dlpy.speech.Speech.transcribe

Speech.transcribe(audio_path, max_path_size=100, alpha=1.0, beta=0.0, gpu=None)

Transcribe the audio file into text.

Notice that for this API, we are assuming that the speech-to-test models published by SAS Viya 3.4 will be used. Please download the acoustic and language model files from here: https://support.sas.com/documentation/prod-p/vdmml/zip/speech_19w21.zip

Parameters
audio_pathstring

Specifies the location of the audio file (client-side, absolute/relative).

max_path_sizeint, optional

Specifies the maximum number of paths kept as candidates of the final results during the decoding process. Default = 100

alphadouble, optional

Specifies the weight of the language model, relative to the acoustic model. Default = 1.0

betadouble, optional

Specifies the weight of the sentence length, relative to the acoustic model. Default = 0.0

gpuclass

When specified, the action uses Graphics Processing Unit hardware. The simplest way to use GPU processing is to specify “gpu=1”. In this case, the default values of other GPU parameters are used. Setting gpu=1 enables all available GPU devices for use. Setting gpu=0 disables GPU processing.

Returns
string

Transcribed text from audio file located at ‘audio_path’.