dlpy.speech.Speech.transcribe¶

Speech.transcribe(audio_path, max_path_size=100, alpha=1.0, beta=0.0, gpu=None)¶

Transcribe the audio file into text.

Notice that for this API, we are assuming that the speech-to-test models published by SAS Viya 3.4 will be used. Please download the acoustic and language model files from here: https://support.sas.com/documentation/prod-p/vdmml/zip/speech_19w21.zip

Parameters:

audio_path : string: Specifies the location of the audio file (client-side, absolute/relative).
max_path_size : int, optional: Specifies the maximum number of paths kept as candidates of the final results during the decoding process. Default = 100
alpha : double, optional: Specifies the weight of the language model, relative to the acoustic model. Default = 1.0
beta : double, optional: Specifies the weight of the sentence length, relative to the acoustic model. Default = 0.0
gpu : class: When specified, the action uses Graphics Processing Unit hardware. The simplest way to use GPU processing is to specify “gpu=1”. In this case, the default values of other GPU parameters are used. Setting gpu=1 enables all available GPU devices for use. Setting gpu=0 disables GPU processing.

Returns: