728x90
Whisper language detection function needs 30 seconds input data.
So it lead to equal elapsed time regardless to the input data length. eg. 1 seconds, 2seconds, 20seconds. and 30secends.
In my case, it spent 6 seconds in cpu mode and 0.3 seconds in gpu mode
The code for getting the language code, referred to https://github.com/openai/whisper
Pay attention to the code line of 'whisper.pad_or_trim(audio)'.
If you comment out the above line, it will cause an error.
import whisper
model = whisper.load_model("turbo")
# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)
# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio, n_mels=model.dims.n_mels).to(model.device)
# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")
# decode the audio
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)
# print the recognized text
print(result.text)
728x90
반응형