🎉 whisper language detection elapsed time

whisper language detection elapsed time

2025. 3. 11. 17:41

728x90

Whisper language detection function needs 30 seconds input data.

So it lead to equal elapsed time regardless to the input data length. eg. 1 seconds, 2seconds, 20seconds. and 30secends.

In my case, it spent 6 seconds in cpu mode and 0.3 seconds in gpu mode

The code for getting the language code, referred to https://github.com/openai/whisper

Pay attention to the code line of 'whisper.pad_or_trim(audio)'.

If you comment out the above line, it will cause an error.

import whisper

model = whisper.load_model("turbo")

# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)

# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio, n_mels=model.dims.n_mels).to(model.device)

# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

# decode the audio
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)

# print the recognized text
print(result.text)

728x90

저작자표시 비영리 동일조건

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

구름사이

whisper language detection elapsed time

+ Recent posts

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역