Whisper UI - AI Audio Transcribe is a powerful and innovative app that lets you convert any audio file into text or subtitles in seconds. Whether you need to transcribe an interview, a lecture, a podcast, or a video, Whisper UI can handle it all with ease and accuracy.
Whisper UI is more than just a transcription app. It is a fully offline app that uses OpenAI Whisper, a state-of-the-art speech recognition model, to transcribe audio on your computer. This means you don’t need any internet connection or worry about your data being sent to any remote server. You can enjoy fast and secure transcription of your audio files, without compromising on quality or privacy.
With Whisper UI, you can:
- Transcribe audio from any format, including MP4, MOV, MKV, AVI, MJPEG, MPEG, F4V, FLV, M2T, M2TS, M2V, 3GP, 3G2, MP3, WAV, OGG, FLAC, M4A, M4V, AIFF
- Record and transcribe audio directly from your computer’s microphone or any audio input device
- Select the input audio language and output text language
- Translate audio from 57 different languages into English
- Specify source language of any of the 57 supported languages
- Generate subtitles in various formats, including .srt, .ass, .vtt, ssa. .lrc
- Download the generated text or subtitle file
- Edit or correct the transcription within the app
- Install as a background service for Scorpio Player to use live transcription and display subtitles using your own computer power
- Customize the app’s appearance with Mica, Mica Alt, Acrylic, or Dynamic Shader Animation backgrounds
Whisper UI is the ultimate app for anyone who works with audio content. It saves you time and effort by providing you with accurate and editable transcriptions in minutes. It also helps you communicate and collaborate with people from different languages and cultures by translating audio with a single tap.
Available models and languages
There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model; actual speed may vary depending on many factors including the available hardware.
Model Disk Mem SHA
tiny 75 MB ~125 MB bd577a113a864445d4c299885e0cb97d4ba92b5f
base 142 MB ~210 MB 465707469ff3a37a2b9b8d8f89f2f99de7299dac
small 466 MB ~600 MB 55356645c2b361a969dfd0ef2c5a50d530afd8d5
medium 1.5 GB ~1.7 GB fd9727b6e1217c2f614f9b698455c4ffd82463b4
large 2.9 GB ~3.3 GB ad82bf6a9043ceed055076d0fd39f5f186ff8062