

This is one of the actually decent uses of this model. I have used Whisper to transcribe to phone calls, and just the other week I had to export the audio from a video I was working on to run whisper to get subtitles for the video. It’s still not a set it and forget it solution, but correcting it’s small mistakes here and there is so much faster than manually transcribing the audio.
Given how modular ffmpeg is with the way the switches work a user never has to interact with that portion of the application. I can technically use ffmpeg to trsnscode an mp3 without ever using the video components.
Have a look at Heliboard. It’s open source. To get swipe you have to import a component that is extracted from GBoard that doesn’t come with the app, but it can be acquired from… places.