Isn’t the Internet wonderful. Just this morning I was searching and wondering why is there no audio “automatic transcription” software for desktop PCs? This evening… Google’s Live Caption feature is now available on the desktop PC, via the Chrome browser. For free, and running locally and offline and without a Google login.
To enable real-time live subtitles (aka ‘closed captions’ or ‘live captioning’) as your audio or video plays back, first get the latest Chrome then go…
Advanced
Accessibility
Captions
> arrow icon
Live Caption
…and turn it on in Chrome. At this point a set of speech-definition files will be downloaded, to enable the real-time detection of what’s being said. While you’re waiting, set up the preferences for fonts and colours etc.
Those used to AI sets of 1Gb or more will find the Live Caption’s are downloaded in a few minutes, even on a slow connection. Other than the initial download of the definition files the services work locally on the PC and without a Cloud connection. So far as I’m aware this is the first time such a free service is available without a Cloud-upload being needed, still less in real-time.
For this reason I would expect to see third-party UserScripts relatively soon, to enable the transcription to be easily captured into an editable text file as it plays. The playback / transcription continues to run, even when Chrome is not the focus of what you’re doing on the PC, which should help with scripted capture. Obviously if you want the whole thing you would have to let it play back first, to get a full transcription.
Can a recorded .MP3 be loaded and work? As well as a live stream? Yes, it works very well. A podcast with a 90 year-old guy on a smartphone, and kind of ok-ish voice quality… it handled that well. In real-time.
As you watch it, it occasionally goes back and auto-corrects and seems to be doing this based on word context. So I’m guessing it’s not just speech-to-text, but also text-to-text context tweaking. But it can’t work miracles: “gorilla campaign” rather than “guerrilla campaign” etc. And swearing does get f****** bleeped out with asterisks. It can’t detect different speakers. You can’t copy-paste. Still, it’s going to be very useful, especially if you just want a few paragraphs for a quote. Until we get a capture script, you can do things like screen grab with Microsoft OneNote, which handles small fonts fine and can make text from a screengrab very easily.