Speech-to-text technology isn’t new in 2023, but a sudden leap in quality means something exciting is happening! With the launch of OpenAI’s Whisper, an open-source speech recognition model in September 2022, the bar has been set high.
So, how does it compare to the seasoned player AWS Transcribe? Let’s dive in and find out which is the ultimate choice for your project.
The Tech Behind Whisper
OpenAI generally plays it close to the chest and rarely open-sources its models. But with Whisper, they broke the mold! This gem has been trained on a whopping 680,000 hours of data, including low-quality recordings, to maximise accuracy.
Whisper’s Many Models
Whisper comes in multiple models:
- Tiny
- Base
- Small
- Medium
- Large
- Large V2
Whisper is open-source; which means you can install it on your server. However, if you are going to a Large model which is the most accurate then you need at least 10 GB VRAM. Its speed is also slower than the tiny model.
But that’s not all, OpenAI recently announced an API for Whisper. So if you don’t have heavy usage then API will be more affordable for you.
Whisper understands an incredible 97 languages and even offers translation services. Choose your desired language, and Whisper will handle the rest.
If you want to check the demo of Whisper you can visit listenmonster, Currently, they are using large v2 mode. It is a Transcription & subtitle tool for internet creators.
Listenmonster functions as a complementary tool to Speechactors, providing an integrated experience that significantly enhances creative possibilities when utilized jointly.
What’s Cooking with AWS Transcribe?
AWS transcribe is an AI speech-to-text model that is available on API. It is a closed model so nobody knows what is behind the scene.
It’s API-only, meaning there’s less flexibility to reduce costs as you scale up. AWS Transcribe covers 40 languages, some with multiple accents like English and Spanish.
AWS Transcribe was launched in 2017. Unlike Whisper it does not offer a translation service. If you want to do it then you have to use their translation API. It simply means more cost.
If you want to test it without developing then you can sign up for AWS and search for AWS transcribe to test their service.
The Head-to-Head Comparison
Accuracy
When it comes to turning spoken words into text, you want the most accurate tool. In tests, Whisper comes out on top compared to AWS Transcribe. This isn’t just for English; it’s the same for other languages too.
Whisper was trained using not-so-great audio on purpose, just to make sure it works well even when conditions aren’t ideal. Want to see for yourself? Try out AWS Transcribe by making an account and then compare it with Whisper by going to listenmonster.
Languages Supported
Whisper speaks 97 languages with no accents available. You can check all the languages listed here.
AWS Transcribe only supports 40 languages including some languages such as English, French, etc multiple accents. You can check the full list here.
In simple words, Whisper is the clear winner here once again.
Pricing
AWS Transcribe’s API follows a pay-as-you-go model:
– First 250,000 minutes: $0.02400/min
– Next 750,000 minutes: $0.01500/min
– Over 5,000,000 minutes: $0.00780/min
Whisper, on the other hand, offers both API and open-source options. API costs start at $0.006/min, and some third-party services even offer rates as low as $0.001445/min.
If you are doing transcription on a large scale then it means you can reduce the cost per minute by installing it on your own server.
The Winner
Whisper is like the Swiss Army knife of audio transcription. It’s affordable, accurate, and chock-full of features. Whether you opt for its API or install the model on your server, you’re in for some substantial savings.
AWS is a closed model and they have more customers. They might make their transcriptions better in future if they see potential threats from OpenAI whispers.
However, while writing this blog post in 2023 Whisper clearly crushes the AWS Transcribe in every aspect.