Speech service overview
Oracle Cloud Infrastructure (OCI) Speech is an AI service that applies ASR (Automatic Speech Recognition) technology to transform audio-based content to text. Developers can easily make API calls to integrate Speech’s pretrained models into their applications. Use OCI Speech service accurate, text-normalized, timestamped transcription via the Console, REST API, CLI, or SDKs. In addition, you can use the Speech service in a Data Science notebook session. With Speech you can filter profanities, get confidence scores for single words or the whole transcription.
1. Support for multiple languages: with our GA you will be able to transcribe audio files in English, Spanish or Portuguese.
2. The OCI Speech service is designed to seamlessly integrate existing customer solutions through the UI, REST APIs, SDK, and CLI. Furthermore, OCI Speech users can take advantage of batching support and submit multiple files with one call.
3. Blazing fast processing: Transcribe hours of audio within single digit minutes. The Speech service is using chunking to break your audio into small segments and then transcribes each segment and join the text file.
4. Text Normalization provides a more readable text that resembles how humans write. E.g., Speech would convert the audio “this laptop costs one thousand three hundred and fifty-five dollars” to “this laptops costs $1355”. Additionally, Speech will normalize addresses, time, numbers, URLs and more.
5. Word Filtering (Profanity) – Speech can either remove, mask or tag profanity, or absence text in output text.
6. Job Canceling allows users to cancel their jobs even after submitting them (and while the job is not processed or done).
7. Confidence Score per word/transcription
8. Quick follow features: we will launch two additional features within a couple weeks of the GA date:
· Punctuations – makes longer text more readable and allows downstream systems to process the text with less friction.
· SRT support – SRT is the most used Closed Caption (CC) output file format. With SRT support, users can add Closed Captions to their media, making it more accessible or translate videos to other languages.
· Seamless integration: Speech is designed to integrate with existing customer solutions through the UI, REST APIs, SDK, and CLI. Users can also use Speech in Data Science notebooks.
· Security: Audio files are not retained after processing (unlike some other cloud providers).
· Zero ramp up time: Speech pretrained models allows users to leverage Automatic Speech Recognition (deep learning-based speech to text models) without any initial investment in data or model training.
· Batch processing: Customers working with larger volumes of data can transcribe audio files asynchronously in batches.
· Fully Managed Service: Customers don’t have to worry about the choice of infrastructure hosted for model training and inference.
How to access OCI Speech Service? Navigate to the Analytics & AI menu in the OCI console.
For regular information become a member in the Developer Partner Community please register here.