Speech Recognition model API¶

About The Project¶

Our speech to text interface enables you to accurately convert speech into text using an API powered by deep learning neural network algorithms for automatic speech recognition (ASR).

This is enabled to provide the following features:

Speech to text transcription support for a growing list of indic languages.
Transcribe your content in real time from stored files or audio bytes.
Generate subtitle or transcript for your audios as per your choice of output.
Support for various audio formats like WAV, MP3, PCM.
Speech-to-Text accurately punctuates transcriptions (e.g., commas, question marks, and periods).
[beta]Enables transcription optimized for domain-specific quality requirements associating domain models in backend.

The Developer documentation provides you with a complete set of guidelines which you need to get started with:

Architecture overview
API reference
Developer Documentation
Contribute to this project
Tutorials Reference

Architecture Overview¶

ASR Architecture

The logical architecture here is built with a grpc server hosting our speech recognition models and dependencies, which can be run in any environment or docker. With gRPC, we can define our service once in a proto file and generate clients and servers in any of gRPC’s supported languages. Which in turn can be run in environments ranging from servers inside a large data center to your own tablet. All the complexity of communication between different languages and environments is handled for you by gRPC. We also get all the advantages of working with protocol buffers, including efficient serialization, a simple IDL, and easy interface updating. You can read about gRPC in the gRPC doc.

Apart from using gRPC stubs, we have added the support for REST calls to the gRPC server via envoy as API gateway. Once you specify special mapping rules, API Gateway translates RESTful JSON over HTTP into gRPC requests. This means that you can deploy a gRPC server with envoy and call its API using a gRPC or JSON/HTTP client, giving you much more flexibility and ease of integration with other systems.

API reference¶

Our API has predictable resource-oriented URLs, accepts form-encoded request bodies, returns JSON-encoded responses, and uses standard HTTP response codes, authentication, and verbs.

Supported Endpoints

Endpoint	Purpose
recognize_audio	Streaming Endpoint.
punctuate	Punctuation endpoint for a given text.
recognize	Inferencing from a audio URL or bytes.

More details are available here.

Authentication and Authorization

We do not have any authentication or authorization layer built into API. So you can choose any mechanism or enhance the API at your convenience to support it.

Errors

Our API uses HTTP response codes to indicate the success or failure of an API request.

Status	Meaning	Description
200	OK	Everything worked as expected.
400	Bad Request	The request was unacceptable, often due to missing a required parameter.
401	Unauthorized	No valid API key provided.
402	Request Failed	The parameters were valid but the request failed.
403	Forbidden	The API key doesn't have permissions to perform the request.
404	Not Found	The requested resource doesn't exist.
409	Conflict	The request conflicts with another request (perhaps due to using the same idempotent key).
429	Too Many Requests	Too many requests hit the API too quickly. We recommend an exponential backoff of your requests.
50X	Server Errors	Something went wrong on Stripe's end. (These are rare.)

Developer Documentation¶

API and all dependent components are open sourced.

API Realtime Inferencing

Tutorials Reference¶

Contributing¶

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License¶

Distributed under the [MIT] License. See LICENSE for more information.

Git repository¶

https://github.com/Open-Speech-EkStep/speech-recognition-open-api.git

Contact¶

Connect with community on Gitter

Project Link: https://github.com/Open-Speech-EkStep/speech-recognition-open-api.git