I know C++ and PHP, I know OOP and usages of Database technologies. I have to ma
ID: 656911 • Letter: I
Question
I know C++ and PHP, I know OOP and usages of Database technologies. I have to make speech recognition software for my own nation Whose symbols are unique but supported by UTF-8. and so far no software company has taken initiative to do so. I need to know what programming language will be perfect to do and which courses should I take to learn the process. I don't like to process the language via SAPI or build in recognition technologies as they are based on English (Problem here the grammar and syntax is so difference- it's Indo-European based). And I want to make it from scratch ( machine level/voice processing- i want to make sound that processed directly be parsed to my symbols (no English transformation )). Hope you will understand cause I am looking forward to it as it's my nation's requirement. This is not to promote any programming language or course. Just I need to know it now. ( if my question does not fit here please where it fit most and be kind enough to move to that forum. I had bitter experience about it)
Explanation / Answer
Adding support for a new language is pretty straightforward, you actually just need to follow the documentation and you can get to the point. You also need to have a knowledge of the scripting language which will help you to cut manual work on some steps. Unix command line experience is a big plus, though you can work on Windows too.
1) Read Introduction to become familiar with concepts of speech recognition - features, acoustic models, language models, etc.
2) Try CMUSphinx with US English model to understand how things work. Try to train with sample US English AN4 database following acoustic model training tutorial.
3) Read about your language in Wikipedia.
4) Collect a set of transcribed recordings for your language - podcasts, radio shows, audiobooks. You can also record some initial amount yourself. You need about 20 hours of transcribed data to start, 100 hours to create a good model.
5) Based on the data you collected, create a list of words and a phonetic dictionary. Most phonetic dictionaries could be created with a simple rules with a small script in your favorite scripting language like Python. See Generating a dictionary for details.
6) Segment the audio to short sentences manually or with sphinx4 aligner, create a database with required files as described in training tutorial.
7) Integrate new model into your application and design a data collection to improve your model.
If you have questions, feel free to ask on CMU Sphinx / Forums.