Improve a set of perl scripts whicn is now capable to create speech recognition models.
## Deliverables
There is an open source automatic speech recognition system (ASR) called Sphinx. It hasa couple of programs associated, which are intended to perform training(SphinxTrain). Most of the programs are written in Perl.
?
Here isthe link: <[login to view URL]>
?
Briefly:these programs take a set of sound files, associate them with a correspondingset of transcriptions in text files and create a so-called "acousticmodel" from this data.
?
Ourcompany has a lot of sound files with transcriptions. These files are not wellcropped, i.e. are containing fragments of neighboring utterances. Hence, weneed to extend the SphinxTrain scripts with a procedure of precisedetermination of utterance margins.
?
We needto do this in the same way as LibriVox data was converted for VoxForge project.
?
Thesample is the following script: <[login to view URL]>
?
Theresult of your work will be the precise list of modifications to originalscript files content (and locations).
?
Scriptsshould run under Linux (Fedora, your suggestions).
?
In caseof success we expect more projects in future with the winner.
?
Anysuggestions about the problem domain are accepted.
?
SampleWAV files will be given to the winner.
Thanks