![]() |
![]() |
AbbotDemo is a near-real-time speaker-independent continuous speech recognition system for American and British accented English. The vocabulary size of this demonstration system is 5,000 words. The system uses a hybrid recurrent network/hidden Markov model acoustic model and a trigram language model. The recurrent network (which contains around 100,000 weights) was trained as a phone probability estimator using back-propagation through time. It is available by FTP from svr-ftp.eng.cam.ac.uk in directory /pub/comp.speech/binaries. The file AbbotDemo.README gives more information on this system and the remainder of the files provide the executables for various flavours of UNIX (Linux, SunOS4, HP-UX, IRIX). A 16 bit soundcard and a reasonable microphone are required. The Linux version is also available by FTP from sunsite.unc.edu (and mirrors) in directory /pub/Linux/apps/sound/speech. Sorry, but at this stage no sources and only limited documentation are provided. Although the task domain is focused on noise-free read speech from a north American business newspaper (e.g. the Wall Street Journal) we hope that this system provides a fair representation of the state of the art in large vocabulary speech recognition and that it will encourage the creation of novel applications. Tony Robinson (Cambridge University) Mike Hochberg (Cambridge University) Steve Renals (Sheffield University) and many many more. AbbotDemo URLs: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/binaries/ ftp://sunsite.unc.edu/pub/Linux/apps/sound/speech/
|
Maintained by Bob Duin, e-mail:
duin@ph.tn.tudelft.nl
Last update: February 17, 2004 |
Return to the home page
|