View Full Version : speech recognition
bluesdog
12-04-2006, 04:45 PM
Anyone have some experience or suggestions for speech recognition/dictation software for Linux?
I think Dragon NaturallySpeaking can run under wine, but I would really like to find some native Linux package or buildable Open Source code
I use text to speech but not speech to text.
I'll see if I can find anything.
fos....
ViaVoice from IBM may be an option. Their support for linux has been hot and cold. It is Java based so it should be portable???
There is a package called openmind / freespeech at Sourceforge. I couldn't check it out here at school. Most internet sites are blocked. :smiley5:
I check further this evening at home.
fos....
bhobjj
12-05-2006, 01:33 PM
Like fos, I use Festival for text2speech.
A couple yers ago, I played around with voice2text. I didn't have the patience to spend the time fine tuning it for my voice, but it worked almost perfect for my daughter.
The Sphinx project from Carnegie Mellon University is speech recognition software written in Java with a BSD license. There are some commercial programs that use it.
http://cmusphinx.sourceforge.net/sphinx4/
http://cmusphinx.sourceforge.net/html/cmusphinx.php
Article by Josephine Ciuca about using Perlbox with Sphinx for voice commands:
http://applications.linux.com/applications/05/01/18/2148234.shtml
Article by Marcel Gagné about setting up festival for txt2speech and Cvoicecontrol for voice commands.
http://www.linuxjournal.com/article/4723
bluesdog
12-05-2006, 11:34 PM
From the Sphinx-4 site:
Capabilities Live mode and batch mode speech recognizers, capable of recognizing discrete and continuous speech.
Generalized pluggable front end (http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/frontend/package-summary.html) architecture. Includes pluggable implementations of preemphasis (http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/frontend/filter/Preemphasizer.html), Hamming window (http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/frontend/window/RaisedCosineWindower.html), FFT (http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/frontend/transform/DiscreteFourierTransform.html), Mel frequency filter bank (http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/frontend/frequencywarp/MelFrequencyFilterBank.html), discrete cosine transform (http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/frontend/transform/DiscreteCosineTransform.html), cepstral mean normalization (http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/frontend/feature/BatchCMN.html), and feature extraction (http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/frontend/feature/DeltasFeatureExtractor.html) of cepstra, delta cepstra, double delta cepstra features.
Generalized pluggable language model architecture. Includes pluggable language model support for ASCII (http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/linguist/language/ngram/SimpleNGramModel.html) and binary (http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/linguist/language/ngram/large/LargeTrigramModel.html) versions of unigram, bigram, trigram, Java Speech API Grammar Format (JSGF) (http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/jsapi/JSGFGrammar.html), and ARPA-format FST grammars (http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/linguist/language/grammar/FSTGrammar.html)
:smiley11:
That sound was my head exploding...
AndreL
12-06-2006, 02:43 AM
So we're going to "talk" to each other, now!!!?! :tongue:
bluesdog
12-08-2006, 04:34 PM
Thanks for the suggestions, but from what I can discover the state of speech recognition in Linux remains quite primitive, or extremely technically challenging, or both.
While it appears trivially easy to set up Perlbox for command functions, a Linux or other Open Source, full-fledged speech-to-text dictation program ala Dragon Naturally Speaking is sadly absent.
IBM's Via Voice is dead in the water, and IBM have even withdrawn the SDK, so it isn't possible to develop from the source.
I've arranged a purchase of Nuance's latest version of Dragon Naturally Speaking, and will attempt to get it working under wine.
bhobjj
12-12-2006, 07:51 PM
Thanks for the suggestions, but from what I can discover the state of speech recognition in Linux remains quite primitive, or extremely technically challenging, or both.
Probably not much demand for desktop use. There are small specialty companies that develop expensive custom speech recognition software for telecommunications, video captioning, etc.
IBM's Via Voice is dead in the water, and IBM have even withdrawn the SDK, so it isn't possible to develop from the source.
It was sold to Wizard software 5 years ago or so:
Note: In order to use the IBM TTS for Linux runtimes, you will have to develop your own program application in a Windows environment using the ViaVoice TTS SDK for Windows, and then compile it on Linux.
http://www.wizzardsoftware.com/products/IBMttssdk.php
So, if someone wants to go into business selling Linux versions, Wizzard will license the runtime.
Old Libranet post here:
http://www.debianquestions.com/ln-archive/viewtopic.php?t=9191
bluesdog
12-13-2006, 06:11 PM
Also from Wizzard, the best part.... Price (Minimum Order is 300 runtimes @ $5.00 each*)..................$1500.00
*Lower (per unit) pricing is available for higher volumes. Please contact Wizzard Sales with your anticipated volumes for a price quoteI need at least 300 of these things... :smiley5:
If I'm not mistaken, the folks at www.oralux.org (http://www.oralux.com) are working on a deluxe version of their distro that includes IBM's Viavoice. They were soliciting beta testers at one time.
fos....
vBulletin® v3.8.6, Copyright ©2000-2012, Jelsoft Enterprises Ltd.