questfox Special Vocabulary Training for Speech2Text with new OOV featurePosted: 9. December 2019
OOV – Out of vocabulary – questfox is sometimes leaving us speechless.
Week by week we see incredible advances in the development of our speech-to-text approach inside of questfox. One of the everlasting challenges is the wrong transcription of words which are not correctly understood in a specific context.
If something is misspelled by a computer this normally happens when the word used is outside the standard vocabulary of the application. This tends to happen often when words are used in a company specific context. By the way this seems to be true for every single market research project.
When asking about a brand some may be asking for competitors with a very special name and a very special spelling. In these cases the transcription technology tends to fail to recognize those specific names. Technically speaking the words used are “out of vocabulary” (OOV).
The questfox solution to the OOV problem: It is now possible to create a list of special words to be integrated in questfox. Everytime the transcription engine of questfox is used this list is sent to the transcription engine to have a better understanding of specific vocabulary used.
Here is our questfox experiment to test the tool with some less common brands
|Brand||Transcription without OOV List||Transcription with OOV List|
|pangea labs||Pangea Labs||pangea LabsPangea Labs|
|Wisdom of Krauts||wisdom of crowds||wisdom of crowds|
|Tjaereborg Reisen||Sherlock Reisen|
|Comme des Garçons||komm DKVkomm digga Song||comme de garcon Gasthaus.com Digger|
kommen die ganzen
As you can see the overall transcription results do improve. But some issues remain.
Those who use mixed spoken language inside one dictation process may be familiar with the fact that transcription only works for one language at a time and hardly ever for two languages simultaneously.
As questfox works with an “expected language” tag for the transcription everything which is far away from the expected language is difficult to understand by the tool. In our pretest language German it was not possible to capture the french brand “Comme des Garçons”. We also failed with our made up brand “Wisdom of Krauts” which is still not transcripted in the desired way. The correct spelling seems to dominate the process and we get the undesired transcription result even after putting it into the list of special words. 😦
The only questfox way to “correct” this spelling would be the use of Regular Expressions in the questlogix. By the way: questlogix is such a distinct word with no similarity in the dictionary that our integration in the list of special words did function right away. Here is how to use RegEx in questfox:
Even while still in development we hope that you appreciate the publication of this feature and share your experience with us.
At first we defined the usage based on each questfox project as we believe that every specific project might have the same vocabulary issues inside one project. You can define your individual “Bag of Words” in the projects settings
The syntax is separate each word by comma. If you have several words just use them in a row
Word, word, word, combination of words,
Our typical list would be
questfox, pangea labs, questlogix, what’s your quest?, Speech2Text
In questfox you can paste one list of words per project under
-> Special List of Words for Transcription / Out of Vocabulary (OOV)
Please share your gained knowledge about this feature.