Dictation



 

Introduction

For a couple of years now it has been possible to buy off the shelf a choice of continuous speech recognisers (CSR) for English. This CSR effectively replaces the systems which required each word to be pronounced distinctly, a state of the art which had prevailed thoughout the 1990s. It has meant that speech-input is now a routine part of even starter-level computing packages; and the potential it offers for hands-free data input of all kinds is now being widely diffused, with doctors' biopsy reports as a particularly effective application.

Crucial to this major step forward was achieving the ability to recognize very large vocabularies (over 30,000 words), while ensuring that the systems could still be used by an unlimited variety of speakers.

 

Where the Progress is Being Made

There are centres of expertise in the technical departments of the companies producing the basic products.

In the UK there are also significant research teams at the Universities of Edinburgh and Cambridge, at British Telecommunications' Research Labs, and the Defence Evaluation and Reseach Agency in Malvern.

There also many noted centres of excellence in the USA, France, Germany and Japan.

 

Sources for Products

The front-runners are:

Dragon Systems' Naturally Speaking

IBM's ViaVoice

Lernout & Hauspie's Voice Xpress

Philips' FreeSpeech.

However, there is also considerable scope for products which embed these or other products in an environment designed to favour effective generation of written text. One such example is the   CyberTranscriber system,  based on the ARMADA technology originated by DERA-SRU, and available over the Internet.

Other companies, as Entropic  (now absorbed by Microsoft), provide environments to customize speech products for particular purposes, providing access to the latest technology but without making a commitment to specific decisions on the best way to integrate it with business and other practices.

 

Things to Watch Out for

  • Although a feature of many early spoken language recognition systems was simultaneous display of the text as it was dictated, it may be better, functionally, not to allow simultaneous editing, or indeed display. One of the major advantages of this technology is to speed up processing times, and the temptation to correct as one goes tends to slow throughput.

  • The technology that has proved successful depends on a combination of Hidden Markov Models, to recognize sounds from their acoustics, and statistical language models, to estimate the likelihood of particular words in a given context. Both of these can only be built effectively with the aid of large amounts of background data, of annotated recordings and relevant text respectively. Without these in the background, it cannot be expected that CSR can be developed for a new language: but if these are available, the techniques used should be effective, whatever the language.

 

If you'd like to learn more about the potential of this technology, from an experienced but completely impartial source, it's time you got in touch with  Linguacubun Ltd  itself.



Linguacubun Ltd. Batheaston Villa, Bailbrook Lane, Bath BA1 7AA UK Tel:+44(0)1225 852865 Fax: +44(0)1225 859258