Speech Recognition in the Healthcare Setting
Speech recognition will be a long duration industry trend for two reasons. One, speech is the most natural way of communication for humans. Second, voice recognition devices free up hands, time, and working memory. Therefore, it will enhance work experience in any industry where workers require both hands and multitasking, such as in the medical field, manufacturing, or mundane office work. Soon, speech recognition will be primarily implemented in any busy or high-risks industries such as healthcare.
There are many methods for implementing speech recognition into the healthcare systems. One would be to deliver information seamlessly or document notes in real-time, possibly in the form of a smartwatch or smart glasses for when a doctor needs to quickly access patient information or reference at the bedside examining patients and during operations in a sterile environment.
Currently, speech recognition is already used in documentation during non-time-sensitive scenarios such as writing medical charts (Vogel et al., 2015). Vogel et al. (2015) also found that using speech recognition to document medical records is 26% faster than traditional methods. In the future, when speech recognition technology is implemented to document real-time, while doctors are examining the patients and recording observations at the same time, then both precision and time efficiency can be improved.
There are a few potential issues that must be addressed when using speech recognition.
First, technical issues, including environmental noises and user accents, are the primary concerns when implementing a speech recognition system. The system is likely to be limited and handicapped if it cannot enhance efficiency for the user or if the user needs to shout into the microphone or needs to pronounce words or construct words in a particular way for the system to register the sentence. As Parente et al. (2004) mention, such problems can hinder the acceptance and implementation of speech recognition technology. Eliminating noises also aids in recognition accuracy and decreases potential room for errors. User accents can become an issue since the machine needs to be usable by everyone.
Another challenge relates to parsing and syntax; how would speech recognition devices understand human speech and accurately convert them to text or perform the right actions. Humans comprehend language with meaning dominance in mind and have a natural ability to resolve temporary ambiguity in sentences (Goldstein, 2019). Interpreting a sentence with homophones such as sun and son is easy for humans because humans comprehend sentences knowing the context and adjust accordingly depending on the speaker; however, not so much for machines, unless trained with a large variety of data. Training work will be extensive because users come from various backgrounds and use a variety of syntax. Gildea and Palmer (2002) also emphasize that correctly identifying the syntax and semantics that form a sentence is important for interpreting texts and is essential in machine translations and automatic responses.
References
Gildea, D., & Palmer, M. (2002). The Necessity of Parsing for Predicate Argument Recognition. Association for Computational Linguistics, 239–246. https://www.aclweb.org/anthology/P02-1031.pdf
Parente, R., Kock, N., & Sonsini, J. (2004). An analysis of the implementation and impact of speech-recognition technology in the healthcare sector. Perspectives in health information management, 1, 5. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2047322/
Vogel, M., Kaisers, W., Wassmuth, R., & Mayatepek, E. (2015). Analysis of Documentation Speed Using Web-Based Medical Speech Recognition Technology: Randomized Controlled Trial. Journal of Medical Internet Research, 17(11), e247. https://doi.org/10.2196/jmir.5072