Many products with embedded computing require user input, and display information that must be understood or acted on by device users. This portion of product functionality is known as the Human Machine Interface (HMI). Today, the HMI is typically provided through display screens, and user input methods have evolved from buttons, mice and keyboards to touchscreens that mimic the operation of our smartphones.
As of 2022, most users expect a smartphone-like interface on electronic products. But for OEMs, this can be difficult and expensive to develop in Embedded Linux and requires talented UI developers and additional graphical user interface (GUI) software tools to build. While the software may be open source, more powerful tools typically require the purchase of a development environment and licensing for devices.
In addition, touchscreen hardware for the finished product is expensive and adds significantly to the bill of material (BOM) cost for embedded products. A glass display can be easily broken or damaged in everyday use in industrial environments, requiring expensive repair or replacement. Another issue device makers face in the medical and food industries is the hygiene factor and the problem of bacteria on surfaces getting transferred between users.
Finally, most touch/display products designed for the smartphone market do not deliver the long service life (10+ years) expected of commercial or industrial products.
The ideal answer to many of these issues lies in voice control. Voice controlled devices allow users to interact with a device at a distance even when they cannot see what they are interacting with. This means that their concentration and focus can be on the task at hand, rather than on the device.
Speech is also a very efficient form of data input. Most people speak at around 150 words per minute, compared with an average typing speed of 40. Combined, these two benefits allow users to make relatively complex requests quickly.
Voice control provides significant advantages in industrial applications where, for example, it can increase the safety of users who are free to focus on the ultimate task rather than on controlling a device through touch interaction. In a medical setting such as an operating room, voice-controlled devices allow touchless interaction, which helps to avoid the transfer of bacteria.
Digi ConnectCore Voice Control is a ready-to use software solution that is pre-integrated into Digi Embedded Yocto, for use with the Digi ConnectCore family of System on Modules (SOMs). ConnectCore Voice Control provides real-time voice recognition and text-to-speech capabilities with a customizable wake word, customizable 60,000-word vocabulary, and support for 30 national languages.
ConnectCore Voice Control brings full voice processing at the IoT edge to any device with a Digi ConnectCore module, enabling zero-touch user interaction with the device. It does not require hardware-based AI/ML accelerators to operate, so product developers can add voice capabilities without additional hardware costs, beyond off-the-shelf microphones and speakers.
Why do the processing at the IoT edge? When you use popular consumer voice control applications like Apple Siri or Amazon Alexa, you might have noticed a slight delay in the interaction, even when the device is right in your hand or on the kitchen counter. That delay is caused by the fact that the computer processing behind nearly all consumer voice applications is performed in the cloud.
While a few tenths of a second delay may not be a problem if you’re selecting a song or sending a text message, that latency can make voice control less effective in the flow of information, or when making precise adjustments. Needless to say, any interruption in connectivity to the cloud makes the problem worse.
ConnectCore Voice Control, however, performs its voice processing locally, at the edge, enabling real-time performance with reaction times of less than 100 milliseconds. On-device voice processing with ConnectCore Voice Control brings real-time response, compared to variable latencies when using voice processing in the cloud. It also eliminates the connection costs of cloud-based solutions.
Most voice control applications on the market operate in only two languages—English and Mandarin Chinese. ConnectCore Voice has the ability to communicate in 30 national languages, providing a great advantage when developing a product for global deployment.
Processing data locally virtually eliminates privacy and security issues that arise when transferring data to cloud services over the network. It protects the privacy of data, since it never needs to connect to the Internet. ConnectCore Voice Control is compliant with the European Union’s General Data Protection Regulation (GDPR), another key benefit for global deployment.
Voice control is a valuable capability in any number of use cases. Considering that most people speak approximately 150 words per minute, compared with average typing speed of 40 words per minute, there is enormous value in improving speed and precision in a range of Human Machine Interaction scenarios. Here are some examples:
For OEM developers considering a voice interface for their next product, either as a current feature or as a future enhancement, Digi ConnectCore Voice Control provides pre-integrated, ready-to-use software for developing on Digi ConnectCore modules.
The development software is available for download on the Digi ConnectCore Voice Control documentation website. As part of the download, Digi provides a single software license for evaluation and development to customers who have already purchased a Digi ConnectCore 8M Nano Development Kit. (For deployment, OEMs can purchase licenses from the software vendor or through Digi for each device they sell.) This software download can be used to develop a proof of concept, to demonstrate voice capabilities and to design the voice control application for a new customer product.
To learn more, download the Digi ConnectCore Voice Control data sheet.