August 11, 2020

Natural voice, the future of human-computer interaction

<p> With the development of technology, the human-computer interaction interface is constantly improving. From the initial command line, window graphical interface to touch screen, the way of human-computer interaction has become more and more humanized. After the touch screen, what kind of interaction will bring new changes? Undoubtedly, voice technology is the most anticipated because it is the most natural way of communication for human beings. Imagine if your electrical appliances can understand your requirements like your friends and respond correctly to your requirements, what a wonderful experience. Nowadays, this kind of voice interaction technology is increasingly applied to the electronic devices around us.

As early as 2011, IBM's supercomputer "Watson" defeated the two champion players of the American quiz show "Dangerous Edge", causing a great sensation. To some extent, Watson can already communicate with humans freely, which is inseparable from the powerful computing performance behind it. According to reports, it took IBM several years to develop Watson. Watson has 10 Power 750 servers, runs the Linux operating system, has 15TB of memory, 2880 processors, and can perform 80 trillion operations per second. 10 refrigerator sizes.

Although it is still challenging to realize human-computer interaction like Watson freely, it does not prevent the application of voice technology in specific fields and specific fields, especially in embedded devices such as car and mobile terminals. Consumer recognition.

Statistics from Strategy AnalyTIcs show that in 2012, the shipments of telematics and telematics systems (telemaTIcs) provided by Chinese original equipment manufacturers (OEMs) with voice human-machine interfaces reached 3 million units, and are expected to be in 2018 Reached 20 million units. In the North American and European markets, in-vehicle equipment applications with voice interaction functions have become very popular. The Ford SYNC system, the Ford in-car multimedia communication and entertainment system specially equipped for mobile phones and digital media players, is a successful case of using voice interaction technology in the in-vehicle system. It has been widely used in many Ford series cars. Equipped with the SYNC system, combined with the display on the car's center console, voice control, compatible with and control of portable communication / entertainment devices, etc., make the driver more easily and conveniently during driving, such as voice dialing, voice broadcast Features such as text message output, voice-controlled music playback, etc.

Outside of cars, mobile Internet terminals are probably another type of product that is currently the most enthusiastic about using voice interaction. Since Apple first launched the smart voice assistant application Siri in its iPhone 4S, Google has also launched in its Android smartphone operating system. Google Now intelligent voice search and Q & A service, Microsoft has also applied voice technology to its Windows Phone. Now, almost every mobile phone manufacturer is trying to integrate voice technology into its mobile products, applications and services. One of the main reasons for this is that these terminal devices are small and compact, and touch input is very inconvenient. In this case, voice interaction has become a very necessary supplementary way for human-machine communication. At this point, I am deeply touched. Since I used an Android phone, I once stopped sending text messages (too much trouble). Now, after installing a Xunfei voice input method, I started to communicate with my friends again. Voice input is really another. Convenient and fast.

Although voice technology has brought us great fun and help, but to achieve a truly smooth and natural voice interaction also requires powerful software and hardware technical cooperation. Speech technology involves many aspects such as speech synthesis, speech recognition, speech evaluation, and natural language understanding, and the complexity and diversity of languages ​​have brought many challenges to the application of speech technology. Nuance, iFLYTEK, Microsoft, IBM, and Google are all investing in research and development of new voice technologies. Among them, as the leader of Chinese voice technology, iFLYTEK has occupied more than 70% of the market share of the Chinese voice technology market. Its Xunfei voice cloud partners have exceeded 10,000, and Xunfei input method is also popular. Nuance's speech recognition platform is also widely used in the industry. The aforementioned Ford SYNC system and Apple Siri all use Nuance technology. Not long ago, Microsoft announced the development of a new type of speech recognition technology, this technology called "deep neural network", which can process language behavior like the human brain, which is said to be twice as fast as current speech recognition technology.

With good voice software and algorithms, it must also be supported by high-performance hardware. I believe that with the development of voice technology and the improvement of hardware performance, natural voice will bring new changes to the next generation of human-computer interaction.

