As with many inquiries to Apple’s helper bot, asking Siri to “read my lips,” comes back with a “I’m not sure I understand” message. After all, Apple has so far eschewed talking much about modern AI systems. Why would the voice recognition service also want to be staring at me like HAL 9000 from 2001: A Space Odyssey?

A newly revealed patent from Apple shows the company has actively considered what a proprietary lip-reading program would look like. The patent application was originally filed in January of this year and describes a system for determining whether “motion data” matches a word or phrase. Diagrams specifically mention Siri with simple voice commands such as “Hey Siri,” “skip,” or “next song,” and how all those inputs can be improved thanks to an algorithm analyzing users’ mouth parts.

Advertisement

As first noted by Apple Insider, Apple explains that there are obvious problems with voice recognition systems such as Siri. Voices can get distorted through background noise, and other sensors that perpetually monitor people’s voices expend a good deal of battery and processing power. Such a system wouldn’t necessarily use a device’s camera. Instead, the voice recognition software would use one of the phone’s motion sensors to record the mouth, neck, or head and determine if any of that movement could indicate human speech.

These sensors could be an attached accelerometer or gyroscope, which Apple noted in its patent is much less likely to be corrupted by unwanted stimuli than a microphone. It doesn’t have to just be a phone, as the patent describes how that kind of motion sensing tech could be integrated into AirPods or even a vague reference to “smart glasses,” which would then send that data over to a user’s iPhone. The devices could detect subtle facial muscle, vibrations, or head motions, per the document. Sure, Apple’s dreams of smart glasses died years ago, but the company is hoping for big things with its Vision Pro headset.

Advertisement

Advertisement

For this kind of system, Apple would need a lot of data on how humans use their mouth parts. The company could set up a “voice profile” for users on the system. Siri is already supposed to recognized an iPhone’s main users’ voice, but Apple’s recent accessibility features have expanded that voice capture capability. The Live Speech feature on iOS can record users’ voice profiles, which is then used by a speech-to-text system to copy those intonations and voice patterns.

Apple then talks about a “first language model” that would need to be trained on sample data sets. It isn’t clear whether this would require any machine learning model, but it would make sense to train an AI model to recognize facial movements from a vast data set. It fits in Apple’s own current paradigm of shoving AI into the background of new features. The company only referenced a “transformer language model” once at its latest WWDC when talking about the company’s new autocorrect features coming to iOS 17.

Sure, Apple does file a lot of patents, some a bit more crazy than others. Not all of them get made into products. But with this being so recent, it does contain a bit more substance than a few of the Cupertino company’s other ideas. Gizmodo reached out to Apple for comment, but we did not immediately hear back.

Apple supply chain analyst Ming-Chi Kuo wrote yesterday that Apple’s progress on generative AI was “significantly behind its competitors,” and there was no sign the company would be integrating these kinds of deep learning models onto their hardware products this year or the next. That’s despite reports Apple has developed its own internal chatbot codenamed “Apple GPT.” Apple could be working to add more AI capabilities to Siri, especially as plenty other apps have already created their own AI-based voice assistants for Apple products.

Advertisement

Services MarketplaceListings, Bookings & Reviews

Entertainment blogs & Forums

Linkedin.