May 1, 2024

Apple Explains How It Trains Siri to Recognize Your Voice

Posted April 16, 2018 at 11:09pm by iClarified · 25339 views
Apple has posted a new entry on its Machine Learning Journal that explains how it trains Siri to recognize your voice. Apple calls this PHS or 'Personalized Hey Siri'. It consists of two methods for user enrollment: explicit and implicit. During the setup of your device you are asked to speak a few phrases that begin with Hey Siri. This is explicit enrollment. Over time Apple adds utterances spoken by the primary user in real-world situations. This is implicit enrollment.

On each “Hey Siri”-enabled device, we store a user profile consisting of a collection of speaker vectors. As previously discussed, the profile contains five vectors after the explicit enrollment process. In the Model Comparison stage of Figure 1, we extract a corresponding speaker vector for every incoming test utterance and compute its cosine score (i.e., a length-normalized dot product) against each of the speaker vectors currently in the profile. If the average of these scores is greater than a pre-determined threshold (λ), then the device wakes up and processes the subsequent command. Lastly, as part of the implicit enrollment process, we add the latest accepted speaker vector to the user profile until it contains 40.

More details in the full post linked below...

Read More