MobileNews

Google explains how Recorder Speaker Labels work, plans to use Tensor TPU to save power

As part of December’s Pixel Feature Drop, Google’s excellent Recorder app gained Speaker Labels that can identify multiple people. As with previous editions, the team behind it is out with an explanation of how the feature came to be.

Speaker Labels are powered by Turn-to-Diarize, Google’s new speaker diarization system. There are three main components to it that “run fully on the device”:

  • Speaker turn detection model that detects a change of speaker in the input speech
  • Speaker encoder model that extracts voice characteristics from each speaker turn
  • Multi-stage clustering algorithm that annotates speaker labels to each speaker turn in a highly efficient way

Our speaker diarization system leverages several highly optimized machine learning models and algorithms to allow diarizing hours of audio in a real-time streaming fashion with limited computational resources on mobile devices.

Google notes that audio recordings from the Recorder app can be “as long as up to 18 hours,” and that more audio means greater “confidence on predicted speaker labels.” As such, Recorder will “occasionally make corrections to previously predicted low-confidence speaker labels,” while users can manually make edits and split the transcript. 

The current system mostly runs on Tensor’s CPU, with both the first generation and G2 supported across the Pixel 6, 6 Pro, 6a, 7, and 7 Pro. For the future, Google is “working on delegating more computations to the TPU block, which will further reduce the overall power consumption of the diarization system.” At the moment, Recorder 4.2 contains warning text about how Speaker Labels will not work if your “Device is too hot.” 

Another future work direction is to leverage multilingual capabilities of speaker encoder and speech recognition models to expand this feature to more languages.

More on Google Pixel:



Author: Abner Li
Source: 9TO5Google

Related posts
AI & RoboticsNews

Nvidia and DataStax just made generative AI smarter and leaner — here’s how

AI & RoboticsNews

OpenAI opens up its most powerful model, o1, to third-party developers

AI & RoboticsNews

UAE’s Falcon 3 challenges open-source leaders amid surging demand for small AI models

DefenseNews

Army, Navy conduct key hypersonic missile test

Sign up for our Newsletter and
stay informed!