MobileNews

Google explains how Recorder Speaker Labels work, plans to use Tensor TPU to save power

As part of December’s Pixel Feature Drop, Google’s excellent Recorder app gained Speaker Labels that can identify multiple people. As with previous editions, the team behind it is out with an explanation of how the feature came to be.

Speaker Labels are powered by Turn-to-Diarize, Google’s new speaker diarization system. There are three main components to it that “run fully on the device”:

  • Speaker turn detection model that detects a change of speaker in the input speech
  • Speaker encoder model that extracts voice characteristics from each speaker turn
  • Multi-stage clustering algorithm that annotates speaker labels to each speaker turn in a highly efficient way

Our speaker diarization system leverages several highly optimized machine learning models and algorithms to allow diarizing hours of audio in a real-time streaming fashion with limited computational resources on mobile devices.

Google notes that audio recordings from the Recorder app can be “as long as up to 18 hours,” and that more audio means greater “confidence on predicted speaker labels.” As such, Recorder will “occasionally make corrections to previously predicted low-confidence speaker labels,” while users can manually make edits and split the transcript. 

The current system mostly runs on Tensor’s CPU, with both the first generation and G2 supported across the Pixel 6, 6 Pro, 6a, 7, and 7 Pro. For the future, Google is “working on delegating more computations to the TPU block, which will further reduce the overall power consumption of the diarization system.” At the moment, Recorder 4.2 contains warning text about how Speaker Labels will not work if your “Device is too hot.” 

Another future work direction is to leverage multilingual capabilities of speaker encoder and speech recognition models to expand this feature to more languages.

More on Google Pixel:



Author: Abner Li
Source: 9TO5Google

Related posts
AI & RoboticsNews

DeepSeek’s first reasoning model R1-Lite-Preview turns heads, beating OpenAI o1 performance

AI & RoboticsNews

Snowflake beats Databricks to integrating Claude 3.5 directly

AI & RoboticsNews

OpenScholar: The open-source A.I. that’s outperforming GPT-4o in scientific research

DefenseNews

US Army fires Precision Strike Missile in salvo shot for first time

Sign up for our Newsletter and
stay informed!