The voice and speech recognition tech market is anticipated to be worth $31.82 billion by 2025, driven by new applications in banking, health care, and automotive industries. And Tom Livne, who cofounded Verbit.ai with Eric Shellef and Kobi Ben Tzvi in 2017, asserts the Tel Aviv and New York-based startup will contribute substantially to the segment’s rise. Verbit’s adaptive speech recognition tech, which can generate detailed transcriptions with a claimed over 99% accuracy, recently attracted the attention of VCs including Vertex Ventures and Oryzn Capital, which both participated in the startup’s $23 million series A round. Now, roughly a year later, Verbit hopes to lay the groundwork for growth with an infusion of fresh capital.
Verbit today announced the close of a $31 million series B financing round at double its previous valuation led by New York-based equity growth firm Stripes, with participation from Viola Ventures, Vertex Ventures, HV Ventures, Oryzn Capital, and ClalTech. The tranche brings the company’s total raised to $65 million, and Livne says it’ll enable Verbit to expand its products’ language support and further develop its speech recognition technologies as it triples its New York office’s headcount this year.
“When I established Verbit [three] years ago, I didn’t anticipate we would become one of the market-leading companies in our industry so quickly,” said Livne, who added the revenue tripled in the past year. “This latest financing round is an important milestone in Verbit’s journey and strengthens the incredible momentum we had in 2019. The collaboration with Stripes is a great indicator of Verbit’s category-leading product and will allow us to continue innovating in the market.”
Voice transcription and captioning isn’t novel — it’s a decades-old venture with well-established players, like Nuance and Google. Enterprise platforms such as Microsoft 365 offer AI-powered speech-to-text, along with Cisco and startups like Otter and Voicera.
But what sets Verbit apart is its reliance on “cutting-edge” advances in machine learning and natural language understanding, according to Levine.
Three models — an acoustic model, linguistic model, and contextual events model — inform Verbit’s captioning. They filter out background noise and echo and identify speakers regardless of accent, and then they detect domain-specific terms while incorporating current events and updates. Clients upload audio or video files to a cloud dashboard for processing, which a team of over 15,000 human freelancers in over 20 countries subsequently edit and review taking into account customer-supplied notes and guidelines. The finished transcriptions are available for export to services like Blackboard, Vimeo, YouTube, Canvas, and BrightCode.
Verbit’s web frontend shows the progress of each job and lets users edit and share files, or define the access permissions for each and add inline comments, request reviews, or view usage reports. A forthcoming feature — Verbit Express — will allow them to drag files in need of transcription to a folder on a desktop PC, where they’ll be automatically uploaded and processed.
On the transcriber side on the equation, it’s a self-serve and on-demand affair. Verbit transcribers can choose the files they’d like to work on — the platform doesn’t assign them manually — and take advantage of built-in dictionary and research tools, keyboard shortcuts, speed control, a highlighter, and spell check. Those that consistently produce exceptional work and achieve high quality scores are offered the chance to become reviewers, who are responsible for proofreading — editing, if necessary — transcribers’ work.
Livne claims the platform can reduce operating costs by up to 50% and deliver results 10 times faster than the competition. In any case, it was enough to woo a healthy client base of over 150 educational institutions and commercial customers (up from 70 as of January 2019), including the NCAA, London Business School, Fashion Institute of Technology, Utah State University, University of Utah, University of Southern Utah, University of Vermont, Auburn University, Western Governor University, University of California Santa Barbara, Oakland University, Stanford, Coursera, Udacity, Panopto, Kaltura, and close to 100 others (up from 50 in May 2018).
Customer have to make a minimum commitment of $10,000 worth of work, a pricing structure that apparently paid dividends. Verbit.ai isn’t disclosing exact revenue but says it’s in the “millions” and that the company is cash flow positive.
Verbit plans to explore verticals in the insurance and financial sectors as well as media and medical use cases, and to this end, it recently launched a human-in-the-loop transcription service for media firms with a delay of only a few seconds. “We are thrilled to partner with Tom and the rest of the Verbit team on their mission to build the leading AI-powered transcription and captioning platform,” said Stripes partner Saagar Kulkarni, who intends to join Verbit’s board of directors. “We are big believers in the power of AI to
fundamentally change business models and provide critical services better, faster, and more affordably.”
Author: Kyle Wiggers
Source: Venturebeat