AI & RoboticsNews

Amazon researchers use AI to improve the recognition of curved text

Optical character recognition (OCR), or the conversion of images of handwritten or printed text into machine-readable text, is a science that dates back to the early ’70s. But algorithms have long struggled to make out characters that aren’t parallel with horizontal planes, which is why researchers at Amazon developed what they call TextTubes. They’re detectors for curved text in natural images that model said text as tubes around their medial (middle) axes, and in a paper describing their work, the coauthors claim that their approach achieves state-of-the-art results on a popular OCR benchmark.

As the researchers explain, scene text is typically broken down into two successive tasks: Text detection and text recognition. The first involves localizing characters, words, and lines using contextual clues, while the second aims to transcribe their content to the extent that it’s possible. Both are easier said than done — text in the wild is affected not only by deformations, but viewpoint changes and arbitrary fonts.

The team’s solution is a “tube” representation of the text reference frame that captures most of the variability, taking advantage of the fact that target text is usually a concatenation of characters of similar size. It’s formulated as a mathematical function that enables the training of machine learning scene text detectors, in contrast to traditional approaches that use overlap- and noise-prone rectangles and quadrilaterals to capture text information.

Amazon curved text OCR ai

TextTubes’ performance was evaluated on CTW-1500, a data set consisting of 1,500 images collected from natural scenes and image libraries and over 10,000 text instances with at least one curved instance per image, and on Total-Text, which contains roughly 1,255 training images and 300 test images with one or more curved text instances. The researchers report that they achieved industry-leading results with 83.65% accuracy on CTW-1500, compared with the closest method’s 75.6% accuracy.

“Modeling an instance’s medial axis and average radius … captures information about the instance overall,” wrote the paper’s coauthors. “On datasets that consist of individual words, such as Total-Text, our model is able to achieve state-of-the-art performance. On datasets that have line-level annotations, such as CTW-1500, our model is able to better capture textual information along an instance’s separate words.”

Assuming TextTubes makes its way into production someday, it could be a boon for enterprises that rely heavily on OCR to conduct business. It’s estimated that paper remains in over 80% of digital processes; roughly 97% of small businesses still use paper checks. That’s perhaps why the OCR solutions market is anticipated to be worth $13.38 billion by 2025, according to Grand View Research.


Author: Kyle Wiggers
Source: Venturebeat

Related posts
GamingNews

‘It Would Shock the World’ — John Wick and Space Marine 2 Dev Teases ‘Coolest IP Ever’ It’s Had to Turn Down Twice

GamingNews

The Pokémon Company Is Actively Recruiting PhD Holders With Backgrounds in Ecology to Work in Tokyo

GamingNews

Master Chief Actor Steve Downes Hits Out at the White House for Use of His Voice in ‘Disgusting and Juvenile' Social Media Video

CryptoNews

Bitcoin Supply Hits 20 Million BTC After 6,267 Days, Final Coins Stretch Across 114 Years

Sign up for our Newsletter and
stay informed!