AI & RoboticsNews

Google releases SimCLR, an AI framework that can classify images with limited labeled data

A team of Google researchers recently detailed a framework called SimCLR, which improves previous approaches to self-supervised learning, a family of techniques for converting an unsupervised learning problem (i.e., a problem in which AI models train on unlabeled data) into a supervised one by creating labels from unlabeled data sets. In a preprint paper and accompanying blog post, they say that SimCLR achieved a new record for image classification with a limited amount of annotated data and that it’s simple enough to be incorporated into existing supervised learning pipelines.

That could spell good news for enterprises applying computer vision to domains with limited labeled data.

SimCLR learns basic image representations on an unlabeled corpus and can be fine-tuned with a small set of labeled images for a classification task. The representations are learned through a method called contrastive learning, where the model simultaneously maximizes agreement between differently transformed views of the same image and minimizes agreement between transformed views of different images

SImCLRAbove: An illustration of the SimCLR architecture.

Image Credit: Google

SimCLR first randomly draws examples from the original data set, transforming each sample twice by cropping, color-distorting, and blurring them to create two sets of corresponding views. It then computes the image representation using a machine learning model, after which it generates a projection of the image representation using a module that maximizes SimCLR’s ability to identify different transformations of the same image. Finally, following the pretraining stage, SimCLR’s output can be used as the representation of an image or tailored with labeled images to achieve good performance for specific tasks.

Google says that in experiments SimCLR achieved 85.8% top 5 accuracy on a test data set (ImageNet) when fine-tuned on only 1% of the labels, compared with the previous best approach’s 77.9%.

“[Our results show that] preretraining on large unlabeled image data sets has the potential to improve performance on computer vision tasks,” wrote research scientist Ting Chen and Google Research VP and engineering fellow and Turing Award winner Geoffrey Hinton in a blog post. “Despite its simplicity, SimCLR greatly advances the state of the art in self-supervised and semi-supervised learning.”

Both the code and pretrained models of SimCLR are available on GitHub.


Author: Kyle Wiggers.
Source: Venturebeat

Related posts
AI & RoboticsNews

Medical training’s AI leap: How agentic RAG, open-weight LLMs and real-time case insights are shaping a new generation of doctors at NYU Langone

AI & RoboticsNews

OpenAI’s ChatGPT explodes to 400M weekly users, with GPT-5 on the way

AI & RoboticsNews

Together AI’s $305M bet: Reasoning models like DeepSeek-R1 are increasing, not decreasing, GPU demand

DefenseNews

Army Stinger missile replacement competition heads into flight tests

Sign up for our Newsletter and
stay informed!

Worth reading...
Oppo Ace 2 5G to have 90Hz display