AI & RoboticsNews

DeepSeek unveils new technique for smarter, scalable AI reward models

DeepSeek AI, a Chinese research lab gaining recognition for its powerful open-source language models such as DeepSeek-R1, has introduced a significant advancement in reward modeling for large language models (LLMs). Their new technique, Self-Principled Critique Tuning (SPCT), aims to create generalist and scalable reward models (RMs). This could potentially lead to more capable AI applications for…
Read more
AI & RoboticsNews

LLMs Generalize Better with Less Hand-Labeled Training

Large Language models(LLMs) can generalize better when left to create their own solutions, a new study by Hong Kong University and University of California, Berkeley, shows. The findings, which apply to both large language models (LLMs) and vision language models (VLMs)…