AI & RoboticsNews

DeepSeek unveils new technique for smarter, scalable AI reward models

DeepSeek AI, a Chinese research lab gaining recognition for its powerful open-source language models such as DeepSeek-R1, has introduced a significant advancement in reward modeling for large language models (LLMs). Their new technique, Self-Principled Critique Tuning (SPCT), aims to create generalist and scalable reward models (RMs). This could potentially lead to more capable AI applications for…
Read more
AI & RoboticsNews

Open-source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 — at 95% less cost

Chinese AI startup DeepSeek, known for challenging leading AI vendors with open-source technologies, just dropped another bombshell: a new open reasoning LLM called DeepSeek-R1. Based on the recently introduced DeepSeek V3 mixture-of-experts model, DeepSeek-R1 matches the performance of o1, OpenAI’s frontier reasoning LLM, across math, coding and reasoning tasks. The best part? It does this at a…
Read more