DeepSeek AI, a Chinese research lab gaining recognition for its powerful open-source language models such as DeepSeek-R1, has introduced a significant advancement in reward modeling for large language models (LLMs).
Their new technique, Self-Principled Critique Tuning (SPCT), aims to create generalist and scalable reward models (RMs). This could potentially lead to more capable AI applications for…
OpenAI drops Deep Research access to Plus users, heating up AI agent wars with DeepSeek and Claude
February 26, 2025
OpenAI announced today that it is rolling out its powerful Deep Research capability to all ChatGPT Plus, Team, Education and Enterprise users, significantly expanding access to what many experts consider the company’s most transformative AI agent since the original…
DeepSeek R1’s bold bet on reinforcement learning: How it outpaced OpenAI at 3% of the cost
January 27, 2025
DeepSeek R1’s Monday release has sent shockwaves through the AI community, disrupting assumptions about what’s required to achieve cutting-edge AI performance. Matching OpenAI’s o1 at just 3%-5% of the cost, this open-source model has not only captivated developers but…
Open-source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 — at 95% less cost
January 21, 2025
Chinese AI startup DeepSeek, known for challenging leading AI vendors with open-source technologies, just dropped another bombshell: a new open reasoning LLM called DeepSeek-R1.
Based on the recently introduced DeepSeek V3 mixture-of-experts model, DeepSeek-R1 matches the performance of o1, OpenAI’s frontier reasoning LLM, across math, coding and reasoning tasks. The best part? It does this at a…