AI & RoboticsNews

Beyond ARC-AGI: GAIA and the search for a real intelligence benchmark

In a Nutshell Intelligence measurement in AI is evolving beyond traditional benchmarks like MMLU, with new tests like ARC-AGI and Humanity’s Last Exam focusing on real-world reasoning. The GAIA benchmark assesses practical AI capabilities across web browsing, code execution, and complex reasoning, setting a new standard for evaluating AI performance. Intelligence is pervasive, yet its…
Read more
AI & RoboticsNews

ChatGPT’s memory can now reference all past conversations, not just what you tell it to

OpenAI is slowly rolling out better memory on ChatGPT, making it a default for ChatGPT to reference past conversations. This has raised the fear that the platform is proactively “listening” to users, making them uncomfortable with how much the platform knows. ChatGPT already logs information from previous interactions through its Memory feature, ensuring preferences are saved and conversations…
Read more
AI & RoboticsNews

DeepSeek unveils new technique for smarter, scalable AI reward models

DeepSeek AI, a Chinese research lab gaining recognition for its powerful open-source language models such as DeepSeek-R1, has introduced a significant advancement in reward modeling for large language models (LLMs). Their new technique, Self-Principled Critique Tuning (SPCT), aims to create generalist and scalable reward models (RMs). This could potentially lead to more capable AI applications for…
Read more
AI & RoboticsNews

New open source AI company Deep Cogito releases first models and they’re already topping the charts

Deep Cogito, a new AI research startup based in San Francisco, officially emerged from stealth today with Cogito v1, a new line of open source large language models (LLMs) fine-tuned from Meta’s Llama 3.2 and equipped with hybrid reasoning capabilities — the ability to answer quickly and immediately, or “self-reflect” like OpenAI’s “o” series and DeepSeek R1. The company aims to push…
Read more