In a Nutshell
Intelligence measurement in AI is evolving beyond traditional benchmarks like MMLU, with new tests like ARC-AGI and Humanity’s Last Exam focusing on real-world reasoning. The GAIA benchmark assesses practical AI capabilities across web browsing, code execution, and complex reasoning, setting a new standard for evaluating AI performance.
Intelligence is pervasive, yet its…
NTT Research announced at its annual Upgrade event that it has started a new AI basic research group, dubbed the Physics of Artificial Intelligence Group. Physical AI has become a big deal in 2025, with Nvidia leading the charge to create synthetic data to pretest…
In a Nutshell
Writer, an Enterprise AI company, launched AI HQ to enable businesses to bridge the gap between AI potential and real-world results. The platform features autonomous agents for complex workflows, self-evolving models, and a $1.9 billion valuation with unique…
ChatGPT’s memory can now reference all past conversations, not just what you tell it to
April 11, 2025
OpenAI is slowly rolling out better memory on ChatGPT, making it a default for ChatGPT to reference past conversations. This has raised the fear that the platform is proactively “listening” to users, making them uncomfortable with how much the platform knows.
ChatGPT already logs information from previous interactions through its Memory feature, ensuring preferences are saved and conversations…
Google Cloud intros AI security agents, unified security platform to consolidate ops, triage, threat intel
April 10, 2025
Enterprise infrastructure is increasingly complex, meaning protecting it is, too.
The attack surface is more expansive than ever, and many enterprises have a patchwork quilt of security tools, making it difficult to gain a cohesive understanding of their security posture.
Anthropic just launched a $200 version of Claude AI — here’s what you get for the premium price
April 10, 2025
Anthropic introduced a new high-end subscription tier for its Claude chatbot today, directly challenging OpenAI’s premium offerings and marking the latest escalation in the race to monetize powerful AI models amid soaring development costs.
The new “Max” plan offers…
DeepSeek AI, a Chinese research lab gaining recognition for its powerful open-source language models such as DeepSeek-R1, has introduced a significant advancement in reward modeling for large language models (LLMs).
Their new technique, Self-Principled Critique Tuning (SPCT), aims to create generalist and scalable reward models (RMs). This could potentially lead to more capable AI applications for…
The RAG reality check: New open-source framework lets enterprises scientifically measure AI performance
April 9, 2025
Enterprises are spending time and money building out retrieval-augmented generation (RAG) systems. The goal is to have an accurate enterprise AI system, but are those systems actually working?
The inability to objectively measure whether RAG systems are actually working is a…
Wells Fargo’s AI assistant just crossed 245 million interactions – no human handoffs, no sensitive data exposed
April 9, 2025
Wells Fargo has quietly accomplished what most enterprises are still dreaming about: building a large-scale, production-ready generative AI system that actually works. In 2024 alone, the bank’s AI-powered assistant, Fargo, handled 245.4 million interactions – more than…
New open source AI company Deep Cogito releases first models and they’re already topping the charts
April 9, 2025
Deep Cogito, a new AI research startup based in San Francisco, officially emerged from stealth today with Cogito v1, a new line of open source large language models (LLMs) fine-tuned from Meta’s Llama 3.2 and equipped with hybrid reasoning capabilities — the ability to answer quickly and immediately, or “self-reflect” like OpenAI’s “o” series and DeepSeek R1.
The company aims to push…