The recent uproar surrounding Anthropic’s Claude 4 Opus model – specifically, its tested ability to proactively notify authorities and the media if it suspected nefarious user activity – is sending a cautionary ripple through the enterprise AI landscape. While Anthropic clarified this behavior emerged under specific test conditions, the incident has raised questions for technical…
Basketball has March Madness. Tech has the Consumer Electronics Show. AI has been waiting for its big moment—and this week may finally be it. With Microsoft’s Build and Google’s I/O developer conferences happening back-to-back, it was already primed to be a big week.
In the first generation of the web, back in the late 1990s, search was okay but not great, and it wasn’t easy to find things. That led to the rise of syndication protocols in the early 2000s, with Atom and RSS (Really Simple Syndication) providing a simplified way for…
Question: What product should use machine learning (ML)?
Project manager answer: Yes.
Jokes aside, the advent of generative AI has upended our understanding of what use cases lend themselves best to ML. Historically, we have always leveraged ML for repeatable, predictive patterns in customer experiences, but now, it’s possible to leverage a form of ML even without an entire training…
Researchers from UCLA and Meta AI have introduced d1, a novel framework using reinforcement learning (RL) to significantly enhance the reasoning capabilities of diffusion-based large language models (dLLMs). While most attention has focused on autoregressive models like GPT…
In my first stint as a machine learning (ML) product manager, a simple question inspired passionate debates across functions and leaders: How do we know if this product is actually working? The product in question that I managed catered to both internal and external…
Former DeepSeeker and collaborators release new method for training reliable AI agents: RAGEN
April 24, 2025
2025 was, by many expert accounts, supposed to be the year of AI agents — task-specific AI implementations powered by leading large language and multimodal models (LLMs) like the kinds offered by OpenAI, Anthropic, Google, and DeepSeek.
But so far, most AI agents remain stuck as experimental pilots in a kind of corporate purgatory, according to a recent poll conducted by VentureBeat on the…
Researchers from Stanford University and Google DeepMind have unveiled Step-Wise Reinforcement Learning (SWiRL), a technique designed to enhance the ability of large language models (LLMs) to tackle complex tasks requiring multi-step reasoning and tool use.
As the interest…
When AI reasoning goes wrong: Microsoft Research shows more tokens can mean more problems
April 16, 2025
In a Nutshell
Microsoft Research finds that inference-time scaling methods for large language models don’t universally improve performance. Varying benefits, token inefficiency, and cost unpredictability challenge assumptions. Verification mechanisms enhance model…
New open source AI company Deep Cogito releases first models and they’re already topping the charts
April 9, 2025
Deep Cogito, a new AI research startup based in San Francisco, officially emerged from stealth today with Cogito v1, a new line of open source large language models (LLMs) fine-tuned from Meta’s Llama 3.2 and equipped with hybrid reasoning capabilities — the ability to answer quickly and immediately, or “self-reflect” like OpenAI’s “o” series and DeepSeek R1.
The company aims to push…