Generative AI adoption has surged by 187% over the past two years. But at the same time, enterprise security investments focused specifically on AI risks have grown by only 43%, creating a significant gap in preparedness as AI attack surfaces rapidly expand. More than 70% of enterprises experienced at least one AI-related breach in the past year alone, with generative models now the primary…
The recent uproar surrounding Anthropic’s Claude 4 Opus model – specifically, its tested ability to proactively notify authorities and the media if it suspected nefarious user activity – is sending a cautionary ripple through the enterprise AI landscape. While…
Basketball has March Madness. Tech has the Consumer Electronics Show. AI has been waiting for its big moment—and this week may finally be it. With Microsoft’s Build and Google’s I/O developer conferences happening back-to-back, it was already primed to be a big week.
In the first generation of the web, back in the late 1990s, search was okay but not great, and it wasn’t easy to find things. That led to the rise of syndication protocols in the early 2000s, with Atom and RSS (Really Simple Syndication) providing a simplified way for website owners to make headlines and other content easily available and searchable.
In the modern era of AI, a new group of…
Question: What product should use machine learning (ML)?
Project manager answer: Yes.
Jokes aside, the advent of generative AI has upended our understanding of what use cases lend themselves best to ML. Historically, we have always leveraged ML for repeatable, predictive…
Researchers from UCLA and Meta AI have introduced d1, a novel framework using reinforcement learning (RL) to significantly enhance the reasoning capabilities of diffusion-based large language models (dLLMs). While most attention has focused on autoregressive models like GPT…
In my first stint as a machine learning (ML) product manager, a simple question inspired passionate debates across functions and leaders: How do we know if this product is actually working? The product in question that I managed catered to both internal and external customers. The model enabled internal teams to identify the top issues faced by our customers so that they could prioritize the right…
Former DeepSeeker and collaborators release new method for training reliable AI agents: RAGEN
April 24, 2025
2025 was, by many expert accounts, supposed to be the year of AI agents — task-specific AI implementations powered by leading large language and multimodal models (LLMs) like the kinds offered by OpenAI, Anthropic, Google, and DeepSeek.
But so far, most AI agents remain…
Researchers from Stanford University and Google DeepMind have unveiled Step-Wise Reinforcement Learning (SWiRL), a technique designed to enhance the ability of large language models (LLMs) to tackle complex tasks requiring multi-step reasoning and tool use.
As the interest…
When AI reasoning goes wrong: Microsoft Research shows more tokens can mean more problems
April 16, 2025
In a Nutshell
Microsoft Research finds that inference-time scaling methods for large language models don’t universally improve performance. Varying benefits, token inefficiency, and cost unpredictability challenge assumptions. Verification mechanisms enhance model accuracy. Brute-force scaling has limits; conventional models can match reasoning models on simpler tasks but struggle with…