Enterprises need to know if the models that power their applications and agents work in real-life scenarios. This type of evaluation can sometimes be complex because it is hard to predict specific scenarios. A revamped version of the RewardBench benchmark looks to give organizations a better idea of a model’s real-life performance.
The Allen Institute of AI (Ai2) launched RewardBench 2, an…
As AI advances toward expanded capabilities, knowledge workers are confronting not just job loss, but the deeper question of what makes them matter.
Fortune published the story of a 42-year-old software engineer with a computer science degree whose purpose has unraveled. He…
Beyond single-model AI: How architectural design drives reliable multi-agent orchestration
May 26, 2025
We’re seeing AI evolve fast. It’s no longer just about building a single, super-smart model. The real power, and the exciting frontier, lies in getting multiple specialized AI agents to work together. Think of them as a team of expert colleagues, each with their own…
Lightricks, the company behind popular creative apps like Facetune and VideoLeap, announced today the release of its most powerful AI video generation model to date. The LTX Video 13-billion-parameter model (LTXV-13B) generates high-quality AI video up to 30 times faster than comparable models while running on consumer-grade hardware rather than expensive enterprise GPUs.
The model introduces…
In my first stint as a machine learning (ML) product manager, a simple question inspired passionate debates across functions and leaders: How do we know if this product is actually working? The product in question that I managed catered to both internal and external…
Databricks is tackling the challenge of AI model performance by addressing the critical issue of data labeling. While labeled data has long been essential for training AI models, enterprises often face a significant bottleneck—not in technology, but in the lengthy process…
Breaking down Grok 3: The AI model that could redefine the industry
February 20, 2025
Less than two years since its launch, xAI has shipped what could arguably be the most advanced AI model to date. Grok 3 matches or beats the most advanced models on all key benchmarks as well as the user-evaluated Chatbot Arena, and its training has not even been completed yet.
We still don’t have a lot of details about Grok 3, as the team has not yet released a paper or technical report. But…
Elon Musk just released an AI that’s smarter than ChatGPT — here’s why that matters
February 19, 2025
Elon Musk’s artificial intelligence startup xAI has unveiled Grok 3, its latest AI model that the company claims outperforms leading competitors across key technical benchmarks. The announcement marks a significant escalation in the race to develop more powerful AI…
‘Personalized, unrestricted’ AI lab Nous Research launches first toggle-on reasoning model: DeepHermes-3
February 17, 2025
AI reasoning models — those that produce “chains-of-thought” (CoT) in text and reflect on their own analysis to try and catch errors midstream before outputting a response — are all the rage now thanks to the likes of DeepSeek and OpenAI’s “o” series. As posted…
AI Agents Are Coming: Decoding Your Personality
February 17, 2025
When I was a kid there were four AI agents in my life. Their names were Inky, Blinky, Pinky and Clyde and they tried their best to hunt me down. This was the 1980s and the agents were the four colorful ghosts in the iconic arcade game Pac-Man.
By today’s standards they weren’t particularly smart, yet they seemed to pursue me with cunning and intent. This was decades before neural networks were…