Anthropic claims new AI security method blocks 95% of jailbreaks, invites red teamers to try
February 4, 2025
Model developers have yet to come up with an effective defense — and, truthfully, they may never be able to deflect such attacks 100% — yet they continue to work toward that aim. To that end, OpenAI rival Anthropic, make of the Claude family of LLMs and chatbot, today released a new system it’s calling “constitutional classifiers” that it says filters the “overwhelming majority” of…