Maybe AI agents can be lawyers after all

Maybe AI agents can be lawyers after all
Last month, I wrote about Mercor’s new benchmark measuring AI agents’ capabilities on professional tasks like law and corporate analysis.

<p id="speakable-summary" class="wp-block-paragraph">Last month, I wrote about <a href="https://techcrunch.com/2026/01/22/are-ai-agents-ready-for-the-workplace-a-new-benchmark-raises-doubts/" target="_blank" rel="noreferrer noopener">Mercor’s new benchmark</a> measuring AI agents’ capabilities on professional tasks like law and corporate analysis. At the time, the scores were pretty dismal, with every major lab scoring under 25%, so we concluded lawyers were safe from AI displacement, at least for now.</p>
<p class="wp-block-paragraph">But AI capabilities can change a lot in a couple of weeks.</p>
<p class="wp-block-paragraph"><a href="https://techcrunch.com/2026/02/05/anthropic-releases-opus-4-6-with-new-agent-teams/">This week’s release of Anthropic’s Opus 4.6</a> shook up <a href="https://www.mercor.com/apex/apex-agents-leaderboard/" target="_blank" rel="noreferrer noopener nofollow">the leaderboards</a>, with Anthropic’s new model scoring just shy of 30% in one-shot trials, and an average of 45% when given a few more cracks at the problem. Notably, the release included a bunch of new agentic features, including “agent swarms,” which may have helped with this kind of multistep problem-solving.</p>
<p class="wp-block-paragraph">Regardless, the score is a huge jump from the previous state-of-the-art, and a sign that progress on foundation models isn’t slowing down. Mercor CEO Brendan Foody, who was particularly impressed, said, “jumping from 18.4% to 29.8% in a few months is insane.”</p>
<p class="wp-block-paragraph">Thirty percent is still a long way from 100%, so it’s not like lawyers need to be worried about getting replaced by machines next week. But they should be a lot less confident than they were last month!</p>