New Legal AI Benchmarking Report Evaluates Four AI Tools Across Seven Legal Tasks

Issue 20
On February 27, 2025, Vals AI released its Vals Legal AI Report (“VLAIR”), a first of its kind evaluation of four legal industry AI tools (CoCounsel, Vincent AI, Harvey Assistant, and Oliver), across up to seven legal tasks commonly performed by lawyers, and benchmarking their results against the results of a lawyer control group.[i] For more on this newsletter’s previous coverage of legal benchmarking, see issue 12.
Out of the seven legal tasks evaluated, one or more AI tools beat the lawyer control group on four tasks, while the lawyer control group surpassed the AI tools on two tasks and matched the highest performing tool on one task.[ii] Harvey Assistant, which participated in six of the seven tasks, had the strongest performance, receiving the top score on five tasks and the second place score on one task, and beating or matching the lawyer control group in five tasks.[iii] CoCounsel also received one top score and ranked among the best performing tools on four of the tasks.[iv]
The Tasks
The legal tasks evaluated by the study were:
- Document Extraction: this task evaluated identification and extraction of specific information within a document, with Harvey and CoCounsel surpassing the lawyer control group.[v]
- Document Question-Answering: the report concluded that lawyers should find value in using generative AI to review and analyze the information in a document, as all of the AI tools outperformed the lawyer control group at this task.[vi]
- Document Summarization: the report also found that lawyers can use AI tools for document summarization with confidence, with all of the AI tools outperforming the lawyer control group.[vii]
- Redlining: the lawyer control group beat each of the AI tools participating in the study at this task.[viii]
- Transcript Analysis: the study noted certain challenges with transcript analysis, including the potential for messy formatting, as well as nuanced information and subtleties.[ix] Both of the tools that participated in this task evaluation, Harvey Assistant and Vincent AI, outperformed the lawyer control group.[x]
- Chronology Generation: Harvey Assistant matched the lawyer control group in chronology generation.[xi]
- EDGAR Research: this task evaluated the ability to perform market-based research or answer questions about U.S. public companies in relation to the U.S. Securities and Exchange Commission’s EDGAR database.[xii] The lawyer control group outperformed the only AI tool participating in this task, Oliver; however, the report noted that the lawyer control group was able to use non-AI research tools such as Google and the EDGAR search interface to complete their task.[xiii]
Future Studies Planned by Vals AI
VLAIR is the first iteration of Vals AI’s planned regular evaluation of legal industry AI tools.[xiv] Vals AI anticipates that additional AI tool vendors will opt in to future benchmarking evaluations, and that additional tasks and skills will be evaluated.[xv] Vals AI is currently conducting a study dedicated solely to legal research that will be released later this year.[xvi]
Takeaways for Lawyers Who are Evaluating Their AI Options
VLAIR is essential reading for any lawyer who is considering adopting an AI tool that performs one or more of the tasks evaluated by this report. You can read the report in its entirety here. As noted by VLAIR, there currently is no consensus in the legal industry about which workflows hold the most generative AI potential.[xvii] VLAIR is an impressive effort to provide lawyers with practical and valuable information they can use in their decision-making about their AI options.
Considering that there are over 50 use cases for legal industry AI tools, and over 200 legal industry AI tools on the market, it’s important to recognize that the vast majority of AI tools for lawyers are unlikely to be included in independent benchmarking studies in the near future. This is a reality of navigating the AI era, where new developments are happening constantly, and the lack of comprehensive benchmarking resources should not be used as justification to exclusively consider AI tool options with benchmarking data.
Instead, lawyers who wish to identify the AI solutions that can make the greatest impact for their organizations should start by clarifying and prioritizing the problems they need to solve with AI. This requires investigating your organization’s technology problems. Where is technology currently serving the people of your organization well, and where is there room for improvement? Is there work performed in your organization that routinely gets written off? What tasks are repetitive? What tasks can be streamlined? What work could be performed more consistently and accurately with technology? Where would a new technology tool make the biggest financial impact? How receptive are the people of your organization to new technology?
Once you understand your organization’s technology problems, you’ll be in a better position to match those problems with the solutions currently available from AI tools. This is also the point in the AI tool evaluation process where benchmarking reports like VLAIR can help guide decision making. However, lawyers who are interested in AI tools that lack independent benchmarking data can still conduct their own evaluations and testing of the AI tools that they have identified as being most promising for their unique organizations. Chapter 5 of A Lawyer’s Practical Guide to AI lays out a process to help you identify whether there are AI tools that potentially meet your organization’s needs, and if so, how to evaluate them before implementing them. Once you know what you want an AI tool to do, you can use the directory of AI tools for lawyers in Chapter 6 of A Lawyer's Practical Guide to AI as a starting point to quickly narrow down the available options and select one or more tools to evaluate further for compatibility with your organization.
Thanks for being here.
Jennifer Ballard
Good Journey Consulting
_____________________________________
[i] Executive summary, Vals Legal AI Report, https://www.vals.ai/vlair (last visited Mar. 1, 2025).
[ii] Id.
[iii] Id.
[iv] Id.
[v] Findings for each skill, Vals Legal AI Report, https://www.vals.ai/vlair (last visited Mar. 1, 2025).
[vi] Id.
[vii] Id.
[viii] Id.
[ix] Id.
[x] Id.
[xi] Id.
[xii] Id.
[xiii] Id.
[xiv] Future plans, Vals Legal AI Report, https://www.vals.ai/vlair (last visited Mar. 1, 2025).
[xv] Id.
[xvi] Methodology, Vals Legal AI Report, https://www.vals.ai/vlair (last visited Mar. 1, 2025).
[xvii] Id.
Stay connected with news and updates!
Join our mailing list to receive the latest legal industry AI news and updates.
Don't worry, your information will not be shared.
We will not sell your information.