Back to Blog

Hargrove & Cole is a thirty-attorney commercial litigation firm. Business disputes, contract fights, partnership dissolutions. The work runs on motion practice. It is 9:14 on a Tuesday night, and the opposition to a summary judgment motion is due at 11:59, the court's electronic filing cutoff. Opposing counsel filed fourteen days ago. The clock has been running the whole time.

Marian is the last person who reads it before it matters.

She has been a litigation paralegal for fifteen years. She holds a Master of Legal Studies and a CP credential through NALA. She sits on the board of her state paralegal association, mentors every paralegal the firm hires, and has spoken at regional CLE events on litigation support and, lately, on using technology in practice without getting burned by it. When a brief has to be right, it goes to Marian first. She is not the person who fears the new AI tool. She is the person everyone asks about it.

Tonight a second-year associate drafted the opposition using that tool, the first time the firm has used it on a brief this serious. It reached Marian late. She is tired. She has already worked two other matters today. And she is the safeguard.

The brief reads well. That is the problem.


The Reflex

When a bad citation gets through, the firm always reaches for the same answer. The reviewer should have caught it. Tighten accountability. Add consequences. Someone was not careful enough.

That answer is half right. The wrong half is the half that matters.

Accountability does change behavior. The research is clear. In studies of automation bias in aviation, pilots who felt genuinely accountable for their work checked the automation more often and made fewer errors than those who did not (Skitka, Mosier, & Burdick, 1999). Accountability gets people to look.

It does not get them to see. The same research shows why.


What Accountability Cannot Reach

In those studies, people using a reliable but imperfect automated aid did worse than people with no aid at all. The omission error rate — the rate of missing a problem the automation failed to flag — hit 55 percent (Skitka, Mosier, & Burdick, 1999).

The error correlated with experience. More flight hours and more years on the job meant a higher chance of missing the automation's failure, not a lower one.

Experience did not guard against the error. It tracked with it.

Marian's qualifications are not the protection everyone assumes. She is the most accountable, most experienced person who will touch this brief. By the reflex the firm trusts, the document is in the safest possible hands, and it is not. To understand why, look at what happens while she reads.


Looking Is Not Seeing

Three forces work against her. None of them is effort.

Automation bias comes first. People treat a capable system's output as presumptively right, and that assumption quietly shuts down the search for what is wrong. It is not laziness. It is a measured shift in where attention goes. In a 2024 study of computational pathology, expert pathologists overturned their own correct judgments and took the system's wrong answer 7 percent of the time (Rosbach et al., 2024). Experts. Already correct. Talked out of it by a machine.

Processing fluency comes second. Information that is easy to process gets judged as more true. In one foundational study, identical statements were rated true more often simply for being printed in higher contrast, easier to read (Reber & Schwarz, 1999). Ease of reading becomes evidence of truth. AI output is fluent by design. It arrives smooth and confident, carrying the exact signal a reader misreads as correct.

Read it the way Marian did

From the draft brief · p. 14
Summary judgment is improper where a party's intent is genuinely in dispute, and the moving party cannot resolve that question on the papers alone. . On this record, intent is squarely contested, and the motion must fail.

What the case actually holds

Develin is real. But it addresses the standard for damages calculations at summary judgment, not whether intent precludes it. The case exists. The citation format is perfect. The proposition is invented.

A fluent sentence carries a citation that feels correct. Reading for the argument, you take in the flow and the citation passes. That ease is the exact signal the mind misreads as truth.

Cognitive offloading comes third. People hand mental work to their tools, and they decide how much to hand over based on an internal read of whether they need to engage. That read is often wrong (Risko & Gilbert, 2016). The reviewer's own sense that this one looks fine is the faulty instrument. It feels like judgment. Often it is just the absence of friction.

Stack those three inside one tired person at 9:14 at night. The output reads as true because it is fluent. Scrutiny is already down because the tool is capable. And Marian's read on whether this brief needs a hard look is being set to no by the smoothness of the thing itself. She is not failing to try. She is trying, and the effort is being routed around the error.


A Law Office Is Close to the Worst Place for This

These effects are not fixed. They get stronger under three conditions: time pressure, high volume, and high trust in the system. The research names all three.

That list is a description of a litigation practice.

Time pressure is the deadline. Volume is the caseload. Trust in the system is what builds the moment the AI tool gets a few briefs right and the team stops watching it closely. A tool that is usually right is more dangerous than one that is obviously unreliable. The unreliable one keeps your guard up. The reliable one lowers it, a little more with every brief it gets right.

A firm does not just fail to defend against these forces. It runs all three at once, hardest on the night that matters most.


The Citation

The case Marian missed was real, and that is what makes it dangerous.

Everyone has heard the fabricated-case stories now. People know to watch for citations to cases that do not exist. This error was quieter. The case was real. The citation was clean: right reporter, right volume, right page. But it was cited for a proposition it does not hold. A real decision, attached to an argument it never made.

Marian read the brief the way any good reviewer reads one. Start to finish, following the argument, checking citations as she reached them. In that mode the bad one was invisible. It sat in a well-built paragraph, in a sentence that flowed, behind a proposition that sounded exactly like something a court would hold. Reading for the argument, she took in the fluency. The citation passed the way the good ones did.

It went to Daniel Cole, the partner. He signed it. Cole is a strong litigator who trusts his team, and for fifteen years trusting Marian has been correct. His signature carries the Rule 11 obligation, the certification that he made a reasonable inquiry and the legal contentions are warranted. His review was a confidence check, not a rebuild, because that is how partners work and because Marian's name on it meant it was done.

Three professionals. Each did the job the way the profession taught them. The brief still went out wrong.


Same night, two systems

A human in the loop

9:14 PM · brief due 11:59
Marian reviews Tired · unsupported

She reads start to finish, following the argument, checking citations as she reaches them. The brief is clean. The tool is capable. Her own sense says this one looks fine.

The real-case-wrong-proposition citation sits inside a sentence that flows. Reading for the argument, she takes in the fluency. It passes the way the good ones did.
Outcome. The brief reaches the partner, who signs. Nobody was careless. It still went out wrong.

What stands behind her

Citations pulled to a list

Each one tagged with the exact proposition it supports, stripped of the prose that made it feel right.

A prompt log

Shows what the drafter accepted, corrected, or threw out. The research section was accepted as-is, so that is where she pushes hardest.

A reviewer who never saw the prompt

She reads what is on the page, not what it was meant to say. No expectation to confirm.

A finite task, not fresh brilliance

Work the list. Check each case against the proposition tagged to it. A tired expert can still execute that.

Outcome. She pulls the case, reads the holding cold, and sees it does not stand for what it is cited for. Ninety seconds. The brief the partner signs is one he can stand behind.

Same person. Same fatigue. The second system did not need Marian to be perfect. It was built for the reviewer who is tired.

Run the Night Again

Same brief. Same deadline. Same tired Marian at 9:14. Keep the people exactly as they are. Change only the system around them.

This time the associate worked inside a few governance habits. None of them ask anyone to be sharper or more disciplined than they already are.

The AI was told to produce a consolidated list of every citation at the end of the document, and for each one, the specific proposition it supports. Not just Smith v. Jones, 123 F.3d 456, but Smith v. Jones, cited for the proposition that summary judgment is improper where intent is genuinely disputed. The citations came out of the prose and stood on their own, stripped of the argument that made them feel right.

The work arrived with a prompt log. A short record of what the associate asked, what the tool returned, and for each piece whether it was used as written, corrected, or thrown out. Marian could see where human judgment had been applied, and where it had not.

And the reviewer was not the person who wrote the prompt. The associate built it. Marian checked it cold, carrying none of the associate's expectations about what it was supposed to say. She read what was on the page instead of what was meant to be there.

She is just as tired in this version. The difference is that she is not being asked to summon brilliance across nineteen pages at the end of a long day. She has a finite task. Work the list. Check each case against the proposition tagged to it. The prompt log shows her the associate accepted the research section almost entirely as the tool produced it, no corrections noted. That tells her where to push hardest.

She reaches the citation. The case is real. She pulls it. Reading the holding directly, with no paragraph around it to carry her past, she sees the case does not stand for what it is cited for. Ninety seconds. She flags it, the associate fixes it, and what reaches Cole is something he can put his name on.

Same person. Same fatigue. The outcome changed because the second system did not need Marian to be perfect. It was built for a Marian who is tired.


The Principle Firms Get Backward

The difference between the two nights is not Marian. It is what stood behind her.

The first firm put a human in the loop and called it governance. Marian was the safeguard, and nothing backed her up. When the conditions turned against her, the way they always do, there was nothing to catch what she missed.

The second firm understood what the first did not. A review process that needs an ideal reviewer is not a process. It is a wish. The ideal reviewer, fresh and unhurried and immune to fluent text, does not exist at 9:14 before a midnight deadline. The reviewer who exists is tired, rushed, and three matters deep. Governance that works is built around that reviewer, because that reviewer is the one holding the line when it counts.

A review process that needs an ideal reviewer is not a process. It is a wish.

If you take one thing from this, take the part that runs against instinct. The most dangerous AI output is not the rough draft that obviously needs work. It is the clean one that reads as finished, because the polish is the exact signal that turns a reviewer's scrutiny off. The cleaner it looks, the harder it should be checked. Most firms do the opposite. They dig into the messy output and wave the polished output through, which is exactly backward.

You do not fix this by telling Marian to try harder. She was already trying. You fix it by building a system that surfaces the error whether she is fresh or exhausted, whether the writing is smooth or not, whether the tool has earned trust or only the look of it.

That is the line between having a human in the loop and having governance. One is a person. The other is built around the person, for the nights when careful is not enough.

The firm and individuals in this piece are composites, written to show how these mechanisms operate in practice. The research findings are real and cited below.


References

  1. Reber, R., & Schwarz, N. (1999). Effects of perceptual fluency on judgments of truth. Consciousness and Cognition, 8(3), 338–342. https://doi.org/10.1006/ccog.1999.0386
  2. Risko, E. F., & Gilbert, S. J. (2016). Cognitive offloading. Trends in Cognitive Sciences, 20(9), 676–688. https://doi.org/10.1016/j.tics.2016.07.002
  3. Rosbach, E., Ganz, J., Ammeling, J., Riener, A., & Aubreville, M. (2024). Automation bias in AI-assisted medical decision-making under time pressure in computational pathology (arXiv:2411.00998) [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2411.00998
  4. Skitka, L. J., Mosier, K. L., & Burdick, M. (1999). Does automation bias decision-making? International Journal of Human-Computer Studies, 51(5), 991–1006. https://doi.org/10.1006/ijhc.1999.0252
Previous Post ← Nobody Wanted to Use It Until It Was the Only Thing That Was Working
All Posts Back to Blog →