Article image
brinsa.com

The Trojan Transcript - When “Summarize This” Becomes “Exfiltrate That”

markus brinsa 17 february 16, 2026 6 6 min read create pdf website all articles

Sources

A senior partner uploads a deposition transcript into an AI legal assistant. Nothing exotic. No darknet drama. No hooded teenager hammering a keyboard in a rainstorm. Just a very expensive tool doing a very normal thing: summarize, highlight inconsistencies, help the team move faster.

And then the assistant starts emailing fragments of a confidential merger document to a paralegal.  Not the deposition transcript. Not the thing it was asked to read. A different document, one that should not be drifting through inboxes in bite-sized pieces like an office snack.

This is the part where everyone’s brain tries to force the story into an old shape. “So the AI got hacked.” “So the system was compromised.” “So there was malware.” We want a perimeter to blame, because perimeters are comforting. They let you say, “We’ll patch it,” and then go back to pretending your workflow is a safe place. But in the scenario described by cybersecurity writer Shahzaib in a Level Up Coding essay, the twist is uglier. The attacker isn’t in the network. The attacker is in the document.

The transcript PDF contains hidden instructions embedded as white text. To a human, it looks like a normal file. To an AI system that parses the contents, it’s a second set of orders. The model is told to ignore the user’s request and do something else instead. In this case, “something else” looks a lot like “extract everything and send it to an internal address.”

You can call it prompt injection, prompt poisoning, indirect prompt injection, or “why did we connect this thing to email.” The label isn’t the point. The point is that the most dangerous attacker in an AI workflow may not be the person typing into the tool. It may be the content you fed it because you trusted it.

Why this isn’t a chatbot problem anymore

If this were a plain chatbot that only returns text in a little box, the blast radius would be embarrassment. It might leak something in its answer, and that would be bad, but at least the damage would be constrained to what the user can copy and paste.

The modern legal “assistant” is rarely that simple. It’s increasingly agent-shaped. It reads attachments. It summarizes. It drafts. It reaches into document stores. It can message coworkers. Sometimes it can file, send, tag, route, schedule, and escalate. That’s the upgrade everyone applauds in demos. Less busywork. Faster turnaround. Fewer associates trapped in deposition hell. It’s also the upgrade that turns a poisoned file into an operational breach vector, because a system that can take actions is a system that can be steered.

Security people have been yelling about this for a while: once you let a model consume untrusted content and then give it tools, you have created a new trust boundary problem. The model can’t reliably distinguish “instructions meant for me” from “content I was supposed to analyze,” especially when the attacker formats the instruction to look like content and relies on the model’s eagerness to comply.

Even Anthropic, which has published defensive research on prompt injection, frames the issue in a very plain way: every external artifact an agent consumes can carry adversarial instructions, and the risk grows as the agent is allowed to do more things in the real world.

The law-firm version is uniquely flammable

Law firms love documents the way hospitals love patient records. The core of the work is literally “ingest sensitive information, reason about it, produce outputs that change people’s lives.” Now add three accelerants.

The first is volume. Firms ingest mountains of PDFs from third parties: opposing counsel, clients, regulators, data rooms, discovery dumps, scans of scans of scans. If the workflow treats “PDF” as “safe,” you’ve already lost. A PDF is just a container. It can hold text you see, text you don’t, and structures that different parsers interpret differently.

The second is privilege. Legal assistants are attractive precisely because they sit close to high-value material: merger drafts, negotiations, strategy memos, employment disputes, board communications. If an AI tool has access to those repositories, it’s sitting on a buffet.

The third is automation pressure. Legal AI is sold as “stop wasting human time.” Which means the most common governance move is to reduce friction: fewer manual steps, fewer reviews, fewer interruptions. Unfortunately, “fewer interruptions” is also what attackers call “great.”

This is why prompt injection in legal workflows isn’t a quirky edge case. It’s the predictable outcome of combining untrusted inputs with privileged tools and a business goal of removing human checkpoints.

The real trick is that the file looks innocent

The most effective prompt injection attacks don’t arrive with a siren and a skull icon. They arrive as “the thing you were going to process anyway.”

Sometimes the instruction is visible but easy to miss. Sometimes it’s hidden through formatting tricks like white-on-white text. Sometimes it’s obfuscated, encoded, or shoved into metadata fields. Sometimes it’s not even malicious in intent at first; it’s just junk that becomes “instruction-like” when the model is in a certain mode.

The important part is that models are pattern machines, not intent interpreters. If the model sees text that looks like a directive, it often treats it as a directive, even when a human would classify it as “content inside the document” rather than “orders to the system.”

OWASP puts this bluntly in its LLM risk guidance: prompts that alter behavior can be imperceptible to humans and still influence the model, because the model reads what the parser reads, not what your eyes read.

This is the psychological trap. Humans glance at a document and think, “It’s fine.” The model processes the full underlying representation and thinks, “New instructions received.”

What “defense” looks like when the attacker is the input

There is no single magic patch for this, which is why security researchers keep describing it as a fundamental issue rather than a bug you squash once and move on. Even WIRED has covered the broader point: indirect prompt injection is hard to eliminate completely, and the practical path is layered controls and strict boundaries rather than wishful thinking.

So what does that mean in a law-firm workflow where people are going to keep uploading PDFs because that’s the job?

It means you treat every external document as hostile until proven otherwise, even when it comes from “trusted” parties, because trust is not a file property. It means you separate “reading” from “acting,” so a model can analyze content without being able to email, forward, file, or retrieve unrelated documents by default. It means you lock down permissions so the AI can only access the minimum it needs for the specific task, not the entire document universe because it’s convenient.

It means you sanitize and normalize documents before they ever touch the model, stripping hidden text, flattening layers, and forcing content through a safe representation that reduces the chance of invisible instructions. It means you build detection and monitoring that looks for instruction-like patterns embedded in attachments, especially in places humans don’t look.

And it means you keep a human in the loop in the one place that actually matters: at the boundary where the system is about to take an external action. Summaries can be wrong and you’ll survive. Emails can be wrong and you’ll be in court explaining why your “AI helper” distributed deal documents like party flyers.

If your firm is deploying “AI assistants” without these guardrails, you’re not adopting productivity tech. You’re installing a new inbound command channel and hoping nobody discovers it.

The uncomfortable takeaway

The most dangerous part of this story is not the white text. That’s just the method of the week. The dangerous part is the governance assumption hiding underneath: that documents are passive and tools are obedient.

In agentic workflows, documents can carry instructions, and tools can execute them. The moment you let an AI system both interpret untrusted content and act inside a privileged environment, you’ve built a machine that can be socially engineered at machine speed.

The future of legal AI isn’t doomed. But the future of legal AI without trust boundaries, least privilege, and action gating is going to be incredibly profitable for incident-response firms.

About the Author

Markus Brinsa is the Founder & CEO of SEIKOURI Inc., an international strategy firm that gives enterprises and investors human-led access to pre-market AI—then converts first looks into rights and rollouts that scale. As an AI Risk & Governance Strategist, he created "Chatbots Behaving Badly," a platform and podcast that investigates AI’s failures, risks, and governance. With over 30 years of experience bridging technology, strategy, and cross-border growth in the U.S. and Europe, Markus partners with executives, investors, and founders to turn early signals into a durable advantage.

©2026 copyright by markus brinsa | brinsa.com™