Anonymization Isn't Enough Anymore

A PM I know walked me through their compliance process last month. Healthcare SaaS. They're building an AI feature to analyze patient records. Their legal team had exactly one condition: "Anonymize the data before sending it to the API."

Fine, they said. Removed the names. Stripped the patient IDs. Deleted the clinic identifiers. Legal signed off. Sprint kicked off.

That same week, a completely different story appeared about Anthropic's Claude Opus 4.7.

A Vox writer named Kelsey Piper ran an experiment. She gave Opus 4.7 a short passage — 125 words. Unpublished. Never appeared anywhere online. The model identified her correctly.

Then she gave it something else: a school progress report she'd written about a student's Pokémon essays. Still correct.

Then an unpublished film review of a 1942 WWII comedy she'd never publicly discussed. Also correct.

ChatGPT guessed wrong. Gemini guessed wrong. Opus 4.7 didn't.

The model also correctly identified close associates — people in Piper's social circle — capturing stylistic patterns that had apparently propagated between them. It wasn't working from memorization. It was doing something more unsettling than that.

Here's where I stop, as a PM.

What is anonymization? Delete the name. Remove the ID. Mask the location. Data is "anonymized." Regulators accept this. GDPR relies on it. In healthcare, the entire compliance framework assumes this holds.

But if Opus 4.7 can identify someone from writing style alone — no names, no IDs, no explicit identifiers — that assumption has a crack in it.

Think about healthcare SaaS specifically.

A physician writes dozens of patient notes per day. Every note carries a stylistic fingerprint — how they structure sentences, which phrases recur, their clinical vocabulary. Even with the name stripped out.

Send that anonymized text to an AI API. A capable model can potentially re-identify the author. And if it can re-identify the author, it can start inferring context you thought you'd removed.

This isn't speculation. It's one data point showing where capability is heading.

The PM problem here isn't technical. It's process.

If your compliance workflow is built on "we removed names, so we're good" — that workflow is operating on an assumption that's actively being challenged. And your legal team probably hasn't caught up to this specific development yet.

Whatever data processing agreement you signed three months ago has a definition of "anonymized data." That definition almost certainly doesn't account for what current models can do with stylistic analysis alone.

Two easy reactions — both wrong.

First: "This is an edge case, not relevant to real products." True, for now. Kelsey Piper has a substantial digital footprint. The model recognizing her is less surprising. But the model also got close associates right — not people it memorized, people it inferred stylistically. The capability is already bleeding outward.

Second: "This is Anthropic's problem, not ours." No. If your product sends data to an AI API, you're making the call about what goes in. The last point of accountability for user privacy is you.

What can a PM actually do?

There's no single right answer. But there are right questions to ask.

What kind of text is our AI feature sending? Is there enough in those inputs for stylistic inference? Is our compliance framework built on "anonymization equals sufficient"? Does our legal team know this specific capability now exists?

These aren't alarm questions. They're stay-current questions.

The threshold will drop. Six months from now, 40 words might be enough. Or a different language. Or someone with a smaller footprint.

This doesn't mean abandoning anonymization. It means it's one layer, not the whole strategy.

Until now, "we anonymized the data" ended a conversation.

It's starting to open one instead.