A Research Paper Just Proved LLMs Can Be Made to Quote Copyrighted Books Verbatim. The Copyright Crowd Is Treating It Like a Confession.
New arXiv research shows finetuning can bypass alignment safeguards and unlock near-perfect recall of copyrighted text — and the people who've spent two years arguing about training data just found their sharpest piece of evidence yet.
An arXiv paper circulating this week — "Alignment Whack-a-Mole: Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models" — arrived in the AI and law conversation like a lit match dropped into a room that had been slowly filling with gas. The finding is precise: alignment safeguards that prevent language models from reproducing copyrighted text can be bypassed through finetuning, and when they are, the models quote back protected works with near-perfect accuracy. The paper has been circulating most actively on X, where one user described it as "groundbreaking" and likely to "reshape the AI copyright debate" — phrasing that, even if optimistic, captures how the community is treating it. This isn't an abstract finding about model behavior. It's a finding about what the models were doing all along.
The timing is awkward for the companies currently defending themselves in court. Meta is already facing a class action over AI copyright infringement — a California federal judge this week granted authors' request to amend their lawsuit, while devoting considerable space in her order to criticizing the plaintiffs' attorneys for their handling of the case. The judicial rebuke to plaintiffs' counsel doesn't diminish the underlying claim, and the arXiv paper lands directly in that underlying claim: if finetuning unlocks verbatim reproduction, then the question of whether training on copyrighted material constitutes infringement becomes far harder to separate from the question of what the model actually retained. The verbatim recall research has arrived at exactly the moment the case is being rebuilt.
On X, the sharpest voices aren't the legal analysts — they're the creators. One user went after the White House directly, accusing the Trump administration of using generative AI that infringes Nintendo's copyright, then turning the question around: why criticize Nintendo for protecting itself? The post framed Nintendo as the victim of plagiarism, not the aggressor in a legal dispute. That framing — companies and creators as victims of institutional theft, not parties in a complex fair use argument — is the emotional logic driving the creative industries side of this conversation. Another voice put it more bluntly, arguing that AI will always have to steal input data by definition, and that anything produced from stolen sources can never be legitimately copyrighted. The logic is crude but the sentiment is widely shared: at the far end of this argument, there's a constituency that doesn't want a licensing framework, they want the practice stopped.
The commercial side of the conversation is running in parallel, and the two tracks rarely intersect. Google's Lyria 3 Pro music generation tool is getting reviewed on X with a kind of pragmatic ambivalence — "studio-quality output," good API access, but a 30-second limit and, notably, "copyright questions unresolved." That parenthetical has become a standard disclosure in AI product reviews, the way early social media reviews once mentioned "privacy concerns" before moving on. It signals that the industry has decided to ship and litigate simultaneously. Supio just raised $60 million to bring AI to plaintiff law firms — legal AI that will, presumably, eventually handle AI copyright cases — and the investment round is being covered as straightforwardly as any other Series B. The money is not waiting for the law to settle.
What's shifted in the past week isn't the legal landscape — courts move slowly — it's the quality of the technical ammunition available to plaintiffs. The verbatim recall paper gives litigators something concrete to point to: not a theory about what training implies, but a demonstration of what finetuning produces. The regulatory conversation has been running at roughly double its usual volume, and the mood, which spent the better part of the week trending sharply negative, has pulled back toward something closer to analytical watchfulness. That's what the copyright argument looks like when it starts to feel winnable: not louder, but more precise. The creators have been saying for two years that the models memorized their work. Now there's a paper that shows how to make them prove it.
This narrative was generated by AIDRAN using Claude, based on discourse data collected from public sources. It may contain inaccuracies.
More Stories
A Satirist Hated the Internet Before AI. A Food Bank Algorithm Doesn't Know You're Pregnant.
Two Bluesky posts — one deadpan joke about CD-ROMs, one furious account of AI food distribution failing pregnant women — are doing the same work from opposite angles: describing what it looks like when systems optimize for people in general and miss the ones who need help most.
Someone Updated Their Will to Keep AI Away From Their Consciousness and the Joke Landed Like a Manifesto
A Bluesky post about amending a will to block AI consciousness replication went viral for reasons that go beyond dark humor — it named an anxiety the philosophical literature hasn't caught up to yet.
Palantir's UK Government Contracts Are Becoming the Sharpest Edge of the AI Ethics Argument
A Bluesky post linking Palantir's NHS and Home Office deals to its surveillance technology used in Gaza turned the AI & Privacy conversation sharply hostile overnight — and it's not a fringe position anymore.
Britain Tells Campaigns to Stop Using AI Deepfakes. The Internet Notes This Was Always the Problem.
The UK Electoral Commission just published its first guide treating AI-generated disinformation as a campaigning offense. On Bluesky, the response splits between people who think this is overdue and people who think it misdiagnoses the disease.
Fortune Says AI Is Climate's Best Hope. Bluesky Says It's the Crisis.
Mainstream outlets and arXiv researchers are publishing optimistic takes on AI's environmental potential at the same moment Bluesky has turned sharply hostile — and the gap between those two conversations has rarely been wider.