Google Is Now Watermarking Its AI-Generated Text

23 October 2024 at 17:00

The chatbot revolution has left our world awash in AI-generated text: It has infiltrated our news feeds, term papers, and inboxes. It’s so absurdly abundant that industries have sprung up to provide moves and countermoves. Some companies offer services to identify AI-generated text by analyzing the material, while others say their tools will “humanize“ your AI-generated text and make it undetectable. Both types of tools have questionable performance, and as chatbots get better and better, it will only get more difficult to tell whether words were strung together by a human or an algorithm.

Here’s another approach: Adding some sort of watermark or content credential to text from the start, which lets people easily check whether the text was AI-generated. New research from Google DeepMind, described today in the journal Nature, offers a way to do just that. The system, called SynthID-Text, doesn’t compromise “the quality, accuracy, creativity, or speed of the text generation,” says Pushmeet Kohli, vice president of research at Google DeepMind and a coauthor of the paper. But the researchers acknowledge that their system is far from foolproof, and isn’t yet available to everyone—it’s more of a demonstration than a scalable solution.

Google has already integrated this new watermarking system into its Gemini chatbot, the company announced today. It has also open-sourced the tool and made it available to developers and businesses, allowing them to use the tool to determine whether text outputs have come from their own large language models (LLMs), the AI systems that power chatbots. However, only Google and those developers currently have access to the detector that checks for the watermark. As Kohli says: “While SynthID isn’t a silver bullet for identifying AI-generated content, it is an important building block for developing more reliable AI identification tools.”

The Rise of Content Credentials

Content credentials have been a hot topic for images and video, and have been viewed as one way to combat the rise of deepfakes. Tech companies and major media outlets have joined together in an initiative called C2PA, which has worked out a system for attaching encrypted metadata to image and video files indicating if they’re real or AI-generated. But text is a much harder problem, since text can so easily be altered to obscure or eliminate a watermark. While SynthID-Text isn’t the first attempt at creating a watermarking system for text, it is the first one to be tested on 20 million prompts.

Outside experts working on content credentials see the DeepMind research as a good step. It “holds promise for improving the use of durable content credentials from C2PA for documents and raw text,” says Andrew Jenks, Microsoft’s director of media provenance and executive chair of the C2PA. “This is a tough problem to solve, and it is nice to see some progress being made,” says Bruce MacCormack, a member of the C2PA steering committee.

How Google’s Text Watermarks Work

SynthID-Text works by discreetly interfering in the generation process: It alters some of the words that a chatbot outputs to the user in a way that’s invisible to humans but clear to a SynthID detector. “Such modifications introduce a statistical signature into the generated text,” the researchers write in the paper. “During the watermark detection phase, the signature can be measured to determine whether the text was indeed generated by the watermarked LLM.”

The LLMs that power chatbots work by generating sentences word by word, looking at the context of what has come before to choose a likely next word. Essentially, SynthID-Text interferes by randomly assigning number scores to candidate words and having the LLM output words with higher scores. Later, a detector can take in a piece of text and calculate its overall score; watermarked text will have a higher score than non-watermarked text. The DeepMind team checked their system’s performance against other text watermarking tools that alter the generation process, and found that it did a better job of detecting watermarked text.

However, the researchers acknowledge in their paper that it’s still easy to alter a Gemini-generated text and fool the detector. Even though users wouldn’t know which words to change, if they edit the text significantly or even ask another chatbot to summarize the text, the watermark would likely be obscured.

Testing Text Watermarks at Scale

To be sure that SynthID-Text truly didn’t make chatbots produce worse responses, the team tested it on 20 million prompts given to Gemini. Half of those prompts were routed to the SynthID-Text system and got a watermarked response, while the other half got the standard Gemini response. Judging by the “thumbs up” and “thumbs down” feedback from users, the watermarked responses were just as satisfactory to users as the standard ones.

Which is great for Google and the developers building on Gemini. But tackling the full problem of identifying AI-generated text (which some call AI slop) will require many more AI companies to implement watermarking technologies—ideally, in an interoperable manner so that one detector could identify text from many different LLMs. And even in the unlikely event that all the major AI companies signed on to some agreement, there would still be the problem of open-source LLMs, which can easily be altered to remove any watermarking functionality.

MacCormack of C2PA notes that detection is a particular problem when you start to think practically about implementation. “There are challenges with the review of text in the wild,” he says, “where you would have to know which watermarking model has been applied to know how and where to look for the signal.” Overall, he says, the researchers still have their work cut out for them. This effort “is not a dead end,” says MacCormack, “but it’s the first step on a long road.”

Reading view

The Rise of Content Credentials

How Google’s Text Watermarks Work

Testing Text Watermarks at Scale