Hiding Information in Plain Text 3

Hiding Information in Plain Text

Posted Tuesday, May 15, 2018 by Jeff Safire

Subtle changes to letter shapes can embed messages

By Charles Q. Choi

Hiding Information in Plain Text 1
Image: Columbia University

Computer scientists have now invented a way to hide secret messages in ordinary text by imperceptibly changing the shapes of letters.

The new technique, named FontCode, works with common font families such as Times Roman and Helvetica. It is compatible with most word-processing software, including Microsoft Word, as well as image-editing and drawing programs, such as Adobe Photoshop and Adobe Illustrator.

Although there are obvious applications for espionage with FontCode, its inventors suggest it has more practical uses in terms of embedding metadata into texts, much like watermarking. “You can imagine that it would be used to provide extra information, such as authors, copyright and so on, about a document,” says study senior author Changxi Zheng, a computer scientist at Columbia University. “Another application is to protect legal documents: Our technique can be used to detect if a document, even when printed on paper, has been tampered with or not. It can even be used to tell which part of the document is tampered.”

Changxi and his collaborators will detail their findings in August at the SIGGRAPH conference in Vancouver.

Another potential application of FontCode is as an alternative to QR codes. For instance, when people snap a photo of a poster with FontCode-modified text, their smartphones may be redirected “to a website or Youtube video about that poster,” Zheng says. “This is similar to what a QR code can do, but now without the need of putting a black-and-white pattern that can be distracting or compromise the aesthetics of the poster.”

Hiding Information in Plain Text 2
Image: Changxi Zheng/Columbia Engineering

Someone using FontCode would supply a secret message and a carrier text document. FontCode converts the secret message to a bit string (ASCII or Unicode) and then into a sequence of integers.

FontCode embeds data into texts using minute perturbations to components of letters. This includes changing the width of strokes, adjusting the height of ascenders and descenders, and tightening or loosening the curves in serifs and the bowls of letters such as o, p, and b.

A kind of artificial-intelligence system known as a convolutional neural network can recognize these perturbations and help recover the embedded messages. The amount of information FontCode can hide is limited only by the number of letters on which it acts, the researchers say.

“Traditionally, a text document is meant to deliver information to the human only. Now we show that it can also deliver embedded information to digital intelligent systems, and the two parts of information delivery do not conflict,” Zheng says. “This is drastically different from existing methods such as QR codes or optical barcodes, which are meant to be read by digital systems but occupy a certain area on the paper.”

To account for potential distortions to text due to concerns such as lighting, blurriness, or camera angle, the scientists relied on the 1,700-year-old Chinese remainder theorem, which can help reconstruct missing information. This strategy could help recover hidden messages even when 25 percent of perturbations to texts are not recognized correctly.

Moreover, FontCode not only embeds messages in text, but can also encrypt them. For instance, users can agree on a private key that can specify the order in which hidden letters are read.

Although other methods exist to hide a message in text, FontCode’s inventors say their new technique is the first to work independent of document type. It can also retain secret information even when a document or an image with text is printed onto paper or converted to another file type, they say.

The researchers have filed a patent for FontCode with Columbia Technology Ventures. In addition, “we want to extend this technique to other languages,” Zheng says. “We demonstrated using English in this project; it might require a bit of thought to extend it to other languages—especially the logographic languages, such as Chinese.”

This article originally posted at IEEE Spectrum, May 15, 2018.


No Comments