Be careful with copy-paste: fingerprinting text with non-printable characters

Original author: Tom
  • Transfer
Do not want to read? Check out the demo .

Zero-width characters are non-printable control characters that are not displayed by most applications. For example, in this proposal, I asked ten whites of zero width, are you noticeable or? (Hint: Paste the sentence into the Diff Checker to see the location of the characters!). These characters can be used as unique “fingerprints” of text to identify users.


Of course, he can be here. And you never guess

What for?


Well, the original reason is not too interesting. A few years ago, the team and I competed in various video games. The team had a private page for important announcements, among other things. But in the end, these announcements began to be reposted in other places, with bullying of the team, revealing confidential information and team tactics.

The protection of the site seemed quite stable, so we suggested that there is an insider who logs in with the username and password, and then simply copies the ad and places it in another place. Therefore, I developed a script that invisibly imprints the name of the user who displays this ad in each ad.

After a recent postZach Aisan, it became clear that people are interested in the theme of non-printable characters. So I decided to publish this method here along with an interactive demo for everyone. Code samples updated for modern JavaScript, but the general logic is the same.

How?


The exact steps and logic are described below, but in a nutshell: the username string is converted to binary form, then the binary file is converted to a series of non-printable characters representing each bit. Then the non-printable line is imperceptibly inserted into the text. If the text is published on another site, a string of non-printing characters can be extracted and the reverse process can be performed to find out the name of the user who made the copy-paste!

Text fingerprinting


1. Get the name of the user who is logged in and convert it to a binary file.

Here we simply convert each letter of the username to the binary equivalent.

const zeroPad = num => ‘00000000’.slice(String(num).length) + num;
const textToBinary = username => (
  username.split('').map(char =>
    zeroPad(char.charCodeAt(0).toString(2))).join(' ')
);

2. Take the username in binary format and convert it to non-printable characters

The following script iterates over a binary string and converts each bit 1 into a non-printable space, each 0 into a non-printable non-joiner character. After converting each letter, insert the non-printable permission symbol for ligatures (joiner) - and move on to the next.

const binaryToZeroWidth = binary => (
  binary.split('').map((binaryNum) => {
    const num = parseInt(binaryNum, 10);
    if (num === 1) {
      return '​'; // zero-width space
    } else if (num === 0) {
      return '‌'; // zero-width non-joiner
    }
    return '‍'; // zero-width joiner
  }).join('') // zero-width no-break space
);

3. Inserting a “username” in non-printable confidential text

Here we simply insert a block of non-printable characters in confidential text.

Extract username from tagged text


The same actions in reverse order.

1. Extract non-printable “username” from confidential text

Remove confidential text from a string, leaving only non-printable characters.

2. Convert the non-printable "username" back to a binary file

Here we break the string into fragments, taking into account the added inter-letter separators. This gives the equivalent in control characters for each letter of the username! We iterate over the characters and return 1 or 0 to recreate the binary string. If we do not find the corresponding 1 or 0, then we got on the letter spacing (ligature resolution symbol) and, thus, completed the binary conversion for the symbol: you can add one space to the line and go to the next symbol.

const zeroWidthToBinary = string => (
  string.split('').map((char) => { // zero-width no-break space
    if (char === '​') { // zero-width space
      return '1';
    } else if (char === '‌') {  // zero-width non-joiner
      return '0';
    }
    return ' '; // add single space
  }).join('')
);

3. Converting the username from binary format back to text

Finally, we analyze the binary string and convert each series 1 and 0 to the corresponding character.

const binaryToText = string => (
  string.split(' ').map(num =>
    String.fromCharCode(parseInt(num, 2))).join('')
);

Conclusion


Companies more than ever pay much attention to information leaks and the search for insiders. This is just one of many tricks you can use. Depending on the direction of your work, it can be vital to understand the risks associated with copying text. Very few applications display non-printable characters. For example, you can assume that your terminal will try to display them (mine is not!).

If you go back to the secret notice board, then the plan worked as it should. Shortly after the implementation of the script, a new announcement came out. Within hours, the text was distributed elsewhere with an unprintable line attached. The culprit's username was successfully identified and banned: happy ending!

Of course, there are certain reservations about using this method. For example, if the user knows about the script, then theoretically it can replace non-printable characters to substitute another person. So it’s better to insert a unique secret ID instead of the username.

To play with the script, run a demo or see the source code .

Also popular now: