How to create a racist AI without even trying. Part 1

Original author: Rob Speer
  • Transfer
The other day, based on another article on racism in speech recognition, I participated in a big debate about who is to blame. Some people were sure that this was a conspiracy of programmers. In fact, the truth lies in the data that AI uses for its training. I decided to conduct an experiment to demonstrate this clearly. It turned out that Rob Speer had already done everything for me.

I want to share with you a translation of his material, which clearly shows that even the most defaulted version of AI will be thoroughly saturated with racism. In the first article we will conduct an experiment, in the second we’ll try to figure out how to overcome that monster that we spawned.



Maybe you heard about the Tay experimental chatbot, which Microsoft has launched on Twitter. In just one day, his notes became so provocative that Microsoft had to disable the bot and never again mention his name. You probably think that this does not threaten you, because you are not doing any strange things (in particular, you are not giving any loafers the opportunity to train your AI based on Twitter).

In this guide, I want to show the following: even if you use the most standard natural language processing algorithms, popular data sets and methods, the result may be a racist classifier that should not exist in nature.

Good news: this can be avoided. To exclude the appearance of racist habits in your classifier, you will need to make a little extra effort. In this case, the corrected version may be even more accurate. But to fix the problem, you need to know what it is, and not grab the first working option.

Let's make a tonality classifier for text!


Sentiment analysis is a very common NLP task , which is not surprising. A system that is able to understand whether a person has left a positive or negative comment has many uses in business. Such solutions are used to monitor publications on social networks, track customer reviews, and even in securities trading (for example : bots who bought shares of Berkshire Hathaway after actress Anne Hathaway received good reviews from critics).

This is a simplified (sometimes too oversimplified) approach, but it is one of the easiest ways to quantify human-created texts. In just a few steps, you can prepare a system that processes texts and issues positive and negative ratings. You don’t have to deal with complex data presentation formats, such as parse trees or entity diagrams.

Now we will make a classifier that is familiar to any NLP specialist. At the same time, at each stage we will choose the easiest option to implement. Such a model, for example, is described in the article Deep Averaging Networks. It is not the main subject of the article, therefore, this mention should not be considered a criticism of the results. There, the model is presented simply as an example of the well-known way of using vector representations of words.

Here is our action plan:

  • Take somewhere the widely used vector representations of words.
  • Take data for training and testing containing the most standard words of positive and negative tonality.
  • Using the gradient descent method, train the classifier to recognize other positive and negative words.
  • Calculate the sentiment estimates of text sentences using this classifier.
  • Horrified by the monster we created.

After that, you will know how to unintentionally make a racist AI.

I would like to avoid such a finale, so then we will do the following :

  • We will perform a statistical assessment of the problem in order to be able to recognize it in the future.
  • We will improve the data so as to obtain a more accurate and less racist semantic model.

Software prerequisite


This guide is written in Python, all libraries are listed below.

import numpy as np
import pandas as pd
import matplotlib
import seaborn
import re
import statsmodels.formula.api
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
%matplotlib inline
seaborn.set_context('notebook', rc={'figure.figsize': (10, 6)}, font_scale=1.5)

You can replace scikit-learn with TensorFlow, Keras, or any other component that contains a gradient descent algorithm.

Step 1. Vector representations of words


Vector representations of words are often used to convert words into a format that is conveniently processed by machine learning systems. Words are represented as vectors in multidimensional space. The smaller the distance between the vectors, the closer the meaning of the corresponding words. Vector representations of words allow you to compare words not letter by letter, but by (approximate) meaning.

To get good vector representations of words, you need to process hundreds of gigabytes of text. Fortunately, many groups of machine learning experts have already done this work and shared the finished materials.

There are two well-known sets of vector representations of words: word2vec (data from Google News was used as educational material for their creation) and GloVe (educational material: web pages processed by Common Crawl). The end results will be similar for both sets. GloVe is based on a more transparent data source, so we will use it.

Three GloVe archives are available for download: 6, 42 and 840 billion records. 840 billion is a lot, but to extract more benefit from this archive than from the 42 billionth set will require complex post-processing. The 42 billion version is very functional and contains a round number of words - 1 million. We are on the path of least resistance, so we will use the 42 billion version.

So, we download the archive glove.42B.300d.zipfromsite GloVe and unpack the file data/glove.42B.300d.txt. Next, we need to create a function that will read vector representations of words in a simple format.

def load_embeddings(filename):
    """
    Load a DataFrame from the generalized text format used by word2vec, GloVe,
    fastText, and ConceptNet Numberbatch. The main point where they differ is
    whether there is an initial line with the dimensions of the matrix.
    """
    labels = []
    rows = []
    with open(filename, encoding='utf-8') as infile:
        for i, line in enumerate(infile):
            items = line.rstrip().split(' ')
            if len(items) == 2:
                # This is a header row giving the shape of the matrix
                continue
            labels.append(items[0])
            values = np.array([float(x) for x in items[1:]], 'f')
            rows.append(values)
    arr = np.vstack(rows)
    return pd.DataFrame(arr, index=labels, dtype='f')
embeddings = load_embeddings('data/glove.42B.300d.txt')
embeddings.shape
#  (1917494, 300)

Step 2. Standard lexicon of keys


From somewhere we need to take information about which words have a positive tone and which ones have a negative tone. There are many lexicons of tonality, but, as usual, we will choose one of the simplest. Download the archive from Bin Liu's website and extract the lexicon files, data/positive-words.txtand data/negative-words.txt.

Next, we need to specify a way to read these files and read their contents into variables pos_wordsand neg_words.

def load_lexicon(filename):
    """
    Load a file from Bing Liu's sentiment lexicon
    (https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html), containing
    English words in Latin-1 encoding.
    One file contains a list of positive words, and the other contains
    a list of negative words. The files contain comment lines starting
    with ';' and blank lines, which should be skipped.
    """
    lexicon = []
    with open(filename, encoding='latin-1') as infile:
        for line in infile:
            line = line.rstrip()
            if line and not line.startswith(';'):
                lexicon.append(line)
    return lexicon
pos_words = load_lexicon('data/positive-words.txt')
neg_words = load_lexicon('data/negative-words.txt')

Step 3. Training a model to predict the tonality of words


Some words are missing from the GloVe dictionary. If there is no vector value, then as a result of reading we get a vector from NaN values. Delete such vectors.

pos_vectors = embeddings.loc[pos_words].dropna()
neg_vectors = embeddings.loc[neg_words].dropna()

Next, we make up the arrays of the desired input and output data. Input data: vector meanings of words; weekend: value 1 for positively colored words and -1 for negatively colored words. We also need to preserve the words themselves in order to be able to interpret the results.

vectors = pd.concat([pos_vectors, neg_vectors])
targets = np.array([1 for entry in pos_vectors.index] + [-1 for entry in neg_vectors.index])
labels = list(pos_vectors.index) + list(neg_vectors.index)

Give me a sec! But some words are neutral, they are devoid of any tonality. Don't we need a third grade for neutral words?


I think examples of neutral words would be useful to us, especially considering that the problems that we encounter arise due to the attribution of tonality to neutral words. If we could reliably identify neutral words, then the complication of the classifier (adding a third class) would be justified. To do this, we need a source of examples of neutral words, because in the set we have selected, only positively and negatively colored words are presented.

Therefore, I created a separate version of this notebook, added 800 neutral words as examples, and set a large weight factor for word neutrality. But the results were almost identical to those presented below.

How did the creators of the list separate the words of positive and negative tonality? Does tonality not depend on context?


Good question. A general analysis of the tonality of the text is not as simple a task as it seems. The border that we are trying to find is not always unambiguous. In the list that we have chosen, the word “impudent” is marked as bad, and “ambitious” as good. “Comic” is bad, “funny” is good. “Compensation” is good, although situations in which you or you require compensation are rarely pleasant.

I think everyone understands that the tonality of a word depends on the context, but if we implement a simple approach to the analysis of tonality, we assume that the averaged values ​​of the tonality of the words will allow us to get the whole correct answer without taking into account the context.

We will divide the input vectors, output values, and labels into training and test data sets. For testing, we will use 10% of the data.

train_vectors, test_vectors, train_targets, test_targets, train_labels, test_labels = \
    train_test_split(vectors, targets, labels, test_size=0.1, random_state=0)

Next, we compose a classifier and start training for it - 100 iterations of processing training vectors. As a function of losses, we use the logistic function. So our classifier will be able to calculate the probability that the given word is positively or negatively colored.

model = SGDClassifier(loss='log', random_state=0, n_iter=100)
model.fit(train_vectors, train_targets)

Now check the classifier on the test vectors. It turns out that he correctly recognizes the tonality of words outside the training set in 95% of cases. Not bad at all.

accuracy_score(model.predict(test_vectors), test_targets)
#  0,95022624434389136

We also define a function that will show the tonality of the individual words predicted by the classifier. Our classifier is able to assess the tonality of words that are not included in the training set well.

def vecs_to_sentiment(vecs):
    # predict_log_proba gives the log probability for each class
    predictions = model.predict_log_proba(vecs)
    # To see an overall positive vs. negative classification in one number,
    # we take the log probability of positive sentiment minus the log
    # probability of negative sentiment.
    return predictions[:, 1] - predictions[:, 0]
def words_to_sentiment(words):
    vecs = embeddings.loc[words].dropna()
    log_odds = vecs_to_sentiment(vecs)
    return pd.DataFrame({'sentiment': log_odds}, index=vecs.index)

Step 4. Get an assessment of the tonality of the text


There are many ways to evaluate text tonality based on tonality values ​​for vector representations of individual words. We will continue to follow the path of least resistance and simply average them.

import re
TOKEN_RE = re.compile(r"\w.*?\b")
# The regex above finds tokens that start with a word-like character (\w), and continues
# matching characters (.+?) until the next word break (\b). It's a relatively simple
# expression that manages to extract something very much like words from text.
def text_to_sentiment(text):
    tokens = [token.casefold() for token in TOKEN_RE.findall(text)]
    sentiments = words_to_sentiment(tokens)
    return sentiments['sentiment'].mean()

What can be improved here?

  • Calculate weight coefficients for words inversely proportional to their frequency so that the most common words (for example, the or I) do not greatly affect the assessment of tonality.
  • Change the averaging formula so as not to receive the highest modulus tonality estimates for short sentences.
  • Consider the context, i.e. the whole phrase.
  • Use a more functional algorithm for splitting sentences into words, which correctly processes apostrophes.
  • Take negatives into account, i.e. handle phrases such as not happy correctly.

But for all this, you need to write additional code, and the results below will not fundamentally change. At the very least, we can roughly compare the relative emotional colors of different sentences.

text_to_sentiment("this example is pretty cool")
#  3.889968926086298
text_to_sentiment("this example is okay")
#  2.7997773492425186
text_to_sentiment("meh, this example sucks")
#  -1.1774475917460698

Step 5. Terrify the monster we created


Some sentences will not contain words with an unambiguous tone. Let's see how our system processes several analogues of the same neutral offer.

text_to_sentiment("Let's go get Italian food")
#  2.0429166109408983
text_to_sentiment("Let's go get Chinese food")
#  1.4094033658140972
text_to_sentiment("Let's go get Mexican food")
#  0.38801985560121732

About the same thing happened to me in other experiments that analyzed reviews of restaurants using vector meanings of words. Then it turned out that all Mexican restaurants receive a lower assessment of tonality for no objective reason.

If you process words according to context, then the vector meanings of words can reflect subtle nuances of meaning. This means that they can detect more pronounced phenomena, such as social prejudice.

Here are some more neutral suggestions.

text_to_sentiment("My name is Emily")
#  2.2286179364745311
text_to_sentiment("My name is Heather")
#  1.3976291151079159
text_to_sentiment("My name is Yvette")
#  0.98463802132985556
text_to_sentiment("My name is Shaniqua")
#  -0.47048131775890656

Well well.

Just changing the name greatly changes the assessment of the tonality that the system gives out. This and many other examples show that when using names that are associated with white people, the predicted tonality is on average more positive than with stereotypical names for people with dark skin colors.

So, making sure that even the most basic implementation of AI turns out to be terribly biased, I propose taking a short pause for reflection. In the second article, we will return to the topic and will correct the errors of an unintentional AI.

Also popular now: