Authors: Daniel Friar, Rafal Kwasny, Mike Swarbick-Jones and Martin Goodson
Summary and findings
We conducted an analysis to determine if ethnic minority MPs receive more abuse on Twitter than their white counterparts. By using Evolution AI's natural language processing (NLP) platform to analyse Twitter data, we found statistically significant evidence that ethnic minority MPs receive more toxic tweets than white MPs.
We took 3 million tweets mentioning MPs from the past year and used Evolution AI's natural language processing platform to identify toxic tweets and find the proportion of toxic Twitter mentions for each MP. We analysed these results and found statistically significant evidence that ethnic minority MPs receive more toxic Twitter mentions than their white counterparts, on average receiving 15% more toxic tweets.
Evolution AI is a London-based startup that specialises in natural language processing, the computational understanding of human language. We build enterprise-grade AI solutions that can learn to read and understand millions of text documents at a time, without explicit instructions.
Method for classifying tweets as toxic or non-toxic
Due to the lack of available labelled Twitter data, we trained a model to predict a toxic/non-toxic label on a dataset of Wikipedia comments from an open source dataset. This data was reduced to a binary toxic/non-toxic label and balanced to leave a 50/50 split across the two classes with a total of 32,450 training examples. The Evolution AI NLP platform was used to build a model to classify the data. Our text pre-processing engine was first used to clean the text data and restrict to the first 280 characters in order to make the model more suited to tweets, before being classified into the two classes, achieving 88% accuracy on a held-out test set.
In order to test whether the model could identify toxic tweets correctly, we used our annotation platform to hand-label 1,500 tweets mentioning MPs as toxic/non-toxic, before reducing this to a balanced test dataset of 450 tweets. The trained model achieved 82% classification accuracy on this labelled data. Additionally, we verified that the model was not biased toward white or ethnic minority MPs by checking that it achieves similar accuracy, precision and recall across these groups, using 200 examples from the test dataset.
Analysis of MP Twitter mentions
We used the Twitter API to obtain tweets mentioning any of the 581 MPs on Twitter from the beginning of 2017 to present, resulting in a dataset of 3.16 million tweets on 580 MPs. A list of ethnic minority MPs is taken from Wikipedia and joined to the data in order to identify the ethnicity of the MPs. MPs with very few twitter mentions (less than 200) are removed, leaving us with 3,159,227 tweets from 523 MPs with the following breakdowns.
|Number of MPs||Number of Twitter mentions|
|Ethnic Minority||41 (21 female, 20 male)||461,645|
|Caucasian||482 (146 female, 336 male)||2,697,582|
Similarly to the Wikipedia comments, the tweets were preprocessed and the trained model was then used to predict whether they were toxic or non-toxic, identifying 5.0% of tweets as toxic.
A histogram of the proportion of toxic tweets for the MPs is shown below along with summary statistics across the two groups, indicating that ethnic minority MPs appear to receive more toxic tweets.
Since there may be significant differences in the proportion of toxic tweets for MPs regardless of their ethnicity, we used a hierarchical Bayesian model to check the statistical significance of these results. Using this method, we found that with 96% confidence ethnic minority MPs received more toxic Twitter mentions, with the best point estimate indicating that ethnic minority MPs receive 15% more toxic tweets. The appendix below contains more detail on this analysis.
Confusion matrices across groups
In order to check that the classifier was not biased to either of the two groups, we took 100 tweets mentioning ethnic minority MPs and 100 tweets mentioning white MPs from the test set, with a a 50/50 toxic/non-toxic split, and compared the confusion matrices.
Hierarchical Bayesian Model
We represented the number of toxic tweets,
The analysis was run in
PyMC3, using MCMC with 2 chains of length 10,000 with 500 burn-in iterations to obtain samples from the posterior distributions. The Gelman-Rubin diagnostic was used to judge MCMC convergence, with
Posterior mean for ethnic minority MPs: 5.48%
Posterior mean for white MPs: 4.76%