Why AI is Struggling to Detect Hate Speech

by Ben Dickson

As online trolling and hate speech are becoming more problematic, companies like Facebook and Twitter are under increasing pressure to identify and block hateful speech on their networks. And like many other problems that involve the massive amounts of online content, these companies have turned to artificial intelligence for solutions.

All major social media networks use AI algorithms to moderate online content. But while AI shows promise in detecting some type of content, it is hard pressed when it comes to spotting hate speech.

A recent study by scientists at University of Washington, Carnegie Mellon University, and Allen Institute for Artificial Intelligence, has found that the leading AI systems for detecting hate speech are deeply biased against African Americans. This includes Google Perspective, an AI tool for moderating online conversations.

The study and the unending struggles of tech companies to automate hate speech detection highlight the limits of current AI technologies in understanding the context of human language.

Understanding language context is hard

Advances in deep learning have helped automate complicated tasks such as image classification and object detection. Artificial neural networks, the key innovation behind deep learning algorithms, learn to perform tasks by reviewing examples. The general belief is that the more quality data you provide a neural network, the better it performs. This is true, to some extent.

We would love for you to take a look at our Reddit Statistics Feature we are building.

At their core, neural networks are statistical machines, albeit very complicated ones. This might not pose a problem for image classification, which is largely dependent on the visual features of objects. For instance, a neural network that is trained on millions of labeled images creates a mathematical representation of the common pixel patterns between different objects and can detect them with remarkable accuracy.

But when it comes to natural language processing and generation (NLP/NLG), machine learning might not be enough. There are still plenty of things statistical representations can do. There are several cases of AI models translating text with impressive precision or generating coherent text. But while those feats are remarkable, they barely scratch the surface of the human language. These AI models perform their tasks by calculating the possibility that words appear in a certain sequence based on the examples they’ve viewed during training.

Hate-speech detector AI model draws their training from data sets that only include sample sentences and their corresponding toxicity score. In their studies, the authors used publicly available AI models that have been trained on millions of annotated tweets and other social media posts.

But statistics do not represent context. When our brain wants to interpret a sentence, we don’t only consider the sequence of words and how they compare to other sentences we’ve heard before. We also take into account other factors, such as the different characteristics of the person who is speaking. Hearing a sentence from one person might sound offending, while another person saying the same thing might be totally fine.

In their study, the researchers from Carnegie Mellon, AI2 and U of Washington show examples of sentences that would sound hateful and racist if said by a white person but acceptable if a black person said it.

AI hate speech — Depending on who is saying a sentence, it may sound toxic or not (source: University of Washington)

The authors suggest that the people who annotate the data should know about the demographics and characteristics of their authors. This will help them improve the quality of the data sets and train AI models that are much more accurate.

It’s hard to agree on what is hate speech

Annotating the data set with relevant meta-data sounds like a good idea, and the results of the experiments show that it reduces bias in the hate-speech-detection AI algorithms. But there are two problems that would make this solution incomplete.

First, annotating training data with relevant information is an enormous task. In many cases, that information is not available. For instance, tweets don’t contain information about the race, nationality and religion of the author, unless the user explicitly state that information in their bio. Some of that information can be inferred by looking at the timeline of the user and other content they have posted online. But finding and annotating that kind of information is much more difficult than labeling cats and dogs in photos.

But even adding author information would not be enough to automate hate speech. Hate speech is deeply tied to culture, and culture varies across different regions. What is considered hateful or acceptable can vary not only across countries, but also across different cities in the same country. And the relevance of things such as race, gender and religion can also vary when you go do to different geographical areas. And culture is something that changes over time. What is considered the norm today might be considered offending tomorrow.

Hate speech is also very subjective. Humans of similar backgrounds, races and religions often argue on whether something is hateful or not.

It’s hard to see how you could develop an AI training data set that could take into account all those factors and make sense of all these different complicated dialects we’ve developed over thousands of years.

When it comes to vision, hearing and physical reflexes, our brain and nervous system are much more inferior to those of wild animals. But language is the most complicated function of our brains.

All animals have some sort of way to communicate together. Some of the more advanced species even have rudimentary words to represent basic things such as food and danger. But our ability to think in complicated ways and communicate knowledge, opinions and feelings gives us the edge over all other living beings. Neuroscientists still haven’t been able to find out the exact mechanisms of formation and interpretation of language in the human brain.

Many companies think they can outsource their NLP tasks to outside contractors, hoping that human labor will train their AI and eventually create a fully-automated system.

But it’s difficult to imagine anything short of a large-scale human brain being able to make sense of all the different nuances of the diverse languages of the people who inhabit this planet. For the moment, our AI algorithms will be able to find common patterns and help filter down the huge amounts of content we create, but we can’t remove humans from the loop when it comes to detecting hate speech.

Ben Dickson is a software engineer and the founder of TechTalks. He writes about technology, business and politics.
This article was originally published on Tech Talks. Read the original article here.

What's Hot

McDonald’s South Africa Brings FIFA World Cup 2026 Excitement Home

Tax Data Can Be Mined To Shape Better Policies. South Africa, Uganda And Zambia Show How

Stoxtel Takes the Istanbul Blockchain Week 2026 Stage, Making the Case for Verifiable Trust as Crypto’s Next Foundation

Tax Data Can Be Mined To Shape Better Policies. South Africa, Uganda And Zambia Show How

Absa, Salesforce Expand Strategic Collaboration, Solidifying AI Leadership In Banking Across Africa

Why South Africans May Prefer Talking To AI About Their Overdue Credit Accounts

The R6 Billion Cargo Heist Blind Spot: Why Vehicle Tracking Fails To Secure SA Roads

How To Employ SA Youth As Data Capturers & Process Mappers

Why Africa’s Agentic Enterprise Needs A Unified Operating System, Not Just Better AI

Can Your Grocery Basket Help Unlock Access To Credit? Bloomberg Names Omnisient One Of Africa’s Startups To Watch

May Contain Nuts: The Need For AI Health Warnings

Can an AI That Argues With Itself Produce Reliable Software A New Platform Aims to Find Out

4 Comments

Diagnostic Orders Direct Launches Partner Program Helping Imaging Centers Convert Patients Who Need Provider Orders

Absa, Salesforce Expand Strategic Collaboration, Solidifying AI Leadership In Banking Across Africa

Bitcoin, Capital And Conflicting Judgments: Where Does The Industry Stand?

South Africa’s Growth Needs Logistics & Export Reform To Fight Unemployment

Aions Ventures Launches R100M Seed Fund For SA Tech Startups

Why South Africans Are No Longer Switching Mobile Phone Operators?

Scaling A Startup In This Economy? Five Lessons From A Local Success Story

Dis-Chem Launches Melrose Arch Health Hub To Transform Healthcare Access

Our Picks

McDonald’s South Africa Brings FIFA World Cup 2026 Excitement Home

Tax Data Can Be Mined To Shape Better Policies. South Africa, Uganda And Zambia Show How

Stoxtel Takes the Istanbul Blockchain Week 2026 Stage, Making the Case for Verifiable Trust as Crypto’s Next Foundation

Subscribe to Updates

What's Hot

Why AI is Struggling to Detect Hate Speech

Understanding language context is hard

It’s hard to agree on what is hate speech

Related Posts

4 Comments