OpenAI has put forth an intriguing proposal: using its latest AI model, GPT-4, for content moderation, with the intent of easing the load on human moderators.
Here’s how they plan to pull it off: they direct GPT-4 using a guiding policy to help it make moderation decisions. Take, for instance, a policy that bars providing information about creating weapons. A clear violation of this policy would be asking for a recipe for a Molotov cocktail. Experts establish these policies, then label content examples to see if GPT-4’s judgments are on par with theirs. If discrepancies arise, GPT-4 offers reasons behind its judgments, helping experts refine the policy. OpenAI insists this iterative process, which some clients have already integrated, can craft content moderation policies in mere hours. They also argue it’s more adaptable than other AI-based methods out there.
But is it truly a game-changer?
Let’s not forget: AI-driven content moderation isn’t new to the scene. Google’s Counter Abuse Technology Team, for one, developed a tool named Perspective, which has been around for some years now. And it’s not just the tech giants; startups such as Spectrum Labs, Cinder, Hive, and recently-acquired Oterlu are also in the game.
It’s also worth noting that AI moderation tools haven’t exactly been infallible. Past research has revealed certain AI models misinterpreting posts about disabled people as negative or misjudging hate speech due to unconventional spellings or “reclaimed” terms. The root cause? Human bias. The people who label data to train these models bring in their personal biases, which inadvertently get incorporated.
OpenAI is candid about this challenge. They admit that GPT-4, like other AI models, could harbor biases from its training data. They stress the importance of human oversight to validate AI decisions.
So, while GPT-4 might be a stride forward in AI-based content moderation, it’s essential to tread with caution. No AI, no matter how advanced, is flawless. Especially when moderating content, human touch remains invaluable.