This page outlines the high level questions we are exploring. Further related research can be found on the WikiDetox outline of related work.
How can machine learning enable us understand harassment at scale?
What role does toxic language have in reducing the number of viewpoints in a discussion?
What tools are needed to make robust open debate on important issues easier at scale?
How might machine learning based tool be used by communities, commenters and authors?
Can machine learning methods understand the emotional impact of language?
How much of the structure of a conversation can machine learning approaches uncover?
What unintended and unfair biases might machine learning models contain? What impact might such biases have? What are the best ways to identify these biases? and what can be done mitigate them?
How might machine learning based tool be gamed?
Word based models, including the CNN we developed for toxicty can be tricked easily by misspellings. Using character level models helps address this, but requires more data and for RNNs especially suffers from the vanishing gardient problem.
Models based on character-level ngrams fed into feed-forward networks can be gamed by adding additional ngrams after the initial comment that counter the signal from the problematic ngrams. This can be addressed by using RNNs (e.g. LSTMs) and CNNs provide a way to take account of more of the textual context.
How might ML be misused to censor or reduce viewpoints in a conversation?