This page outlines the high level questions we are exploring. Further related research can be found on the WikiDetox outline of related work.
How can machine learning enable us understand harassment at scale?
What role does toxic language have in reducing the number of viewpoints in a discussion?
What tools are needed to make robust open debate on important issues easier at scale?
How might machine learning based tool be used by communities, commenters and authors?
Can machine learning methods understand the emotional impact of language?
How much of the structure of a conversation can machine learning approaches uncover?
What unintended and unfair biases might machine learning models contain? What impact might such biases have? What are the best ways to identify these biases? and what can be done mitigate them?
How might machine learning based tool be gamed?
Word based models, including the CNN we developed for toxicty, can be tricked easily by creative misspellings. Using character level models can help address this, but require more data and their training suffers from the vanishing gardient problem.
Models based on character-level ngrams fed into feed-forward networks, like our TOXICITY_FAST model, can be easily gamed by adding additional ngrams after the initial comment that counter the signal from the problematic ngrams. This can be addressed by using RNNs and CNNs, like our TOXICITY model which take account of more of the textual context.
The practical impact of gaming ML models is an open research question, and is likely to depend on way the ML is applied. Moreover, there are different threat models for different applications of ML: gaming of an authorship experience is quite different to the recieving suggestions and considering retraining on them.
How might ML be misused to censor or reduce viewpoints in a conversation?