Our research aims to help increase participation, quality, and empathy in online conversation at scale. Our three primary areas of research are:
One of the main challenges of machine learning research is limited availability of high quality research data to improve online discussion. We want to change that by creating, publishing, and/or identifying high quality public data sets related to online discussion.
To submit a link to an existing public dataset, or get help creating new public data sets for related research please fill out this form.
The Perspective API (demo; API documentation) serves models created by the Conversation AI research initiative to enable platforms, developers, and researchers to explore how ML might help conversations online. The API scores a comment based on its potential impact on a conversation.
Our first alpha-version perspective model, and the model that you can play with on the public demo is ‘toxicity’ (and it is only in English right now). The toxicity model allows text to be scored based its similarity to comments people have said are “toxic” or likely to make them leave a conversation. There are also several other experimental perspective models you can play with, such as a model that tries to identify unsubstantial comments. These are described in our developer documentation. As the research develops we will add new models to the API and list them there.
The models provided by the API are trained on more comments than are available in our public datasets listed above. This is because some contributors do not wish for the comments they shared with us to be more widely available.
Our models are still far from perfect - it will make errors: it will be unable to detect patterns of toxicity it has not seen before, and it will falsely detect comments that are similar to patterns of previous toxic conversations. To help improve the machine learning, the API supports sending us suggested scores which allow the model to improve.
The gallery of perspective hacks illustrates various ideas people have had for how the API might be used.
We do not recommend using the API as a tool for automated moderation: the models make too many errors. We have released early access to the API to support research and allow developers, publishers and platforms to try creating new conversational experiences e.g. to help human moderators choose what to review, to help people reading comments to choose what they read, or to help authors get another perspective on what they are writing. Our blog post on how to use ML models, despite their imperfections outlines some further considerations and illustrates the challenge and opportunity further.
If you would like to receive email updates about the API - for example when we add new models, or deprecate old ones, you can subscribe to perspective-announce, to receive emails updates from email@example.com. This list will be used only to share release information, and will never be used to ask you for login details.
The research effort was started by Jigsaw and the Google Counter-Abuse Technology Team and welcomes contributions from anyone interested in better online conversation. If you have questions about our research, you can email us at firstname.lastname@example.org.
Are you building a machine learning tool for great conversation that you’d like to us to know about and link to from this page? Let us know!
Do you have examples of comments (both comments you want in your community and comments you don’t want in your community) that you would like to contribute for use in research and products to improve conversations online? You can submit examples here