How AI Will End Spoilers, Once And For All

The internet is one of humanity's greatest achievements in the past century. From having to loan out a book at a library or reading a newspaper for the latest news a few years ago, the internet allows almost instant access to information, whether that be the news, the weather or simply online discussions. However along with public discussion, inevitably came the discussion of cinema and media, and the accompanying cancerous growth of spoilers.

A UC San Diego professor summed up the pain of spoilers neatly:

Spoilers are everywhere on the internet, and are very common on social media. As internet users, we understand the pain of spoilers, and how they can ruin one’s experience
— Ndapa Nakashole, professor of computer science at UC San Diego

A team of researchers from UC San Diego set out to tackle the problem of spoiler using AI. However the problem of automatic spoiler filtering comes down to the problem of spotting anything in text, what counts as a spoiler and what are the recurring patterns in spoiler text. The concept of “SpoilerNet“ was born. SpoilerNet is an AI powered tool designed to find patterns that may indicate the presence of a spoiler and allow the user to block it.

At the 2019 annual meeting of the Association for Computational Linguistics in Italy, the team announced the tools final form would be a browser extension requiring little to no setting up by the user. Although the final concept seemed promising the team eventually stumbled on its first road block. In order for the AI to be effective it would have to be trained on vast swathes of texts containing spoilers and normal text. They gathered a large data set from 1.3 million book reviews that had been labelled as containing spoilers by the authors. The sections of each review that contained a spoiler were conveniently marked with a spoiler tag.

UC San Diego, Gelsel Library - an area where students and staff can access a variety of resources and research projects such as SpoilerNet.

UC San Diego, Gelsel Library - an area where students and staff can access a variety of resources and research projects such as SpoilerNet.

To our knowledge, this is the first dataset with spoiler annotations at this scale and at such a fine-grained granularity
— Mengting Wan, Ph.D. student in CS at UC San Diego

However a challenge this data possessed was that different users had different ideas about what constituted as a spoiler, and neural networks needed to be carefully adjusted to include these variations. Another issue the researchers encountered was that the same words may have different semantic meanings in different texts, meaning a word, e.g. stark could be an adjective in one text but the name of a character who dies in another. This posed as a large challenge for the team.

SpoilerNet was trained on 80 percent of the book review, running the text through several layers of neural networks. The system could detect spoilers with 89-92 percent accuracy when run through the remaining 20%. On reviews for TV shows the tool was, as expected, lower at 75-80 percent. Key words such as “killed”or “died” were responsible for most of the false warnings.

In conclusion this technology is only in its infancy and has a long way to go, however we can expect to see it being developed into browser extensions fairly quickly, and at some point maybe even cooked into browsers. Hopefully in a year or two this technology will be widespread and social media can no longer spoil the next Marvel movie.