AI Trained On Research Papers Can Discover What Humans Never Could

Artificial intelligence researchers have recently used machine learning systems to uncover new information hidden in older scientific papers and journals.

At Lawrence Berkeley National Laboratory, the researchers made use of a modern AI algorithm called “Word2Vec”. They used it to sift through countless scientific papers and articles on materials science in order to find any knowledge scientists may have not seen or realised back when it was published. Using mainly word association techniques, Word2Vec managed to supply the researchers with possible papers that could have “future thermoelectric materials” throughout its probing. Surprisingly, the AI had not been given the meaning of thermoelectric prior to the search, nor had it been trained in materials science, in spite of its favourable results.

Every subject field has an extensive backlog of research papers. Could AI be the key to harnessing its power?  Source:

Every subject field has an extensive backlog of research papers. Could AI be the key to harnessing its power? Source:

However, the team of researchers did train the algorithm in alternative ways. Using other programmes, they compiled 3.3 million texts which were related to materials science in order to create a list of about 500,000 subject-specific words. This list was given to Word2Vec, and it used machine learning to study the connections between words. This meant that the AI could understand the kind of language used in materials science, which was essential in making its search more effective and successful.

Using only these words, the AI could comprehend ideas like the structure of molecules and the periodic table. It linked words together in webs of interconnecting phrases, leading to strings of related words that helped to explain concepts – most of which were well-known and a few that had not been defined in this way before.

In one experiment the researchers completed, scientific papers published before 2009 were handed over to the artificial intelligence algorithm and it managed to predict one of the most useful modern thermoelectric materials three and a half years before it was brought to light in 2012.

Because the algorithm can be trained with any set of subject literature, it is very versatile and can be used in many different fields. Through machine learning, it creates its own connections and could adapt to literally any subject, provided it has access to enough texts.

Thumbnail source:

ResearchEdward Bristow