Researchers Reveal Recipe for Protein Compositon

Due to the integral nature of proteins relative to our cells’ lives, helping to catalyse metabolic reactions within our bodily systems, or carry out processes therein, scientists have long since tried to engineer and design “artificial” proteins that can perform new, more useful tasks, such as treating specific diseases, capturing carbon to reduce CO2 emissions, or harvesting energy itself, but the processes involved in manufacturing these proteins are very time intensive, and have an excessively high failure rate.

However scientists researching at the University of Chicago have pushed for a breakthrough, which could have positive implications throughout various sectors, including medical, chemical and the agricultural. Their method involved developing an artificial intelligence-led process using a lot of data to design these new proteins.

By developing models which incorporate machine-learning, feeding it information from existing protein genome databases, the researchers quite quickly found relatively simple rules for designing and building artificial proteins, and upon construction, their chemistry rivalled that of proteins found in nature!

This is no mean feat, due to the complex composition of the proteins themselves. Proteins are made up of hundreds of thousands of amino acids, and in certain sequences these building blocks determine the protein’s structure and associated function. The problem is, we as humans have struggled greatly in figuring out how exactly we combine these amino acids to create novel proteins; past work has resulted in methods that can actually specify protein structure, but the function has proved increasingly elusive.

Credits: Shutterstock Images

Credits: Shutterstock Images

The researchers figured out that the genome databases contain multitudes of data detailing the basic rules of protein structure and function, therefore they could use mathematical modelling and machine learning methods to reveal new information about these rules which previously appeared basic. They studied the chorismate mutase family of metabolic enzymes, a type of protein that is important for life in many bacteria, fungi, and plants, and employing the aforementioned machine learning methods, they were able to determine a basic protein composition formula.

The model shows that just conservation at amino acid positions and correlations in the evolution of pairs of amino acids are sufficient to predict new artificial sequences that would have the properties of the protein family.

In the future, we can use this new knowledge to potentially address societal problems, such as climate change, or cancers, for if we are able to control what makes us whole, there is no limit therefore to human potential.