The Mathematics Behind AI
Only a small fraction of the population really understands what it means to create an artificially intelligent machine. Most of society only sees the applications of this concept, whether it be a personal stylist on a clothing website or a virtual doctor, many people only ever witness an abstract view of what artificial intelligence is, often leading to exaggerated fears that artificial intelligence is taking over the world and stealing our jobs. It may come as a surprise to know that most artificial intelligence merely attempts to mimic human intelligence through the use of mathematical models and statistics.
In this article I am specifically going to focus on Neural networks, the systems that allow sites such as google to show you specific advertisements, help with computer vision and speech recognition. As defined by Wikipedia, “neural networks are non-linear statistical data modeling or decision making tools. They can be used to model complex relationships between inputs and outputs or to find patterns in data”. Artificial neurons, the basic units of an artificial neural network (ANN), behave more or less like biological neurons. An artificial neuron responds to information in signals it receives, and sends out its own signal to other neurons. At a simple level, these neurons can take inputs and carry out functions by using OR, AND ,XOR logic gates to provide an output that we would expect, but how are these neurons trained?
Essentially what we are trying to do is to classify points that have one outcome and separate them from points with a different outcome. If we imagine these points on a graph, the neuron would have to be a function that separates the outputs we think are successful to the outputs we don’t think are successful. This may be difficult to picture, so let's think of an example. Say we are a sportswear retailer wanting to know how likely a customer is to buy sports products dependent on two variables such as how much they play sports and how much they earn. The customers who do buy clothes are coloured green and the customers who do not are coloured pink. As a result the neuron can be trained on this data, by adjusting its functions it can learn to separate the green and pink points, so that it can accurately predict whether a customer is likely to purchase a product from the retailer. Each neuron has a bias (y axis intercept) and a weight (the gradient), the weight affects the magnitude of the input and the bias affects the output of the neuron.
The data is yet to be classified accurately
However, not all data is as easily classifiable as above, what about data points that look like this?
It is completely impossible to use a straight line to separate the two inputs. This means that we may have to use several straight lines in order to separate the two types of output. In many neural networks, the inputs don't connect directly to the output neurons. There's a group of neurons that live between the inputs and the outputs, which are connected to both. These hidden layer neurons allow for multiple decision boundaries to be placed on the graph and these neurons are trained by setting the weights and biases of each hidden-layer neuron to block off regions of the input space belonging to a single output class. Then we set the weights and biases of the output neuron to perform a logical operation on the activity of hidden-layer neurons to predict which type of output a specific input will give. Essentially, the hidden layer increases the capacity of a neural network to solve problems.
Lots of relationships in data are not described accurately by a straight line. Sometimes we can classify outputs by taking a mathematical model, and adjusting its shape until it describes a dataset as accurately as possible, this is called curve fitting. Data scientists will use different techniques to decide which model is the best fit for the data, and once a model is applied, it essentially becomes a machine that makes predictions. By adding more neurons we can make even more complicated curves that fit the statistical data. When training neurons during curve fitting, we measure how well a curve fits the set of data points, by calculating the average distance between points to a curve and minimizing this value by adjusting the weight and bias of a neuron.
The problem that may occur is that usually data scientists do not know the mathematical form of the function that is representing the data beforehand. As a result we have to use other methods to approximate complicated functions, such as plotting a series of rectangles to match a curve, similar to the trapezium rule, the more rectangles we add the closer the rectangles will be to approximating the function, and the rule is that any continuous function can be approximated by a series of rectangles to any precision. If we can figure out how to make an ANN that outputs a series of arbitrary rectangles, then we'd have an argument that ANNs can encode any continuous functions as precisely as we want. As hidden layer neurons can work together to produce rectangles, ANN’s can be used to approximate any continuous function to any precision, provided there are enough neurons in the hidden layer. This result has a name: it's called the universal approximation theorem.
The theorem demonstrates the power of the predictive ability of networks, producing models that have a surprising degree of accuracy, all due to the principles of mathematics that have been applied. Yet there is a drawback, neural networks are great for providing predicted values and outputs, just as long as an analytical form is not required, because even though the neural network may be able to model the sine curve that fits your data, it does not truly understand that it is performing that exact action. This idea is key to understanding where we are on the timeline of artificial intelligence, as although these systems may have amazing predictive power, they are limited in terms of recognising what their actions are and the effect that they have. As for now, these systems have significant mathematical power, but as for emotional and intuitive intelligence, there is still a long way to go.