Curious about AI? This article delves into the history of natural-language processing, explains the science behind artificial brains and casts foresights over the implications of machine learning. By engineering student at Oxford, Tom Michaelis.
In the last decade, an area of research called Deep Learning - a subset of artificial intelligence - started landing ground-breaking results in computers' ability to process human language, using a tool called a Deep Neural Network (DNN). This technology has very recently reached a breakthrough where computer-generated language has become indistinguishable from our own. The applications are endless, and a new age of automatic translation, summarisation, writing and processing is upon us.
In 1943, British intelligence created Colossus - one of the first machines able to perform calculations faster than a human - in order to crack Nazi ciphers. The machine relied on hard-coded logical rules to solve the problem of computer calculations, a technique that researchers in language processing had originally hoped would be transferrable to their field. To understand the difference, consider the following sentence:
Assuming a basic knowledge of chess, this sentence appears fairly straightforward: ‘White castles’ is a basic phrase made up of one noun and one verb. However, without understanding the context, ‘White castles’ could be erroneously translated as an adjective and a plural noun. It is this fundamental requisite for the computer to understand the meaning of language in order to process it, and not just follow simple rules, that has stunted development in language processing for so long. To overcome this restriction, developers began taking advantage of Deep Neural Networks: but how do these allow a computer to ‘understand’ language? First, we must understand how a biological net (a brain) works.
Neural nets are artificial, but directly inspired from the way a brain works (hence the name). All animal brains are made from millions of cells called neurons. These cells consist of a main body called a soma which is connected to other neurons via conducive extensions called dendrites. Electrical signals travel along these dendrites between neurons until they reach a barrier with another soma called a synapse. The next neuron can pass on this signal down their own dendrites to further neurons depending on factors such as strength of signal, strength of connection, and whether it has been activated by dendrites from other neurons. As the number of neurons increases, this process facilitates the development of complex relationships in response to electrical stimulus from sensory organs. In turn, this allows your brain to perform tasks such as walking, control of heart rate, recognition of images, and understanding the words in front of you right now.
A DNN mimics this process on a computer chip by simulating layers of nodes which have exchanged carbon-based cells for silicon chips. The electrical signals of the neuron have become real numbers to indicate signal strength and dendrites are now simply connections with a ‘connection strength’ attribute. To try and understand how this facilitates pattern recognition and other complex responses to input, let's consider a (very) simplified model. Say we created a DNN which aims to predict if a sentence is talking about chess or bees using the last three nouns used (however, we will consider the case where only the nouns Pawn, Queen and honey are available) the architecture of our net may look something like this:
This Net has already been trained on sample data, as can be seen by the connection strengths between nodes not being constant. The output layer has two nodes, each indicating the probabilities of chess or bees being discussed. If we input the sentence “Queen takes the black pawn” the nodes for queen and pawn gain a numeric value (as seen by the darker colour) which are transmitted along the connections until the output is reached.
We can see that, even with a simple model, it quickly becomes hard to understand what is happening - hence DNNs are often said to be black boxes where the reasons for outputs are hard to determine. This small network has correctly attributed a high probability that the sentence refers to chess. Whilst this is a fairly easy task, once the net is scaled up, a DNN is able to achieve human-like pattern recognition abilities – which has a multitude of applications in the field of language processing.
While history would suggest we should be cautious about making grandiose statements about the future of natural language processing, on this occasion there is solid evidence that the revolution is underway. If you're sceptical, just say ‘Hey Siri’, or type anything into Google search bar, and shortly a large neural net will be processing the language you just used. As computing power inevitably increases and more training data is available online, the number and efficaciousness of NLP applications will only increase. One of the fields most likely to reap the rewards of language processing is modern research, an area that is currently pitifully inefficient; when have you ever received useful information from any results beyond Google’s first page (despite the hundreds of thousands of results generated for you)? Language processing is bound to improve the accuracy of search engines, but perhaps a more exciting development is that of automatic text summarisation. Gone will be the days of manually shifting through large texts for relevant information - instead, software will download, understand, condense and rewrite data for you at the touch of a button. And if you’re reading this in a few years’ time, maybe a computer has already reprocessed these words for you.