In 2014, the University of Reading’s computer programme chatbot ‘Eugene Goostman’ succeeded in the Turing test, by pretending to be a 13-year-old boy. The algorithm succeeded in fooling many of the 30 judges involved, where each judge was given free rein to ask whatever they desired.
By this benchmark, AI singularity may seem imminent. However, we are not just there yet. Machines perform woefully at reasoning, causal inference and generalised thinking. Alpha Go may have overcome humanity at a rules-based board game, but it fails to adapt to other tasks in the real world, such as human conversation or tying shoelaces. The algorithm was specialised, or ‘weak’. What sci-fi writers warn of is ‘strong’ AI; AI that can adapt and possess generalised intelligence, where it will be able to learn quickly and infer rules in novel environments.
Does the Turing test truly measure intelligence?
The problem is that the test is not a sufficient barometer of intelligence, or even a barometer at all. There are two problems with the Turing test: first, the test doesn’t specify the expertise of the judges. A well-versed AI researcher is likely to be more effective at deciphering whether or not the originator of speech is silicon or biological hardware, and able to trap an AI with circular logic and appeals to empathy that AIs today still have a hard time mastering. Second, the test only succeeds in measuring the strength of weak AI, not strong AI. Chatbots today are impressively able to handle customer complaints and bank enquiries with efficiency and clarity, and often fool humans into believing that they are speaking with another human. But they can’t do much else. They may even get to the level of conversing about normal day to day topics such as politics and social lives. That would be a remarkable achievement, yet the test wouldn’t ask the same AI to do other tasks such as classifying images or playing video games. While machine learning has proved adept at these tasks, asking the same highly tuned algorithm to switch between those tasks is still out of reach.
Let us clarify, machine learning is not AI. AI is a branch of computer science and philosophy focused on understanding how intelligence emerges from complex mechanisms. Machine learning, on the other hand, is where systems are programmed to find their own correlations in data sets and improve their models without being explicitly programmed, or ‘hard-coded’. 'Learning' might be the wrong word to use, because learning as a human does is not what the machine is doing. It is slowly, generally incrementally, improving its ability to correctly guess things. A machine learning program might need 100 images of dogs before it can correctly identify one. A human might only need one.
What do we mean by 'intelligence'?
When we consider ‘intelligence’ in machines, we need firstly to define intelligence, but more importantly, to ‘measure’ it. A recent paper written by Francois Chollet attempted a blueprint for measuring intelligence, called ‘On the Measure of Intelligence’. Working at Google, Chollet grew frustrated with the AI community’s focus on measuring the skill of their algorithms, and not the skill acquisition itself.
At first it may seem counter-intuitive to ignore the outright skill of the algorithms. A human who is highly capable at mathematics and can speak 5 languages is likely to be called intelligent. However, it is not the best measure of that person’s intelligence but more her ability. We can reasonably say she is intelligent because it is unimaginable that she was born fluent in algebra, Russian and Mandarin. By extension it would be foolish to claim Alpha Go is intelligent solely based on the evidence of its win over humanity’s best talent in the game of Go. We can make the claim only when we account for the method of its skill acquisition: its own experience.
Alpha Go is impressive to many leading researchers because it taught itself all of the skills it would need in order to compete in the game of Go. It was barely hard-coded, meaning it wasn’t set rules and methods to follow, rather through (a lot of) trial and error. This idea, called reinforcement learning, is to use many many simulations of games and allow the system to probe as many different strategies as it can, and encourage successful strategies while punishing poor ones. After millions of iterations, the algorithm will be highly tuned towards playing with success at a given game. We may say it has ‘learnt’ the game, or ‘taught’ itself, but that would imply it actually understands the reasoning behind the actions it makes, rather than simply recognising quantitative probabilities at different inputs and responding with the best performing move.
In this example, the algorithm has successfully performed skill acquisition, since before reinforcement learning it was useless, and after it was highly capable, with all the abilities stemming from its own experience. We can say this is intelligent behaviour. But, we should recognise two flaws here. First, the algorithm that is tuned to play Go will need to completely retrain itself in order to play chess. By doing so, it will also have to drop all of the tuning for Go, thus ‘forgetting’ how to play Go in order to play chess. A human wouldn’t need to do that. Second, the algorithm only learned after playing years and years of Go in simulations, requiring vast computing power. The algorithm as a result needs many, many more games to learn how to play chess or Go effectively versus a human. As such, it is ‘learning’ how to play in a much less efficient way than a human.
In the present day, the lack of transferability of skill-acquiring algorithms presents a problem for scientists looking to craft an intelligent system, but also allows us to recognise what intelligence is not: it is not playing a video game better than a human. One skill is trumped by many skills, and further, having a skill is trumped by the ability to learn another skill. If AI is to ever be considered intelligent of a human kind, it has a long way to go.