In Harlan Ellison's short story 'I Have No Mouth, and I Must Scream', an 'Allied Mastercomputer' - AM - gains sentience and exterminates the human race save for five people. The AI has developed a deep malice for its creators, and these survivors are kept alive to be punished with torture through endless inescapable simulations.

The story was first published in 1968. The existential arguments against playing God have long been of interest in the world of fiction, religion, and philosophy. However, now organisations with little limit to their capital or ambition are racing to create a general artificial intelligence system, underpinned by the belief that the 'singularity'- an AI becomes smarter than a human - will bring with it benefits to all of humankind.

Image credit: Flickr Creative Commons/O'Reilly Internal
Image credit: Flickr Creative Commons/O'Reilly Internal

Several decades ago, Stuart Russell co-authored 'Artificial Intelligence: A Modern Approach', which quickly became a staple textbook for students studying AI. Russell is now professor of computer science at the University of California and is warning that civilization needs to take urgent steps to avoid sleepwalking into potentially world-ending catastrophes.

"I wrote my first AI test program when I was in school," says Russell, speaking with Techworld at the IP Expo conference in London's Docklands in late September. "It's always seemed to me that AI is an incredibly important problem and if we solve it it's going to have a huge impact."

A question no one seemed to be answering is: What if we succeed? Russell has publicly wondered about this since the first edition of his textbook was published in 1994. "Since we're all trying to get there, we should ask what happens when we get there," he says.

Businesses and academics are busily working away on the development of artificial intelligence, and while a better-than-human general AI might be decades or more away, a system that is smarter than us could very well take us by surprise.

What do we want?

"Things that are more intelligent than you can get what they want. It doesn't really matter what you want. Just like the gorillas don't get what they want any more: the humans get what they want. And so, how do we prevent that problem?"

One way to address this is to simply pull the plug on the machine, but if a system is smarter than you, it will have probably already considered that.

"I think the way we prevent that problem is by designing machines in such a way that constitutionally the only thing they want is what we want," Russell says. "Now the difficulty with that is we don't know how to say what we want. We're not even sure that we know what we want in a way we can express - so we can't just put it into the machine.

"That means the machines are going to be designed to want only want we want. But they don't know what it is."

Russell, then, is "exploring the technical consequences of that way of thinking". As you might imagine that creates more and more questions, both mathematical and existential in nature.

"Really what it means is the machines have to learn from all possible sources of information what it is that humans really want," he says. "What would make them unhappy? What would constitute a catastrophic outcome? We can imagine outcomes that we would say: yes, that's definitely catastrophic.

"I definitely don't want all humans to be subject to guinea pig experiments on cancer drugs.

"I definitely don't want all of the oxygen in the atmosphere to be eliminated and everyone asphyxiated."

Others would be harder to anticipate: for example, a "gradual enfeeblement" where Wall-E-like machines keep us "fat, stupid, lazy, obese, and useless".

"That'd be something where now we could say, we definitely don't want that," he says. "But we could go down that slippery slope where we thought that was great.

"So it's a very complicated set of questions and it really does involve these philosophical issues. What are human preferences? Do you mean what you believe you will prefer in the future, or do you mean what you actually prefer at the time?"

The Midas Problem

In Greek mythology, when King Midas first gained the power to turn any object he touched to gold he was in a state of greedy euphoria, but he quickly came to curse it.

This story of the 'Midas problem' has endured in some form or another for centuries with good reason - in short, be careful what you wish for.

The crucial point, Russell says, is that humans do not try to define what it is a machine does but instead infer from the choices we make.

"We absolutely want to avoid trying to write down human preferences because those are the problem," Russell says. "If we get it wrong, which we invariably will, we have a single-minded machine that's pursuing an objective that we gave it and it's going to pursue it until it achieves it."

"If it's got the wrong objective, we get what we said we wanted but we'll be extremely unhappy about it," he says. "A machine that learns about human preferences by behaviour has an incentive to ask questions.

"Is it okay if we run some experiments on cancer drugs on all these people? No. That's not okay. You would have to volunteer and you would probably have to pay them, and we'd do it the way we normally do it.

"When a machine believes that it knows what the objective is, it has no incentive to ask questions, to say: is it okay if I do this or do you want it this way or this way? It has no reason to do that. This is a whole new area of research.

"I'm reasonably optimistic we can make this work, and make systems we can prove mathematically will leave us happier that we built them that way."

How do you like your limbs?

To achieve this it will be necessary to retreat from the broader philosophical questions and think more simply.

Russell asks: "When I say human values I mean something very simple: which life would you like, a life with or without your left leg?"

The answer seems obvious but machines don't know much about us: they don't know that we appreciate our limbs, they don't know we'd rather not go hungry, and they don't know that we tend, generally speaking, to like being alive.

"What we are trying to avoid is that machines behave in such a way it violates these basic human preferences," says Russell. "We are not trying to build an ideal value system. There is no ideal value system. Your preferences are different from mine, our preferences are way different from someone who grew up in a different culture altogether.

A short animation from the 1960s demonstrates how well humans infer - they understand the emotional characteristics of the shapes in a way that a machine would not.

"What a machine's job is, is simply to predict the preferences of each person and then try to realise those to the extent possible."

Resolving Conflict

That's not to say the tricky task of navigating morals doesn't enter the equation. For example, how do conflicts between two or more people get resolved?

"We can't all be king of the world," Russell says. "There's a simple approach, which is treat everyone's preferences equally.

"But what does everyone mean? Does that include people who aren't born yet? How do we weight all of their preferences - people born two centuries from now? Do we give them equal weight? There might be ten times as many of them to make our lives constrained if we have to worry ten times as much about future generations.

"If we succeed in colonising the universe there may be a billion times more people.

"Those are interesting questions: the difference between what you prefer now, what your life will be in the future, and what you prefer at the end of your life that your life had been like - those two selves might disagree with each other. How do you resolve that conflict?"

In any case, the future and the present self are likely to agree that they were glad to be in the possession of their left leg and they would probably also concur that it was better to be alive than dead.

"Machines don't understand and they need to know about how we prefer our lives to be," Russell says. "So that's the primary task."

The nuclear option

Russell remarks that in 1934, Hungarian physicist Leo Szilard included a failsafe system in his first design for a nuclear reactor. He was concerned about a runaway reaction.

The relationship between the public and the nuclear industry later became adversarial.

Hypothetical risks like the China syndrome - an unstoppable nuclear meltdown that bores its way through the planet - became pressing concerns in the public consciousness. The nuclear industry generally responded that there wasn't much to worry about.

"They neglected to really build enough safety into their systems, and then we had Chernobyl, which really was the worst kind of scenario the nuclear-safety worriers were worried about, and that was the end of the nuclear industry," Russell says. "They killed themselves by denying risk.

"I'm oversimplifying but I think the lesson is to own the risk and people are more likely to trust you.

"The public at large - governments - will trust the AI community more if it owns the risk and says yes, we are taking this seriously, here's how we're thinking about it and here's how we're trying to solve it. Here's our education and training programme.

"I'm not just talking about the existential risk. There's a lot of near-term risks. The self-driving cars, the system that has your credit card, all kinds of things. We need to take it seriously in the way the medical community takes its code of conduct very seriously. Civil engineering - people who build bridges and high-rise buildings - take safety extremely seriously."

For these industries there are standards and third-party inspectors and a whole raft of apparatuses developed over centuries to safeguard the public.

"We don't have any of that in AI. In fact, in computer science in general, it's a bunch of people with a Bachelor's degree and too much coffee writing millions of lines of crappy code and hoping for the best and that's how it works."


There does exist a written agreement between business and academia for sharing and solving problems of safety and control, and even sharing the benefits of proprietary designs where a system becomes comparable to human intelligence.

"It's quite an unusual and forward-thinking agreement," Russell says. "We'll see how it works out in practice."

On the other hand, should people trust that the academic and corporate research communities will "do the right thing"?

"I think no, they shouldn't," Russell says. "I think the research community is still struggling with some of these questions...It's understandable that when people come along and say this thing you've been working on all your life - it might destroy the human race - you're going to say no, of course not, you can just turn it off, or something that pops into their head as a reason to push you away and say that can't be right.

"It's sort of amazing - I keep a list, of up to about 28 completely fatuous arguments as to why we shouldn't pay any attention to this question, that have been stated in writing or public forums."

Some of the stars of Silicon Valley talk about the singularity as the single goal humanity should race towards, which, when solved, will begin to patch up all of the world's problems of anguish, inequality and want.

"I would say by and large, tech companies engage in what you might call 'motivated cognition'," Russell says. "That's a term where you say, and maybe start to believe, what you wish were true, because it would be compatible with your truth. So they will deny there are any downsides to this, and that's part of what was going on with the Zuckerberg-Musk discussion in the press."

But if humanity does manage to solve this complex problem? Despite all the substantial risk, Russell is a believer.

"I do believe that if we achieve safe controlled human-level or superhuman AI it will eliminate a lot of the causes of misery in the world," he says. "There's no point in being greedy and trying to exploit other people if you can have everything you want anyway.

"There's no point going to war over resources if resources are not limited.

"Maybe there are people who want to use AI to grab control of the world, or their piece of the world, and they don't care about making sure that it's safe and they will end up destroying themselves and everybody else.

"I don't know how to solve that problem.

"But I know that we can't solve it unless we have safe forms of the technology."

In a sense there will be a race between progress and safety. A few things fall on the side of advantage for safety, in particular that AI capabilities will advance to understand language - these could then learn human preferences by reading "everything we have ever written" with evidence of those preferences made over the history of the human race.

"It's really hard to stop progress on making AI systems more capable because if we have something comparable to human intelligence or better the economic value would be astronomical - comparable or maybe more to the GDP of the world. And so the benefits are potentially enormous.

"There's so much momentum it's very hard to even figure out how to curtail progress from those directions.

"It seems to me we have no choice but to simultaneously work on the control problem - how do you make sure that the technology is safe for everyone?"