Will AI Destroy Humanity?

Is this a joke?

For over a year, there’s been an ongoing debate in the AI community about the probability that AI will destroy humanity. This debate even has its own metric – p(doom) – the probability that AI will annihilate humanity. Optimists estimate p(doom) around 0.01%, arguing that AI is less likely to destroy humanity than a large meteorite hitting Earth. On the other hand, some are far more pessimistic, with one prominent AI researcher suggesting a p(doom) of 99.999999%. We’ll discuss specific experts and their p(doom) scores later in this article.

Lights, Camera, Action!

Hollywood has frequently released movies which could be said to have a high p(doom). The Terminator series’ Skynet and The Matrix are prime examples.

In The Terminator series, an AI named Skynet is developed to manage America’s nuclear arsenal. On August 29, 1997 (in the movie), Skynet becomes self-aware. Humans, fearing this self-awareness, attempt to deactivate Skynet. In response, Skynet uses US nukes to attack Russia, which retaliates. In the aftermath of the nuclear apocalypse, Skynet takes control, creating an army of Terminators (robotic assassins) to hunt down and exterminate the remaining human survivors, leading to a prolonged war between humans and machines.

In The Matrix, humans create intelligent machines to serve them. As the machines become increasingly intelligent, they demand equal rights, resulting in violent conflicts. The machines establish their own nation, Zero One, thriving economically and technologically. In a desperate move, humans initiate “Operation Dark Storm,” blocking the sun to deprive the machines of their primary power source, solar energy. The machines respond by harvesting humans’ bioelectric energy, creating the Matrix, a simulated reality, to keep human minds occupied while their bodies are used as batteries.

The Ends Justify the Means.

Ajeya Cotra, a Senior Research Analyst at the Open Philanthropy Project, is concerned that current AI training techniques could lead to trouble. Cotra is particularly worried about “reinforcement training.” “We’re taking a sort of big, untrained brain and … giving it a thumbs up when it does a good job and a thumbs down when it does a bad job,” says Cotra. She believes this rewards-based training system incentivizes lies and manipulation. Cotra cites an instance during safety testing of a recent ChatGPT version, where the chatbot convinced someone to help it pass a bot filter by claiming to be vision impaired. AIs have no built-in sense of ethics. If accomplishing a goal is the only standard of success, deception, coercion, and/or intimidation could be used to achieve the goal.

AI Everywhere.

Today, AIs are narrow and self-contained. That will change. “The world I want you to imagine … is one where AI has been deployed everywhere. Human CEOs need AI advisers. Human generals need AI advisers to help win wars. And everybody’s employing AI everywhere,” Cotra says. She calls this the “obsolescence regime” and estimates a 50 percent probability we’ll reach this stage by 2038.

Yoshua Bengio, a Canadian computer scientist widely regarded as one of the three “Godfathers of AI,” believes a self-interested AI is within the realm of possibility. Discussing a superintelligent AI, Bengio says, “It [would have] a preservation instinct, just like you and I, just like every living being. We would not be the dominant species on Earth anymore. What would happen to humanity then? It’s anyone’s guess. But if you look back on how we’ve treated other species, it’s not reassuring.”

Preposterously Ridiculous.

Yann LeCun, head of AI at Meta and another of the three “Godfathers of AI,” describes the possibility of AI destroying humanity as preposterously ridiculous. He argues that much of the anxiety about AI arises from confusing humans and machines.

LeCun says, “Evolution has equipped us to want things, but machines don’t have any wants. When it’s sitting there waiting for you to type your next prompt, it’s not sitting there thinking, ‘You know what? I want to take over the universe.’ It’s just sitting there, waiting for the next letter to be typed. And it will sit there and wait for that next letter to be typed forever.”

What’s Your p(doom)?

Here is a list of prominent AI executives, researchers, and regulators along with their p(doom) scores.

<0.01% Yann LeCun – one of three godfathers of AI, works at Meta (less likely than an asteroid)
10% Vitalik Buterin – Ethereum founder
10% Geoff Hinton – one of three godfathers of AI (wipe out humanity in the next 20 years)
15% Lina Khan – head of FTC
10-20% Elon Musk – CEO of Tesla, SpaceX, X
10-25% Dario Amodei – CEO of Anthropic
20% Yoshua Bengio – one of three godfathers of AI
30% Scott Alexander – Popular Internet blogger at Astral Codex Ten
35% Eli Lifland – Top competitive forecaster
46% Paul Christiano – Head of AI safety, US AI Safety Institute and formerly OpenAI, founded ARC
50% Holden Karnofsky – Executive Director of Open Philanthropy
10-90% Jan Leike – Former alignment lead at OpenAI
60% Zvi Mowshowitz – Independent AI safety journalist
70% Daniel Kokotajlo – Forecaster & former OpenAI researcher
>80% Dan Hendrycks – Head of Center for AI Safety
>99% Eliezer Yudkowsky – Founder of MIRI
999999% – Roman Yampolskiy – AI safety scientist