The case for taking this seriously
Artificial intelligence is advancing faster than almost anyone predicted. Systems that were research curiosities a few years ago can now write code, conduct research, and reason through complex problems. The gap between today's AI and a system that could outperform humans at most cognitive tasks may be measured in years, not decades.
This is not science fiction. The leading AI labs โ OpenAI, Anthropic, Google DeepMind โ are explicitly working toward artificial general intelligence, and they have the funding, talent, and compute to make rapid progress. The question is not whether highly capable AI is coming, but whether we'll know how to make it safe when it arrives.
Existential risk from AI โ the possibility that advanced AI systems could cause civilisation-scale catastrophe โ is taken seriously by a growing number of researchers, policymakers, and technologists. This is not because anyone thinks current systems are dangerous in that way, but because the trajectory of capabilities is steep and our understanding of how to align powerful systems with human values is still in its early stages.
The core difficulty is this: we don't yet know how to reliably specify what we want a highly capable system to do, verify that it's doing it, or correct it if it's not. These are hard technical and governance problems, and they need to be solved before โ not after โ we build systems where the stakes are highest.
Watch
Video coming soon
A short introduction to AI safety โ why it matters, what the risks are, and why the field needs more people working on it.
A framework for thinking about risks
There are several ways of categorizing AI risks, but we like to break them down into three categories.
Category 1
Robustness
AI systems can fail in unexpected ways โ producing confident but wrong answers, behaving differently in deployment than in testing, or breaking down when faced with situations outside their training distribution. Robustness research asks: how do we build systems that work reliably, even in novel situations?
This is the most "traditional" engineering framing of AI safety. It includes work on adversarial robustness, out-of-distribution detection, uncertainty quantification, and building systems that know what they don't know. As AI is deployed in high-stakes domains like medicine, law, and infrastructure, failures of robustness become failures with real consequences.
Research areas
- Adversarial robustness
- Hallucination reduction
- Out-of-distribution detection
- Uncertainty quantification
- Red-teaming
Category 2
Misuse
Powerful AI systems can be deliberately used for harm. This includes using AI to generate sophisticated disinformation, develop biological or chemical weapons, conduct large-scale cyberattacks, or enable mass surveillance. Misuse research asks: how do we prevent AI capabilities from being weaponised?
Unlike robustness failures (where the system breaks), misuse involves the system working exactly as directed โ but toward harmful goals. This makes it a sociotechnical problem that requires both technical safeguards (like output filtering and access controls) and governance solutions (like export controls, licensing, and international agreements).
Research areas
- Biosecurity screening
- Deepfake detection
- Access controls & licensing
- Cyber offense evaluation
- Content provenance
Category 3
Loss of Control
The most existential concern: what happens when AI systems become capable enough to pursue goals that diverge from human intentions โ and we can't course-correct? Loss of control research asks: how do we ensure that advanced AI systems remain aligned with human values and under meaningful human oversight?
This is the hardest category and the one most specific to AI safety as a field. It encompasses the alignment problem (making systems that actually want what we want), interpretability (understanding what's happening inside a model), and corrigibility (building systems that allow themselves to be corrected). As systems become more capable and autonomous, maintaining meaningful control becomes both more important and more difficult.
Research areas
- Alignment
- Interpretability
- Scalable oversight
- Corrigibility
- Goal misgeneralisation
- Deceptive alignment
Further reading
Some of the best starting points for understanding the full picture.
Wait But Why
The AI Revolution: The Road to Superintelligence
Tim Urban's iconic two-part series that introduced millions of people to the concept of superintelligence. Written in 2015, its core arguments about the pace of progress and the difficulty of alignment have only become more relevant. Long, accessible, and still one of the best introductions to the big picture.
AI 2027
AI 2027: A Scenario for the Next Two Years
A detailed, concrete scenario for how AI development could unfold through 2027 โ written by researchers with deep knowledge of the current trajectory. Explores how rapid capability gains could create alignment challenges faster than the field can solve them, and what that means for the world.
A Narrow Path
A Narrow Path: How to Secure Our Future
An interactive exploration of the challenges ahead and the narrow path humanity must walk to navigate advanced AI safely. A compelling visual guide to why the margin for error is small โ and why the work matters.