Can AI Forecast a Nuclear War Before It Breaks Out?

May 6, 2026

How accurately can AI anticipate what lies ahead? In pursuit of an answer, a number of AI researchers are turning to earlier periods for perspective. Talkie, unveiled recently, stands as both a newcomer and an elder within the AI lineup. It was trained exclusively on material published before December 31, 1930, absorbing a spectrum of sources—from dusty tomes and scientific journals to newspapers, as well as poetry compilations, etiquette handbooks, and encyclopedias.

 

Cut off from the internet, the “vintage” language model carries an unmistakable old-money charm—a refreshing contrast to ChatGPT’s pitch of delivering “a new way of working that helps people operate at a fundamentally different speed.” When I inquired about advice for young people, Talkie suggested measures such as living temperately and with integrity while avoiding idleness and dissipating habits that might undermine future independence.

Yet the creators of Talkie view the AI’s potential beyond mere nostalgia. Considering the limitations of current large language models, they believe the tool could assist policymakers in evaluating how well contemporary models can forecast—i.e., render explicit predictions about the future rather than broad statements—across long horizons. And this pursuit of validating AI’s predictive modeling is growing more urgent as advocates press the U.S. government to embed the technology into risk assessments and decision-making.

For officials, forecasting is a crucial step in shaping long-term policy choices. For instance, the National Intelligence Council has delivered its Global Trends report to every incoming president since 1997, outlining what the world might look like 15 to 20 years down the line. Yet AI proponents argue the technology can offer policymakers more than wide-ranging scenario planning: it could provide explicit probabilities tied to specific decisions, updated in near real-time as new information comes in.

The push for the government to adopt AI-driven analyses has grown stronger as the models have become more accurate and more cost-effective. “Over the past year, AI has moved from being barely helpful to now delivering predictions that exceed those of the majority of humans,” said Peter Wildeford, a leading forecaster and head of policy at the AI Policy Network, in an interview with The Dispatch.* He noted in his own forecasting that he uses AI to conduct research before applying his own critical judgment to verify and adjust the AI’s conclusions.

The team at Metaculus, a forecasting platform built on open-source principles, has likewise observed a rise in AI’s capacity to anticipate events. “Bots are ahead of the general public, competitive with those who enter this space, but still behind the seasoned pros,” explained Metaculus engineer Benjamin Wilson. The platform’s research indicates trend lines where AI systems are expected to start outperforming professional forecasters by June 2027.

The boost in AI forecasting capability means that what once required substantial resources to generate reliable estimates can now be achieved with far less expenditure. As a result, it becomes cheaper: employing human forecasters to predict outcomes can cost thousands of dollars, whereas AI models cost only a fraction of that and can operate faster, around the clock. “You can’t, each time you begin probing a strategic question and building scenarios, simply lock away experts for two days,” said Anthony Vassalo, director of RAND Corporation’s forecasting initiative. “We needed to develop AI tools to handle that.”

Vassalo, who has held senior roles within the intelligence community, is among the experts who see meaningful government applications for AI’s predictive prowess.

His forecasting approach unfolds as follows: start with a strategic question—such as the prospect of a Chinese invasion of Taiwan—and let AI tools first break down the core drivers that could enable or inhibit an invasion, including economic interdependence, military posture, and political will, as well as observable signals indicating the direction of each factor. Human experts then validate the AI schema before the AI partitions the drivers into a set of plausible future scenarios—ranging from nuclear war to a full-scale invasion, a blockade, or limited action. From there, the AI can generate hundreds of questions and answer them in parallel, acting like a crowd-sourced forecaster to produce initial probabilities cheaply and at scale.

Nevertheless, the precision and reliability of AI forecasts often hinge on the system employed. While widely available large language models can be useful for tracing trend lines or broadly assessing how uncertain a policymaker should be about a given event, more sophisticated agentic AIs offer greater specificity. Consider RAND’s Mantic AI system, for example. Developed by British entrepreneur and former Google DeepMind researcher Toby Shevlane, the model performs better than both standard LLMs and human forecasters.

Yet humans remain essential to the forecasting process. After models like Mantic pose questions and present initial probabilities, experts are needed to explain the “why” behind the estimates, translating the AI-produced results into actionable guidance for policymakers. “It’s not useful to tell someone there’s a 32 percent chance of something happening. You have to explain there’s a 32 percent chance, why that is, what it was last week—say, 12 percent—and why it’s changing,” Vassalo told The Dispatch. “That gives you a trend line, and it tells you why you should care and how you should influence it.”

This human intervention also helps counteract AI’s frequently “jagged” reasoning, noted Haifeng Xu, an assistant professor of computer science at the University of Chicago who identifies as broadly optimistic about AI forecasting. Xu contributed to a benchmark assessing the forecasting capabilities of various AI models and found that each model exhibits slightly different patterns or quirks. He observed that models like Google’s Gemini tend to be more “conservative,” preferring to stay close to available data with minor adjustments, while others like DeepSeek tend to produce bold, sometimes extreme, probabilistic forecasts.

A separate study by scholars at King’s College London offers perhaps the clearest example of how AI reasoning can diverge from human judgment. When placed in simulated war games, AI models escalated conflicts by threatening nuclear strikes in 95 percent of scenarios, with Claude and Gemini showing particular propensity to push the red button.

Another hurdle for AI predictive modeling is the unreliability of long-term assessments. Most modern models learn from online data, making it difficult to test the accuracy of their long-range forecasts. For example, you cannot confine Gemini’s knowledge to December 2010 and then ask it to forecast the future uptake of autonomous vehicles, since the model has already absorbed newer information.

Older, vintage models, by contrast, may be less vulnerable to such contamination. This is where Talkie could play a role. “Talkie will demonstrate whether it is feasible to develop AI systems capable of predicting questions of interest over considerably longer time horizons than we can presently manage,” said Nick Levine, one of the model’s co-creators. While comparing AI success in predicting the future against humans using historical events is tricky, Levine believes it is possible. “We can examine how well current human forecasters predict the future, then reproduce new equivalent questions for the past and assess how well AI would have performed if forecasting had existed in 1929 compared with humans,” he told The Dispatch.

Yet Talkie’s modern-use limitations remain: it is prone to hallucinations, and its data corpus is much smaller than that of most contemporary AI models. Moreover, many obstacles hindering the government’s embrace of AI forecasting extend beyond the models’ own capabilities.

For instance, some analysts and proponents warn that more accurate projections could incentivize global actors, especially foreign adversaries, to behave more unpredictably to avoid preemption. Robin Hanson, an economics associate professor at George Mason University, compared the emerging geopolitical landscape to a tennis match: if one player knows the exact frequency with which the other will strike across the court, the opponent can randomize his next move to erase the other player’s advantage.

Ultimately, the world remains “highly uncertain” and “hard to predict,” Hanson told The Dispatch. “It’s hard to truly imagine a world different until you present alternative histories.”


Correction, May 5, 2026: This story has been updated to reflect the correct title of Peter Wildeford.

Pilar Marrero

Political reporting is approached with a strong interest in power, institutions, and the decisions that shape public life. Coverage focuses on U.S. and international politics, with clear, readable analysis of the events that influence the global conversation. Particular attention is given to the links between local developments and worldwide political shifts.