Self Help

Superforecasting - Philip E. Tetlock

Author Photo

Matheus Puppe

· 53 min read



  • The chapter introduces Bill Flack, an ordinary man who has proven to be an excellent forecaster based on his performance in forecasting geopolitical and economic events.

  • Unlike forecasting celebrities like Tom Friedman, Bill Flack’s forecasting accuracy has been rigorously evaluated in forecasts he has submitted as part of a forecasting tournament.

  • The chapter notes there are thousands of volunteers in the tournament who provide forecasts, and about 2% of them have proven to be “superforecasters” with excellent prediction accuracy, like Bill Flack.

  • These superforecasters come from all walks of life, including engineers, lawyers, artists, scientists, Wall Street professionals, and more.

  • The book aims to explain what makes these superforecasters so skilled at forecasting, and how others can learn to improve their own forecasting abilities.

  • It notes that while experts’ forecasts are widely relied upon, their actual forecasting accuracy is rarely measured, in contrast to how performance statistics are routinely tracked for decisions in other fields like sports.

So in summary, the chapter introduces the concept of superforecasters who have proven high accuracy through rigorous evaluation, in contrast to the typical reliance on famous experts whose forecasting skills are untested. It sets up explaining what makes superforecasters effective.

  • The passage discusses a famous story about a researcher who studied thousands of predictions from experts in areas like economics, politics, and world events. He found the experts did only slightly better than random chance at predicting outcomes.

  • This became a popular joke, suggesting experts were no better than chimpanzees throwing darts. However, the researcher notes his actual study was more nuanced and found experts could predict shorter-term outcomes somewhat better.

  • The researcher initially didn’t mind the joke but grew tired of how it was misinterpreted and mutated over time to suggest all expert forecasts are useless. His research didn’t actually support such an extreme conclusion.

  • The researcher describes himself as an “optimistic skeptic.” He acknowledges forecasting limits but believes it is possible to foresee some future events, at least to an extent, if one cultivates the right skills.

  • The passage uses the example of Mohamed Bouazizi, a Tunisian street vendor whose self-immolation sparked the Arab Spring, to illustrate the limits of prediction. No one could have predicted this single event would have such huge repercussions.

  • The concept of the “butterfly effect” is discussed - how tiny initial changes can greatly affect complex systems, illustrating inherent limits in predicting the future.

  • Laplace imagined a hypothetical entity (“demon”) that knows everything about the present state of the universe. He believed such an entity could perfectly predict the future like it knows the past.

  • Lorenz’s work on chaos theory showed that even tiny errors in measurements could lead to vastly different outcomes over time, undermining perfect predictability. Climate and weather are highly sensitive to initial conditions.

  • While the future is not perfectly predictable due to complexity and nonlinearity, many aspects of daily life are still predictable to varying degrees depending on timeframe and circumstances. Things like traffic patterns, billing cycles, sunrise/sunset times can be forecast reliably.

  • Reality consists of both predictable “clocks” and unpredictable “clouds”. Predictability depends on the system and timeframe. Weather is fairly predictable a few days out but less so a week ahead. Markets can remain predictable for years but suddenly change.

  • The relationship between time and predictability is also complex - physics can be predicted far into the future while ecological impacts like extinction events are very hard to foresee. Overall, separating predictable from unpredictable requires careful analysis of specific systems and conditions.

  • Forecasting is often done without rigorously measuring accuracy, so there is no way to learn from mistakes and improve over time. Some key domains like politics, economics and intelligence analysis have never seriously tried to measure their forecasting accuracy.

  • The Good Judgment Project, led by Tetlock, conducted rigorous forecasting experiments over 4 years with thousands of volunteer forecasters. It found that some “superforecasters” had real skill at forecasting geopolitical and economic events out to 1-1.5 years in the future.

  • Studying what superforecasters do revealed their foresight is a result of certain thinking habits and information gathering/processing techniques, not innate gifts. These skills can potentially be learned by anyone motivated to improve.

  • Even basic training on forecasting principles improved accuracy by 10% in the GJP experiments. Small improvements sustained over time can make a big difference.

  • But overall, forecasting remains an area with huge potential for progress if more domains adopted rigorous measurement and worked to understand what drives accuracy so those insights could be applied more broadly.

  • The chapter explores the psychology that leads us to think we know more than we really do, even about our ability to forecast accurately. This “illusion of knowledge” has historically hindered progress, especially in medicine.

  • For centuries, doctors relied on experience and perception rather than scientific testing to determine what treatments work. This limited medical advances. A similar revolution is needed in forecasting - rigorous testing and evaluation is required.

  • The author conducted one of the earliest large tests of expert political forecasting accuracy in the late 1980s. While most experts were not very accurate, some did show a modest ability to foresee events. What differentiated these experts was not specific information or beliefs, but how they thought - in a way that was open-minded, careful, curious, and self-critical.

  • Inspired by this, DARPA later created a forecasting tournament that uncovered “superforecasters” - those with an exceptionally high degree of accuracy. Understanding what makes them superior is the focus of subsequent chapters, with the key being how they think, not just intelligence, knowledge or time spent. Rigorous self-improvement is also important.

The passage describes an incident in 1956 where Archie Cochrane underwent surgery for what his specialist diagnosed as terminal cancer. However, when the pathologist examined the removed tissue, they discovered there was no cancer at all.

Cochrane had been given a death sentence that turned out to be completely wrong. This highlighted issues with how medicine was practiced historically - specialists and physicians were often too confident in their own judgments and did not consider alternative possibilities or wait for conclusive evidence. Treatments over centuries were frequently ineffective or harmful, yet medical theories and practices changed little over time.

Experimentation began gaining acceptance in the early 20th century as a way to more rigorously evaluate treatments. Randomized controlled trials became the gold standard for determining what works through careful measurement and statistical analysis, as opposed to relying on anecdotal evidence or physicians’ asserted expertise alone. Cochrane’s misdiagnosis showed how important it was to introduce more scientific standards to medicine.

  • Randomized controlled trials have become standard for medicine today, but were revolutionary when first introduced because medicine had historically not been a scientific practice. It dressed as a science but lacked scientific rigor.

  • Archie Cochrane was a pioneer in advocating for randomized controlled trials to determine what medical treatments actually worked. He was frustrated that the medical establishment resisted scientific validation and empirical evidence.

  • Cochrane proposed one of the first randomized controlled trials to compare cardiac care units to home care for heart attack patients. Physicians resisted, saying the units were obviously superior, but the trial found no significant difference.

  • Cochrane advocated that policies and approaches should be subjected to randomized trials to determine their actual effects, rather than assumed to work based on intuition alone.

  • Yet even Cochrane acquiesced to a doctor’s diagnosis of terminal cancer without demanding scientific evidence or a second opinion, highlighting the difficulty of overcoming intuitive thinking.

  • Modern psychology describes two systems of thinking - System 1 is fast, intuitive thinking while System 2 is slower, more deliberate and analytical. Cochrane advocated moving from System 1 thinking to System 2 thinking in medicine through randomized trials.

  • The bat-and-ball cognitive test shows that people often give an intuitive but incorrect answer (“ten cents”) without reflecting further. This reveals how our thinking can be driven by initial hunches from System 1 thinking.

  • In the ancestral environment, quick System 1 judgments were often beneficial for survival. Gathering all evidence carefully could be too slow when facing threats.

  • However, System 1 tends to accept the available evidence as sufficient and reliable without questioning it. Kahneman calls this “What You See Is All There Is” (WYSIATI).

  • People have a strong urge to explain and make sense of things. Split-brain experiments show how people will confabulate explanations even when they have no real reason for their actions.

  • Reactions to events like the Oslo attacks demonstrate how people rush to a plausible initial hypothesis and gather supporting evidence without seriously considering alternatives, displaying confirmation bias.

  • Scientists are trained to approach hypotheses with careful doubt rather than certainty, seeking evidence that could disprove ideas rather than just support them. This runs counter to human nature but builds more accurate understanding.

  • Intuitions and snap judgments can sometimes be accurate due to pattern recognition from extensive experience, as in firefighters accurately sensing danger from subtle cues in a fire. However, intuitions are also prone to false positives if the person lacks sufficient valid cues to recognize patterns.

  • Kahneman and Klein, who originally had differing views on intuitions, agreed that intuitions from experts in domains with valid cues available can be trusted more than domains like stock picking where accurate patterns may be harder to discern.

  • Even strong intuitions benefit from conscious reconsideration and “double checking” as the chess champion Magnus Carlsen does, since intuitions can still be wrong at times.

  • While the tip-of-your-nose perspective from intuition can provide insights, it is also prone to illusions so conscious thought and reconsideration is advisable if time allows before making important decisions, as intuitions deemed obvious may later turn out false. Proceeding with caution is recommended even when intuitions seem convincing.

  • In the early 1980s, many experts warned that a nuclear holocaust between the US and Soviet Union was not just possible but inevitable if nuclear arsenals were not reduced or eliminated. Famous writer Jonathan Schell wrote that a holocaust “not only might occur but will occur.” Large anti-nuclear protests occurred.

  • In 1984, a distinguished panel convened by the National Research Council and funded by major foundations was charged with “preventing nuclear war.” The panel included several luminaries and Nobel laureates. Tetlock, then a young professor, was also included due to relevant research.

  • The panel invited many experts, including intelligence analysts, military officers, officials, and scholars, to discuss the issues. These experts were also very impressive and confident in their assessments of what was happening geopolitically and where things were headed.

  • However, Tetlock notes that judging the accuracy of predictions and forecasts turned out to be much more difficult than supposed. Ambiguities in language and undefined terms made it hard to definitively say whether predictions were ultimately right or wrong. Keeping score proved challenging.

  • Experts disagreed on who would succeed the ailing Soviet leaders in the early 1980s, but most expected another Communist hardliner.

  • Konstantin Chernenko did become the next leader but died soon after. Unexpectedly, Mikhail Gorbachev was then selected as general secretary.

  • Gorbachev ushered in new reforms of glasnost and perestroika that liberalized the Soviet Union and improved relations with the US. Few experts saw this change coming.

  • After the fact, experts spun explanations for why the change was actually predictable or inevitable, to save face after their incorrect predictions. They downplayed their failures.

  • Experts often make vague and undefined forecasts without clear timeframes or probabilities, making their predictions difficult or impossible to really test against outcomes.

  • Even defining probabilities poses challenges, as Sherman Kent discovered - the same term like “serious possibility” could be interpreted very differently by different people.

So in summary, the passage questions the predictive abilities of experts and their tendencies to rationalize failures after the fact, while also outlining challenges in really judging forecasts due to vague language, unclear timeframes and differing views of probabilities.

  • Sherman Kent advocated using precise numerical probabilities rather than vague terms like “serious possibility” in intelligence estimates. This would reduce misunderstandings.

  • However, different people interpreted such vague terms very differently - one thought “fair chance” meant 80% odds of success, another thought it meant 20% odds.

  • The Bay of Pigs invasion was authorized based on a “fair chance” of success assessment, but the actual assessment was 3:1 odds of failure, showing how vague terms can mislead.

  • Kent proposed a standardized scale assigning numerical ranges to common probability terms like “probable” to resolve this issue. But it was never adopted.

  • Some resisted numbers due to a misconception that they imply false certainty, rather than expressing subjective estimates. Others had an “aesthetic revulsion” to numbers.

  • Importantly, vague terms allow for more twisting of meaning after the fact if the event occurs or not, escaping accountability. Numbers do not permit this.

  • Only by generating many probabilistic forecasts over time can their accuracy be assessed based on calibration, via seeing if stated probabilities match observed frequencies - but this requires standardized numerical probabilities.

  • The passage says she is overconfident, meaning the things she says are 80% likely to happen actually only happen 50% of the time. This means she overestimates the likelihood of her predictions coming true.

  • Being overconfident means estimating probabilities of events too high (under the line on the graph). Someone who is properly calibrated would have their estimated probabilities match the actual outcome rate (on the line).

  • So in this case, when she estimates something is 80% likely, it is actually only 50% likely based on past performance. She is overconfident in her abilities and overestimates how often her predictions will turn out to be correct.

  • The research program, called Expert Political Judgment (EPJ), studied the accuracy of experts’ forecasts and predictions across various domains.

  • The results showed that on average, experts were about as accurate as random guesses or a dart-throwing chimpanzee.

  • However, there were two groups - one group did slightly better than random, barely beating simple algorithms, while the other group did worse than random.

  • The critical factor was not what the experts thought, but how they thought. One group organized their thinking around “Big Ideas” and were overconfident ideologues called “hedgehogs.” The other took a more pragmatic, data-driven approach considering multiple perspectives, called “foxes.”

  • The foxes significantly outperformed the hedgehogs in prediction accuracy. The hedgehog style, despite greater confidence, led to distorted judgments through an overly ideological lens.

  • Famous experts tended to be hedgehogs who told simple, confident stories that were attractive to media but did not correlate with accuracy. Foxes were less attractive due to acknowledging uncertainties.

So in summary, the research found ideology and rigid thinking styles hampered experts’ abilities to make accurate predictions compared to a more pragmatic, data-driven approach.

  • Francis Galton observed people accurately guessing the weight of an ox at a fair by taking the average of many random guesses. This demonstrated the “wisdom of crowds” phenomenon where aggregating judgments beats individual accuracy.

  • Aggregation works best when people have differing pieces of information that add up to a more complete picture when combined. Errors cancel each other out while valid information points in the same direction.

  • Richard Thaler held a contest where people guessed a number between 0-100 that was related to the average guess. Many reached 0 through logic but the actual average was 18.91, showing people don’t always behave logically.

  • To improve judgments, one should consider multiple perspectives like logic/illogic and factor those into their own guess, similar to how Captain Kirk synthesized Spock and McCoy’s views. Considering more perspectives further sharpens judgment.

  • The best analogy is a dragonfly which has thousands of lenses on each eye, giving unique perspectives that are synthesized into near 360-degree vision - a fox wishing to aggregate information well should aim to consider problems from many vantage points.

The passage captures a key reason why the foresight of foxes is superior to that of hedgehogs - foxes are better able to aggregate multiple perspectives. While aggregation does not come naturally to humans, foxes tend to engage in the hard work of consulting other perspectives through temperament, habit, or conscious effort.

Annie Duke, an elite poker player and psychologist, gives an example demonstrating this. In a poker training seminar, she walks students through hypothetical scenarios where they hold strong hands. When an opponent raises the bet, the students incorrectly assume the opponent must have a strong hand too. However, through roleplaying different perspectives, the students realize they would not themselves raise with a strong hand, because they don’t want to scare off their opponent. Only by considering alternative views do they understand their initial assumption was flawed. While the students are experienced and passionate poker players, it takes this exercise for them to appreciate different perspectives. This shows the difficulty of escaping one’s own narrow viewpoint without deliberate effort to aggregate views like foxes do.

  • Saddam Hussein was playing a game of hide-and-seek with UN arms inspectors in 2002-2003 regarding Iraq’s weapons programs. This risked triggering an invasion by the US and its allies and the downfall of Saddam’s regime.

  • It’s difficult for historians and analysts to truly put themselves in the position of leaders at the time, without the benefit of hindsight. We have to be careful not to substitute the easier question of “Were they correct?” for the harder question of “Was their judgment reasonable at the time?“.

  • While the US intelligence community’s (IC) conclusion that Iraq had WMD programs turned out to be incorrect, their judgment at the time could still have been reasonable given the information available. The issue is they were too confident in their conclusion and failed to properly consider alternative possibilities or that they could have been wrong.

  • If the IC assessment had acknowledged more uncertainty, it may have influenced Congress’ decision on authorizing the Iraq war. But the IC assessments spoke definitively without recognizing any uncertainty, which was a major error.

  • This suggests Saddam was taking a risky gamble in playing games with inspectors, as it increased the chances of an invasion that could lead to his downfall, but it’s difficult to fully judge his motivations and decision-making at the time.

  • IARPA (Intelligence Advanced Research Projects Activity) planned to sponsor a massive forecasting tournament to test new methods for making predictions about geopolitical and economic events, as recommended in a National Research Council report.

  • The tournament would focus on “Goldilocks zone” questions that were neither too easy nor too difficult to predict, with timeframes ranging from 1 month to 1 year.

  • Research teams would compete to beat the control group’s combined “wisdom of the crowd” forecast, with the margin of victory expected to increase over the 4 years. Teams could also run experiments within their groups.

  • The Good Judgment Project recruited thousands of volunteers online to participate. Their forecasts would be aggregated using a method that weighted top forecasters more heavily and “extremized” the forecasts.

  • Surprisingly, the Good Judgment Project forecasts, made by ordinary people, consistently beat forecasts from intelligence professionals with classified information. This suggested crowdsourcing could complement traditional analysis.

  • One top forecaster, Doug Lorch, a retired computer programmer, outperformed other forecasters with his high volume and accuracy of predictions over the first year. This challenged ideas about who could analyze geopolitics.

  • Doug Lorch was an extraordinarily successful forecaster in IARPA’s forecasting tournaments, outperforming other individual forecasters, prediction markets, algorithms, and even intelligence analysts who had access to classified information.

  • However, there were 58 other top performers like Doug in the first year, known as “superforecasters”. As a group, superforecasters significantly outperformed other forecasters.

  • While superforecasters’ results seem to indicate strong forecasting skill, their success could potentially be explained by randomness. With thousands of people making forecasts, some are likely to have more accurate results just by chance, not necessarily skill.

  • People often misinterpret or overestimate the significance of random success stories. Just as some coin tosses will naturally land on the right side more often than others through chance, some forecasters in a large group are likely to have more accurate results due to luck, not ability.

  • So while superforecasters’ results seem impressive, more evidence is needed to conclusively prove their success is due to real forecasting skill rather than random chance, given the large number of people making predictions in the tournaments. Their performance deserves further testing and analysis.

  • When a corporation or executive becomes hugely successful, they often write a book attributing their success to certain qualities or actions. However, these books often don’t provide solid evidence that these factors truly caused the success, and they rarely acknowledge the role of luck.

  • Regression to the mean is an important statistical concept - very good or very poor performances tend to move closer to the average over time due to random variation/luck. This provides a way to test how much skill vs. luck influences results.

  • The superforecasters defied expectations by improving rather than regressing to the mean across years. This suggests teamwork and recognition elevated their performance above luck.

  • However, individuals within the group did still show some regression to the mean over time, with about 30% falling from the top ranks each year.

  • Both skill and luck influence forecasting performance. While the superforecasters as a group perform well, any given individual should not be seen as infallible, as occasional poor performance could be due to unlucky years. Recognition of luck’s role is important.

Here is a summary of the key points in the passage:

  • Sandy Sillman was diagnosed with MS in 2008 and had to retire from his career as an atmospheric scientist by 2012. He decided to volunteer for a forecasting tournament as something stimulating to do.

  • Sandy achieved remarkable forecasting accuracy, tying for first in his first year. He has an extremely impressive educational background with multiple advanced degrees from top universities.

  • Early on, it seemed intelligence and extensive knowledge could explain the success of superforecasters like Sandy. However, further analysis found:

  • Regular forecasters scored higher than about 70% of the general population on intelligence/knowledge tests, while superforecasters scored higher than 80%.

  • But the big jumps were between regular forecasters and the general public, not between forecasters and superforecasters.

  • Most superforecasters fell below what would be considered “genius” level intelligence.

  • So intelligence/knowledge help to a point, but superforecasting does not require the highest levels. It depends more on how intelligence and knowledge are applied.

So in summary, the passage explores whether intelligence alone can explain superforecasting success, but finds intelligence/knowledge are helpful only to a degree - what really matters is how those attributes are used in the forecasting process.

  • The passage discusses the technique of “Fermi-izing” or Fermi estimation, popularized by physicist Enrico Fermi. This involves breaking down complex questions into simpler known components to arrive at reasonable estimates, even without direct information.

  • Fermi would pose puzzling questions like estimating the number of piano tuners in Chicago and expect his students to provide a thoughtful estimate rather than no answer.

  • The passage walks through applying Fermi estimation to that piano tuner question, breaking it down into smaller pieces like number of pianos, tuning frequency, time per piano, tuner work hours, and estimating each piece.

  • The estimate of around 63 piano tuners arrived at through this process aligned reasonably well with other data, showing the power of structured guesstimation over random guesses.

  • Fermi estimation trains people to separate knowable and unknowable factors, bring guessing into the open, avoid looking foolish, and often arrive at surprisingly accurate estimates through this process.

  • The technique is useful for forecasting, helping avoid cognitive biases and traps by breaking problems down methodically rather than relying on initial hunches from limited information.

So in summary, it explores the forecasting benefits of Fermi’s structured approach to estimation over superficial initial guesses.

Here are the key steps Bill took in his analysis of whether Yasser Arafat was poisoned with polonium:

  1. He used a Fermi-style approach to break down the question, considering what factors would lead to a “yes” or “no” answer, rather than jumping to conclusions.

  2. He established that scientists could potentially detect polonium on remains years after death, based on reviewing the Swiss testing report. This allowed him to move past the scientific feasibility.

  3. He generated alternative hypotheses for how Arafat’s remains could have been contaminated with polonium beyond just “Israel poisoned him” - such as Palestinian enemies or postmortem contamination to frame Israel. Considering alternatives increased the likelihood of contamination.

  4. He noted the question would be answered “yes” if just one of the two European testing teams found evidence, further nudging his analysis in that direction.

  5. Rather than diving straight into the complex politics, he framed his investigation of the “inside view” around systematically evaluating each contamination hypothesis he generated through his initial Fermi-style analysis.

So in summary, Bill took a structured, hypotheses-driven approach informed by an initial Fermi-style analysis, rather than jumping straight to conclusions or getting lost in irrelevant details. This set him up well to thoughtfully analyze the question.

Here are the key points regarding the potential involvement of Israel in the death of Yasser Arafat:

  1. Israel had access to polonium, a rare and dangerous radioactive substance. Israel is known to have a nuclear weapons program and the ability to produce radioactive materials.

  2. Israel had strong motivation to eliminate Arafat. As the longtime leader of the Palestinian nationalist movement, Arafat was a top enemy and target for Israel. Removing him could seriously weaken Palestinian opposition.

  3. Israel had the capability to poison Arafat secretly. As a technologically advanced nation with intelligence capabilities, Israel plausibly could have carried out a covert poisoning operation against Arafat.

However, there are also reasons to question Israeli involvement:

  • Directly poisoning a foreign leader could severely damage Israel’s international reputation and standing. It would be an extremely risky action.

  • Other groups also had reasons to target Arafat and could have acquired polonium. Assigning blame requires thoroughly investigating all potential perpetrators.

  • No public evidence has directly linked polonium found in Arafat’s body to an Israeli source. The investigation did not find conclusive proof of the exact cause of death.

So in summary, Israel was one possible suspect given its means and motive, but the evidence currently available is not definitive. A thorough, impartial investigation considering all angles would be needed to make a conclusive determination. Plausible alternative explanations also exist and cannot be ruled out.

  • Active open-mindedness is the concept of being open to evidence that contradicts one’s beliefs and considering disagreement rather than just agreement. Superforecasters exemplify this by treating beliefs as testable hypotheses rather than treasures to protect.

  • Doug displays active open-mindedness. He is not just passively open-minded but actively seeks out opposing views and evidence against his beliefs in order to evaluate and potentially change his perspective.

  • Active open-mindedness is a core feature of what makes superforecasters so successful, as it allows them to thoughtfully consider different perspectives rather than clinging to preconceived notions.

  • The passage describes a scene in the movie Zero Dark Thirty where Maya, a CIA analyst, insists bin Laden is hiding in an uncovered compound in Pakistan. She grows increasingly frustrated as weeks go by without action.

  • The real Leon Panetta understands uncertainty better than the fictional Panetta portrayed in the movie. Nothing is 100% certain.

  • A similar scene in Mark Bowden’s book describes Obama listening to a range of estimates from CIA analysts, from 30% confident to 95% confident bin Laden was there.

  • Obama declares it a “fifty-fifty” decision, which likely meant “I’m not sure” rather than a literal 50% probability. This decision-making acknowledges uncertainty better than insisting on near-certainty.

  • Historically, humans have generally only had three settings for dealing with uncertainty: “gonna happen,” “not gonna happen,” and “maybe.” Finer distinctions required more deliberative thinking that wasn’t always useful or possible for ancestral decision-making. Acknowledging uncertainty has pragmatic advantages over unwarranted confidence.

In summary, the passage examines how both fictional and real-life decision makers approached uncertainty in assessing the likelihood that bin Laden was hiding in an uncovered compound. It analyzes the trade-offs between acknowledging uncertainty vs. insisting on near-certainty in probabilistic judgments.

  • People naturally think in binary terms of yes/no or certain/not certain due to evolutionary pressures. Early humans had to quickly assess threats and identify worry-free environments.

  • This intuitive two-setting mental dial of certain/not doesn’t map well to probability. Real-world outcomes are often uncertain and probabilistic rather than binary.

  • Even educated people revert to binary thinking when dealing with concrete issues, interpreting high probabilities like 80% as certainties rather than acknowledging possibilities of other outcomes.

  • Scientists accept uncertainty as inherent to reality. Nothing can be absolutely certain according to modern science. They represent knowledge through probability rather than facts carved in granite.

  • Probabilistic thinking requires recognizing degrees of uncertainty on a continuum from very likely to very unlikely. Finely calibrated probability assessments are better than vague terms like “probably.”

  • Figures like Robert Rubin who rejected the idea of certainty and thought strictly in terms of probabilities through assigning numerical likelihoods found this approach counterintuitive to many people.

  • Robert Rubin, a former Treasury Secretary, was known for thinking in precise probabilistic terms rather than vague categories. However, he found most people reacted with surprise to this approach.

  • Probabilistic thinking, which acknowledges uncertainty, comes more naturally to scientists and mathematicians. It differs from more vague, categorical thinking most people rely on. Each type of thinking seems strange to those used to the other.

  • Superforecasters excel at probabilistic thinking. They grasp the difference between “epistemic” uncertainty that could be reduced with knowledge, versus “aleatory” uncertainty that cannot be eliminated. They cautiously estimate ranges rather than precise numbers for questions with much aleatory uncertainty.

  • Compared to regular forecasters, superforecasters are much more granular in their probability estimates, considering differences of just 1% rather than rounding to the nearest 10%. Research shows their greater granularity corresponds to improved forecasting accuracy.

  • While useful, even sophisticated groups like the U.S. intelligence community could benefit from encouraging analysts to think with the level of granularity superforecasters demonstrate on certain questions. This could yield a clearer view of future uncertainties and possibilities.

  • Superforecasters do not follow a paint-by-numbers method, but they often tackle questions in a similar systematic way: breaking the question into components, distinguishing knowns from unknowns, critically examining assumptions, adopting both an outside view (comparative perspective) and inside view (uniqueness), comparing their views to others, and synthesizing different perspectives into a precise probability judgment.

  • This process is demanding, taking significant time and mental effort. However, it is just the beginning, as forecasts need to be continually updated based on new information.

  • Forecasts are living judgments based on current information, not static lottery tickets filed away. As new data emerges, like new polls showing a candidate surging or a competitor declaring bankruptcy, superforecasters revise their forecasts accordingly.

  • In the IARPA tournament, forecasters like Bill Flack could update their forecasts as often as they liked in response to new information about their questions, such as polonium detection in Arafat’s remains. So superforecasters closely followed news to reevaluate their judgments over time.

  • Devyn Duffy is a superforecaster who frequently updates his forecasts as new information becomes available, often making over 16 forecasts per question. He credits his success to frequent updating.

  • Superforecasters in general update their forecasts much more often than regular forecasters. Updating forecasts to reflect the latest information makes them more accurate and informed.

  • An example is given of superforecasters quickly updating their forecasts after President Obama announced plans to take action against ISIS in Syria.

  • However, simply paying close attention to news and updating frequently is not the only factor in their success. Their initial forecasts were still more accurate even without updating.

  • Good updating requires the same skills as making the initial forecast and can be challenging, such as correctly interpreting subtle new information.

  • Examples are given of a superforecaster, Bill Flack, skillfully updating his forecast on Arafat’s death based on new evidence, and another time failing to update on a visit by the Japanese prime minister to a shrine.

  • Both underreacting and overreacting to new information can diminish forecast accuracy. It’s important to weigh new information carefully rather than adjust forecasts too radically.

  • The US Air Force attacked targets in Syria on September 22, 2014, resolving the question of whether foreign militaries would intervene in Syria before December 1, 2014.

  • Frankel’s mistake was not raising his forecast from 82% to 99% that the US would intervene in Syria after Obama announced plans to go after ISIL, as he later said he should have. He was too busy to properly update his forecast given the quick unfolding of events.

  • Bill Flack underestimated the likelihood of Japanese Prime Minister Shinzo Abe visiting the Yasukuni Shrine. The subtler explanation is that Flack unconsciously substituted the question “Would I visit as PM?” instead of objectively assessing if Abe would visit. He dismissed new information because it was irrelevant to his replacement question.

  • Committing to beliefs makes people resistant to changing their minds even when presented with new facts. Extreme commitment, as seen in those who defended imprisoning over 100,000 Japanese Americans with no evidence of sabotage, can lead to extreme reluctance to admit being wrong when the facts change.

  • Superforecasters may have an advantage because they are less committed to views as part of their careers or identities. This makes them better able to acknowledge mistakes and properly update forecasts when new evidence emerges. Extreme commitment to beliefs undermines accuracy by fostering underreaction to updates.

  • Irrelevant information can unconsciously sway judgments by diluting stereotypes and making people seem more multifaceted. This is known as the “dilution effect.”

  • However, constantly swaying in response to irrelevant information is an overreaction and bias. It’s better to maintain some commitment to one’s initial judgments rather than be swayed by every stray piece of data.

  • In forecasting, overreacting or underreacting to new information are both risks, akin to navigating between Scylla and Charybdis in Greek mythology.

  • Top superforecaster Tim Minto won the IARPA tournament by constantly updating his forecasts, but doing so through many small incremental changes rather than large swings. This allowed him to avoid both underreacting and overreacting.

  • Updating forecasts in small, gradual increments is effective because it allows one to carefully balance old information with new without being too swayed by either. This method, though unexciting, proved highly accurate according to IARPA data on superforecasting performance.

  • Mary Simpson missed predicting the 2007-2008 financial crisis despite her background in economics, which frustrated her and motivated her to improve her forecasting abilities.

  • She heard about the Good Judgment Project and became a top forecaster/superforecaster through her participation.

  • Psychologist Carol Dweck’s research on growth vs fixed mindsets is relevant. Those with a growth mindset see abilities as something that can be developed through effort, while those with a fixed mindset see abilities as immutable traits.

  • Dweck found that when given difficult puzzles, some children lost interest while others enjoyed the challenge. This was not due to different abilities, but different mindsets - those who saw ability as fixed gave up more easily when facing difficulties, while those with a growth mindset saw challenges as opportunities to improve.

  • Simpson exhibited a growth mindset by wanting to get better at forecasting after her failure, and embracing the challenges of the Good Judgment Project to develop her skills through effort and learning from mistakes. This allowed her to become a top forecaster.

  • The key to John Maynard Keynes’ success as an investor was his growth mindset and willingness to change his views when faced with new evidence. Even when he was wrong, he sought to learn from mistakes rather than stubbornly sticking to beliefs.

  • As an example, Keynes lost money in 1920 due to wrong currency forecasts. However, he analyzed what went wrong, adopted new ideas like value investing, and went on to make fortunes for himself and others despite economic turmoil like the Great Depression.

  • Try, fail, analyze, adjust and try again was Keynes’ consistent approach to learning and improving. Failure meant identifying mistakes and exploring alternatives, not giving up. This process of learning through experience underlies skill development from childhood on.

  • Effective practice for skill-building and forecasting requires not just experience but informed practice using guidance from lessons learned by others. Timely feedback is also important to draw the right insights from experiences. Keynes’ flexibility exemplifies an openness to different perspectives that benefits forecasting.

  • Police officers often lack clear feedback on whether their judgements about lying were right or wrong, as legal cases can take months or years to resolve. This leads to overconfidence that grows faster than accuracy.

  • Unlike police, meteorologists and bridge players get prompt feedback, allowing them to improve. Meteorologists know right away if their weather forecast was wrong, and bridge players see their scores at the end of each hand.

  • Forecasters often use vague language like “possibly” or “might” which makes it impossible to accurately evaluate forecasts later on. The Forer effect shows people inflate the accuracy of vague personality assessments.

  • Time lags between forecasts and outcomes allow memory biases like hindsight bias to distort later evaluations. Experts revised their past predictions of Soviet collapse upwards after the fact due to this bias.

  • Without precise feedback, forecasters cannot learn from mistakes. They are like basketball players shooting free throws in the dark, not knowing if shots went in or not. Precise scores, like a meteorologist’s Brier score, allow for meaningful evaluation.

So in summary, ambiguous language and long time lags prevent forecasters from getting the clear, prompt feedback needed to improve calibration and reduce overconfidence by learning from errors.

The passage discusses the attitudes and behaviors of superforecasters when re-examining their assumptions and past forecasts. Superforecasters are as keen to evaluate how they can improve as they are to review their past performance. They often have in-depth discussions with teammates to thoroughly analyze forecasts. Additionally, superforecasters reflect deeply on their own about what they could have done better. One forecaster mentioned reviewing old forecasts in the shower to try and understand his past reasoning.

Part of this review process involves being highly self-critical, even when forecasts turn out to be correct. An example is given of forecasters acknowledging that luck played a role in some of their accurate predictions. The passage contrasts this to most experts who are less open to the idea that success was not entirely due to their own skill. Overall, superforecasters show a growth mindset by continually seeking to learn from both successes and failures through rigorous self-examination. Their focus on improvement demonstrates grit, which together with an open mindset helps them progress over time.

The passage compares the decision-making that led to the failed Bay of Pigs invasion of Cuba in 1961 with that of the Cuban Missile Crisis in 1962. It describes how the same team of advisers, led by President Kennedy, badly misjudged the Bay of Pigs plan but then handled the more dangerous Missile Crisis with great skill.

Psychologist Irving Janis’s theory of “groupthink” is discussed. This describes how close-knit groups can develop “shared illusions” that undermine critical thinking. The Bay of Pigs team exhibited groupthink by failing to question flawed assumptions.

After this failure, Kennedy changed the decision-making process. Skepticism was encouraged by having advisers think as generalists rather than specialists. Bobby Kennedy and Theodore Sorensen took on the role of “intellectual watchdogs” who challenged ideas aggressively. Protocol was loosened and outside experts sometimes brought in, with Kennedy occasionally leaving discussions. These changes allowed the same team to avoid groupthink and manage the Missile Crisis competently through open debate of options.

  • Kennedy initially set up decision-making in a way that discouraged true debate and give-and-take when he was present. Allowing his advisers to discuss openly without him led to better consideration of alternatives.

  • The group discussions during the Cuban Missile Crisis were contentious but ultimately helped change Kennedy’s thinking and led to a negotiated solution rather than war.

  • How Kennedy’s team improved decision-making provides lessons for management and public policy, as groups can cause mistakes but also sharpen judgment through collaboration.

  • When forming teams of superforecasters, there were risks of groupthink, “cognitive loafing,” and overconfidence if they were told of their superforecaster status. However, teams on average performed 23% more accurately than individuals.

  • Special superforecaster teams were created with guidance on high-performance teamwork. While there were challenges like potential discord from distance, it also made maintaining independence of thought easier. The experiment sought to determine if “superteams” could achieve even greater accuracy.

  • Marty Rosenthal reflected on his first year on a forecasting team, where people were hesitant to directly disagree or criticize others for fear of causing offense. They would speak indirectly or “dance around” issues.

  • With experience, people realized this hindered critical examination and made efforts to assure others that pushback and criticism was welcome. This made discussions more open and direct.

  • Superforecasting teams had to develop structure and norms even without formal leadership or meeting in person. Marty took on an informal leadership role by example, explaining his views in detail and organizing a call for team coordination.

  • Face-to-face meetings helped foster social connections between team members. Marty hosted teammates at his home which strengthened commitment to the team.

  • Well-performing forecasters boosted their skills and confidence through teamwork. Elaine Rich took more responsibility to contribute actively rather than “freeloading.”

  • While workloads divided, team members invested more effort as their commitment grew. Elaine found it more stimulating than working alone.

  • Teams were highly effective at gathering information through diverse research styles and sharing findings. An individual could not match a team’s coverage.

  • On average, forecasters improved 50% more when placed on superforecasting teams compared to working alone. Teams significantly outperformed prediction markets.

  • Prediction markets beat ordinary groups by about 20% in forecasting accuracy.

  • Superteams, which were highly collaborative groups of top individual forecasters, beat prediction markets by 15-30%.

  • However, critics argue prediction markets may have performed better with more liquidity and real monetary stakes. More testing is needed.

  • Superteams did well by avoiding extremes of groupthink and flame wars, fostering respectful challenge of ideas and admitting ignorance. This promoted actively open-minded thinking.

  • Teams with a higher degree of active open-mindedness (AOM) correlated with greater accuracy. A team’s AOM is an emergent property depending on communication, not just members’ individual AOM.

  • Winning teams fostered a culture of open sharing, with “givers” who generously contributed without expecting immediate returns, improving group behavior and performance over time.

  • While the recipe for success is not simple to replicate, diversity, ability, and information sharing all contributed to superteams’ effectiveness. More research is still needed.

Here are a few leaders who could be considered humble in addition to Gandhi:

  • Nelson Mandela - He promoted national reconciliation in South Africa and was not driven by personal ambition or ego despite leading the anti-apartheid movement.

  • Mother Teresa - She dedicated her life to serving the poor and sick in India in a quiet, modest way focused more on compassion than recognition.

  • Abraham Lincoln - He was not born into wealth or status and carried himself with a folksy, down-to-earth demeanor despite bearing huge responsibilities as President during the Civil War.

The article raises an important point that leaders must balance the roles of forecaster and decision-maker, and allowing autonomy within structure and guidance can enable both roles. Leaders like Moltke recognized the inherent uncertainty in complex situations and emphasized empowering subordinates to adapt, while still providing clear objectives. This approach seems to have enabled the German military to act decisively yet flexibly on a decentralized basis, avoiding both rigid control and lack of coherence.

The passage discusses command and decision-making in the German Wehrmacht during World War 2. It notes that Wehrmacht commanders often laid out broad objectives but expected subordinates to use their own judgment in how to achieve them - a principle known as Auftragstaktik.

This decentralized style is contrasted with the strict obedience and micromanagement seen in Nazi propaganda. The successful attack on the Eben Emael fortress in 1940 is used as an example, where junior officers improvised when plans went awry and still achieved objectives.

The passage argues the Wehrmacht’s effectiveness in early victories was partly due to this emphasis on independent thinking by officers and soldiers at all levels. However, it acknowledges the Wehrmacht ultimately failed due to overreliance on Hitler’s poor strategic decisions and micromanagement in violation of delegation principles.

The US military of the era is then compared, noting its discouragement of independent thinking until later adopting mission command principles in the 1980s after studying Israeli and German examples. Key US generals like Eisenhower and Patton who de facto enabled initiative within commands are highlighted.

  • In the Iraqi city of Mosul, General David Petraeus drew on his military history knowledge to develop counterinsurgency strategies to secure the people and deny support for insurgents. He implemented these strategies without asking for permission, knowing he wouldn’t get it.

  • The strategies worked and stabilized Mosul as long as he was in command. In 2007, when insurgency was spreading, Petraeus was given overall command in Iraq and implemented the same strategies nationwide. Violence decreased significantly.

  • Petraeus emphasized developing flexible, independent thinking in commanders. He pushed people out of their intellectual comfort zones, like creating unpredictable live-fire training exercises. He also supported officers getting advanced degrees to encounter different perspectives.

  • Petraeus balanced deliberation and bold action. He saw the false dichotomy between “doers” and “thinkers” - leaders must be both. Flexible thinking is needed to determine the right move and execute it boldly.

  • The concept of mission command, or Auftragstaktik, can be found in innovative organizations like 3M and Amazon that empower employees while setting clear goals. Former military officers bring these concepts when advising companies.

  • While successful leaders often have confidence, true humility is needed for good judgment. Even talented people like poker champion Annie Duke are prone to overconfidence without self-doubt. Humility helps avoid cognitive biases when making complex decisions.

  • The author discusses his long-time conversations with cognitive psychologist Daniel Kahneman about the author’s work on forecasting tournaments.

  • Kahneman questions whether superforecasters are fundamentally different types of people, or if they just employ different analytical approaches/methods. The author’s view is that it’s a bit of both - superforecasters tend to be more intelligent and open-minded, but what truly distinguishes them is the systematic work they put into research, self-criticism, considering multiple perspectives, and updating judgments over time.

  • However, maintaining such a intensive, conscious analytical approach is exhausting. Even the best forecasters are susceptible to slipping back into more intuitive, biased styles of thinking aligned with Kahneman’s concept of “WYSIATI” - only seeing what is immediately visible.

  • As an example, the author discusses former head of DIA Michael Flynn making an overbroad claim about unprecedented global conflict despite data showing declining war trends. Flynn fell victim to only seeing the immediate bad news on his desk.

  • The author’s point is that no one is bulletproof from cognitive biases and illusions, likening it to the impossible-to-ignore Muller-Lyer optical illusion. Superforecasters have to constantly monitor their thinking for inadvertent slips back into less rigorous analysis.

  • Daniel Kahneman studied a cognitive bias called “scope insensitivity” where people’s willingness to pay or assess risk does not meaningfully change based on the scope or scale presented. For example, people were equally willing to pay around $10 to clean a single lake or 250,000 lakes.

  • Kahneman collaborated with Barbara Mellers to study if superforecasters showed less scope insensitivity. They tested responses to the likelihood of Assad’s regime falling in Syria over 3 vs 6 months. Regular forecasters showed similar responses, while superforecasters incorporated the different timescales more.

  • The author believes superforecasters have practiced self-corrections so much that these have become habitual, part of their intuitive thinking. However, superforecasting remains difficult work and success is fragile.

  • Nassim Taleb criticizes forecasting efforts as misguided, arguing history is determined by “black swans” that are unpredictable and impactful surprises. The author acknowledges black swans as important but questions if some events dubbed black swans were more foreseeable or “gray.” Overall forecasting has value but also clear limits according to unexpected events.

  • The passage discusses critiques of predictability and superforecasting from psychologists Daniel Kahneman and Nassim Taleb, who argue events like black swans are truly unpredictable.

  • However, the author argues some events labeled black swans, like WWI or 9/11, were actually preceded by awareness of risks and possible scenarios. While highly improbable, forecasting tournaments cannot rule out the ability to anticipate consequences that define black swan status.

  • Slow, incremental changes like rising life expectancies are also important to history besides rare shocks. Accurate incremental forecasting, like predicting probabilities, can be meaningful even if big surprises cannot be predicted.

  • Consequences take time to develop, so the full impacts of events like 9/11 that are seen as black swans now were not inevitable. Different decisions could have led to very different consequences.

  • While long-term predictions beyond 5-10 years are limited, planning and considering alternative scenarios can help systems be resilient or antifragile to inevitable surprises beyond forecasters’ horizons. But preparing for all possibilities is costly, so priorities and some forecasting are still needed.

  • Engineering standards that are designed to withstand major earthquakes make sense for regions that are prone to significant seismic activity. However, the same high standards may not be as practical or cost-effective in regions with less risk of major quakes, especially in poorer countries.

  • Long-term military and geopolitical planning requires making explicit judgments about probability and risk. For example, the US policy of preparing for two simultaneous wars was based on an assessment that this risk level justified the costs, but the same calculation did not apply to preparing for three or more wars simultaneously.

  • Probability judgments in planning are often not made explicitly and risks are swept under the rug. It is important to openly discuss probability assessments so they can be scrutinized and improved. When there is significant uncertainty, that uncertainty should be acknowledged rather than pretending greater certainty exists than really does.

  • Forecasts about distant futures, like 20-year geopolitical predictions, are prone to errors because humans have a psychological bias towards seeing the past as more predictable and the future as more predictable than is actually the case. We also tend to ignore low-probability “black swan” events that have outsized impacts.

  • Geopolitical and historical possibilities had a “fat-tailed” probability distribution with more potential for extreme or unexpected outcomes than standard models assume. Key historical events like world wars could have played out very differently depending on small initial changes or probabilities.

  • The Scottish independence referendum in September 2014 saw Scotland vote to remain part of the UK rather than becoming independent, defeating the “Yes” campaign by a surprising margin of 55.3% to 44.7%.

  • Political scientist Daniel Drezner said the result was a “teachable moment” that showed pundits need to make unambiguous predictions that can be scored, rather than vague analyses, so they can accurately assess what they got right or wrong.

  • Tournaments like the IARPA forecasting competition help improve forecasting by providing clear feedback through scoring predictions, which allows forecasters to refine their models based on outcomes. This could lead to an “evidence-based forecasting” approach like evidence-based medicine.

  • However, forecasting is also impacted by political and partisan interests, as illustrated by different reactions to Nate Silver’s predictions depending on whether they supported a partisan view. Some argue accuracy is secondary to advancing political interests.

  • The book argues while self-interest is a factor, people also value other things like accuracy, and change is possible if the “attentive public” demands better forecasting. An example of early efforts to introduce accountability in medicine through outcome tracking is provided.

  • Ernest Codman pioneered the idea of tracking clinical outcomes and doctor performance at hospitals to improve quality of care. He proposed judging doctors solely based on patient outcomes rather than reputation or bedside manner.

  • Hospitals and doctors strongly opposed this at first. Keeping score could damage reputations. Codman was eventually forced out of his position for pushing the idea too aggressively.

  • Over time, as the benefits became clearer, Codman’s core insight gained acceptance. Hospitals now routinely track outcomes and quality measures. Evidence-based medicine became the standard approach.

  • This success inspired the expansion of evidence-based thinking to other fields like government policymaking, charitable foundations, and sports. Rigorous data analysis and evaluation are used to improve performance and determine what programs are effective.

  • Technological advances now make data collection and testing much easier across many domains of human activity. There is a broad shift away from intuition-based decision making toward evidence-based, data-driven approaches.

  • However, some argue this trend risks overstating the role of metrics and numbers. Not all important factors can be quantitatively measured, and an overreliance on metrics could undermine important qualitative judgments. Balance is needed in applying evidence-based thinking.

  • Numerical scoring systems for forecast accuracy, like Brier scoring, are useful tools but still have room for improvement. They may not properly weight different types of errors.

  • The “big questions” that truly matter, like “How will a geopolitical situation unfold?”, cannot be easily scored numerically. But they can be broken down into smaller, more specific questions that contribute pieces to understanding the overall situation.

  • By asking many narrowly-focused questions about relevant events and having forecasters make predictions, it is possible to gain cumulative insight into answering the larger, unscorable questions over time. This is likened to the pointillism technique in art.

  • While accurate forecasting is important, good judgment requires more dimensions like moral reasoning and asking insightful questions. Identifying the right questions to consider may be just as important as making predictions.

  • The ideal qualities for a “superforecaster” who is highly accurate in predictions may differ from those of a “superquestioner” who excels at generating thought-provoking questions. Both roles are valuable but should acknowledge each other’s strengths.

  • The author argues that debates around major policy issues, like Keynesian vs Austerian responses to the 2008 financial crisis, have often failed to have a productive exchange of ideas or lessons learned. Participants tend to stubbornly defend their original positions rather than evaluating forecasts against real world outcomes.

  • He proposes a model of “adversarial collaboration” where opposing sides work together, with moderators, to precisely define their disagreements in testable forecasts. Clear questions could then evaluate the forecasts against results, providing an opportunity for views to evolve based on evidence.

  • While celebrity debaters may not engage in good faith tests, there are likely reasonable voices within ideological camps that could participate productively. Making score-keeping serious could help bring lessons from empirical results rather than just rhetorical defense of preconceptions.

  • The goal is not for either side to “win” but for collective understanding to improve through an open and evidence-based process of reconciling perspectives with reality. Even mixed outcomes from multiple forecasts could indicate complex realities rather than simple dichotomies.

  • There is no guarantee that a great batter will continue to have success at the plate indefinitely. Success depends not just on the quality of pitching but also luck and going through inevitable slumps.

  • Stats tell us that after periods of exceptional performance, regression to the mean is likely. The better batter acknowledges that luck played a role in his past success and that luck may turn against him, hurting his scoring.

  • Remaining open-minded and willing to modify predictions based on feedback is important for improving forecasting skills over time. Taking risks with forecasts but also being cautious allows one to learn from mistakes.

  • The described approach of trying, failing, analyzing mistakes, adjusting forecasts and trying again is seen as the ideal process for a “superforecaster” to constantly refine their abilities. Being in a perpetual state of learning from experiences, both successes and failures, is what allows one to become truly excellent at forecasting complex outcomes.

  • The passage advocates adopting a nuanced, graded view of uncertainty rather than seeing things in binary terms of certain vs impossible. Distinguishing different degrees of uncertainty can improve forecasting accuracy.

  • It encourages applying rigorous probabilistic thinking to national security decisions, as is routinely done for sports predictions. Specific numeric probabilities force clearer and more evidence-based assessments.

  • Superforecasters aim to balance caution and decisiveness by qualifying forecasts enough to avoid carelessness but also taking definitive positions rather than endless hedging.

  • Self-critique and post-mortems of failures and successes are important to identifying weaknesses and avoiding overconfidence from luck. But hindisght bias should also be guarded against.

  • Collaboration, allowing different perspectives, and constructive challenge can strengthen forecasting, though leadership requires finesse in group dynamics.

  • Developing forecasting skills requires deliberate practice with feedback, not just casually engaging with information.

  • Guidelines rather than rigid rules are appropriate given no two situations are identical. Constant mindfulness is needed in applying principles flexibly.

  • The passage acknowledges the sometimes astonishing patience and perseverance of the author’s wife Sandra and mother June, who were necessary for his participation in writing the book.

  • It also notes “the Queen is the Queen, dammit. Long may she reign.” This appears to be acknowledging Queen Elizabeth II as the monarch of the United Kingdom.

  • The main points are thanking the author’s wife and mother for their support in allowing him to write the book, and also acknowledging the British monarch. The language used is informal and colloquial.

Here is a 308-word summary of the key points:

Leadership and political outcomes could have turned out very differently under alternative scenarios. However, evaluating a president’s full term based on different hypothetical situations would require expertise far beyond what most people, including expert journalists, possess.

Instead of directly judging a president’s performance, voters tend to use proxies like the state of the economy in the months before the election. If voters feel the country and local economies have been trending positively recently, they are more likely to approve of the incumbent party and president, even if conditions varied throughout their full term. Voters essentially replace the question “Did the president do a good job over the past four years?” with “Do I feel the country has been roughly on the right track over the last six months?“.

While voters may not explicitly acknowledge this, research shows many intuitively substitute these proxy questions instead of critically examining a leader’s full multi-year record. Their evaluation is swayed more by their own recent pocketbook experiences than by attempting to assess the entirety of complex national issues and decisions over a president’s term. As a result, election outcomes can depend more on short-term fluctuations than on comprehensively evaluating long-term leadership and policy impacts.

  • The passage discusses whether superforecasters may possess some unique cognitive abilities or traits that make them better forecasters compared to others.

  • It notes that superforecasters seem to treat forecasting as a skill that can be cultivated through practice, whereas intelligence analysts do not see prediction as their main goal.

  • Some superforecasters appear comfortable admitting they could be wrong in their judgments and taking an outside view rather than an overconfident inside view.

  • However, precisely estimating the ratio of skill versus luck for superforecasters is difficult. While active superforecasters often outperformed, their skill/luck ratio is uncertain and likely depends on factors like the question topics.

  • Overall, the passage questions whether superforecasters are truly “supersmart” or just treat forecasting as a learnable practice compared to other analysts. More research would be needed to understand the cognitive factors behind their success.

  • Government wants to maintain relations with moderate Islamists who could act as intermediaries between the government and terrorist groups like Boko Haram. Boko Haram may also be interested in at least appearing open to negotiations. However, the analyst balanced this against Boko Haram’s ferocity.

  • The analyst initially estimated the chances of successful negotiations at 30% based on outside factors. They then averaged this with an inside view accounting for differing perspectives, yielding 25% chances. This estimate was scheduled to decline as the deadline approached.

  • The analysis resulted in a top 10% score on forecasting the outcome, despite rumors of pending talks that misled some other forecasters.

  • Superforecaster Regina Joseph applied a similar approach to forecasting another bird flu outbreak in China, starting with the base rate from outside data but adjusting based on inside factors like public health policies. This led to a better than average but not spectacular score.

  • Former military officer Welton Chang used outside data on how long urban battles typically last to set a base probability of 10-20% for a rebel group taking Aleppo. He then adjusted down based on the rebel group’s actual military capabilities, achieving a top 5% forecast score.

  • Even crude guesses can result in fairly good forecasts when overtly considering both outside base rates and inside adjusting factors, rather than letting intuition drive judgments covertly.

Here is a summary of the opinion article “What You Don’t Know Makes You Nervous” by Barry Schwartz:

  • The article discusses how uncertainty and lack of information can breed anxiety and poor decision making. When people are uncertain, they tend to exaggerate risks and threats.

  • Research on risk perception shows that what people perceive as risky is influenced more by dread of the worst possible outcome than objective analysis of probabilities. Uncertain situations trigger fear since people’s imagination runs wild contemplating all the possible bad outcomes.

  • The article cites examples like how people highly overestimate terrorism risks after 9/11 due to lack of information and control. German citizens also misinterpret weather probability forecasts due to uncertainty.

  • To combat this, the article argues people need more accurate information and understanding of uncertainties. When provided context and frequencies, people make less extreme and alarmist judgments. Education can counteract the “dilution effect” of uncertainty on risk perception.

  • However, complete certainty and information is impossible. The world is inherently uncertain. So managing perceptions and providing rational perspectives on uncertainty is important for decision making under risk. Too much uncertainty breeds unnecessary fear and dysfunction.

Here are summaries of the two papers:

“Do Stock Prices Move Too Much to Be Justified by Subsequent Changes in Dividends?,” National Bureau of Economic Research Working Paper no. 456, 1980:

  • Examines whether short-term stock price movements can be justified by subsequent changes in dividends.

  • Finds that stock prices change much more than can be explained based on available information about dividends.

  • Suggests stock prices fluctuate more than actual firm performance would warrant, implying other non-rational factors may be influencing prices.

“Do Investors Trade Too Much?,” American Economic Review 89, no. 5 (1999): 1279–98:

  • Examines whether individual investors would be better off holding diversified portfolios rather than actively trading.

  • Finds individual investors trade more frequently than can be justified by expected returns and transaction costs, implying overconfidence in their ability to pick stocks.

  • Concludes the level of trading by individual investors means their returns are lower than simply buying and holding a balanced portfolio. Frequent trading does not appear to be a winning strategy.

  • Even purportedly “superforecasters” are subject to cognitive biases and limits in predicting future events, especially rare or “black swan” events that are literally unpredictable. overconfidence in predictive abilities is common.

  • Geopolitical and economic trends are highly complex and nonlinear, influenced by countless unknown unknowns. Accurately predicting specific developments decades into the future is virtually impossible.

  • Historical data and base rates should inform forecasts, but each situation has unique aspects that complicate direct analogy to the past. Pinpoint predictions of wars, economic growth rates, and technological disruptions often prove wrong in retrospect.

  • While expertise and analytical skills may enhance judgment at the margins, humility is needed given stochastic uncertainties. Outside reviewers often spot oversights insiders miss. Prudent policy balancing multiple scenarios is difficult but preferable to fixating on a single predicted future.

In summary, no forecaster or forecasting group is truly “super” in the sense of consistently achieving highly accurate predictions, especially of low-probability events. Rigorous outside review of methods and probabilities is important to prevent overconfidence outstripping predictive abilities.

Here is a summary of the key points regarding ide-view information about the problem at hand from the passage:

  • Nassim Taleb posed a tough question about how to assess the accuracy of forecasters on rare events that only occur once every few decades. It is difficult to evaluate forecasts of such rare events.

  • Two approaches are suggested to attempt to address this problem:

    • Assess forecaster consistency in estimating likelihoods of rare events over different time periods, like injury risks in 1 vs 10 years. Inconsistencies fail a basic logic test.
    • Develop early warning indicators for rare mega-events and assess accuracy on those indicators to identify better guides for historical trajectories.
  • Neither approach fully solves the problem Taleb posed, but they are better than nothing and better than an “extreme precautionary principle” that rejects any change based on plausible worst-case scenarios.

  • The passage discusses the difficulty of evaluating forecasts of rare events that occur infrequently, and puts forth two potential approaches to provide some way to assess forecaster accuracy and consistency on such predictions, even if an ideal solution is not possible.

Here is a summary of the reet Journal from February 14, 2015:

  • The article discusses how President Obama seems to speak with more precision and confidence about probabilities in sports compared to national security matters like the raid to kill Osama Bin Laden.

  • It provides three potential explanations for this: 1) Sports outcomes are more repetitive so probabilities are based on more data. National security judgments may be “plucked from thin air”. 2) Is Obama equally precise about repetitive national security events like drone strikes? 3) Would policymakers take probability judgments seriously in less repeatable domains if they knew they could be reliable?

  • It notes testing the reliability of predictions for unique events is impossible if policymakers don’t collect or evaluate such predictions.

  • It quotes a passage from a 1986 book by Tommy Lasorda and David Fisher about the role of creativity, taking risks, and balancing possible “hits” and “false positives” when imagining unprecedented scenarios.

  • It provides brief biographies of the article’s authors, Philip Tetlock and Dan Gardner, noting their relevant publications and areas of research/writing.

Author Photo

About Matheus Puppe