Self Help

How to Lie with Statistics - Huff, Darrell

Author Photo

Matheus Puppe

· 13 min read

“If you liked the book, you can purchase it using the links in the description below. By buying through these links, you contribute to the blog without paying any extra, as we receive a small commission. This helps us bring more quality content to you!”



Here is a summary of the key points from the excerpt:

  • The article examines a statistic stating the average Yale graduate from the class of 1924 earns $25,111 per year.

  • This impressive, precise number seems dubious. Incomes are rarely known down to the dollar, and $25,000 payments usually involve investments, not just salary.

  • The number likely comes from a questionnaire asking Yale alums to self-report income. People may exaggerate or downplay their income on surveys.

  • The statistic is based on a sample - not all alumni responded. The piece is biased because “lost” lower-earning alumni are underrepresented.

  • Higher earning alums are easier to find through resources like Who’s Who. The lower earners like clerks and unemployed are more likely to be “lost.”

  • The sample overrepresents the higher-earning Yale grads, skewing the average income figure upward. The actual average income is likely much lower.

  • The article cautions that statistics from biased or too-small samples underlie much of what we read, though they can appear precise and scientific.

  • The article examines the limitations and potential biases of statistical sampling. It uses the example of a Yale class’s reported average income to illustrate how samples can be biased by who chooses to respond. Those with lower incomes are less likely to respond.

  • The article warns against taking statistics like average tooth brushing frequency at face value, as people may not respond honestly to surveys. It gives the example of a study on magazine readership where people claimed to read more highbrow magazines than popular ones, which was contradicted by publishers’ circulation data.

  • Similarly, data on improvements in cancer survival rates can be biased by how the sample is selected. The article advises applying skepticism when considering statistics based on sampling.

  • Truly representative samples require random sampling with no biases. But perfectly random sampling is difficult and costly. So techniques like stratified sampling are used instead, which can still introduce biases.

  • Opinion polls and market research surveys rarely achieve perfect random samples. Factors like how respondents are selected and interviewed can skew the sample. Pollsters try to minimize biases but some inevitably remain.

  • Critical evaluation of the sampling method and potential biases is essential when interpreting statistics from polls and surveys. Broad conclusions should be drawn by questioning the representativeness of the sample.

  • Different averages (mean, median, mode) can give very different pictures of the same data. The “average” income I gave in the example was highly misleading because I used different types of averages.

  • Mean, median, and mode give similar results for data like heights that follow a standard bell curve distribution. But for skewed data like incomes, they can differ significantly.

  • With skewed income data, the mean is pulled up by a few very high incomes. So the standard can be much higher than the median, representing the middle.

  • When statistics like “average income” are given without qualification, it is often the misleading mean. The median or mode would provide a more typical picture.

  • Business executives often cite misleading high “average” wages or pay, inflated by their salaries. The median or mode would show more typical lower income.

  • “Average” has a loose meaning and can be misleading. To interpret averages correctly, you must know which type is used and whether the data is skewed. Qualifiers like “mean” or “median” should always be included.

  • Be skeptical of claims based on small or inadequate sample sizes. A test group of just 12 people, like the toothpaste example, is statistically insignificant and the results are likely due to chance.

  • Small samples produce variable results. Repeating an experiment with a small piece will yield different results each time by chance. Large models are needed to produce reliable results that represent the actual probability.

  • The number of trials needed depends on the size and variability of the population studied. Medical studies often use sample sizes too small to produce meaningful results.

  • Results can be misleading even with large samples if the actual number of relevant cases is small, as in the polio vaccine example where the disease incidence was low, so the large sample size was meaningless.

  • Public pressure for quick solutions can lead to the adoption of unproven treatments before adequate testing. Skepticism is needed when sample sizes are small or unclear. The full statistical context must be examined, not just the claim.

  • Statistics can be misleading if important details are omitted, like measures of significance and range around an average. Leaving these out can make findings seem more definitive than they are.

  • Averages can be over-applied, as with housing built for the “average” family size of 3.6 people. This neglects the diversity of actual family sizes.

  • Normal ranges or averages for child development can worry parents unnecessarily if their child falls slightly outside the scope. The total distribution should be shown.

  • Reporting an average or prevalence can make something seem excellent or desirable, when the researcher meant it as a neutral observation (e.g. Kinsey).

  • Journalists often report figures uncritically without digging into their meaning. Examples are given of vague claims that collapse under scrutiny.

  • Child height prediction charts are useless for predicting a child’s height. Better to look at parents/grandparents.

  • Statistics should be reported carefully. Essential details like significance and distributions are needed to interpret them correctly. Their absence should have been better.

  • Intelligence tests like IQ are imperfect measures of intelligence. They neglect essential capacities like leadership and creativity.

  • IQ scores have a margin of error, usually around 3 points. So, two scores a few moments apart may be similar.

  • You should think of IQ scores in ranges, not absolute numbers. Comparing a child in the “normal” range of 90-110 to one outside that range may be meaningful. But slight differences within that range are not.

  • Ignoring margins of error in statistics can lead to faulty conclusions. Magazine editors may need to make better decisions based on minor readership survey differences within the margin of error.

  • Advertisers like Old Gold cigarettes can create misleading ads based on tiny meaningless differences in product tests. Overall the brands tested were virtually identical in their content. But Old Gold exploited being at the bottom of the list ever so slightly.

  • The critical point is that minor statistical differences are often meaningless, even though numbers may appear precise. Consider the margin of error and whether a difference makes a difference before concluding.

  • Pty seems overconfident in his mastery of words but numbers often pose more of a challenge. This can make it hard for writers and advertisers to convey numerical information effectively.

  • Charts and graphs can help visualize statistics but can also mislead if designed poorly or manipulatively. Examples are given of truncated axes and altered proportions.

  • Pictorial charts like bar charts can also deceive. Doubling the height of an image quadruples its area, exaggerating the visual difference. This technique has been used misleadingly with moneybags representing wages.

  • Overall, numbers and statistical charts should be presented honestly and in proper proportion to convey accurate information. But they are often manipulated to exaggerate trends and mislead readers. Critical examination is required.

  • The “semiattached figure” refers to statistics or other numbers that are presented in a misleading way to prove a point. The numbers may be accurate, but are not truly relevant to the argument.

  • Examples given include:

  1. Advertising a cold remedy by citing a laboratory test showing it kills germs, without evidence it cures colds.

  2. Citing poll results on black job opportunities that reveal more about prejudice than job conditions.

  3. Advertising that more doctors smoke a particular cigarette brand proves nothing about the brand’s quality.

  4. Advertising a juicer extracts 26% more juice than a hand reamer, an irrelevant comparison.

  5. Citing highway accident statistics without considering how many drivers there are at different times of the day.

  6. Comparing transportation fatality numbers over time without considering rates per passenger mile.

  • The point is you can make misleading arguments by attaching semi-relevant numbers. Proper interpretation requires understanding what the numbers represent.

You make excellent points about being cautious of statistics that seem attached but aren’t. A few key takeaways:

  • Rates and percentages can be more meaningful than raw numbers for comparison. For example, looking at transportation fatalities per passenger mile traveled rather than total fatalities.

  • Be wary of surveys or polls that lump together minor complaints as evidence of widespread opposition. The phrasing and grouping can distort the real meaning.

  • Returns on investment versus returns on sales are very different figures. It’s essential to understand the difference.

  • Inconsistent reporting and definitions can skew medical and health statistics over time or between locations.

  • Comparing groups like the general population to the Navy is problematic because they differ systematically in age, health, etc.

  • Changes in diagnosis and reporting can make trends seem worse than they are, like the polio cases. Looking at deaths instead of issues may offer more reliable data.

  • Political campaigns misuse stats, like presenting cherry-picked before-and-after numbers, often ignoring external factors.

The key is to dig deeper behind the surface numbers and understand how they were produced and what they mean in context. Statistics can inform, but also mislead if not approached thoughtfully.

The passage discusses the post hoc ergo propter hoc fallacy, which involves making unsupported assumptions of causation based solely on the observation that one thing follows another.

The key points are:

  • Correlation does not imply causation. Just because two things are correlated does not mean one caused the other.

  • Spurious correlations can occur due to chance or a third factor influencing both variables.

  • Correlations may be accurate but the cause-effect relationship remains unclear.

  • Correlations should not be assumed to continue beyond the data.

  • The passage provides several examples of the post hoc fallacy, including claims about education increasing income.

  • Statistics may show a real correlation but still not prove causation. Assumptions of causation require scrutiny.

  • The passage overall warns against making unsupported causal conclusions based on correlations alone. Care is needed to avoid the post hoc fallacy.

I cannot comprehensively summarize the article without the full context, but I will attempt to highlight some key points:

  • The article discusses how statistical data can be misrepresented or manipulated to support misleading conclusions. Examples are given, such as spurious correlations between milk drinking and cancer rates.

  • The article cautions against assuming causation from correlation and warns about distortions from things like maps that visually exaggerate findings.

  • There is a discussion of how statistical manipulation is often not accidental, with biases leading to one-sided errors. Special interests may selectively promote statistics that favor their position.

  • An example is given of a map exaggerating government spending by shading low-population Western states, which visually overstates the impact. Better methods are suggested.

  • Another example involves a study exaggerating average family income using a faulty calculation method.

  • Overall, the article advises scrutiny of statistical claims, as misinformation or “statisticulation” is familiar from incompetence or intent to deceive—the integrity of the statistical source and methodology matters.

  • Statistics can be misleading if not presented carefully. Averages can hide details and precision can appear more accurate than it is.

  • Percentages often need to be more used or understood. The base they are calculated on must be clear. Adding percentages together is usually invalid as they do not have a joint command.

  • Comparisons using different bases, like pay cuts and raises, can be deceptive. The whole context needs to be considered.

  • Decimal places and percentiles can create a false sense of accuracy. The original data may not justify such precision.

  • Percentage points and percentages often need to be clarified, confusing the scale of change. Percentiles measure rank but can be interpreted as percentages.

  • Examples of errors and misuse of statistics by media, businesses, and others are given. Common mistakes include invalid adding of percentages, mixing bases, false precision and confusion over statistical concepts.

  • The piece advises care in handling statistics to avoid deception or exaggeration, whether intentional or not. Context and clarity are essential.

Here are a few key points to summarize from the passage:

  • Statistics can be misleading due to bias, both conscious and unconscious. Look for who is providing the statistics and consider their motives.

  • Question the methodology behind the numbers. How was the data collected and analyzed? Is the sample representative?

  • Be wary of impressive, vague assertions like “studies show” or “experts say.” Ask for specifics on who conducted the study and how.

  • Check if conclusions follow logically from the data presented. Don’t assume causation from correlation.

  • Watch for shifting definitions, ambiguous wording, convenient comparisons, and other techniques that can distort the story behind the numbers.

  • Proper use of statistics requires objectivity and transparency. But statistics can also be misused to push particular agendas or perspectives. Maintain a healthy skepticism.

  • With a critical eye and a few key questions, you can better evaluate the validity and honesty of statistical claims. This allows you to make more informed decisions rather than being swayed by numbers alone.

Does this help summarize the passage’s key points on critically interpreting statistics? Let me know if you need me to expand on any summary part.

  • When evaluating statistics, watch out for samples too small to yield reliable conclusions. Correlations also need enough cases to be significant.

  • Key information is sometimes left out, like the number of cases or a measure of reliability for a correlation. This should raise suspicion.

  • Averages can be misleading if the type is not specified and mean vs. median may differ significantly.

  • Figures often need a comparison point to retain meaning.

  • Raw numbers are sometimes omitted when percentages are given, which can distort the picture.

  • Changes over time may be due to other factors, not the one implied. The influencing factors may be left out.

  • Indexes and percentages can mislead if the base figures are not provided or are selectively chosen.

  • Increases in reported cases of something are not always actual increases in occurrences.

  • Subjects are sometimes changed between raw figures and conclusions drawn from them.

  • What people say is not always the same as actual behavior, especially for sensitive topics.

  • Definitions and criteria may change between one measurement and another.

  • People may misreport information like ages.

  • The motivation for providing information, like a census, can influence results.

So, watch out for these issues when evaluating the meaning of statistics. Essential information is often left out or subject to misinterpretation.

  • The post first points out how statistics can be manipulated by changing the subject matter being compared. It gives examples of how companies and publications do this.

  • It then discusses how interest rates can be misleading if the loan terms are unclear. 6% interest can cost vastly different amounts depending on how it is calculated.

  • The post criticizes using semantics to change the subject and make something sound better, giving an example of accountants advocating to replace the term “surplus” with “retained earnings.”

  • It emphasizes using common sense and critical thinking when evaluating statistics rather than taking impressive figures at face value. Examples are given of statistics that contradict logic and real-world observations.

  • The post warns about extrapolating trends too far into the future, as trends rarely continue indefinitely without change. It gives absurd examples of extrapolating TV ownership and population growth.

  • The critical point is that statistics can be misleading if not considered carefully and skeptically. The post advocates applying logic and critical thinking rather than unthinkingly accepting numbers.

  • The passage humorously calculates that based on the rate the Mississippi River is lengthening (1 1/3 miles per year), it must have been over 1 million miles long just 1 million years ago.

  • It then extrapolates that 742 years from now, the river will only be 1 3/4 miles long, with New Orleans and Cairo joined under one mayor.

  • The author remarks that science allows people to make sweeping conjectures and interpretations based on limited facts.

  • The anecdote about the author’s stock response to critical letters demonstrates providing a vague, polite reply that satisfies without encouraging further correspondence.

  • Similarly, the anecdote about the minister’s generic praise of babies highlights making positive yet noncommittal comments.

The passage uses humor and hyperbole to satirize making grand extrapolations from limited data and illustrates the effectiveness of vague, polite responses in certain situations.

“If you liked the book, you can purchase it using the links in the description below. By buying through these links, you contribute to the blog without paying any extra, as we receive a small commission. This helps us bring more quality content to you!”



Author Photo

About Matheus Puppe