The Art of Statistics – David Spiegelhalter (by Blinkist)

What’s in it for me? Improve your data literacy and learn to see the agenda behind the numbers.

You might think that with the growing availability of data and user-friendly statistical software that does the mathematical heavy-lifting for you, there’s less need to be trained in statistical methods.

But the ease with which data can now be accessed and analyzed has led to a rise in the use of statistical figures and graphics as a means of furnishing supposedly objective evidence for claims. Today, it’s not just scientists who make use of statistics as evidence, but also political campaigns, advertisements, and the media. As statistics are separated from their scientific basis, their role is changing to persuade rather than to inform.

And the people generating such statistical claims are not necessarily trained in statistical methods. An increasingly diverse number of sources produce and distribute statistics with very little oversight to ensure their reliability. Even when data is produced by scientists undertaking research, errors and distortions of statistical claims can occur at any point in the cycle – from flaws in the research to misrepresentations by the media and the public.

So, in today’s world, data literacy has become invaluable in order to accurately evaluate the credibility of the myriad news stories, social media posts, and arguments that use statistics as evidence. These blinks will give you all the tools you need to better assess the statistics you encounter on a daily basis.

In these blinks, you’ll learn

how statistics can be used to catch serial killers;
whether drinking alcohol is good for your health or not; and
which remarkable creature can respond to human emotions even after it has died.

Statistics can help us answer questions about the world.

Have you ever wondered what statisticians actually do?

To many, statistics is an esoteric branch of mathematics, only slightly more interesting than the others because it makes use of pictures.

But today, the mathematical side of statistics is considered only one component of the discipline. Statistics deals with the entire lifecycle of data, which has five stages which can be summarized by the acronym PPDAC: Problem, Plan, Data, Analysis, and Conclusion. The job of a statistician is to identify a problem, design a plan to solve it, gather the relevant data, analyze it, and interpret an appropriate conclusion.

Let’s illustrate how this process works by considering a real-life case that the author was once involved in: the case of the serial killer Harold Shipman.

With 215 definite victims and 45 probable ones, Harold Shipman was the United Kingdom’s most prolific serial killer. Before his arrest in 1998, he used his position of authority as a doctor to murder many of his elderly patients. His modus operandi was to inject his patients with a lethal dose of morphine and then alter their medical records to make their deaths look natural.

The author was on the task force set up by a public inquiry to determine whether Shipman’s murders could have been detected earlier. This constitutes the first stage of the investigative cycle – the problem.

The next stage – the plan – was to collect information regarding the deaths of Shipman’s patients and compare this with information regarding other patient deaths in the area to see if there were any suspicious incongruities in the data.

The third stage of the cycle – data – involves the actual process of collecting data. In this case, that meant examining hundreds of physical death certificates from 1977 onwards.

In the fourth stage, the data was analyzed, entered into software, and compared using graphs. The analysis brought to light two things: First, Shipman’s practice recorded a much higher number of deaths than average for his area. Second, whereas patient deaths for other general practices were dispersed throughout the day, Shipman’s victims tended to die between 01:00 p.m. and 05:00 p.m. – precisely when Shipman undertook his home visits.

The final stage is the conclusion. The author’s report concluded that if someone had been monitoring the data, Shipman’s activities could have been discovered as early as 1984 – 15 years earlier – which could have saved up to 175 lives.

So, what do statisticians do? They look at patterns in data to solve real-world problems.

The accuracy of data is often skewed by systematic bias.

Data is not just cold, hard facts – it’s subject to human judgments and biases like any other form of knowledge.

In fact, human judgment is involved in the very first step. Before we can collect data, we sometimes have to make fairly arbitrary decisions about what we’re measuring. If our problem is to count how many trees there are on the planet, then we need to define what exactly a “tree” is. For instance, most studies of this type only include trees that have achieved a diameter of at least 4 inches.

Consequently, data can be skewed if the definition of what is being measured changes midway through measurement. For example, the number of sexual offenses recorded by the police in the UK between 2014 and 2017 almost doubled from 64,000 cases to 121,000 cases. It might seem like crime skyrocketed in those years. However, the real reason for the increase was that sexual offenses were taken more seriously after a 2014 report criticized police recording practices.

So, we should never assume that data is a fully accurate representation of reality. Consider that a lot of data is collected from surveys that ask people questions relating to their experience, such as how happy they feel. Of course, those questions can’t be expected to capture the full range of human experience on a spreadsheet. And biases in how people interpret and answer them can further skew the data. 

This is why designing appropriate questions is one of the big challenges of statistics. The language used can influence how the respondent feels about the question. When one UK survey asked respondents how they felt about “giving 16- and 17-year-olds the right to vote,” 52 percent supported it while 41 percent opposed it. But, when the same respondents were asked the logically identical question of how they felt about “reducing the voting age from 18 to 16,” support dropped to 37 percent with 56 percent opposed.

In other instances, it’s not the question that causes bias, but the answers the survey permits. In 2017, Ryanair proudly announced that 92 percent of its passengers were satisfied with their flight experience. It turned out, however, that their customer satisfaction survey only permitted the responses “excellent, very good, good, fair, and ok.”

What this adds up to is that before statisticians even touch the data, they’re often already dealing with misleading information.

How data is presented affects how it’s interpreted.

The pervasive problem of human interpretation doesn’t only affect the collection of data, but its presentation as well.

Recent years have seen a rise in the study of data visualization as a method for communicating statistical results. Data visualizations are graphical devices used to render data visible to the eye. Statisticians describe visual representations of data as inter-ocular, meaning that patterns in the data become discernible from sight alone – without having to do any mental mathematics.

An example would be when a bar chart is used to compare the number of deaths resulting from heart surgery across different hospitals. Without having to look at the figures, any hospital that departs dramatically from the average will be obvious to the naked eye.

But graphics require careful design if they’re to be accurate and effective. Everything from color, font, order, and language affects how the data is interpreted. That’s why nowadays, statisticians work with psychologists to evaluate how alternative graphics are likely to be perceived.

To stick with the example of comparing mortality rates in hospitals, imagine a statistician presented her data in a table. She has to decide in what order she’s going to list the hospitals. It might seem like common sense to list the hospitals by mortality rate. The problem is, this order might give the impression that the hospitals are ranked according to their quality, which would be very misleading; the best hospitals often have higher mortality rates because the most severe cases have to be treated there.

Another well-documented case of how presentation affects interpretation is the effect of framing. The language in which a statistical claim is framed affects its emotional impact.

A few years ago, an advertising campaign on the London underground claimed that 99 percent of young Londoners do not commit serious youth violence. The aim, presumably, was to reassure London’s citizens of their safety. But we could reverse this claim’s emotional impact if we swap the statistic around to say “1 percent of young Londoners commit serious youth violence.” That’s a little more threatening. If we used an actual figure in place of a percentage the effect is even more pronounced: “London has 10,000 violent young offenders!”

Statistics communicators often use framing to their advantage, depending on whether they want to shock or reassure their audience. Researchers need to be careful to preempt inappropriate gut reactions to data by using deliberate design and clear language.

There is a positive bias in scientific literature caused by selective reporting.

Many researchers spend their entire lives combing data for important discoveries that they rarely find. The pressure to publish significant work sometimes leads researchers to massage the data a little.

Despite their fidelity to the truth, even scientists have been known to engage in some questionable research practices. One such practice is multiple testing, which is when researchers repeat tests until they get the results they want. The more a researcher repeats a test, the greater the chance that they will get false positives – results that seem to confirm a hypothesis but are actually due to chance error.

To understand why this is a problem, let’s take a look at a study carried out by a team of very reputable researchers in 2009. Brain imaging was used to see which areas of a subject’s brain would light up when they were shown a series of photographs of people expressing different emotions. The catch was that the “subject” was a four-pound, dead Atlantic salmon. Out of the 8,064 sites measured in the fish’s brain, 16 showed a response to the photographs. Rather than conclude that the fish had remarkable powers, the team correctly surmised that over 8,000 tests are bound to lead to a few false positives.

False positives aren’t necessarily a problem in themselves, but often they’re the only results that get reported. Even in scientific reporting, only positive or interesting results tend to get published. This has inevitably led to a positive bias within scientific literature, meaning that the public only sees the studies that seem to support a hypothesis – not the ones that don’t. Naturally, this impacts how the results are interpreted.

For example, you might be shocked if a study found that eating bacon sandwiches increases your risk of cancer. But, if you knew that 20 previous studies had found no link at all, your shock would likely be muted.

The reason for such selective reporting is complex, owing both to excessive academic pressure and our taste for sensational, groundbreaking stories.

Positive bias is what led John Ioannidis, a professor of statistics at Stanford University, to claim “most published research findings are false.” While Ioannidis was being intentionally provocative, his proclamation serves as a warning not to take research findings for granted just because they’re published in a scientific journal.

The media tends to emphasize storytelling at the expense of accuracy.

Once research has been published, it then gets reported by the media. The media, however, tends to exercise creative license.

The good news is that data journalism is flourishing. Journalists are increasingly being trained in how to interpret and communicate data. Statistics can enrich stories by bringing clarity and insight to important issues.

But there’s always the risk that statistical claims will be distorted in the process of storytelling. Stories usually require an emotional punch, which science journals rarely provide. For institutions more concerned about increasing web traffic than reporting accurate research, the temptation will always be to shy away from nuanced conclusions in favor of sensationalized ones.

The author was once privy to such sensationalizing after a careless comment he made in a talk. He was responding to the results of a national survey that studied the sexual habits of the British public. The study discovered that young people in Britain were having sex 20 percent less frequently than they had been a decade previously. Speculation by the author that an increase in content like Netflix may have something to do with the decline provoked a flurry of absurd headlines such as “sex will be obsolete by 2030 because of Netflix, according to one lone scientist.”

Aside from such outright fabrications, one of the most common ways the media contrives an emotional punch is by exaggerating statistical claims of risk.

When a report by the World Health Organization found that regularly eating processed meat led to an 18 percent increased risk of developing bowel cancer, this 18 percent figure was widely reported by the media. And, admittedly, it does sound scary. But how worried should we really be?

While the media did report the 18 percent figure accurately, they failed to distinguish between relative and absolute risk.

The increased risk of bowel cancer from regularly eating processed meat was relative to the 6 percent risk for people who do not regularly eat processed meat. So, an 18 percent increase up from 6 gives us 7.08 percent. In absolute terms, the increase in risk faced by people who regularly eat processed meat is only about 1 percent higher than the risk faced by people who don’t – a lot less scary.

Exaggerating risk is just one of the common ways that statistical claims are misrepresented. In the following blinks, we will look at some other common interpretation fallacies.

Reported averages can be misleading when the type of average used isn’t specified.

Inappropriate uses of averages have inspired some terrible jokes among statisticians.

Consider this one: “Most of us have more legs than average.” This is true, in a sense, if you calculate the average number of legs using the mean average, which is something like 1.9999 – brought down from 2 by people who’ve lost their legs.

Or, if that one wasn’t for you, what about this: “On average, the general public has one testicle.” This is also true – but only if you use a mean average that includes women in the calculation.

Both of these bizarre statements are achieved by inappropriately using the mean average when the median or mode would have led to more sensible estimations.

Let’s do a quick overview of these three types of average. The mean average is achieved by adding up all the numbers in a data set and then dividing by how many numbers there are. The median average is the number that lies in the middle when all the numbers in the data set are lined up in ascending order. Finally, the mode average is the most common number in the data set.

Different forms of average are appropriate for different circumstances. The mean average, for example, is best used when all the numbers in a data set cluster symmetrically around a central value. But in many other cases, it can be very misleading.

Let’s consider another example. The UK National Survey of Sexual Attitudes and Lifestyle asked respondents to report how many sexual partners they’d had. Here’s how the data looks: the most commonly reported number of sexual partners was 1, the majority reported between 0 and 20, and a minority reported numbers between 20 and 500.

Because of the small number of outliers reporting numbers much higher than 20, the mean average of sexual partners is likely to be far higher than the vast majority of people’s experience and is, therefore, a misleading average to use. The median average is going to provide a figure much closer to the typical person’s experience, and the mode gives us insight into the most common experience.

The type of average being used is almost never stated alongside reported conclusions. Most commonly, it’s the mean average, which we’ve seen is often inappropriate – so many statistical claims we hear in the media are misleading and not relevant to our experiences.

The statistician’s mantra is that correlation does not imply causation.

The charge correlation does not imply causation has become something of a cliché among statisticians. Yet the fallacy of inferring cause from correlation is still commonly made by the media and the public alike, so it’s a message that needs to be reiterated.

This misconception is what leads to comical headlines, such as: “Why going to university increases the risk of getting a brain tumor.”

The study that this headline is based on found that a slightly larger proportion of people who developed a brain tumor were from a higher socio-economic background. But this in itself doesn’t mean that there is a causal link between the two.

In fact, even the authors of the study speculated that the correlation was due to a form of ascertainment bias. That is, people from a higher socio-economic background were more likely to get tested for and subsequently diagnosed with brain cancer.

So, when two data sets correlate, we shouldn’t assume that one causes the other. The correlation could be explained by any of three other possibilities.

First, it might be sheer coincidence that two data sets correlate, as this silly example illustrates: between the years 2000 and 2009, there is a strong correlation between the per capita consumption of mozzarella cheese in the US and the number of engineering doctorates awarded. Despite the correlation, it’s unlikely that the increase in cheese consumption has anything to do with the number of people becoming engineers.

Secondly, correlated data could equally be explained by the reverse causal relationship to the one we expect. For example, many studies that compare alcohol consumption with health outcomes reveal that people who don’t drink have a higher death rate than people who drink moderately. It’s studies like these which beget such wishful headlines as “drinking a glass of wine a night is actually good for you.” However, this is thought to be an example of reverse causation, since people who are already ill tend to avoid alcohol.

Finally, correlation between two sets of data could be the result of a lurking factor – something that isn’t taken into account by a study, but that influences both of the observed elements which are. For example, a correlation between ice-cream sales and drowning is likely to be caused by the weather, which influences both.

So, just in case you didn’t hear it the first time: correlation does not imply causation.

Probability is frequently misunderstood.

When the author was once asked why people tend to find probability difficult and counterintuitive, he replied that probability just really is difficult and counterintuitive. 

Even people running the country have problems understanding it. In 2012, 97 members of parliament in the UK were asked this question: “If you flip a coin two times, what’s the probability of getting two heads?” The answer is one quarter, since getting two heads is one eventuality out of a possible four. The majority of parliament members – 60 out of 97 – could not give the correct answer.

Consider another probability question: Assume that roughly 1 percent of women have breast cancer and that mammography screening is 90 percent accurate in detecting breast cancer. If a woman is diagnosed with breast cancer, what’s the probability that she actually has it?

Naturally, we might assume that the woman has a 90 percent chance of having breast cancer since that’s how accurate the scan is. But, in truth, she only has an 8 percent chance. The reason for this counterintuitive result is that there are likely to be far more false positives given the much larger group of women who don’t have breast cancer than true positives among the smaller group who do.

Another common probability fallacy is the gambler’s fallacy, which is when people alter their expectations of the likelihood of individual events based on what’s come before. For example, after a long run of black on the roulette wheel, the temptation is to imagine that red is somehow “due.” True to its name, this fallacy underpins the success of casinos around the world.

However, while there’s no mechanism that forces individual random events to balance out, it’s a remarkable fact of statistics that proportions of random events do remain roughly uniform in the long run. If you keep flipping a fair coin ad infinitum, the proportion of heads and tails will approach 50 percent each. It’s considered miraculous that such uniformity should emerge out of apparent chaos.

In much the same way, unpredictable social events display a remarkable uniformity at a macro level. Just as the random movement of molecules in a gas produces uniform physical properties, the unpredictable goings-on of millions of human lives come together to produce uniform social properties – such as suicide statistics that hardly change from year to year.

So, when used properly, statistics is like “social physics.” It can be wielded to make reliable long-term predictions about thoroughly unpredictable events.

Final summary

The key message in these blinks:

Statisticians study patterns in data to help us answer questions about the world. When reported accurately, statistical research can enrich storytelling and inform the public about important issues. Unfortunately, there are a great many distorting filters that research has to pass through before it reaches the public, including scientific journals and the media. As statistical data creeps into our lives more and more, there is a growing need for us all to improve our data literacy so we can appropriately assess the findings.

Actionable advice:

Don’t take statistics at face value.

View statistical information the way you might view your friends: they’re the source of some great stories, but they’re not always the most accurate. Statistical information should be treated with the same skepticism you apply to other kinds of claims, facts and quotes. And, where possible, you should examine the sources of statistics behind the headlines so you can assess how accurately the information has been reported.

Got feedback?

We’d sure love to hear what you think about our content! Just drop an email to remember@blinkist.com with the title of this book as the subject line and share your thoughts!

What to read next: How to Lie with Statistics, by Darrell Huff

We’ve seen how statistical claims can be distorted in their passage from research to the public ear. Usually, these distortions of the data are unintentional and arise from a misunderstanding of statistical methods. Sometimes, however, these distortions are quite deliberate. 

The blinks to How to Lie with Statistics, by author Darrell Huff, deal with this darker side of statistics. They introduce the techniques that media and advertisements use to alter how data is perceived and interpreted. They also go deeper into some familiar themes, such as the difficulty of truly random sampling, the error of inferring cause from correlation, and the misuse of averages. To avoid getting fooled, head on over to our blinks on How to Lie with Statistics.

Be the first to comment

Leave a Reply

Your email address will not be published.


*