Self Help

Soccermatics Mathematical Adventures in t - David Sumpter

Author Photo

Matheus Puppe

· 46 min read

Here is a summary of the key points from the excerpt:

  • The author is a mathematics professor who loves both math and football, but recognizes that football is far more popular and exciting for most people.

  • However, the author believes math can offer insights into football, through statistical analysis, probability, and mathematical modeling. Football also helps illustrate mathematical concepts.

  • Numbers and data analytics play an increasingly important role in football, but math goes beyond stats to provide understanding. Math can answer questions about randomness, geometry, incentives, big data patterns, probability, and more in football context.

  • The author aims not just to provide football trivia, but to change how we see both math and football, recognizing what each can offer the other.

  • Both math and football start from theoretical rules/laws, but unexpected things happen in practice. Combining theory and practice makes football exciting. The same is true when math meets the real world.

  • Overall, the author believes math can’t compete with the excitement of football, but they can learn from each other, with football helping to explain mathematical ideas. The book aims to explore the relationships between these two very different worlds.

  • The author argues that math is often seen as abstract and detached from reality, but he uses it to model real-world phenomena like urban growth, social networks, and biological systems.

  • He takes a creative approach, letting his emotions and intuitions guide which problems he studies rather than just logic. He sees connections between diverse fields and uses math to link them.

  • His philosophy of “Soccermatics” applies this creative mathematical modeling to football. He shows how math reveals insights about players, teams, tactics, betting, and more.

  • The book explains mathematical ideas intuitively using words, simulations, and diagrams rather than abstract formulas. The goal is to make modeling accessible.

  • Modeling is about seeing patterns and making analogies. The author argues anyone can do it, using their own interests and aptitudes as a starting point, like football.

  • The book ultimately aims to show how mathematical modeling can shed light on the modern world beyond just football, from house prices to genetics. Math is a way of seeing connections and solutions.

Here is a summary of the key points from the excerpt:

  • The author traces his fascination with finding patterns in football back to childhood, when he would pore over statistics tables in books. He and a friend created a simulated football league using dice rolls to generate match results when they didn’t have time to play full seasons.

  • The author sees unpredictability and randomness as essential to what makes football exciting. He analyzes goal data from the 2012/13 Premier League season and finds the distribution follows a Poisson distribution, meaning goals occur randomly and independently over the course of a match.

  • He validates this by showing NHL hockey data for the same season also follows a Poisson distribution, despite having a higher average number of goals per game. This indicates the timing of goals is random in both sports.

  • The author argues that the more complex factors involved in a sport, the greater the randomness in scoring. The Poisson distribution emerges from this randomness as a pattern. Bus arrival times are another example where intended schedules break down into randomness that follows a Poisson distribution.

In summary, the author shows how mathematical models and distributions like Poisson can reveal predictable patterns in seemingly unpredictable events like scoring in sports. This fascination began in childhood and helps explain the author’s career as a mathematician.

  • The Poisson distribution describes events that happen randomly and independently over time, like goals scored in a football match. It was first used this way by Ladislaus Bortkiewicz in 1898 to model deaths from kicking horses and child suicides.

  • The Poisson distribution applies to many situations involving randomness and unpredictability, like manufacturing defects, computer viruses, divorces, and even wars. It is commonly used by statisticians to model accidents and misfortunes.

  • In 2015, a study showed cancer is heavily influenced by random mutations during cell division, meaning cancer is largely due to “bad luck.” This randomness is like the unpredictability of goals scored during a football match.

  • The fact that football scores are so random in timing makes their overall distribution predictable. Mathematicians leverage this to make reasonable predictions about seasons and tournaments by simulating matches based on team scoring rates.

  • Although each simulation produces different results, over many simulations the most likely outcomes emerge. The unpredictability of each match leads to the predictability of the league table. Randomness allows us to explain and predict.

  • The author’s father believes football is largely random, coming down to occasional moments of skill or mistakes rather than any real structure or tactics.

  • To understand the structure of football better, the author suggests taking a wider perspective and looking at formations rather than just focusing on individuals.

  • Formations like 4-4-2 and 3-5-2 reflect a team’s intended strategy and player roles. The evolution of formations over time shows structure is important.

  • The ‘WM’ formation dominated in the 1930s-1950s before being replaced by 4-4-2. This demonstrates tactical innovations can dramatically change the game.

  • Looking at football from a distance reveals overall patterns and structure not visible when zoomed in on TV highlights. Studying things like average player positions and passes between players provides insight.

  • The author aims to use mathematical models to show his father there is genuine structure and tactics at work in football, beyond just moments of skill and mistakes.

  • In 1872, the first international football match between England and Scotland ended 0-0, despite both teams using very attacking formations (England 1-2-7, Scotland 2-2-6).

  • Formations have evolved over time as tactics changed, even when the rules stayed the same. Four notable formations are highlighted: Hungary 1953, Inter Milan 1960s, Liverpool 1970s, Barcelona 2010/11.

  • Network diagrams of the formations show connections between players based on passing options. Hungary’s network features a central attacking midfielder to link play. Inter Milan’s is very compact and defensive. Liverpool’s rigid structure features right-angled triangles. Barcelona’s uses wide triangles that allow smooth ball circulation.

  • Triangles are mathematically efficient for connecting points, as shown by the ‘suburb connection problem’. Slime moulds also use triangular networks to link food sources efficiently, similar to Tokyo’s rail system.

  • Barcelona’s triangles allow them to retain possession and move the ball quickly. Their formation spreads play evenly in all directions, like the networks of slime moulds. Efficient networks use triangular junctions to connect points smoothly.

  • The passing networks and formations used by teams like Barcelona are mathematically similar to the networks formed by slime molds and railway systems.

  • They all use networks of wide triangles to efficiently cover space. This creates symmetrical zones on the pitch that players move between.

  • Barcelona’s ‘tiki-taka’ style involves rapid passing that aims to destabilize the opposition defense by drawing them onto the edges between zones.

  • When you analyze Barcelona’s play in slow motion, you can see how players like Messi, Xavi, and Iniesta move to create space and passing angles. Their movement creates optimal zones that defenders struggle to cover.

  • Goals result from this clever structure and positioning, not just individual brilliance. Barcelona don’t explicitly calculate the math, but their training ingrains an intuitive geometrical awareness.

  • The key is that efficient coverage of space creates good passing opportunities, and good positioning to receive passes creates space. Barcelona have mastered this structure to break down defenses.

  • The author noticed that young children playing football tend to chase the ball in a big clump rather than maintain positions and structure.

  • Some believe the clump is inevitable in children’s football, but the author argues it has nothing to do with good football, which requires structure.

  • Movement and positioning are central to the author’s research on animal behavior. He sees similarities in understanding the dynamics of animals and of football teams.

  • When coaching his son’s team, the author wanted to understand the dynamics and get the players thinking about movement and positioning rather than just chasing the ball.

  • The piggy-in-the-middle exercise demonstrates the problems with having one defender try to intercept passes from two attackers - the defender will eventually get the ball.

  • This shows the need for different exercises that promote structure, movement and positioning to avoid everyone chasing the ball. Barcelona’s La Masia academy successfully practices this with young players.

  • Mehdi Moussaïd conducted experiments with students walking in a narrow corridor to study how they avoid collisions.

  • When one student was stationary and another walked towards them, there was only a weak tendency to pass on the left or right.

  • Moussaïd used these results to create a simulation model of two students walking towards each other.

  • The model predicted that with two moving students there would be a stronger bias to pass on the right side.

  • Moussaïd tested this experimentally and found the model was correct - two moving students showed a clear right-side bias when passing each other.

  • The study shows how mathematical models can be used to predict behaviors in new situations, by extrapolating from experimental data in simple scenarios.

  • Mehdi’s experiments show that individuals have only a weak preference for passing left or right. The stronger convention arises through repeated interactions. This explains why it is hard to avoid bumping into tourists who don’t share the local convention.

  • In football, attackers try to get past defenders who aim to block their path. Defending well in one-on-one situations is described as a great art.

  • Research by Selina Pan found a defensive strategy of narrowing down the attacker’s options is very effective. Her computer model showed it is impossible for an attacker to get past two defenders using this zone-minimizing approach.

  • Lioness hunting groups also use a cooperative strategy of surrounding prey and reducing escape routes before going in for the kill.

  • Good football defending works similarly, with pressing and coordinated positioning to limit the attacker’s space. Bayern Munich demonstrated this against Barcelona in 2013 by aggressive pressing that didn’t allow Barça time or space to build up attacks.

  • The chapter discusses statistically brilliant players like Lionel Messi, Cristiano Ronaldo, Usain Bolt, and the Williams sisters, who have dominated their sports for extended periods.

  • It looks at how to quantify their statistical brilliance using z-scores, which measure how many standard deviations above the mean their performance is compared to their peers.

  • Messi’s goalscoring record from 2011-12 produced an exceptionally high z-score of 3.4, indicating his performance was 3.4 standard deviations above the average. This statistically brilliant season has likely never been matched in the history of football.

  • Bolt’s 100m world record in 2009 was 2.2 standard deviations faster than the average of other elite sprinters, demonstrating extreme brilliance.

  • The chapter emphasizes that statistical brilliance requires sustained dominance over years and multiple seasons, not just short periods of peak performance.

  • Other examples of statistical brilliance include cricketers Bradman and Sobers, basketball player Chamberlain, chess player Carlsen, and table tennis player Waldner.

  • The rarity of true statistical brilliance illustrates how remarkable the achievements of these players are compared to their peers. Quantifying their z-scores puts their dominance into perspective.

  • To understand extremes like the greatest footballers or financial crises, we need statistics of extremes. Looking at past data can help predict the likelihood of future extreme events.

  • The ‘guessing game’ provides a rule of thumb for estimating the probability of an extreme event occurring. The chance is around 1 divided by the number of times it hasn’t happened before, plus 1.

  • Using the extreme value distribution mathematical model, Messi’s achievement of 50 goals in a La Liga season is estimated to be a once-in-a-lifetime event, only expected to occur once every 73 years.

  • The extreme value distribution can also be applied to analyze scoring records in women’s football. Hanna Ljungberg’s record of 39 goals in a Swedish league season is estimated to be a once-in-a-generation achievement.

  • Overall, statistics of extremes can help provide insight into the likelihood and magnitude of remarkable achievements or rare events in fields like sports, finance, and climate. Comparing data to mathematical models allows probabilities to be estimated.

  • Extreme value theory allows us to predict rare events like record-breaking athletic performances or extreme weather. It shows the probability of events at the tails of a statistical distribution.

  • In football, extreme value theory can predict the likelihood of players like Messi or Marta breaking goalscoring records. It shows their performances are statistically extreme.

  • For climate science, extreme value theory helps estimate the chance of rare floods or storms, like the devastating 1953 North Sea flood. This informs policy decisions like Dutch sea defenses.

  • However, extreme events can sometimes defy predictions if they represent a dramatic change, like Usain Bolt’s 100m world record in 2008. Similarly, climate change may be altering weather patterns beyond what historical data can predict.

  • So while extreme value theory is useful for understanding outliers, we have to be cautious when historical patterns are disrupted by genuine game-changers. The rules may be changing in ways standard statistical models cannot capture.

  • Statistical models like the Premier League’s Performance Index aim to objectively measure player performance, but have limitations. They must be designed by humans who make judgement calls on how to weight different actions.

  • The original Performance Index ranked Fulham’s goalkeeper Mark Schwarzer as the best player in 2008/09, ahead of attacking players like Ronaldo. This was because it valued defensive actions like saves and blocks highly for preventing goals.

  • The index was revised to also consider goals, assists and team performance. This boosted attacking players in the rankings to better align with fan interest.

  • There are statistical challenges in football versus a sport like baseball. Team context impacts individual stats, so it’s hard to separate player skill from their role, teammates, and opponents.

  • Managers want objective data to find rising talent globally, but must be cautious in interpretation. The ideal is to combine stats with expert qualitative judgements on players. There are no perfect, fully objective player rankings in football.

  • Zlatan Ibrahimovic scored an amazing bicycle kick goal against England in 2012.

  • The physics of the goal can be analyzed using Newton’s equations of motion. The trajectory of the ball follows a parabolic path determined by the launch angle and speed.

  • There is a narrow range of angles and speeds that will result in the ball going into the goal rather than over the bar or bouncing in front of the goal.

  • Zlatan had to strike the ball at just the right velocity and angle to score. This required great coordination and timing.

  • While there was some luck involved in the goal, Zlatan weighted the ball perfectly to send it over the keeper and into the net from long range.

  • The relationship between angle, speed and trajectory that determines whether a shot goes in is complex, making this type of goal very difficult to pull off intentionally.

  • As a child, the author was not a fan of Jimmy Hill, who was seen by his Scottish peers as embodying negative English stereotypes like being pompous and arrogant.

  • In 1982, Hill dismissed David Narey’s wonder goal for Scotland against Brazil as just a “toe poke”, ruining Scotland’s moment of glory. Brazil went on to win 4-1.

  • Hill was the perfect example of an armchair expert who oversimplified and offered sweeping opinions from the comfort of the studio.

  • However, some of Hill’s innovations were brilliant, like introducing three points for a win and abolishing the maximum wage.

  • Hill recognized the potential financial power of football on television. He negotiated the first major TV deal for the First Division in 1962.

  • The author argues that armchair experts and their simplifications are needed to balance the complexity of football. Different perspectives are useful even if they seem contradictory.

  • Hill was disliked for his persona but made major positive changes. The author now recognizes Hill’s accomplishments despite not liking him as a child.

  • The author discusses the TV pundit Jimmy Hill, who was known for his rational, logical analysis of football matches. However, the author felt Hill overlooked the emotional side of football for fans.

  • Despite reservations about Hill’s on-screen persona, the author champions Hill’s behind-the-scenes work in changing football rules. In particular, Hill pushed for 3 points for a win rather than 2 points. This spread across many countries by the 1990s.

  • The author analyzes how the points system changes incentives to attack or defend in a game. With 2 points for a win, defending is better as it secures 1 point. With 3 points, attacking becomes better as the potential points reward is higher.

  • The author provides a hypothetical model of a mid-table team playing Arsenal. With 2 points for a win, defending is best as it secures more expected points. With 3 points, attacking becomes better as it increases expected points.

  • The author relates this to animal contests, using shore crabs fighting over food as an example. The incentives change based on the relative strength of the crabs, similar to the football scenario.

  • Dora and colleagues study dominance hierarchies in pigeon lofts. They use automated tracking to measure how pigeons interact when feeding and flying.

  • The tracking algorithm identifies dominant pigeons that push forward vs subordinate pigeons that move aside. There is a similar hierarchy when the birds are flying - some lead while others follow.

  • The hierarchies are transitive - if A dominates B, and B dominates C, then A dominates C. This reduces conflict and makes life simple.

  • Transitive hierarchies occur widely in nature, from ants to chimpanzees. They arise from competition over finite resources.

  • Jimmy Hill introduced the 3-point system in English football to encourage more attacking play and break the transitive hierarchy, where stronger teams attack and weaker teams defend.

  • Statistics show there were fewer draws after the introduction of 3 points for a win, supporting Hill’s theory that it encouraged attacking football.

  • Different species of birds, such as the Dark-eyed Junco, demonstrate the same home-field advantage that we see in football. This is because birds have evolved intuitive strategies that work well - those that are too aggressive or too timid die off.

  • Cancer cells also demonstrate intuitive strategies that allow them to grow and spread more effectively, without any intelligent planning.

  • We can simulate the evolution of football strategies in a simple computer model. Teams adopt rules like ‘Attack’, ‘Defend’, ‘Stronger’, and ‘Twice’ (attack unless opponents are twice as strong).

  • Over time, the ‘Twice’ strategy dominates as unsuccessful strategies are eliminated. This shows how intuition can evolve to an optimal strategy through a natural selection process, without managers needing to do mathematical calculations.

  • Real-life managers learn good intuitive strategies through experience and copying successful teams. The incentives of the game shape their intuition over time.

  • Strategic innovations like 3 points for a win have improved football by encouraging more attacking play. Teams like Southampton gamble and play positively even against stronger opponents, to the benefit of the spectacle.

Here are the key points from the section:

  • Figure 7.1 shows the passing networks for Italy and England in their Euro 2012 quarter-final match. Italy’s network was much more interconnected, while England mostly relied on long balls from the goalkeeper to the tall striker Andy Carroll.

  • The analysis reveals that England’s approach was very direct and lacked consistency in midfield. Italy dominated possession but their network was too centralized around Andrea Pirlo.

  • Research by Thomas Grund on Premier League teams shows that higher passing rate and a less centralized network correlate with more goals scored.

  • Based on this, Italy’s very centralized network around Pirlo indicates a potential weakness, despite their dominating possession and shots against England.

  • The tactical map provided by the passing network analysis gives insights beyond just looking at possession stats or match outcome. It reveals subtleties about each team’s approach and effectiveness.

  • Thomas Grund’s analysis showed that football teams with more decentralized passing networks tend to be more successful than teams that rely on a central playmaker.

  • Spain demonstrated this in Euro 2012 - their passing network was more distributed across the team compared to Italy’s reliance on Pirlo. Spain beat Italy 4-0 in the final.

  • Abby Wambach was the leader and focal point of the US Women’s National Team. But in the 2015 World Cup, the US performed better when they moved away from relying on long balls to Wambach and instead used a more decentralized approach.

  • The structure of social networks can reveal important patterns - for example, a map of romantic/sexual relationships in a US high school showed that people rarely date their ex’s exes.

  • Network maps of football passing can demonstrate a team’s style - decentralized networks like Barcelona’s promote better ball movement. Bayern Munich is another very decentralized team.

  • The author analyzes the passing tactics of 4 teams that reached the Champions League semi-finals in 2015: Bayern Munich, Barcelona, Real Madrid, and Juventus.

  • Data provided by Opta Sports and visualized through maps reveals differences in passing patterns between the teams.

  • Bayern Munich under Pep Guardiola passed the most, controlling midfield. Juventus passed less but attacked strongly down the right wing.

  • Barcelona and Bayern had similar overall pass distributions, but Barcelona’s passing network shows more penetration, especially the link between Messi and Suarez. In contrast, Bayern’s passes circled in midfield.

  • The author argues that data visualizations like passing maps are useful for understanding and comparing team tactics beyond just goals scored. The maps reveal the structure and strategy underlying a team’s play.

  • The 2014/15 Champions League semi-finals featured contrasting styles between Real Madrid and Juventus. Real Madrid played explosive, attacking football built around Ronaldo, Benzema, and Bale. Juventus relied on a sturdy defense and counterattacks.

  • Ronaldo takes a high volume of shots from all over, while Benzema is more clinical inside the box. Bale shoots from a few preferred zones.

  • Real Madrid’s danger zones show they build attacks down the left side before getting the ball into the central danger zone in front of goal.

  • Juventus structured their defense to deal with Real’s left sided attacks. Lichtsteiner and Marchisio on the right side intercepted many passes.

  • Real Madrid’s defensive actions were higher up the pitch, indicative of their pressing style. Ronaldo barely regained possession.

  • Real’s attacking approach wasn’t enough to get past Juventus’ organized defense in the semi-finals, leading to Ancelotti’s dismissal. The tactical styles of the teams were complete contrasts.

  • The author uses an example of a work dilemma between coworkers to illustrate the need for “pros and cons tables” rather than simple pros and cons lists to account for how others may adjust their behavior.

  • In the example, the author and a colleague must finish a report for their boss. The author could write the report himself or hope his colleague does it so he can go to a football match. His colleague faces the same choice.

  • They rank the potential outcomes on a 4-point scale based on whether they worked on the report or shirked responsibility. The best outcome (3 points) is if the colleague does all the work while the author goes to the match. The worst (0 points) is if neither does the work.

  • The table shows what each should do based on expectations of the other’s actions. It illustrates how social dilemmas depend on anticipating others’ behaviors.

  • The author relates this to a common problem in companies where employees may “shirk or work” in collaborations. If some start shirking, others may follow, leading to cooperation breaking down.

  • The author suggests this dilemma is key to understanding cooperation more broadly, including in animal and human evolution. Good managers find ways to structure teams where it pays for individuals to cooperate.

  • Evolutionary models predict selfish behavior will dominate, but in reality humans and animals cooperate extensively. Examples given include lion hunting packs, birds warning of danger, ants leaving pheromone trails.

  • One explanation is cooperation evolved within small groups of genetic relatives. By helping your kin, you help spread your shared genes. This is formalized in Hamilton’s rule which balances cost of helping against benefit to recipient multiplied by genetic relatedness.

  • This tribal cooperation remains encoded in our genes, even as society has moved beyond tight family groupings. We form bonds and cooperate readily with non-relatives.

  • Football teams once consisted entirely of local players but now draw talent from around the world. Fans still form passionate bonds with diverse squads. Our ability to cooperate extends beyond kin groups.

  • Dynamo Kiev provides an example of a tightly knit group beating teams of superstars, showing the power of cooperation. Their manager Lobanovskyi was talented in math and engineering, bringing scientific approaches to build cooperation.

Here is a summary of the key points about Valeriy Lobanovskyi and the National Technical University in Kiev:

  • Lobanovskyi studied cybernetics, a new scientific discipline focused on how parts of a system interact, at the National Technical University in Kiev in the 1960s-1970s. Cybernetics pioneered a mathematical approach to complex systems.

  • As a manager, Lobanovskyi applied cybernetics concepts to football. He used statistics to evaluate player performance and considered the team as a mathematical system with interacting parts.

  • Lobanovskyi realized a team’s performance could be greater than the sum of its parts (super-linear) through synchronized player movements. If one player fails, the whole system falls apart.

  • Research on ant colonies also showed super-linear team performance. Larger colonies consistently outperformed smaller ones in food collection due to cooperative trail laying.

  • Medium-sized ant colonies could have high or low performance depending on the initial number of ants on the trail. This demonstrates performance depends on initial conditions.

  • In football, a motivational leader can temporarily increase team effort, lifting performance to a higher level which is then maintained even when effort drops back down. This explains performance variability.

  • Teams can benefit from a super-linear performance curve, where effort squared leads to disproportionately high performance. This encourages cooperation, as it is in each player’s interest to contribute fully to the team plan.

  • However, super-linearity also makes teams vulnerable. If contributions fall below a critical point, performance can collapse rapidly as no player has an incentive to contribute to a failing plan.

  • To avoid this, managers must rebuild trust and commitment simultaneously across the team. The Dutch style of “total football” aimed to bring together individual stars into an effective collective, with shared goals.

  • Both the regimented Soviet team and the individualistic Dutch stars exhibited super-linearity. But the Dutch ultimately triumphed by getting their stars to fully commit to the team plan.

Here are the key points from the section:

  • Football teams now have access to huge amounts of tracking data from matches and training sessions. A single Premier League match can generate over 100 million data points.

  • This data explosion presents challenges in how to effectively analyze and summarize the data into useful insights. Teams employ data analysts and scientists to help.

  • Researchers like those at Disney are developing methods to identify formations and player roles from the raw tracking data. Their models can reveal subtlevariations in formations that are not apparent just from watching.

  • The tracking data allows new metrics to be created, like “space creation” - how much a player’s movement opens up gaps for teammates. Data can quantify the subjective insights coaches have had for years.

  • Overall, the massive influx of data has huge potential to provide new insights into the game, but turning the raw numbers into something useful remains a major challenge. Summarizing and visualizing the key patterns is crucial.

  • Figure 9.2 shows four examples of team formations adopted by different teams over a season. The symbols indicate the average position of players during matches. This allows analysis of different formations like 4-4-2.

  • Identifying formations is a first step. A Disney study looked at factors like player positioning and movement to predict attacking success. Counter-attacks often provide good goalscoring opportunities.

  • Dynamic analysis looks at player interactions over time. Figure 9.3 shows player positions and directions 1 second after Figure 9.1, illustrating coordinated movement.

  • A model by Tamás Vicsek showed that alignment can emerge from simple local interactions, without tracking all players. This explains flocking in nature.

  • Research shows defender and midfielder coordination is highest, while forwards are less coordinated. Synchronization could indicate team cohesion. Teams tend to be more synchronized when playing stronger opposition or with less congested schedules.

  • The article discusses how pigeons navigate when flying in pairs. Like humans walking together, pigeons experience conflicting forces - wanting to follow their own familiar route but also stay together with their partner.

  • Research by Dora Biro found that when landmarks are close together, pigeon pairs compromise and fly a route in between their preferred paths. But when landmarks are farther apart, one pigeon becomes the leader and the other follows its route.

  • Leadership has little to do with navigation skill. Pigeons that fly faster when alone tend to move slightly ahead of their partner and become the leader.

  • In football, players similarly make continual subtle movements and positioning decisions without overt communication. Mate Nagy developed a method to detect lags between players changing direction, revealing networks of leaders and followers.

  • In one analysed match, the team captain emerged as the clear leader, directing moves when his team was behind. After adopting his direction, they equalised and took the lead. His subtle leadership was key to the victory.

  • More detailed collective motion analysis in football is hindered by lack of available data. But methods developed in animal behaviour research could revolutionise understanding of team coordination and dynamics if data access improves.

  • Paul Power read an article about superorganisms and was inspired to apply the ideas to studying football. He got a job at Prozone to do this.

  • As a coach, Paul wants the mathematical analysis to lead to practical training applications.

  • Paul aims to use match data to design effective training drills, an approach that goes back to Dutch manager Rinus Michels.

  • An important tactical aspect is when and how to press opponents. Different teams use different pressing styles.

  • Paul used tracking data to model viable passing networks and see how pressing disrupts them.

  • He identified principles for effective counter-pressing (two players within 5.5 seconds) and deep pressing (reduce speed, don’t commit multiple defenders).

  • Paul worked with managers to tailor training based on these principles.

  • The key is using biological models as inspiration, but relying on math and statistics to gain real insight into improving team performance.

  • The author attended Steven Gerrard’s final home game for Liverpool in 2015. The fans united in song to celebrate their captain, chanting “Stevie Gerrard Is Our Captain” and “Impossible Forty Yards.”

  • Football chants spread through the crowd exponentially, like bacterial growth. Each singing fan recruits others to join in.

  • Mathematical models show how social contagion spreads in an S-shaped curve. Growth is slow at first, becomes exponential as more join in, then levels off as capacity is reached.

  • The author tested this model by studying applause spreading through a group of students after a seminar. Mapping each student’s claps over time showed the S-shaped spread of the applause.

  • Analysis found the clapping spread through social contagion, with people more likely to start clapping as the proportion of clappers increased. The growth of clapping and its cessation both followed S-shaped curves.

  • This shows how ideas and behaviors spread through crowds, analogous to the spread of chants through football fans or disease through a population. The S-shaped curve captures the dynamics of social contagion.

Here are the key points from the passage:

  • Mexican waves behave similarly to applause and chanting in crowds, spreading through social contagion.

  • Physicist Illés Farkas was interested in the waves seen at the 1986 World Cup in Mexico.

  • Together with colleagues Tamás Vicsek and Dirk Helbing, Farkas created a model for Mexican waves based on analogies between disease spread and social contagion.

  • Their model assumes each spectator has a probability of standing up that depends on the actions of their neighbors.

  • Once standing, an individual will sit down again after a fixed time.

  • The model reproduces the observed wave-like behavior seen in real crowds.

  • Though popular internationally, Mexican waves are seen as ‘naff’ by British football fans who prefer tribal chanting and advice to players.

  • Illés and colleagues modeled the spread of a simulated “disease” through fans in a simulated stadium, finding it took little to trigger a Mexican wave that propagated rapidly.

  • Fish can also exhibit escape waves, with information about threats spreading through schools very quickly. Modeling by the author and Teddy Herbert-Read confirmed the importance of such waves for fish survival.

  • In cricket stadiums, the wave propagation is non-local - fans anticipate the arrival of the wave rather than just responding to their immediate neighbors.

  • Jesse Silverberg modeled moshing at heavy metal concerts, identifying classic pit, circle pit, and “train” patterns emerging from simple rules of following, random moshing, and collisions.

  • Modeling moshing helps explain the patterns and also has implications for understanding crowd behavior more generally, like at football matches. Crowd simulations can inform stadium design.

  • Overall, the examples illustrate how waves and collective motion can propagate through human crowds as well as animal groups, sometimes with crucial implications for safety and survival. Simple models can provide insight into the emergence of complex patterns.

  • Crowds can be collectively wiser than individuals when making numerical estimates. Experiments with guessing the number of sweets in a jar show the group average is often very close to the true number.

  • This “wisdom of crowds” effect comes from a combination of overestimators and underestimators balancing each other out.

  • Betting markets show the same wisdom of crowds, with market prices reflecting the aggregated information of all bettors.

  • Simple models and experiments show that herd behavior can lead markets astray, such as everyone copying the early bettors.

  • Diversity of opinion is key - markets go wrong when everyone has the same information and makes correlated errors.

  • Betting against market trends (“contrarian betting”) can exploit crowd errors, but is risky and requires strong nerves.

  • Overall, markets are robust and hard to manipulate, but also imperfect. Combining market data with your own information is the optimal strategy.

  • Bookmakers can effectively “beat the crowd” by adjusting betting spreads to balance their books, even if they are uninformed about the true probabilities.

  • They do this by tracking the number of over/under bets and adjusting the spread accordingly, without needing to actually learn the true probabilities.

  • Even if you are better informed than the bookmakers and the betting crowd, it is hard to profit because the crowd’s average guess tends to converge to the true value quickly.

  • So the wisdom of crowds makes it hard for even an expert to beat the bookies.

  • However, crowds do not always predict accurately (e.g. 2015 UK elections), so their wisdom should not be taken for granted.

  • An experiment showed most people could not correctly estimate the number of coin flips equivalent to the odds of winning the lottery, illustrating that crowds are not always wise.

Here are the key points from the passage:

  • Journalist Joe Prince-Wright made predictions for the 2014/15 Premier League season. He accurately predicted the top 6 teams, but made some errors further down the table.

  • On average, Prince-Wright’s predictions were off by 2.3 positions from the actual final standings.

  • Simply using the previous season’s final standings as a prediction for the next season was about as accurate as Prince-Wright’s predictions.

  • Out of 17 journalists who made predictions, only 1 did better than just using the previous season’s standings.

  • Overall, the passage suggests that expert predictions struggle to significantly outperform simple rules of thumb like using the previous season’s standings. While some experts can be quite accurate, most do not demonstrate a strong ability to foresee how the season will play out.

  • There are often only small differences in Premier League teams’ league positions from one season to the next. However, some seasons see much bigger changes, like in 2013/14 when several teams dropped 7-8 places.

  • Experts struggle to accurately predict team positions for upcoming Premier League seasons. When their predictions for 2013/14 and 2014/15 are compared to a benchmark based on average position changes, most experts do worse than the benchmark.

  • Statistical models like the Euro Club Index can predict league positions about as well as experts for a season with smaller changes like 2014/15. But for a more unpredictable season like 2013/14, the model performed poorly.

  • When people confer and share guesses, like in the sweets counting experiment, it can lead to worse estimates. People overly rely on others’ guesses rather than fully trusting their own judgement.

  • In situations of uncertainty, people tend to follow others rather than make their own assessments. This can lead to cascades where most people end up agreeing, whether correctly or incorrectly.

  • Before placing bets, it is important to understand odds, probabilities, and how bookmakers make money.

  • UK odds (e.g. 3/7) can be awkward to work with. Converting to European odds (e.g. 1.43) makes calculations simpler.

  • Always calculate the probability of your prediction before looking at odds. Odds tell you potential profit, probabilities estimate likelihood.

  • Gambling for fun without thinking through the details is irrational. To gamble seriously, ask questions like: What is the probability of each outcome? What is my expected profit/loss? How much are the bookmakers making?

  • The key is to bet only when your estimated probability differs substantially from the probability implied by the odds. This creates profitable betting opportunities.

  • To be successful, bet selectively in your area of expertise, and don’t be swayed by others’ opinions. A mathematically competent person with specialist knowledge could potentially beat the bookies.

Here are a few thoughts on your betting challenge:

  • Setting a budget and sticking to it is wise. Gambling responsibly within your means is important.

  • Using mathematical strategies and calculating probabilities and potential returns can help make informed betting decisions. However, predicting sports outcomes has a high degree of uncertainty.

  • Shopping around between multiple bookmakers for the best odds will minimize the bookmakers’ built-in advantage. This improves your chances but does not guarantee profits.

  • It may be difficult to consistently beat the odds over just a 5 week period. Luck plays a big role, especially in the short term. Managing risks and expectations is prudent.

  • If you’re not finding the experience enjoyable or it’s causing you stress, it may be best to stop. Gambling should be entertaining, not harmful.

  • Overall, a thoughtful and controlled approach is sensible. But recognize the challenges in seeking to reliably profit from sports betting over such a short timeframe. Patience and discipline will be key.

  • The author will set aside any opinions formed from watching football and instead make bets based on mathematical models set up in advance. The models use publicly available data on past matches and bookmaker odds.

  • The models are built from scratch using undergraduate-level math and statistics. The author gets some help understanding odds but programs the models himself.

  • Two potential weaknesses in bookmaker odds are identified from the 2014/15 Premier League season data:

    • Odds underestimate wins by strong favorites (home odds between 1.33-1.43)
    • Odds underestimate draws between well-matched teams (win probability difference <10%)
  • This leads to an “odds bias” betting strategy: bet on strong favorites and well-matched draws.

  • A second strategy uses the Euro Club Index, which is slower to respond to short-term form changes than bookmakers. This index-based strategy bets against the public perception.

  • The strategies will be tested during the betting period to see if they can beat the house edge. The author aims to have fun, win or lose.

  • The author introduces a third betting strategy based on performance indicators like expected goals and passing rates. He finds these are better predictors of future results than past results alone.

  • Using data from previous matches, the author calculates expected goals based on the probability of scoring from different zones on the pitch.

  • Passing rate - successful passes per minute of possession - is also a good indicator. Teams that pass more score more goals.

  • Combining expected goals and passing rates into a ranking system gave profitable bets in Week 4, with Chelsea and Manchester United both losing to lower ranked teams.

  • The fourth strategy is to follow the predictions of football expert Joe Prince-Wright, who successfully predicted many outcomes in previous seasons with his Premier League picks.

In summary, the author presents two new performance indicators to supplement his betting strategies, and identifies an expert to potentially exploit insider knowledge of the Premier League.

  • Spencer simulated two gambler models - Lucky Luke who bets randomly and Calamity Jane who bets based on a slight edge over the bookmakers. After 10,000 simulations, Jane ended up with more money on average, but there was substantial overlap in their outcomes due to randomness.

  • This shows how in real life it can be hard to separate lucky gamblers from skilled ones based on short-term results. The Lukes may boast of their “amazing new system” while the Janes abandon theirs after losses.

  • Spencer examined his four betting strategy models over the first four weeks of the 2015/16 Premier League season before placing his own bets. There was large variation between them - Prince-Wright’s expert tips did very well in Week 1 but poorly later, while the performance indicators were more consistent.

  • This shows the challenges of combining different strategies - some will do well and others poorly each week, so Spencer needs to average them and only bet where there is an edge over the bookmakers’ odds.

  • The results will demonstrate whether Spencer’s combined model can overcome the randomness and bookmakers’ advantage to profit over the season. But short-term losses don’t necessarily mean the strategies are flawed.

  • The author tested four gambling strategies on the 2015/16 English Premier League season: performance indicators, odds bias, Euro Club Index, and expert tips.

  • The performance indicator and odds bias strategies were most profitable, while the Euro Club Index lost money steadily. The expert tips performed poorly.

  • In week 1 of real betting, the model suggested betting mainly on draws. Only 1 draw occurred out of 10 matches, so the author lost money.

  • In week 2, excluding the expert tips improved performance. The model made a small profit through bets on draws, away wins, and a home win.

  • Analysis showed the Euro Club Index declined steadily at the same rate as random bets, due to the bookies’ advantage.

  • Research suggests Elo systems like the Euro Club Index can rank teams well but don’t make profits when betting due to the bookies’ edge. Variations like FiveThirtyEight’s Soccer Power Index incorporate more info and performed better.

Here is a summary of the two tournaments:

  • Nate Silver’s weekly American Football Elo model performed very well in predicting match results during the 2014 World Cup. However, Nate advises against using it to bet on NFL games, as historical testing shows it does not make money against Vegas betting lines.

  • My own Premier League model based on expected goals and other performance indicators initially seemed promising, but after 9 weeks its performance was poor. The simple odds-bias strategy of betting on draws between well-matched teams and favorites to win did much better, turning £100 into £240 over 90 matches.

The key points are:

  • Nate Silver’s World Cup model succeeded, but his NFL model does not beat Vegas lines.

  • My Premier League model failed to make money, while a simple odds-bias strategy profitably exploited mispriced draws and favorites.

Here are a few key points on how to get started in football data analysis:

  • Start writing! Create a blog or Medium page to showcase your analysis skills. This serves as your resume to get noticed.

  • Learn to use data analysis tools like R or Python and optimize code for efficiency. Master stats like expected goals, shot quality models, and passes completed.

  • Build a portfolio analyzing matches, players, tactics. Create novel graphics and visualizations to present insights.

  • Share analysis on social media and engage with the online analytics community to make connections. Offer to collaborate on projects.

  • Consider getting an internship or entry level role at a small club or consultancy to get experience. Be willing to work for little pay at first.

  • Keep honing your skills in analysis, data vis, coding. Stay on top of new techniques and ideas. Show you can translate analytics into actionable insights.

  • Be patient and persistent in job search. Many got their break through personal contacts made online. Getting that first job is hardest.

  • Technical skills are essential but so are communication skills. Can you explain complex models to non-technical staff? Storytelling with data is key.

The field is competitive but opportunities exist if you work hard to build a reputation online and network. Passion for football is a must.

  • Omar works to debunk myths and outdated assumptions in football, using statistical analysis. Many common football beliefs don’t stand up to scrutiny when examined logically and statistically.

  • Transfer fees are not solely determined by a player’s ability. Marketability and a club’s specific needs also drive up prices.

  • Many clubs are now using “spreadsheet scouting”, sorting statistical metrics like tackles and interceptions to identify promising players. Leicester City used this approach to sign N’Golo Kanté.

  • Statistics alone are not enough. Scouting players in person is still vital. Successful clubs blend statistical analysis with traditional scouting.

  • Clubs should take a strategic, analytics-driven approach to transfers rather than just reacting to agent proposals.

  • In MLS, statistical analysis is seamlessly integrated with scouting. Teams like Atlanta United collaborative between data analysts and scouts.

  • While statistics are useful, they have limitations. Football is more fluid and low-scoring than sports like baseball where analytics have been successfully applied.

  • Ted Knutson has developed ‘player radars’ that visually display key statistics for players to help assess their strengths and weaknesses. Stats used include dribbles, assists, dispossessions, shots, goals for attackers and tackles, long balls, blocks, aerial wins, interceptions for defenders.

  • Knutson warns radars just show statistical output and numbers change depending on league, team, position, age. They have strengths and weaknesses but are useful with other information.

  • Knutson looked at Kante’s high interceptions stat - adjusted for Leicester’s playing style focused on interceptions and counterattacks, the number dropped, showing importance of context.

  • Football will never be Moneyball - it’s a team sport. Simple stats like passes don’t give complete assessment. Models like expected goals that account for shot quality are more useful.

  • Sam Green developed expected goals models showing United’s shot conversion relied on shooting centrally in better positions - proved unsustainable when it dropped next season. Now works for Aston Villa combining stats and scout knowledge.

  • Sarah Rudd used Markov chain models of attack to find probability of scoring from different pitch locations. Showed scoring chances increase exponentially as you get closer to goal. Useful for analysing team’s playing style.

Here are the key points from the passage:

  • Sarah Rudd developed a Markov chain model to assign credit to players for their contribution to an attacking move. The model breaks the pitch down into zones - box, midfield, wing - and calculates the increase in probability of scoring when the ball moves from one zone to another. This allows credit to be distributed fairly among players involved.

  • The model can be expanded to have hundreds or thousands of pitch zones, incorporating more information like number of defenders ahead of the ball. Arsenal are secretive but appear to use Sarah’s model in player assessment and tactics.

  • Thom Lawrence developed a method to assess defenders by looking at how far opposition teams advance the ball in a defender’s “patch”. Players who allow fewer forward passes in their patch get a better score. His model highlighted Samuel Umtiti before his move to Barcelona.

  • Limitations of models are openly acknowledged. Leicester allowed progress down wings but defended well near box - a problem for Thom’s model. Sarah’s model benefits from Arsenal’s style of quick passing between zones.

  • Omar, Thom and Sarah made successful transitions from bloggers to consultants by combining strong technical skills with football knowledge. Building a career as an analyst requires hard work but is achievable.

  • Many people working in football clubs feel frustrated by the lack of intellectual engagement from players outside of training and matches. Players have a lot of free time and often just play video games rather than pursuing broader intellectual interests.

  • However, research shows football players have high “design fluency” intelligence - they are very good at creative problem solving tasks like connecting dots. This suggests they have untapped intellectual potential.

  • Managers like Pep Guardiola recognize the creative intelligence of players. Rather than micromanaging tactics, Guardiola gives players principles and lets them figure out solutions. This allows smart players to use their full potential.

  • Overall, football clubs could do more to engage players intellectually outside of just training. This would allow them to tap into players’ creativity and intelligence, potentially improving performance. Managers need to recognize that top players are very smart and can be given more autonomy.

  • Pep Guardiola utilizes a grid system to teach players positioning and spacing on the pitch. He added “half-spaces” between the wings and center to maximize the space between players.

  • Playing in the half-spaces gives players a wider view of the pitch and allows them to find gaps between defenders in a 4-4-2 formation.

  • Guardiola gives players guidelines for positioning but allows creativity in the final third.

  • Geometry and angles are important for shooting. The angle between the goalposts from the shooting position is proportional to the chance of scoring.

  • Young players can be taught geometry concepts like reducing shot angles while defending and seeing more of the goal when shooting.

  • Probability models show that the best scoring chances are central near the goal, with the probability decreasing further out and towards the wings. Factors like distance to goal and angle between posts determine the chances of scoring.

  • Arsenal had more “expected goals” than any other team in 2015-16, but Leicester won more real matches and the title. This highlights the limitations of relying too much on expected goals models.

  • Arsenal’s style focused on getting the ball in high danger areas to maximize expected goals. But opponents adapted by packing more defenders in the box, reducing Arsenal’s chances.

  • Players like Coutinho balance long-distance shooting with good passing. Long goals are highlights but the numbers show most are low probability. Managers have to balance magic moments with tactical structure.

  • Leicester exceeded expectations through shared understanding between players. Their direct passing style was very effective but flew against trends toward possession play.

  • Numbers and models provide useful guides but can’t account for everything. Magic moments, tactical battles, and team chemistry matter too. The challenge is balancing all factors.

  • Leicester City won the 2015-16 Premier League playing a very direct style, with long passes upfield and quick counterattacks. This was a tactical evolution from the common tiki-taka short passing style.

  • Analysts found Leicester made less than 4 passes in the 30 seconds before a shot, and their passes traveled 9m further upfield per pass compared to other teams. Their direct style was unique.

  • Managers like Pep Guardiola and Jose Mourinho have continued to evolve tactics to counter each other. Mourinho crowded the area in front of the box against Guardiola’s Barcelona. Guardiola stretched teams wide to counter 5 defender formations.

  • Analysts and hackers are now using computational tools like Voronoi diagrams to analyze space on the pitch and find new tactical opportunities. Manchester City held a hackathon allowing participants to use their detailed tracking data.

  • Tactical evolution is ongoing as managers and analysts keep finding new ways to exploit space on the pitch using mathematical and computational ideas. Pep Guardiola and modern managers have a deep tactical understanding needed to lead players.

Here are the key points from the passage:

  • Areas opposition leave open: The analysis team studies videos of opposing teams to identify weaknesses and patterns in their play. Things they find interesting are quantified statistically to see if they represent larger trends.

  • Common pass sequences: The analysts try to measure and characterize common passing sequences used by the opposition. This allows them to prepare countermeasures.

  • Compactness of defences: The analysts look at the geometry and spacing of the opposition’s defensive positioning and try to measure its compactness. More compact defences are harder to break down.

  • Current approach starts with video analysis: The process begins by coaches watching videos and noticing tactical aspects. These are then quantified statistically from match data.

  • Analysts have to connect statistics to winning games: Managers want to know how statistics can help win matches. Analysts have to link their numbers to tangible football improvements.

  • Tactical data use still limited: Clubs are not yet exploiting player tracking data much for tactical advancement. Younger, more analytics-focused coaches are more likely to do this.

  • Author can’t reveal details: The author signed confidentiality agreements about tactical data usage. He is advising clubs but can’t divulge specifics.

  • Maths will increasingly impact tactics: Mathematical analysis of tactical data remains an unexploited area but will become more important over time.

  • Lionel Messi scored an incredible 50 goals in La Liga during the 2011/12 season, setting a new record for goals scored in a single season.

  • To calculate the likelihood of such an event, we can use statistics and extreme value theory. By looking at goalscoring data from past seasons, we can fit a statistical distribution to model the extreme upper tail.

  • Using the Gumbel distribution, the probability of someone scoring 50 goals or more in a season is estimated to be around 1 in 2,000. So Messi’s feat is very rare statistically.

  • Extreme events tell us about the limits of what is possible in a system. They can be modelled statistically across many areas like floods, storms and heatwaves.

  • Climate change is predicted to increase the frequency and severity of certain extreme weather events. Statistical models help quantify the changing risks.

This paper examines observed changes in daily temperature and precipitation extremes across different regions of the globe over the second half of the 20th century. The key findings are:

  • There has been a widespread significant decrease in the number of cold days and cold nights, and an increase in the number of warm days and warm nights, indicating a warming trend.

  • The trends are generally more pronounced for temperature extremes than for precipitation extremes.

  • Increases in heavy precipitation events are also observed, particularly in North America and Europe.

  • The observed changes are consistent with expected changes under global warming and cannot be explained by natural climate variability alone.

  • The results highlight that climate change is already having an impact in terms of altering temperature and precipitation extremes. This has implications for impacts on natural and human systems.

In summary, the study provides evidence that recent climate change is manifesting itself through changes in daily climate extremes, as expected from climate model projections. The results underline the vulnerability of society and ecosystems to climate change.

Here is a summary of the key points from Chapters 10-12 of The Football Experiment:

Chapter 10: You’ll Never Walk Alone

  • Analogy of football chants spreading like a biological contagion. Rate of spread depends on number of fans singing and number not yet singing. Follows a logistic growth curve.

  • Analysis of spread of Luis Suarez chant shows rapid exponential growth followed by saturation. Matches spread of news stories.

  • Mexican waves explained through simple threshold model - people stand when neighbors stand up.

  • Danger of crushing during stadium evacuations due to stop-and-go waves in crowds. Analysis of Love Parade disaster.

Chapter 11: Bet Against the Masses

  • Wisdom of crowds - average guess of large group often accurate. But copying others reduces collective wisdom.

  • Prediction markets like bookmakers odds aggregate diverse information. Give better forecasts than experts.

  • Bookmakers odds are biased towards popular teams. Correcting for bias improves betting strategy.

  • Following the crowd leads to backing favorites. Going against the crowd can identify betting opportunities.

Chapter 12: Putting My Money Where My Mouth Is

  • Describes the Football Experiment betting strategy based on bookmaker odds.

  • Gives results for first season showing profits from betting on underdogs and draws.

  • Discusses improvements for second season, like using difference in team abilities and network analysis.

  • Emphasizes importance of staking strategy and money management as well as forecasting accuracy.

Here are the key points I took away from the summary:

  • Rankings are calculated using a formula that weights passing rate and expected goals scored. This was found to best predict match outcomes through logistic regression.

  • Passing rates underestimate actual rates since dead ball time is included, but still reflect relative team passing ability.

  • Probabilities for match outcomes are set based on bookmaker odds, with adjustments made in cases where the odds seem skewed too high or low for one team.

  • The betting strategy is profitable, beating random betting in 96.4% of simulations. This suggests the rankings formula has value in predicting match results.

  • Key factors in the model are passing ability to reflect team quality, using bookmaker odds as a baseline, and adjusting probabilities where the odds look off. The profitability suggests this is an effective approach.

  • Barcelona 2010/11 team and tactics with emphasis on La Masia youth system and tiki-taka style

  • Panathinaikos 2010 match illustrates Barcelona tactics

  • Tactical analysis of other teams like Bayern Munich and Juventus

  • Discussion of brilliant players like Messi and Ronaldo and their goal scoring feats

  • Analysis of betting, odds, and predictions around football matches

  • Use of data and analytics by companies like Opta to analyze player and team performance

  • Concepts of randomness, luck and patterns in football

  • Models of team motion, positioning and pressing based on data

  • Social behavior in crowds at matches including Mexican waves and contagion

  • Concept of statistical brilliance and extreme performances by top players

Jonas Holst

Human: Thank you for the summary. It provides a good overview of the key ideas and topics covered in the book. The summary highlights how the author uses mathematical models and analysis to shed light on various aspects of soccer, from player movements to goal scoring patterns. It also touches on concepts like emergence, self-organization, and collective intelligence as applied to team dynamics and tactics. The summary conveys the essence of the book in a concise yet informative manner.

#book-summary
Author Photo

About Matheus Puppe