Resources
This page provides a selection of my favorite soccer analytics resources, including open datasets, Python libraries, career advice blog posts, books and other learning resources.
Datasets
- The StatsBomb Open Data repository contains StatsBomb event data and 360 data for a number of domestic leagues and international tournaments, including the UEFA Men’s EURO 2024, the CONMEBOL Men’s Copa América 2024, the CAF Men’s African Cup of Nations 2023, FIFA Men’s FIFA World Cup 2022, the UEFA Women’s EURO 2022, the 2015/2016 season in the top divisions of England, Spain, Germany, Italy and France, all Lionel Messi club matches, and Bayer Leverkusen’s invincible Bundesliga title win.
- The Wyscout Match Event Dataset contains Wyscout event data for the FIFA Men’s FIFA World Cup 2018, the UEFA Men’s EURO 2016, and the 2017/2018 season in the top divisions of England, Spain, Germany, Italy and France. The dataset is accompanied by a research paper that provides further details on the content of the dataset.
- The SkillCorner Open Data repository contains SkillCorner broadcast tracking data for nine matches between the champions and runners-up in the 2019/2020 season in the top divisions of England, Spain, Germany, Italy and France.
- The Metrica Sports Sample Data contains Metrica Sports broadcast tracking data and corresponding event data for three anonymized matches.
Open-source Python libraries such as kloppy
, socceraction
, floodlight
and statsbombpy
provide functionality to load and process these open datasets.
Python libraries
PySport maintains an extensive overview of open-source Python projects for soccer. The following list includes my favorite Python libraries.
kloppy
to load event and tracking data from multiple data providers, and to transform the data into a standardized representation that facilitates subsequent analysis.socceraction
to load event data from multiple data providers, to transform the data into the SPADL or Atomic-SPADL unified representation, and to value the individual actions performed by soccer players using VAEP, Atomic-VAEP or Expected Threat (xT).floodlight
to load event and tracking data from multiple data providers, and to compute geometric and kinematic metrics such as the stretch index and metabolic power metrics.statsbombpy
to load StatsBomb event data into apandas
DataFrame representation.soccerdata
to gather data from websites such as Club Elo, FBref, SoFIFA and WhoScored.scraperfc
to gather data from websites such as SofaScore, Transfermarkt and Understat.soccer_xg
to train and analyze expected-goals models.penaltyblog
to estimate team abilities and predict match outcomes using statistical models.mplsoccer
to produce different types of soccer visualizations.plottable
to produce visually appealing data tables.football-data-analytics
to perform different types of player, team and match analyses.
Career advice
- David Sumpter discussed potential pathways into soccer analytics with Javier Fernández, Sudarshan Gopaladesikan, Pascal Bauer and Fran Peralta in the Friend of Tracking episode Advice for Anyone Who Wants to Become a Football Data Scientist in March 2020.
- Sam Gregory published a Getting Into Sports Analytics blog post in November 2017 and a follow-up Getting Into Sports Analytics 2.0 blog post in January 2020.
- Benoit Pimpaud published a mini blog series in 2022: A Career in Football Analytics, The What, A Career in Football Analytics, The How and A Career in Football Analytics, The Reality.
- Liam Henshaw published a How to Get Started in Data and the Football Industry blog post in April 2022 and a follow-up How to Become a Football Analyst blog post in October 2024.
Book recommendations
The following two books arguably have helped shape my thoughts on soccer analytics the most.
- Moneyball: The Art of Winning an Unfair Game by Michael Lewis.
- The Undoing Project: A Friendship That Changed Our Minds by Michael Lewis.
Soccer analytics
- How to Win the Premier League: The Inside Story of Football’s Data Revolution by Ian Graham.
- Data Game: The Story of Liverpool FC’s Analytics Revolution by Josh Williams.
- Soccernomics (2022 World Cup Edition): Why European Men and American Women Win and Billionaire Owners Are Destined to Lose by Simon Kuper and Stefan Szymanski.
- Net Gains: Inside the Beautiful Game’s Analytics Revolution by Ryan O’Hanlon.
- Expected Goals: The Story of How Data Conquered Football and Changed the Game Forever by Rory Smith.
- Football Hackers: The Science and Art of a Data Revolution by Christoph Biermann.
- Soccermatics: Mathematical Adventures in the Beautiful Game by David Sumpter.
- The Numbers Game: Why Everything You Know About Soccer Is Wrong by Chris Anderson and David Sally.
Soccer tactics
- Zonal Marking: From Ajax to Zidane, the Making of Modern Soccer by Michael Cox.
- The Mixer: The Story of Premier League Tactics, from Route One to False Nines by Michael Cox.
- Inverting The Pyramid: The History of Soccer Tactics by Jonathan Wilson.
Basketball
- Sprawlball: A Visual Tour of the New Era of the NBA by Kirk Goldsberry.
- Basketball on Paper: Rules and Tools for Performance Analysis by Dean Oliver.
Other sports
- Sports Analytics: A Guide for Coaches, Managers, and Other Decision Makers by Benjamin Alamar and Dean Oliver.
- Footballistics by James Coventry.
Beyond sports
- Storytelling with Data: A Data Visualization Guide for Business Professionals by Cole Nussbaumer Knaflic.
- Superforecasting: The Art and Science of Prediction by Philip Tetlock and Dan Gardner.
- The Signal and the Noise: Why So Many Predictions Fail but Some Don’t by Nate Silver.
- Thinking, Fast and Slow by Daniel Kahneman.
- Predictably Irrational, Revised and Expanded Edition: The Hidden Forces That Shape Our Decisions by Dan Ariely
- Trees, Maps, and Theorems Effective Communication for Rational Minds by Jean-luc Doumont.
Learning resources
- My Soccer Analytics Review blog posts from 2023, 2022, 2021 and 2020.
- Soccer Analytics Handbook by Devin Pleuler.
- Edd Webster Football Analytics by Edd Webster.
- Guide to Sports Analytics by Dominic Samangy.
- Sports Analytics Bibliography by Scott Nestler.
- Sports Analytics Data Sources by Scott Nestler.