Resources

This page provides a selection of my favorite soccer analytics resources, including open datasets, Python libraries, career advice blog posts, books and other learning resources.

Datasets

Open-source Python libraries such as kloppy, socceraction, floodlight and statsbombpy provide functionality to load and process these open datasets.

Python libraries

PySport maintains an extensive overview of open-source Python projects for soccer. The following list includes my favorite Python libraries.

  • kloppy to load event and tracking data from multiple data providers, and to transform the data into a standardized representation that facilitates subsequent analysis.
  • socceraction to load event data from multiple data providers, to transform the data into the SPADL or Atomic-SPADL unified representation, and to value the individual actions performed by soccer players using VAEP, Atomic-VAEP or Expected Threat (xT).
  • floodlight to load event and tracking data from multiple data providers, and to compute geometric and kinematic metrics such as the stretch index and metabolic power metrics.
  • statsbombpy to load StatsBomb event data into a pandas DataFrame representation.
  • soccerdata to gather data from websites such as Club Elo, FBref, SoFIFA and WhoScored.
  • scraperfc to gather data from websites such as SofaScore, Transfermarkt and Understat.
  • soccer_xg to train and analyze expected-goals models.
  • penaltyblog to estimate team abilities and predict match outcomes using statistical models.
  • mplsoccer to produce different types of soccer visualizations.
  • plottable to produce visually appealing data tables.
  • football-data-analytics to perform different types of player, team and match analyses.

Career advice

Book recommendations

The following two books arguably have helped shape my thoughts on soccer analytics the most.

Soccer analytics

Soccer tactics

Basketball

Other sports

Beyond sports

Learning resources