class soccerdata.MatchHistory(leagues=None, seasons=None, proxy=None, no_cache=False, no_store=False, data_dir=PosixPath('/home/docs/soccerdata/data/MatchHistory'))#

Provides pd.DataFrames from CSV files available at

Data will be downloaded as necessary and cached locally in ~/soccerdata/data/MatchHistory.

  • leagues (string or iterable) – IDs of leagues to include.

  • seasons (string, int or list) – Seasons to include. Supports multiple formats. Examples: ‘16-17’; 2016; ‘2016-17’; [14, 15, 16]

  • proxy ('tor' or dict or list(dict) or callable, optional) –

    Use a proxy to hide your IP address. Valid options are:
    • ”tor”: Uses the Tor network. Tor should be running in the background on port 9050.

    • dict: A dictionary with the proxy to use. The dict should be a mapping of supported protocols to proxy addresses. For example:

          'http': '',
          'https': '',
    • list(dict): A list of proxies to choose from. A different proxy will be selected from this list after failed requests, allowing rotating proxies.

    • callable: A function that returns a valid proxy. This function will be called after failed requests, allowing rotating proxies.

  • no_cache (bool) – If True, will not use cached data.

  • no_store (bool) – If True, will not store downloaded data.

  • data_dir (Path, optional) – Path to directory where data will be cached.

property seasons: List[str]#

Return a list of selected seasons.


Retrieve game history for the selected leagues and seasons.

Column names are explained here:

Return type:


classmethod available_leagues()#

Return a list of league IDs available for this source.

Return type:


get(url, filepath=None, max_age=None, no_cache=False, var=None)#

Load data from url.

By default, the source of url is downloaded and saved to filepath. If filepath exists, the url is not visited and the cached data is returned.

  • url (str) – URL to download.

  • filepath (Path, optional) – Path to save downloaded file. If None, downloaded data is not cached.

  • max_age (int for age in days, or timedelta object) – The max. age of locally cached file before re-download.

  • no_cache (bool) – If True, will not use cached data. Overrides the class property.

  • var (str or list of str, optional) – Return a JavaScript variable instead of the page source.


TypeError – If max_age is not an integer or timedelta object.


File-like object of downloaded data.

Return type:


property leagues: List[str]#

Return a list of selected leagues.