This page was generated from doc/datasources/WhoScored.ipynb.
You can download the notebook,
[3]:
import soccerdata as sd

WhoScored#

[4]:
ws = sd.WhoScored(leagues="ENG-Premier League", seasons=2021)
print(ws.__doc__)
/cw/dtaijupiter/NoCsBack/dtai/pieterr/Projects/soccerdata/soccerdata/_common.py:462: UserWarning: Season id "2021" is ambiguous: interpreting as "20-21"
  warnings.warn(msg)
Provides pd.DataFrames from data available at http://whoscored.com.

    Data will be downloaded as necessary and cached locally in
    ``~/soccerdata/data/WhoScored``.

    Parameters
    ----------
    leagues : string or iterable, optional
        IDs of Leagues to include.
    seasons : string, int or list, optional
        Seasons to include. Supports multiple formats.
        Examples: '16-17'; 2016; '2016-17'; [14, 15, 16]
    proxy : 'tor' or dict or list(dict) or callable, optional
        Use a proxy to hide your IP address. Valid options are:
            - "tor": Uses the Tor network. Tor should be running in
              the background on port 9050.
            - dict: A dictionary with the proxy to use. The dict should be
              a mapping of supported protocols to proxy addresses. For example::

                  {
                      'http': 'http://10.10.1.10:3128',
                      'https': 'http://10.10.1.10:1080',
                  }

            - list(dict): A list of proxies to choose from. A different proxy will
              be selected from this list after failed requests, allowing rotating
              proxies.
            - callable: A function that returns a valid proxy. This function will
              be called after failed requests, allowing rotating proxies.
    no_cache : bool
        If True, will not use cached data.
    no_store : bool
        If True, will not store downloaded data.
    data_dir : Path
        Path to directory where data will be cached.
    path_to_browser : Path, optional
        Path to the Chrome executable.
    headless : bool, default: True
        If True, will run Chrome in headless mode. Setting this to False might
        help to avoid getting blocked.

Game schedule#

[5]:
epl_schedule = ws.read_schedule()
epl_schedule.head()
[5]:
game_id home_team away_team date url stage
league season game
ENG-Premier League 2021 2020-09-12 Crystal Palace-Southampton 1485186 Crystal Palace Southampton 2020-09-12 15:00:00 https://www.whoscored.com/Matches/1485186/Live... NaN
2020-09-12 Fulham-Arsenal 1485187 Fulham Arsenal 2020-09-12 12:30:00 https://www.whoscored.com/Matches/1485187/Live... NaN
2020-09-12 Liverpool-Leeds United 1485188 Liverpool Leeds United 2020-09-12 17:30:00 https://www.whoscored.com/Matches/1485188/Live... NaN
2020-09-12 West Ham United-Newcastle United 1485191 West Ham United Newcastle United 2020-09-12 20:00:00 https://www.whoscored.com/Matches/1485191/Live... NaN
2020-09-13 Tottenham-Everton 1485189 Tottenham Everton 2020-09-13 16:30:00 https://www.whoscored.com/Matches/1485189/Live... NaN

Injured and suspended players#

[6]:
missing_players = ws.read_missing_players(match_id=1485184)
missing_players.head()
[6]:
game_id player_id reason status
league season game team player
ENG-Premier League 2021 2021-01-12 Burnley-Manchester United Burnley Charlie Taylor 1485184 107462 injured doubtful Doubtful
Dwight McNeil 1485184 357427 injured doubtful Doubtful
Jay Rodriguez 1485184 33891 injured doubtful Doubtful
Jimmy Dunne 1485184 366743 injured doubtful Doubtful
Manchester United Eric Bailly 1485184 243814 injured doubtful Doubtful

Match event stream data#

[7]:
events = ws.read_events(match_id=1485184)
events.head()
[7]:
period minute expanded_minute type outcome_type team player qualifiers x y end_x end_y goal_mouth_y goal_mouth_z is_touch is_shot is_goal related_event_id related_player_id blocked_x blocked_y card_type game_id team_id player_id
league season game id
ENG-Premier League 2021 2021-01-12 Burnley-Manchester United 2253458317 PreMatch 0 0 FormationSet Successful Burnley NaN [{'type': {'displayName': 'TeamPlayerFormation... 0.0 0.0 NaN NaN NaN NaN False NaN NaN NaN NaN NaN NaN NaN 1485184 184 NaN
2253458375 PreMatch 0 0 FormationSet Successful Man Utd NaN [{'type': {'displayName': 'CaptainPlayerId', '... 0.0 0.0 NaN NaN NaN NaN False NaN NaN NaN NaN NaN NaN NaN 1485184 32 NaN
2253487469 FirstHalf 0 0 Start Successful Burnley NaN [] 0.0 0.0 NaN NaN NaN NaN False NaN NaN NaN NaN NaN NaN NaN 1485184 184 NaN
2253487473 FirstHalf 0 0 Start Successful Man Utd NaN [] 0.0 0.0 NaN NaN NaN NaN False NaN NaN NaN NaN NaN NaN NaN 1485184 32 NaN
2253487625 FirstHalf 0 0 Pass Successful Burnley Ashley Westwood [{'type': {'displayName': 'Angle', 'value': 21... 50.3 50.3 30.5 50.3 NaN NaN True NaN NaN NaN NaN NaN NaN NaN 1485184 184 79050.0

Match event stream data can be returned in various formats, which can be selected with the “output_fmt” parameter.

  • events (default): Returns a dataframe with all events.

  • raw: Returns the original unformatted WhoScored JSON.

  • spadl: Returns a dataframe with the SPADL representation of the original events.

  • atomic-spadl: Returns a dataframe with the Atomic-SPADL representation of the original events.

  • loader: Returns a socceration.data.opta.OptaLoader instance

[12]:
events = ws.read_events(match_id=1485184, output_fmt="raw")

import json
print(json.dumps(events[1485184][0], indent=2))
{
  "eventId": 2,
  "expandedMinute": 0,
  "id": 2253487473,
  "isTouch": false,
  "minute": 0,
  "outcomeType": {
    "displayName": "Successful",
    "value": 1
  },
  "period": {
    "displayName": "FirstHalf",
    "value": 1
  },
  "qualifiers": [],
  "satisfiedEventsTypes": [],
  "second": 0,
  "teamId": 32,
  "type": {
    "displayName": "Start",
    "value": 32
  },
  "x": 0,
  "y": 0
}
[13]:
actions = ws.read_events(match_id=1485184, output_fmt='spadl')
actions.head()
[13]:
game_id original_event_id period_id time_seconds team_id player_id start_x end_x start_y end_y type_id result_id bodypart_id action_id player team
0 1485184 2253487625 1 0.0 184 79050.0 52.815 32.025 34.204 34.204 0 1 0 0 Ashley Westwood Burnley
1 1485184 2253487639 1 2.0 184 131464.0 31.080 38.220 36.312 15.844 0 1 0 1 James Tarkowski Burnley
2 1485184 NaN 1 4.5 184 80067.0 38.220 43.365 15.844 12.512 21 1 0 2 Matthew Lowton Burnley
3 1485184 2253487685 1 7.0 184 80067.0 43.365 90.300 12.512 49.708 0 1 0 3 Matthew Lowton Burnley
4 1485184 2253487689 1 11.0 184 93473.0 90.300 105.000 49.708 38.828 11 0 0 4 Robbie Brady Burnley
[14]:
atomic_actions = ws.read_events(match_id=1485184, output_fmt='atomic-spadl')
atomic_actions.head()
[14]:
game_id original_event_id action_id period_id time_seconds team_id player_id x y dx dy type_id bodypart_id player team
0 1485184 2253487625 0 1 0.00 184 79050.0 52.815 34.204 -20.790 0.000 0 0 Ashley Westwood Burnley
1 1485184 2253487625 1 1 1.00 184 131464.0 32.025 34.204 0.000 0.000 23 0 James Tarkowski Burnley
2 1485184 2253487639 2 1 2.00 184 131464.0 31.080 36.312 7.140 -20.468 0 0 James Tarkowski Burnley
3 1485184 2253487639 3 1 3.25 184 80067.0 38.220 15.844 0.000 0.000 23 0 Matthew Lowton Burnley
4 1485184 NaN 4 1 4.50 184 80067.0 38.220 15.844 5.145 -3.332 21 0 Matthew Lowton Burnley
[15]:
# Scrape all games and return a socceration.data.opta.OptaLoader
loader = ws.read_events(output_fmt='loader')

# Now use this loader to load the data
print("Games:")
df_games = loader.games(competition_id="ENG-Premier League", season_id="2021")
display(df_games.head())

print("Teams:")
df_teams = loader.teams(game_id=1485184)
display(df_teams.head())

print("Players:")
df_players = loader.players(game_id=1485184)
display(df_players.head())

print("Events:")
df_events = loader.events(game_id=1485184)
display(df_events.head())

# You can use the socceraction package to convert the events
# to SPADL and to compute xT or VAEP action values
Games:
game_id season_id competition_id game_day game_date home_team_id away_team_id home_score away_score duration referee venue attendance home_manager away_manager
0 1485494 2021 ENG-Premier League None 2021-04-04 12:00:00 18 184 3 2 98 Andre Marriner St. Mary's Stadium 0 Ralph Hasenhüttl Sean Dyche
1 1485300 2021 ENG-Premier League None 2020-12-16 20:00:00 170 211 0 0 95 Robert Jones Craven Cottage 0 Scott Parker Graham Potter
2 1485264 2021 ENG-Premier League None 2020-12-06 19:15:00 26 161 4 0 97 Craig Pawson Anfield 2000 Jürgen Klopp Nuno Espírito Santo
3 1485519 2021 ENG-Premier League None 2021-05-16 16:30:00 175 26 1 2 102 Mike Dean The Hawthorns 0 Sam Allardyce Jürgen Klopp
4 1485436 2021 ENG-Premier League None 2021-03-19 20:00:00 170 19 1 2 100 David Coote Craven Cottage 0 Scott Parker Marcelo Bielsa
Teams:
team_id team_name
0 184 Burnley
1 32 Man Utd
Players:
game_id team_id player_id player_name is_starter minutes_played jersey_number starting_position
0 1485184 184 105720 Nick Pope True 102 1 GK
1 1485184 184 80067 Matthew Lowton True 102 2 DR
2 1485184 184 94935 Ben Mee True 102 6 DC
3 1485184 184 131464 James Tarkowski True 102 5 DC
4 1485184 184 24148 Erik Pieters True 102 23 DL
Events:
game_id event_id period_id team_id player_id type_id timestamp minute second outcome start_x start_y end_x end_y qualifiers related_player_id touch shot goal type_name
0 1485184 2253487473 1 32 NaN 32 2021-01-12 20:15:00 0 0 True 0.0 0.0 0.0 0.0 {} NaN False False False start
1 1485184 2253487469 1 184 NaN 32 2021-01-12 20:15:00 0 0 True 0.0 0.0 0.0 0.0 {} NaN False False False start
2 1485184 2253487625 1 184 79050.0 1 2021-01-12 20:15:00 0 0 True 50.3 50.3 30.5 50.3 {213: '3.1', 178: True, 141: '50.3', 212: '20.... NaN True False False pass
3 1485184 2253487639 1 184 131464.0 1 2021-01-12 20:15:02 0 2 True 29.6 53.4 36.4 23.3 {178: True, 213: '5.0', 212: '21.7', 141: '23.... NaN True False False pass
4 1485184 2253487685 1 184 80067.0 1 2021-01-12 20:15:07 0 7 True 41.3 18.4 86.0 73.1 {1: True, 213: '0.7', 56: 'Center', 178: True,... NaN True False False pass
[ ]: