[3]:
import soccerdata as sd
WhoScored#
[4]:
ws = sd.WhoScored(leagues="ENG-Premier League", seasons=2021)
print(ws.__doc__)
/cw/dtaijupiter/NoCsBack/dtai/pieterr/Projects/soccerdata/soccerdata/_common.py:462: UserWarning: Season id "2021" is ambiguous: interpreting as "20-21"
warnings.warn(msg)
Provides pd.DataFrames from data available at http://whoscored.com.
Data will be downloaded as necessary and cached locally in
``~/soccerdata/data/WhoScored``.
Parameters
----------
leagues : string or iterable, optional
IDs of Leagues to include.
seasons : string, int or list, optional
Seasons to include. Supports multiple formats.
Examples: '16-17'; 2016; '2016-17'; [14, 15, 16]
proxy : 'tor' or dict or list(dict) or callable, optional
Use a proxy to hide your IP address. Valid options are:
- "tor": Uses the Tor network. Tor should be running in
the background on port 9050.
- dict: A dictionary with the proxy to use. The dict should be
a mapping of supported protocols to proxy addresses. For example::
{
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
- list(dict): A list of proxies to choose from. A different proxy will
be selected from this list after failed requests, allowing rotating
proxies.
- callable: A function that returns a valid proxy. This function will
be called after failed requests, allowing rotating proxies.
no_cache : bool
If True, will not use cached data.
no_store : bool
If True, will not store downloaded data.
data_dir : Path
Path to directory where data will be cached.
path_to_browser : Path, optional
Path to the Chrome executable.
headless : bool, default: True
If True, will run Chrome in headless mode. Setting this to False might
help to avoid getting blocked.
Game schedule#
[5]:
epl_schedule = ws.read_schedule()
epl_schedule.head()
[5]:
game_id | home_team | away_team | date | url | stage | |||
---|---|---|---|---|---|---|---|---|
league | season | game | ||||||
ENG-Premier League | 2021 | 2020-09-12 Crystal Palace-Southampton | 1485186 | Crystal Palace | Southampton | 2020-09-12 15:00:00 | https://www.whoscored.com/Matches/1485186/Live... | NaN |
2020-09-12 Fulham-Arsenal | 1485187 | Fulham | Arsenal | 2020-09-12 12:30:00 | https://www.whoscored.com/Matches/1485187/Live... | NaN | ||
2020-09-12 Liverpool-Leeds United | 1485188 | Liverpool | Leeds United | 2020-09-12 17:30:00 | https://www.whoscored.com/Matches/1485188/Live... | NaN | ||
2020-09-12 West Ham United-Newcastle United | 1485191 | West Ham United | Newcastle United | 2020-09-12 20:00:00 | https://www.whoscored.com/Matches/1485191/Live... | NaN | ||
2020-09-13 Tottenham-Everton | 1485189 | Tottenham | Everton | 2020-09-13 16:30:00 | https://www.whoscored.com/Matches/1485189/Live... | NaN |
Injured and suspended players#
[6]:
missing_players = ws.read_missing_players(match_id=1485184)
missing_players.head()
[6]:
game_id | player_id | reason | status | |||||
---|---|---|---|---|---|---|---|---|
league | season | game | team | player | ||||
ENG-Premier League | 2021 | 2021-01-12 Burnley-Manchester United | Burnley | Charlie Taylor | 1485184 | 107462 | injured doubtful | Doubtful |
Dwight McNeil | 1485184 | 357427 | injured doubtful | Doubtful | ||||
Jay Rodriguez | 1485184 | 33891 | injured doubtful | Doubtful | ||||
Jimmy Dunne | 1485184 | 366743 | injured doubtful | Doubtful | ||||
Manchester United | Eric Bailly | 1485184 | 243814 | injured doubtful | Doubtful |
Match event stream data#
[7]:
events = ws.read_events(match_id=1485184)
events.head()
[7]:
period | minute | expanded_minute | type | outcome_type | team | player | qualifiers | x | y | end_x | end_y | goal_mouth_y | goal_mouth_z | is_touch | is_shot | is_goal | related_event_id | related_player_id | blocked_x | blocked_y | card_type | game_id | team_id | player_id | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
league | season | game | id | |||||||||||||||||||||||||
ENG-Premier League | 2021 | 2021-01-12 Burnley-Manchester United | 2253458317 | PreMatch | 0 | 0 | FormationSet | Successful | Burnley | NaN | [{'type': {'displayName': 'TeamPlayerFormation... | 0.0 | 0.0 | NaN | NaN | NaN | NaN | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1485184 | 184 | NaN |
2253458375 | PreMatch | 0 | 0 | FormationSet | Successful | Man Utd | NaN | [{'type': {'displayName': 'CaptainPlayerId', '... | 0.0 | 0.0 | NaN | NaN | NaN | NaN | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1485184 | 32 | NaN | |||
2253487469 | FirstHalf | 0 | 0 | Start | Successful | Burnley | NaN | [] | 0.0 | 0.0 | NaN | NaN | NaN | NaN | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1485184 | 184 | NaN | |||
2253487473 | FirstHalf | 0 | 0 | Start | Successful | Man Utd | NaN | [] | 0.0 | 0.0 | NaN | NaN | NaN | NaN | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1485184 | 32 | NaN | |||
2253487625 | FirstHalf | 0 | 0 | Pass | Successful | Burnley | Ashley Westwood | [{'type': {'displayName': 'Angle', 'value': 21... | 50.3 | 50.3 | 30.5 | 50.3 | NaN | NaN | True | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1485184 | 184 | 79050.0 |
Match event stream data can be returned in various formats, which can be selected with the “output_fmt” parameter.
events
(default): Returns a dataframe with all events.raw
: Returns the original unformatted WhoScored JSON.spadl
: Returns a dataframe with the SPADL representation of the original events.atomic-spadl
: Returns a dataframe with the Atomic-SPADL representation of the original events.loader
: Returns asocceration.data.opta.OptaLoader
instance
[12]:
events = ws.read_events(match_id=1485184, output_fmt="raw")
import json
print(json.dumps(events[1485184][0], indent=2))
{
"eventId": 2,
"expandedMinute": 0,
"id": 2253487473,
"isTouch": false,
"minute": 0,
"outcomeType": {
"displayName": "Successful",
"value": 1
},
"period": {
"displayName": "FirstHalf",
"value": 1
},
"qualifiers": [],
"satisfiedEventsTypes": [],
"second": 0,
"teamId": 32,
"type": {
"displayName": "Start",
"value": 32
},
"x": 0,
"y": 0
}
[13]:
actions = ws.read_events(match_id=1485184, output_fmt='spadl')
actions.head()
[13]:
game_id | original_event_id | period_id | time_seconds | team_id | player_id | start_x | end_x | start_y | end_y | type_id | result_id | bodypart_id | action_id | player | team | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1485184 | 2253487625 | 1 | 0.0 | 184 | 79050.0 | 52.815 | 32.025 | 34.204 | 34.204 | 0 | 1 | 0 | 0 | Ashley Westwood | Burnley |
1 | 1485184 | 2253487639 | 1 | 2.0 | 184 | 131464.0 | 31.080 | 38.220 | 36.312 | 15.844 | 0 | 1 | 0 | 1 | James Tarkowski | Burnley |
2 | 1485184 | NaN | 1 | 4.5 | 184 | 80067.0 | 38.220 | 43.365 | 15.844 | 12.512 | 21 | 1 | 0 | 2 | Matthew Lowton | Burnley |
3 | 1485184 | 2253487685 | 1 | 7.0 | 184 | 80067.0 | 43.365 | 90.300 | 12.512 | 49.708 | 0 | 1 | 0 | 3 | Matthew Lowton | Burnley |
4 | 1485184 | 2253487689 | 1 | 11.0 | 184 | 93473.0 | 90.300 | 105.000 | 49.708 | 38.828 | 11 | 0 | 0 | 4 | Robbie Brady | Burnley |
[14]:
atomic_actions = ws.read_events(match_id=1485184, output_fmt='atomic-spadl')
atomic_actions.head()
[14]:
game_id | original_event_id | action_id | period_id | time_seconds | team_id | player_id | x | y | dx | dy | type_id | bodypart_id | player | team | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1485184 | 2253487625 | 0 | 1 | 0.00 | 184 | 79050.0 | 52.815 | 34.204 | -20.790 | 0.000 | 0 | 0 | Ashley Westwood | Burnley |
1 | 1485184 | 2253487625 | 1 | 1 | 1.00 | 184 | 131464.0 | 32.025 | 34.204 | 0.000 | 0.000 | 23 | 0 | James Tarkowski | Burnley |
2 | 1485184 | 2253487639 | 2 | 1 | 2.00 | 184 | 131464.0 | 31.080 | 36.312 | 7.140 | -20.468 | 0 | 0 | James Tarkowski | Burnley |
3 | 1485184 | 2253487639 | 3 | 1 | 3.25 | 184 | 80067.0 | 38.220 | 15.844 | 0.000 | 0.000 | 23 | 0 | Matthew Lowton | Burnley |
4 | 1485184 | NaN | 4 | 1 | 4.50 | 184 | 80067.0 | 38.220 | 15.844 | 5.145 | -3.332 | 21 | 0 | Matthew Lowton | Burnley |
[15]:
# Scrape all games and return a socceration.data.opta.OptaLoader
loader = ws.read_events(output_fmt='loader')
# Now use this loader to load the data
print("Games:")
df_games = loader.games(competition_id="ENG-Premier League", season_id="2021")
display(df_games.head())
print("Teams:")
df_teams = loader.teams(game_id=1485184)
display(df_teams.head())
print("Players:")
df_players = loader.players(game_id=1485184)
display(df_players.head())
print("Events:")
df_events = loader.events(game_id=1485184)
display(df_events.head())
# You can use the socceraction package to convert the events
# to SPADL and to compute xT or VAEP action values
Games:
game_id | season_id | competition_id | game_day | game_date | home_team_id | away_team_id | home_score | away_score | duration | referee | venue | attendance | home_manager | away_manager | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1485494 | 2021 | ENG-Premier League | None | 2021-04-04 12:00:00 | 18 | 184 | 3 | 2 | 98 | Andre Marriner | St. Mary's Stadium | 0 | Ralph Hasenhüttl | Sean Dyche |
1 | 1485300 | 2021 | ENG-Premier League | None | 2020-12-16 20:00:00 | 170 | 211 | 0 | 0 | 95 | Robert Jones | Craven Cottage | 0 | Scott Parker | Graham Potter |
2 | 1485264 | 2021 | ENG-Premier League | None | 2020-12-06 19:15:00 | 26 | 161 | 4 | 0 | 97 | Craig Pawson | Anfield | 2000 | Jürgen Klopp | Nuno Espírito Santo |
3 | 1485519 | 2021 | ENG-Premier League | None | 2021-05-16 16:30:00 | 175 | 26 | 1 | 2 | 102 | Mike Dean | The Hawthorns | 0 | Sam Allardyce | Jürgen Klopp |
4 | 1485436 | 2021 | ENG-Premier League | None | 2021-03-19 20:00:00 | 170 | 19 | 1 | 2 | 100 | David Coote | Craven Cottage | 0 | Scott Parker | Marcelo Bielsa |
Teams:
team_id | team_name | |
---|---|---|
0 | 184 | Burnley |
1 | 32 | Man Utd |
Players:
game_id | team_id | player_id | player_name | is_starter | minutes_played | jersey_number | starting_position | |
---|---|---|---|---|---|---|---|---|
0 | 1485184 | 184 | 105720 | Nick Pope | True | 102 | 1 | GK |
1 | 1485184 | 184 | 80067 | Matthew Lowton | True | 102 | 2 | DR |
2 | 1485184 | 184 | 94935 | Ben Mee | True | 102 | 6 | DC |
3 | 1485184 | 184 | 131464 | James Tarkowski | True | 102 | 5 | DC |
4 | 1485184 | 184 | 24148 | Erik Pieters | True | 102 | 23 | DL |
Events:
game_id | event_id | period_id | team_id | player_id | type_id | timestamp | minute | second | outcome | start_x | start_y | end_x | end_y | qualifiers | related_player_id | touch | shot | goal | type_name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1485184 | 2253487473 | 1 | 32 | NaN | 32 | 2021-01-12 20:15:00 | 0 | 0 | True | 0.0 | 0.0 | 0.0 | 0.0 | {} | NaN | False | False | False | start |
1 | 1485184 | 2253487469 | 1 | 184 | NaN | 32 | 2021-01-12 20:15:00 | 0 | 0 | True | 0.0 | 0.0 | 0.0 | 0.0 | {} | NaN | False | False | False | start |
2 | 1485184 | 2253487625 | 1 | 184 | 79050.0 | 1 | 2021-01-12 20:15:00 | 0 | 0 | True | 50.3 | 50.3 | 30.5 | 50.3 | {213: '3.1', 178: True, 141: '50.3', 212: '20.... | NaN | True | False | False | pass |
3 | 1485184 | 2253487639 | 1 | 184 | 131464.0 | 1 | 2021-01-12 20:15:02 | 0 | 2 | True | 29.6 | 53.4 | 36.4 | 23.3 | {178: True, 213: '5.0', 212: '21.7', 141: '23.... | NaN | True | False | False | pass |
4 | 1485184 | 2253487685 | 1 | 184 | 80067.0 | 1 | 2021-01-12 20:15:07 | 0 | 7 | True | 41.3 | 18.4 | 86.0 | 73.1 | {1: True, 213: '0.7', 56: 'Center', 178: True,... | NaN | True | False | False | pass |
[ ]: