Efficiently get Fotmob match IDs with worldfootballR
worldfootballR recently including a wrapper for extracting Fotmob data, I thought it might be a good time to write a small post on how to get historical match IDs fairly efficiently.
Most of the fotmob functions need a match ID (or a series of IDs), so getting those is absolutely vital, but can include a few steps.
This guide will show a few of those methods.
First, the current season
If it’s just the current season we want match IDs for, then the below code (adapted from Tony’s code in the package vignette) will make that easy.
fotmob_get_league_matches(), passing the
league_name and unnesting the nested data frame,leavues us with a table of all EPL matches for the current season.
league_matches <- fotmob_get_league_matches( country = c("ENG"), league_name = c("Premier League") ) %>% dplyr::select(match_id = id, home, away) %>% tidyr::unnest_wider(c(home, away), names_sep = "_")
From here, we could simply get all the match IDs for the current season contained in the
league_matches data frame, however this will include matches not yet played (postponed matches, matches not yet scheduled, etc). These unplayed match IDs could cause headaches when used in the match data functions, so we want to remove those:
league_match_ids <- league_matches %>% # filter for records where the home_score isn't missing dplyr::filter(!is.na(home_score)) %>% # then pull out the match IDs: dplyr::pull(match_id) head(league_match_ids)
##  "3609929" "3609934" "3609930" "3609931" "3609932" "3609933"
From here, the world is your oyster (for the current season only) and you can now use the match IDs vector
league_match_ids with the match-level functions outlined in the vignette here
Ok, but I’m here for historical Match IDs
So the above section was easy to get current season match IDs for a league, but what about if we wanted to retrieve the match IDs for a previous season? Well that’s a bit trickier…
One option would be to pass in all dates in a calendar year to
fotmob_get_matches_by_date(), then filter for the league you want and get all match IDs that way, but this will be inefficient as you will be getting matches for ALL leagues played on every date.
Instead, we can fairly quickly get match dates from FBref using
get_match_results() and then use these dates to get fotmob match data.
Below, we’re going to get match dates played in the Australian A-League Men’s competition for the 2020-21 season.
# get dates A-League games are played - this is easiest done using worldfootballR aleague <- get_match_results(country = "AUS", gender = "M", season_end_year = 2021) aleague_dates <- aleague %>% dplyr::filter(!is.na(HomeGoals)) %>% dplyr::pull(Date) %>% unique() head(aleague_dates)
##  "2020-12-28" "2020-12-29" "2020-12-30" "2020-12-31" "2021-01-02" ##  "2021-01-03"
Ok now that we have some dates, we can go ahead and get the data we need.
Note: this will take a few minutes (and more depending on how many dates atches are played on) - effectively you’re reading in every date summary page from fotmob
The result of the below will be a data set with all leagues metadata with matches played on the dates used.
# get match IDs from fotmob match_id_df <- fotmob_get_matches_by_date(aleague_dates)
From there, we go and filter for the league we want and again, we filter out any matches that haven’t been played.
The league ID (
primary_id) can be found in the League URL on the fotmob site, so replace
113 with the relevant league id you’re after.
required_match_id <- match_id_df %>% # select the league you need dplyr::filter(primary_id == 113) %>% # select the columns/nested df we want dplyr::select(primary_id, ccode, league_name = name, matches) %>% # Unnest the nested df of matches tidyr::unnest_longer(matches) %>% # only keep the matches actually played dplyr::filter(matches$status$started)
Then we get a vector of unique match IDs and now the world really is your oyster. These can then be used for the match-level data functions outlined in the vignette
# get a unique vector of all match IDs on fotmob match_ids <- required_match_id$matches$id %>% unique() head(match_ids)
##  3488733 3488760 3488856 3488741 3488749 3488738