This content originally appeared on Level Up Coding - Medium and was authored by Paul Corcoran
We love football data and fbref is one of the best open sources out there!
We are going to scrape the league tables and some other shooting statistics just to get a feel for what is possible. fbref uses html tables which is a key requirement for what we are doing today. As always the full code is here: https://github.com/PaulyCorcoran/Medium_Football/blob/main/fb%20ref.ipynb
The first step is to identify what table we want to scrape. For today’s article, I am choosing to scrape the league table for la liga located here https://fbref.com/en/comps/12/La-Liga-Stats.
The only library we need to scrape html tables is pandas. Import the library and set the target url into the pd.read_html() function.
import pandas as pd
df = pd.read_html('https://fbref.com/en/comps/12/La-Liga-Stats')
If you allow yourself to print this df it will be very messy and not in the correct format. However, we can clean this up for extraction using a simple for loop.
for idx,table in enumerate(df):
print(“***************************”)
print(idx)
print(table)
Here we are asking python to loop through the df and print the index and table it possesses. As you can see below the index starts at [0] in python and the table has been printed after. We want to extract the table at index[0] this can be done by simply slicing the df[0].
Which leaves us with the extracted table…. cool huh?
The rest of the notebook goes through another la liga table extraction. When scraping fbref html tables the resulting df can be multilayered columns which will have an affect on any analysis or plotting. We can drop the multilayer by using df.columns.droplevel(). Work through the provided notebook to see an example.
Lastly, I pulled up the shooting stats for the league and filtered on the top 4 and plotted. This is a brief example of what can be done with football statistics scraped from fb ref. Hope you enjoy playing! Below, I have plotted the SoT figures for the current top 4. We can gain interesting insights by plotting. Sevilla are much less successful at achieving Shots on target!
Quickly and easily Scrape FBREF using just Pandas was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.
This content originally appeared on Level Up Coding - Medium and was authored by Paul Corcoran
Paul Corcoran | Sciencx (2022-04-01T13:22:42+00:00) Quickly and easily Scrape FBREF using just Pandas. Retrieved from https://www.scien.cx/2022/04/01/quickly-and-easily-scrape-fbref-using-just-pandas/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.