All-Time-Premier-League-Player-Statistics

From MaRDI portal
Dataset:6036645



OpenML43548MaRDI QIDQ6036645

OpenML dataset with id 43548

No author found.

Full work available at URL: https://api.openml.org/data/v1/download/22102373/All-Time-Premier-League-Player-Statistics.arff

Upload date: 23 March 2022
Copyright license: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International



Dataset Characteristics

Number of features: 59 (numeric: 52, symbolic: 0 and in total binary: 0 )
Number of instances: 571
Number of instances with missing values: 571
Number of missing values: 10,224

Context I am a really huge football fan and the Premier League is one of my favourite football (or soccer, whatever you like to call it) leagues. So, as my very first dataset, I thought this would be a great opportunity for me to make a dataset of player statistics of all seasons from the Premier League. The Premier League, often referred to as the English Premier League or the EPL outside England, is the top level of the English football league system. Contested by 20 clubs, it operates on a system of promotion and relegation with the English Football League (EFL). Contested by 20 clubs, it operates on a system of promotion and relegation with the English Football League. Home to some of the most famous clubs, players, managers and stadiums in world football, the Premier League is the most-watched league on the planet with one billion homes watching the action in 188 countries.The league takes place between August and May and involves the teams playing each other home and away across the season, a total of 380 matches. Three points are awarded for a win, one point for a draw and none for a defeat, with the team with the most points at the end of the season winning the Premier League title. The teams that finish in the bottom three of the league table at the end of the campaign are relegated to the Championship, the second tier of English football. Those teams are replaced by three clubs promoted from the Championship; the sides that finish in first and second place and the third via the end-of-season playoffs. Details about the dataset

Some players of certain position may not have certain statistics - For example, A goalkeeper may not have a statistic for "Shot Accuracy" The format for the filename is - dataset - yyyy-mm-dd Date (The date is date when the file was last updated on)

Content The data was acquired from: https://www.premierleague.com/ I made a BeautifulSoup4 Web Scrapper in Python3 which automatically outputs a csv file of all the player statistics. The runtime of the file is about 20 minutes but it varies with the bandwidth of the Internet connection. I made this program so that this dataset could be updated weekly. The reason for weekly update is that the statistics change after each match played by the player so I felt that for the most up-to-date results, such a program is needed. Planning this project took 2 days. Making the program in Python3 took 7 days and the testing and bug fixing took another 5 days. The project was completed in the span of 2 weeks. Acknowledgements Source credits : https://www.premierleague.com/ Image credits : https://rb.gy/wuiwth Inspiration How do variables like age, nationality and club affect the player performance? Known issues in the dataset

Goals per match displays an abnormally high value for a few players as the HTML displays incorrect value during first few milliseconds of loading the page. I am trying to fix it analytically rather than scrapping directly from the website.