Winning in (Fantasy) Football
With the recent conclusion of the FIFA Women’s World Cup and the Copa America, and with the Africa Cup of Nations still underway, there will be lots of disappointed football fans who are puzzled as to why their teams crashed out. Their ire is often directed at the team managers, blaming them for what they see as, poor team selection and tactics.
As sports franchises increasingly turn to data and analytics to drive performance, a data-led approach to team selection could well help the misfiring football manager. Here is my take on a data science led approach to football team selection and performance in that most competitive of leagues, the Fantasy Premier League.
A great passion of mine is using my ever-growing data science toolset on the decisions to be made around me. I am an avid football fan and have been for most of my life. I was introduced to the Fantasy Premier League almost 10 years ago and I have been playing ever since.
Here is how it works: The Fantasy Premier League, like most fantasy games, allows you to select a side composed of different players in the English Premier League (EPL). Each football player has a price attached to him and the user has a maximum budget of 100m pounds. With that budget, the user selects 15 players and is rewarded with points on a weekly basis depending on how well his football players performed in their scheduled matches.
An example of a squad is given below. Even though it is a 15-man squad, only 11 are selected on any given game week. Each of the players selected will be awarded points. An example here is Callum Wilson (our striker) who will be awarded 2 points if he plays over 60 minutes, 4 points for every goal he scores and 3 points for any assists credited to him. There are further bonus points (either 3,2 or 1 ) awarded to the best players in a particular match. Callum Wilson has been selected as our captain and will thus receive double the amount of points awarded.
This is a simple overview of the game mechanics but a more detailed breakdown can be found at https://fantasy.premierleague.com/help/rules.
I have spent many hours with friends and colleagues discussing the potential of introducing machine learning in this particular environment, but we had often hit a brick wall regarding the most optimal approach.
I recently found player and weekly data for the 2016/17, 2017/18 and 2018/19 EPL seasons on a Kaggle page. I began playing around with the data and then I remembered reading an article a while back about “dead” squads that tend to perform well.
A “dead” squad is generally a squad that the user has selected and forgotten about, not making transfers and not substituting his/her players on a weekly basis. The season is 38 weeks long, so having a dead squad is certainly not recommended. However, the game allows you to replace some of your players on a weekly basis. This is called a transfer and comes with a point deduction cost. Generally, a user has 1 free transfer a week and then gets deducted 4 points for every transfer after that. This helps to improve your squad by protecting against injured players, players who have lost form and players who have a particularly challenging run of fixtures.
This article got me thinking; what if I built a squad using the first 8 weeks of the season to train a model that would then spit out an optimal squad for the remaining 30 weeks?
Getting the data in an acceptable format took some time. My input into the model was how a player had performed in the first 8 weeks, and if the data was available, how many points they had collected in the previous 2 seasons.
I found several performance metrics such as the number of goals scored, the number of clean sheets, the threat of a particular player, how many chances they had missed, how many minutes they had played and so on.
Armed with 3 seasons of data, my plan was to use the first 2 seasons to build and train a model and then test its performance on the most recent season. Finally, I would overlay a linear optimizer to select the squad subject to certain constraints. The variable I was looking to predict was the amount of points they could accrue between weeks 9 and 38. This was a simple sum over those weeks to add to my training set.
The different metrics offered different insights based on the players position. A clean sheet (not conceding a goal) is far more valuable to a defender than it is to a striker. A goal of course, has somewhat of a converse valuation. This led me to building different models for each of the different positions namely, Goalkeepers, Defenders, Midfielders and Strikers.
After training and cross-validating my model, I applied the result on the first 8 weeks of the most recent campaign. This led to a 30-week score prediction for each of the available players.
I then used the linear programming solver, “lpSolve” package to optimize the predicted score given the list of constraints. Instead of selecting 15 players for 100m, I selected 11 for 82m. This is because I allocated the cheapest possible players to the substitute slots as there was no room or time to test alternating players on a weekly basis.
So I selected 11 players that the model believed would optimize points for the last 30 weeks without any transfers. The 11 players selected by the model can be found below.
Mohammed Salah was the player predicted to have the most points, so I selected him as the team captain. I then calculated the points this team actually accrued for the remaining 30 weeks and took the points per week and stretched it to 38 weeks for a more measurable statistic of performance. This brought the aggregated points to a hair under 2000 and even though I cannot check that to an exact rank, the team performance would come in, in the top 25% of teams playing in the 18/19 campaign.
This is an impressive return for a “dead” side and would outperform many active managers. The insights are also valuable as it is often said that full-backs outperform center-backs in terms of value and the model concurred. There also seems to be far greater value in selecting high-price midfielders compared to high-price forwards.
The Way Forward
The above analysis took me in the region of a weekend to conclude, but I think it is a good starting point. The natural, optimal way forward would involve reinforcement learning to make decisions on a week-by-week basis.
That is an interesting option but before then I would like to try some feature-engineering and explore optimizing a squad for weeks 8 – 23 and then building a complete new squad from 24 – 38. This can be done using a rule of Fantasy Premier League called “wild-carding.”
So there you have it, one of many data science led approaches to football team selection and predicted performance. And with the new season almost upon us, anxious football managers or fans are more than welcome to get in touch!