Predicting the Winning Football Team

#Predicting-the-Winning-Football-Team

Can we design a predictive model capable of accurately predicting if the home team will win a football match?

alt text

Steps

#Steps
  • We will clean our dataset
  • Split it into training and testing data (12 features & 1 target (winning team (Home/Away/Draw))
  • Train 3 different classifiers on the data -Logistic Regression -Support Vector Machine -XGBoost
  • Use the best Classifer to predict who will win given an away team and a home team

History

#History

Sports betting is a 500 billion dollar market (Sydney Herald)

alt text

Kaggle hosts a yearly competiton called March Madness

https://www.kaggle.com/c/march-machine-learning-mania-2017/kernels

Several Papers on this

https://arxiv.org/pdf/1511.05837.pdf

"It is possible to predict the winner of English county twenty twenty cricket games in almost two thirds of instances."

https://arxiv.org/pdf/1411.1243.pdf

"Something that becomes clear from the results is that Twitter contains enough information to be useful for predicting outcomes in the Premier League"

https://qz.com/233830/world-cup-germany-argentina-predictions-microsoft/

For the 2014 World Cup, Bing correctly predicted the outcomes for all of the 15 games in the knockout round.

So the right questions to ask are

-What model should we use? -What are the features (the aspects of a game) that matter the most to predicting a team win? Does being the home team give a team the advantage?

Dataset

#Dataset
  • Football is played by 250 million players in over 200 countries (most popular sport globally)
  • The English Premier League is the most popular domestic team in the world
  • Retrived dataset from http://football-data.co.uk/data.php

alt text

  • Football is a team sport, a cheering crowd helps morale
  • Familarity with pitch and weather conditions helps
  • No need to travel (less fatigue)

Acrononyms- https://rstudio-pubs-static.s3.amazonaws.com/179121_70eb412bbe6c4a55837f2439e5ae6d4e.html

Other repositories

#Other-repositories

Import Dependencies

#Import-Dependencies
Loading output library...

Data Exploration

#Data-Exploration
Loading output library...
Loading output library...

Preparing the Data

#Preparing-the-Data
Loading output library...

Training and Evaluating Models

#Training-and-Evaluating-Models

Clearly XGBoost seems like the best model as it has the highest F1 score and accuracy score on the test set.

Tuning the parameters of XGBoost.

#Tuning-the-parameters-of-XGBoost.

alt text

Possible Improvements?

-Adding Sentiment from Twitter, News Articles -More features from other data sources (how much did others bet, player specific health stats)