Predicting the Winning Football Team


Can we design a predictive model capable of accurately predicting if the home team will win a football match?

alt text


  • We will clean our dataset
  • Split it into training and testing data (12 features & 1 target (winning team (Home/Away/Draw))
  • Train 3 different classifiers on the data -Logistic Regression -Support Vector Machine -XGBoost
  • Use the best Classifer to predict who will win given an away team and a home team



Sports betting is a 500 billion dollar market (Sydney Herald)

alt text

Kaggle hosts a yearly competiton called March Madness

Several Papers on this

"It is possible to predict the winner of English county twenty twenty cricket games in almost two thirds of instances."

"Something that becomes clear from the results is that Twitter contains enough information to be useful for predicting outcomes in the Premier League"

For the 2014 World Cup, Bing correctly predicted the outcomes for all of the 15 games in the knockout round.

So the right questions to ask are

-What model should we use? -What are the features (the aspects of a game) that matter the most to predicting a team win? Does being the home team give a team the advantage?


  • Football is played by 250 million players in over 200 countries (most popular sport globally)
  • The English Premier League is the most popular domestic team in the world
  • Retrived dataset from

alt text

  • Football is a team sport, a cheering crowd helps morale
  • Familarity with pitch and weather conditions helps
  • No need to travel (less fatigue)


Other repositories


Import Dependencies

Loading output library...

Data Exploration

Loading output library...
Loading output library...

Preparing the Data

Loading output library...

Training and Evaluating Models


Clearly XGBoost seems like the best model as it has the highest F1 score and accuracy score on the test set.

Tuning the parameters of XGBoost.


alt text

Possible Improvements?

-Adding Sentiment from Twitter, News Articles -More features from other data sources (how much did others bet, player specific health stats)