Data and Analysis: Detecting Match-Fixing Patterns In Tennis

#Data-and-Analysis:-Detecting-Match-Fixing-Patterns-In-Tennis

The Python code below runs the anonymized implementation of the methodology described here that was used in "The Tennis Racket". The methodology contains many important details. Please read it before continuing here.

Importing The Data

#Importing-The-Data

Match Selection

#Match-Selection

The code below excludes opening odds that implied probabilities more than 10 percentage points higher or lower than the median of all bookmakers’ opening odds for the match. (Otherwise the return of these odds toward the consensus could be mistaken for a sign of suspicious betting.) The code also excludes matches that were noted as "canceled" — typically a result of pre-match withdrawals — or "walkover" on OddsPortal.

Odds-Movement Calculation

#Odds-Movement-Calculation

The code below find the odds movement for a bookmaker in a given match by calculating the difference between each player’s chance of winning implied by the opening and final odds.

Player Selection

#Player-Selection

The code below selects only matches where, in at least one book, the odds moved more than 10 percentage points. The 10-percentage-point cutoff is based on discussions with sports-betting investigators, who said that movement above this threshold was what prompted them to give greater scrutiny to a match.

Players who lost more than 10 such “high-movement” matches are selected for analysis.

Simulation

#Simulation

The code below runs a series of simulations to estimate the unlikelihood of each player’s outcomes. Each simulation uses the player’s implied chance of winning — based on each match’s opening odds — to generate a set of outcomes for each string of matches. BuzzFeed News ran the simulation 1 million times per player. The result: The estimated chance that the player would have lost as many (or more) high-movement matches as the player did, if the chances implied by the opening odds were correct.

Classify Likelihood

#Classify-Likelihood

Note on reading the likelihood_level_open column:

  • Players who have Bonferroni likelihood below 5%: ****
  • Players who have an overall likelihood below 1%: **
  • Players who have an overall likelihood below 1%: *
Loading output library...

In some simulations an additional player received an estimated likelihood just barely under 0.05. To be conservative we are not including that player among our totals.

How Many Questionable Matches Have Players On Investigators' List Lost?

#How-Many-Questionable-Matches-Have-Players-On-Investigators'-List-Lost?

The strings below represent the anonymized names of the 28 players flagged in a 2008 report by investigators for the Assocation of Tennis Professionals. Each anonymized name is the SHA256 hash of the name plus a randomly-generated salt.