Wayde Van Niekerk stunned the world when he won the gold medal in the 400 metres at the 2016 Summer Olympics with a world record time of 43.03 seconds. Not only did he reach for gold, he also obliterated Michael Johnson’s 17 year old world record from 1999. Van Niekerk became the first man to win the Olympic 400 metres from lane eight since Scotland’s Eric Liddell in 1924 during the Paris Olympics.
South Africa is still in the afterglow of this phenomenal achievement (and will still be for a while). A great backdrop against which we can have some fun with advanced analytics, using R and Microsoft Power BI, which highlights how exceptional his record time is against the backdrop of historic results.
R is an open source programming language for statistical computing, predictive analytics and graphics. It is the world’s most popular programming language in this space, used by more than 2 million people worldwide. Microsoft acquired Revolution Analytics, a commercial provider of software and services for the R programming language, which led to an elegant integration into the SQL 2016 platform.
The data set in this exercise consists of all the gold medalists and their winning times in the Olympics for the men’s 400 meter event.
After loading this data set into Power BI, it is easy to create an overview like the one below.
The real fun kicks in when we’re using some R scripts to visualise the data in a more statistically appealing manner in Power BI. R provides so many time series modelling functionalities that you’ve got all your bases covered! In this example I used Holt Winters, but you can easily extend the model to e.g. include ARIMA and seasonal smoothing, as well as incorporate an automated model selection to get the best fit.
Having a look at the historic data, it is now easy to create a combined histogram and boxplot like the one below, indicating the distribution of the gold medal times across history. It is now easy to see that the one data point on the far right of the histogram is an outlier, which is the 54.2 seconds time at the first modern Olympics in 1896 by Thomas Burke.
Let’s have a look into the future. The next dashboard shows an overview of the forecasted values for the next 10 Olympics (the next 40 years!).
What is great about the Power BI integration is that the slicers and other data objects in the model interact with the R visualisations as seamlessly as with normal Power BI objects. The overview above shows the forecasted 400m times for the next 10 Olympics. The blue, light grey and dark grey areas indicate the forecasted values, as well as the 80 and 95 confidence intervals.
So, how exceptional was Wayde’s achievement? Based on this forecast model it looks like the expected 400m time in 40 years from now will be 43.43 seconds – and this includes Wayde’s most current record time. If we would exclude the new world record, the forecasted time would be 43.98 seconds. Very exceptional indeed and it highlights the excitement that often only sports can bring. The confidence intervals will help us a bit more with determining how far we can go when placing our bets. All eyes are on Wayde for the next time as he seems to be an individual that is able to beat the statistical odds!
If you want to see the dashboard in action, take a look at this video blog, as Erwin navigates through it and explains how to drill down for more detail.
Being the Country Partner of the year for South Africa and with a track record of over 15 years of data management and analytics, Karabina can assist you with incorporating advanced analytics in your environment, so you can make better informed business decisions. Contact us if you have any questions or pop a comment below!
Photo Credit: Xavier Laine, Getty Images Sport