I have decided to do another statistical breakdown of Tim Duncan's incredible defensive performance this year. While blocks and rebounding certainly provide a glimpse into his defensive prowess, they cannot give a comprehensive statistical picture of why Duncan has truly been a defensive machine on the floor this year. What we need is some sort of conglomerate synopsis of Tim Duncan's defensive game to confirm what all of the isolated statistics suggest: that Tim Duncan is the Defensive Player of the Year. What I hope to demonstrate in this post is that his play on the defense, when compared to that of other candidates, possesses more of the qualities evident in past Defensive Players of the Year than any of the other candidates. I'm going to perform a statistical simulation of the voting process based on how voters have voted in the past. I promise this will be fun.
Let me briefly explain how I plan to accomplish this.
Machine Learning to Predict Outcomes
Machine learning is an emerging statistical method which predicts outcomes based on previous results. This brand of artificial intelligence is employed in many aspects of pattern recognition and outcome prediction such as speech recognition (think Siri), spam detection (the Gmail kind, not Salty Processed Artificial Meat), search engines, bioinformatics, text processing, image mapping ... ahem... sorry. I got a little carried away. It won't happen again.
Recently, machine learning has even made its way into the study of basketball. A review of the MIT Sloan Sports Analytics Conference research topics shows that data mining and machine learning techniques have moved to the forefront of basketball analytics. Scouts and GMs are picking up on the power of machine learning in recognizing successful plays, gifted draft prospects, and potential free agent pickups. As such, it seems an appropriate application for the statistical analysis of the DPOY award.
If you want a more detailed look into how machine learning works, feel free to get lost deep-diving through Wikipedia or Google Scholar. I am not going to even begin to offer a detailed explanation here because I have neither the time nor mastery to do so. What I'll try to do is describe in simple terms my algorithm to predict the DPOY based on past results.
Training the Algorithm
So we want to know who is going to win DPOY? Well, what better way to find out than to compare this year's candidates to the previous 30 winners? This is where machine learning comes into play. We are going to "train" the algorithm to recognize what was important in past DPOY selections, and then compare the current candidates to the past winners with this training in mind.
One of the difficulties that arises in developing the algorithm is that we have to decide which statistics we want to compare. Obviously we are more interested in defensive statistical measures, but do we want to look at per game, per 36, or advanced statistics? Are we more interested in the absolute statistical values, or the values relative to the rest of the league?
I'm a big believer in advanced statistics. Percentage based stats (REB%, BLK%, etc.) give a more robust indication of a players on-court impact than per game or per 36 numbers, since they account for team pace. Likewise statistics such as DRtg (Defensive Rating) and DWS (Defensive Win Shares) provide a pretty comprehensive view of a players' defensive impact in ways that an aggregate of the number of defensive plays cannot.
For this analysis, I am going to compare the player's relative performance, or his league rank, in all of the advanced defensive statistics. I've summarized each of the candidates by their ranking this season among big men (centers and forwards) in each statistical category in the table below:
We should note a couple of things about our candidates. First, they are all in the top 20 in defensive rating and defensive wins shares for big men. None of them are especially good offensive rebounders, and only Duncan and Sanders are in the top 20 in total rebounding and blocks. Lastly, even among big men, none of these players are particularly good at stealing.
Essentially what we see is that the argument for each of these players for DPOY has merit. Each is elite in at least one defensive category. In order to skim the cream from the top, we will have to weight our comparisons to focus on statistics which are more important in determining the DPOY. To do that, we need to look at the rankings of the past winners. For the sake of comparison, only big men will be considered.
|Metta World Peace||2003-2004||15||7||96||93||94||1||72|
We see that the big men won for a variety of reasons, but on average, all winners had exceptional DWS and DRtg. From the averages, we can list each of the variables in order of importance, and provide a weighting equivalent to their level of importance. The weightings were made by dividing each statistic by its ranking average, so that statistics where past winners had consistently high rankings were given more weight. Listed below are the statistics in order of importance along with their weighting.
1. DWS, 0.2759
2. DRtg, 0.1644
3. DRB%, 0.0777
4. TRB%, 0.0755
5. BLK%, 0.0659
6. ORB%, 0.0339
7. STL%, 0.0214
The Voting Simulation
Hopefully you haven't fallen asleep yet, because we're now ready to conduct our own virtual DPOY vote. By weighting each of the statistical rankings by level of importance, we have "trained" our algorithm on how to recognize a DPOY based on past winners. Now all that remains is to perform the simulation.
To predict the Defensive Player of the Year, I will use a nearest neighbor model. In simple terms, we calculate the theoretical weighted statistical distance from each of the past winners to each of the candidates. Then each candidate is ranked based on his similarity to each past winner. In order to simulate the voting process, we will consider these rankings like votes. The candidate who most resembles the past winner will receive a first place vote, and on down the line. We will translate these results to assume that there are 124 voters, just like in the actual media voting, where a 1st place vote is worth 5 points, a 2nd place vote worth 3, and a 3rd place vote worth 1. Fourth place votes are worthless. With these things in mind, DRUM ROLL PLEASE....
|Player||1st Place||2nd Place||3rd Place||4th Place||Total Points|
|1. Tim Duncan||67||42||16||0||477|
|2. Joakim Noah||42||78||5||0||449|
|3. Larry Sanders||10||5||93||15||158|
|4. Marc Gasol||5||0||10||109||35|
It was a close race, but Tim Duncan edged out Joakim Noah. Personally, I think this is spot on. Both of these players are having phenomenal defensive years, and they should be the two players with the best shot of winning Defensive Player of the Year. The 3rd and 4th place results are strange. I would have expected Gasol to earn a significant number of 3rd place votes due to his impressive DWS. Sanders is a terrific blocker like many past DPOYs, and that is probably why he received the majority of 3rd place votes. Gasol has shown that he has a positive defensive impact, but since he does not have an exceptional ability to block shots or rebound, his relatively poor DRtg (when compared to Duncan and Noah) on a team with the second best DRtg really hurts him in this race.
This was an interesting exercise. I need to be very clear when I say that this was more of a thought experiment than a robust statistical analysis. The results of machine learning algorithms are highly influenced by the selection of statistics and how they are weighted. While it certainly wouldn't hold up to a peer-reviewed journal submission, it shows us several key things. First of all, big men have earned DPOY through a variety of avenues. Some, like Rodman and Wallace, were incredible rebounders. Others, like David Robinson and Alonzo Mourning, blocked shots all day. What all of them had in common, however, was a top 10 ranking in DWS and a top 25 ranking in DRtg. And this makes sense. Through whatever means necessary, the best defensive players find a way to disrupt the opposing offenses, all while serving as the defensive backbone of their team. Duncan, Gasol, and Noah are elite in terms of DWS, all ranking in the top 6 in the league on top 6 team defenses. In my (biased) eyes, however, Duncan gets the nod for his league leading defensive rating, his ability to control opposing offenses through both suffocating blocking and consistent defensive rebounding, and his overall command of the floor through extraordinarily keen court awareness.