Question: How do I determines a player's real/predicted UZR including age adjustments?
Why I asked the question: I needed a way for my Manger's Scorecard to determine a player's defensive ability and I couldn't find a good method. I was told that an average player's UZR should decrease by ~1 UZR per year , but my work on CF UZR/150 showed different results.
I collected the UZR for all players since 2002 from Fangraphs that had over 100 innings at each position. I decided not use true UZR because a few bad plays in a small sample size can lead to a player with a fairly large positive or negative UZR. I needed to regress the data to a value that is more indicative of the player's actual ability. I used the value of 125 games (as recommended by Mitchel Lichtman) to be the point when 50% of player's UZR can determined from their current stats and the rest will be the league average. The value of 125 games is a little more conservative value than the 100 value mention in this article at Fangraphs (http://www.fangraphs.com/blogs/index.php/fielding-update-arms-and-double-plays), but I will use it for my analysis.
I assumed that a player regresses to a value of 0, so for each player I got their yearly regressed UZR (rUZR) with the following equation:
rUZR = (1-((125)/(number of games + 125))) * Yearly UZR)
I ran this formula on all players (~3200 players), except pitchers and catchers (who have no UZR data on them) born after 1968 (the data from Fangraphs doesn't have playerID, so I was getting too many duplicate names). I combined all the positions and got the following results:
It is tough to make too many conclusions from the data, but the following can be observed:
Players are significantly better when they are younger than 30 than when they are older than 30. Both of these age groups seem to have values grouped together.
Even though the general trend in downward, there seem to be a learning curve for a few years, a peak and then a slow decline.
After that, I summed up the data for each position and got the following average numbers for the yearly rUZR for 23 to 34 year olds (above and below these ages, the amount of data for certain position, shortstop or example, was severely lacking):
||Yearly regressed numbers||
|Position||Change in UZR per year||R-squared|
The r-squared shows the level of correlation in a direct change over time with the left field numbers actually being statically significant and the third base numbers not having any correlation.
Here is a chart of the actual rUZR values by year and position:
The preceding data seems to indicate that there are 3 categories for rUZR:
Outfielders – All three groups show about the same decrease of ~0.125 rUZR per year of age.
Corner Infielders – These both are pretty steady over the years with values close to 0, but 1B seems to decline a little more than 3B and its decline rate is closest of any position to the overall decline rate.
Middle Infielders – These two positions were a little confusing in that they had a trend of an increase in regressed UZR over the years. I needed to look into them a little more. I selected the players that played 4 straight seasons at SS and looked to see how much of change there was from year to year and the results were the same. The players seem to having an improving rUZR to a peak around age 27 and then decline after that. Here are the graphs for these 2 positions:
Once you look at the positional rUZR, the overall rUZR values makes a little more sense when the following 3 trends are combined:
General overall decline(outfielders)
Increase at the beginning (middle infielders)
General overall leveling (corner infielders)
Creating a prediction formula
After collecting the information on individual positions, I wanted to look at using the age adjustment to help predict the players actual rUZR scores. Through the discussion with Mitchel Litchman, he recommended the following formula which weights the recent years more than the previous years:
((5* # of games last year * UZR last year)+
(4 * # of games 2 years ago * UZR 2 years ago)+
(3 * # of games 2 years ago * UZR 3 years ago)+
(2 * # of games 2 years ago * UZR 4 years ago))/
((5 * # of games last year) +
((4 * # of games 2 years ago) +
((3 * # of games 3 years ago) +
((2 * # of games 4 years ago))
I ran this formula against all the players with over 100 games at a position for 5 straight years and using this system, it was a very good predictor. There was a slight decimal difference for perfect prediction, but it was a very good method. I also found out that using a weight of 2,1 (2 for last year, 1 for 2 years ago and 0's for the rest) was equally as good.
The one fact I did find was that -.03 was not a good indicator of the loss of defensive ability from these players. The main reason I saw a larger increase in the negative direction is that I was predicting using 4 years of data to predicted the 5th year, so the youngest players could not be predicted until they are ~25 years old. Here are the age adjustments and standard deviations on being able to predict a player's UZR using the 5,4,3,2 method and the 2,1 method:
||2,1 Age Adjustment||Standard Deviation for 2,1||5,4,3,2 Age Adjustment||Standard Deviation for 5,4,3,2|
|Year's Regressed UZR||-0.35||1.61||-0.70||1.87|
|Year's Actual UZR||-1.00||4.44||-1.35||4.45|
I might go back sometime and figure out the overall and positional age adjustment using the 2,1 formula. For now I feel the data is good enough for the Manager's Scorecard, but it does raises some questions on exact positional aging patterns.
I have put a spreadsheet on the web (with a web page coming soon) that a person can download and enter a player's Games/Chances, UZR, regression adjustment and age adjustment and the player's rUZR will be exported.
There is not great understanding of defensive metrics, but my hope is to help remove some of the old clutter (and possibly created new clutter) and help people interpret the defensive metrics like UZR better. As always, I open to comments and suggestions.