With the start of the baseball season around the corner, it time to get the stats predictions for the most cherished people on the field, the umpires. More specifically, I wanted to examine the men behind the plate and how they effect the scoring of the games they call. The main portion of the data I used was from the website: The Logical Approach. It contains all the data on umpires since 2000, the year they were combined from being separate AL and NL groups to one MLB group. The following data was used: K/BB, BB/9, Base runners/IP, ERA and Total Runs/Game.
Procedure: To get to the final numbers, I calculated the total of all game scores for each umpire, added these up for all the seasons and averaged them to get the all time average values. Then I used this average and the yearly averages to create yearly “umpire factors” for each year to normalize the stats. I took the umpires that umped in all the nine years I had data for and determined the point when they had regressed to 50% (r-value) of there final value. Using these r-values, I predicted what the umpire's 2009 numbers. I have made the sheet available for all at Google Docs that includes all the umpires assigned to crews in 2009 (including the crew in which they are assigned) and also any umpire that umped at all in the majors in 2008. http://spreadsheets.google.com/ccc?key=pYilmL4DSNmkgSYzN_o80ZA.
Here is a chart of the averages off all the years data, the high and low values of the predicted values and the number of games where the umpire regresses to 50% of their actual value:
||K/BB||BB/9||Base Runner/inning||ERA||Total RPG|
|Average Values for All Umpires||2.01||3.16||1.40||4.59||9.55|
|Highest Predicted Value||2.85||3.53||1.47||4.91||10.33|
|Lowest Predicted Value||1.67||2.55||1.29||4.11||8.70|
|Number of games to get to 50% regression||35||97||133||97||194|
To get a general idea which umpires are pro-pitcher or pro-hitter, I ranked each umpire in the 5 categories and then added these 5 figure together (not the most scientific method, but will work for general analysis). Here are the top 10 umpires for being pitcher and hitter friendly:
Finally, I used these numbers to see how they would effect a single pitcher. Using the ERA numbers, I looked at the Royal's Gil Meche in 2008. I added how much the each home plate umpire's ERA differed from the average ERA. He had an extra 2 runs scored against him because of umpire differences over the season. Taking these differences into account, his ERA drops from 3.98 to 3.90.
Improvements – I wanted to get this data able available before the start of the season, but feel it is far from complete. I plan on adding these two improvements for future analysis:
Incorporate Pitch F/X. I currently don't have all the data downloaded, but hope to have it soon. I plan on implementing something like Jonathan Hale did in this article for the Hardball Times. I would like to see how well the data correlates from umpires actual strike zone to the game stats.
Use Retrosheet data and Park Factors. With Retrosheet I would be able to go back further in time and be able to get any stat need (please suggest any additions that would be helpful to the research). Some umpires only work certain areas of the country, so their numbers might be skewed do to the parks they work in.
Conclusion: With a range of +/- 1.6 R/G for all the umpires, teams and pitchers will need to adjust for the umpire behind the plate. Hopefully I give a people a better understand of which umpires have what tendencies. Finally, I plan on expanding the research to have more and better data, including the data from Pitch F/X. As also I open to comments and suggestions.