Wednesday, December 17, 2008

What factors have an effect on runs scored at MLB Parks?

Question: What factors have an effect on runs scored at MLB Parks?

Why I asked the question: I was trying to find why Chase Field had such a high Park Factor over the years and study just exploded from there.

Analysis: Originally I did some research on effects on all MLB and MiLB stadiums (link to spreadsheet). After it was available someone brought up that Chase Field had such a high Park Factor and elevation and temperature could not explain the entire effect. Trying to able to tell if it was the lower humidity (as some suggested) or park size, I decided to run a multiple regression for the average total runs scored per game (home and away team scores averaged) against elevation, average temperature, average humidity, park size and foul territory area.

I plan to use park factors at some point in the analysis, but I can only find park factors in single digits (e.g. 102 vice 102.45). Using these leads to a very low r-squared do to too much rounding. Once I find the data, I will run the data against them to predict park factors.

Sources of data:

Runs Scored – Collect from retrosheet database – average values from last three years

Elevation - List of Major League Baseball stadiums on Wikepedia

Temperature – Collected from the retrosheet database – last 3 years

Humidity – Average values from April to September from websites BBC.com and CityRating.com

Park Fair Area – Calculated using the equations discussed on The Book blog

Park Foul Area – Areas were calculated by Mitchel Lichtman

The data is for each of the major league parks is in the following table

Table 1

Team Park Original Runs scored (league adjusted) Fair Area Foul Area Humidity % Elevation (ft) Average Temp (F)
Arizona Diamondbacks Chase Field 9.67 109999 26712 18.3 1087 80.6
Atlanta Braves Turner Field 9.58 113031 23852 56.0 928 79.8
Baltimore Orioles Oriole Park at Camden Yards 9.99 106745 23055 52.5 30 80.6
Boston Red Sox Fenway Park 9.68 99147 17859 58.2 15 69.1
Chicago Cubs Wrigley Field 9.77 107892 18564 58.0 600 69.6
Chicago White Sox U.S. Cellular Field 9.69 108913 25663 58.0 595 68.3
Cincinnati Reds Great American Ball Park 10.13 104830 23376 55.5 542 76.2
Cleveland Indians Progressive Field 9.41 105817 21664 58.2 656 69.1
Colorado Rockies Coors Field 10.55 116260 28269 35.2 5198 73.7
Detroit Tigers Comerica Park 9.75 112983 30227 54.7 600 71.9
Florida Marlins Dolphin Stadium 9.66 109542 28144 61.8 5 83.2
Houston Astros Minute Maid Park 9.13 102328 25139 62.3 48 74.5
Kansas City Royals Kauffman Stadium 9.39 109689 23528 61.5 877 76.8
Los Angeles Angels of Anaheim Angel Stadium of Anaheim 8.86 108130 22021 67.2 153 75.1
Los Angeles Dodgers Dodger Stadium 8.93 105899 18276 67.2 517 73.4
Milwaukee Brewers Miller Park 9.32 108367 23047 63.0 598 72.0
Minnesota Twins Hubert H. Humphrey Metrodome 8.35 104404 33578 57.5 992 69.5
New York Mets Shea Stadium 9.14 112137 25665 54.8 12 72.4
New York Yankees Yankee Stadium 9.77 115349 18949 54.8 17 72.2
Oakland Athletics Oakland-Alameda County Coliseum 7.98 103539 40153 59.8 7 64.3
Philadelphia Phillies Citizens Bank Park 10.23 101280 25107 53.7 19 74.8
Pittsburgh Pirates PNC Park 9.37 108670 22914 53.7 726 73.0
St. Louis Cardinals Busch Stadium 9.20 111239 25783 54.2 453 76.9
San Diego Padres PETCO Park 7.71 107331 21473 65.3 21 69.4
San Francisco Giants AT&T Park 8.98 107077 23267 59.8 10 64.4
Seattle Mariners Safeco Field 8.72 109120 26290 53.5 17 64.4
Tampa Bay Rays Tropicana Field 9.03 106163 25590 59.2 38 72.0
Texas Rangers Rangers Ballpark in Arlington 10.51 110743 21494 56.5 546 81.2
Toronto Blue Jays Rogers Centre 8.46 108540 30327 57.3 290 71.4
Washington Nationals Nationals Park 9.04 108752 26400 54.2 17 76.1

I ran a regression on the data to get an equation that uses the data to create a best fit line. I originally ran fence height as part of the regression and the final regression equation was even less accurate. The equation to predict the runs scored after running the regression is:

Runs Scored = 0.0513 * Degrees F +0.000212 * Elevation in feet – 0.0176 * % Relative Humidity – 0.0000543 * Foul Area in square feet – 0.00000982 * Fair Area in square feet + 8.874 [+0.4267 runs for AL Park]

Standard Deviation of 0.47 and R-squared of 0.526

Here is a simple chart of the factors for easy comparison.

Table 2

Factor

Change in Total Runs Scored per Game

10 degree F increase

+0.51

Increase in RH by 10%

-0.18

10,000 sq ft increase in foul area

-0.54

10,000 sq ft increase in playing area

-0.098

1000 ft increase in elevation

+0.21

As it can be seen, each factor can significantly effect the runs scored. The following table is the original and final number for each of the ballparks.

Table 3

Team Park Original Runs scored (league adjusted) Projected Runs scored (league adjusted) Original Runs Scored (not league adjusted) Projected Runs Scored (not league adjusted) Difference (Original – Projected)
Arizona Diamondbacks Chase Field 9.67 10.38 9.67 10.38 -0.71
Atlanta Braves Turner Field 9.58 9.77 9.58 9.77 -0.18
Baltimore Orioles Oriole Park at Camden Yards 9.99 9.79 10.42 10.21 0.21
Boston Red Sox Fenway Park 9.68 9.45 10.11 9.88 0.23
Chicago Cubs Wrigley Field 9.77 9.48 9.77 9.48 0.29
Chicago White Sox U.S. Cellular Field 9.69 9.02 10.12 9.44 0.68
Cincinnati Reds Great American Ball Park 10.13 9.62 10.13 9.62 0.52
Cleveland Indians Progressive Field 9.41 9.31 9.84 9.74 0.10
Colorado Rockies Coors Field 10.55 10.46 10.55 10.46 0.09
Detroit Tigers Comerica Park 9.75 8.97 10.18 9.40 0.78
Florida Marlins Dolphin Stadium 9.66 9.45 9.66 9.45 0.21
Houston Astros Minute Maid Park 9.13 9.23 9.13 9.23 -0.10
Kansas City Royals Kauffman Stadium 9.39 9.56 9.82 9.98 -0.17
Los Angeles Angels of Anaheim Angel Stadium of Anaheim 8.86 9.31 9.29 9.74 -0.45
Los Angeles Dodgers Dodger Stadium 8.93 9.53 8.93 9.53 -0.60
Milwaukee Brewers Miller Park 9.32 9.26 9.32 9.26 0.06
Minnesota Twins Hubert H. Humphrey Metrodome 8.35 8.78 8.78 9.21 -0.43
New York Mets Shea Stadium 9.14 9.13 9.14 9.13 0.01
New York Yankees Yankee Stadium 9.77 9.45 10.20 9.88 0.32
Oakland Athletics Oakland-Alameda County Coliseum 7.98 7.92 8.40 8.34 0.06
Philadelphia Phillies Citizens Bank Park 10.23 9.41 10.23 9.41 0.82
Pittsburgh Pirates PNC Park 9.37 9.51 9.37 9.51 -0.15
St. Louis Cardinals Busch Stadium 9.20 9.46 9.20 9.46 -0.26
San Diego Padres PETCO Park 7.71 9.06 7.71 9.06 -1.35
San Francisco Giants AT&T Park 8.98 8.81 8.98 8.81 0.17
Seattle Mariners Safeco Field 8.72 8.73 9.15 9.16 -0.01
Tampa Bay Rays Tropicana Field 9.03 9.10 9.46 9.52 -0.06
Texas Rangers Rangers Ballpark in Arlington 10.51 9.90 10.94 10.32 0.61
Toronto Blue Jays Rogers Centre 8.46 8.87 8.89 9.30 -0.41
Washington Nationals Nationals Park 9.04 9.32 9.04 9.32 -0.28

The equation is able to predict some stadiums run production very accurately. Here is a table where the regression equation was able to predict the runs scored (within 0.1 runs).

Table 4

Team Park Original Runs scored (league adjusted) Projected Runs scored (league adjusted) Difference (Original – Projected)
Houston Astros Minute Maid Park 9.13 9.23 -0.10
Tampa Bay Rays Tropicana Field 9.03 9.10 -0.06
Seattle Mariners Safeco Field 8.72 8.73 -0.01
New York Mets Shea Stadium 9.14 9.13 0.01
Milwaukee Brewers Miller Park 9.32 9.26 0.06
Oakland Athletics Oakland-Alameda County Coliseum 7.98 7.92 0.06
Colorado Rockies Coors Field 10.55 10.46 0.09
Cleveland Indians Progressive Field 9.41 9.31 0.10

I grouped the parks that exceeded the Standard Deviation of 0.47. These are the parks that through the factors that run scoring environment can't be explained as well.

Team Park Original Runs scored (league adjusted) Projected Runs scored (league adjusted) Difference (Original – Projected)
San Diego Padres PETCO Park 7.71 9.06 -1.35
Arizona Diamondbacks Chase Field 9.67 10.38 -0.71
Los Angeles Dodgers Dodger Stadium 8.93 9.53 -0.60
Cincinnati Reds Great American Ball Park 10.13 9.62 0.52
Texas Rangers Rangers Ballpark in Arlington 10.51 9.90 0.61
Chicago White Sox U.S. Cellular Field 9.69 9.02 0.68
Detroit Tigers Comerica Park 9.75 8.97 0.78
Philadelphia Phillies Citizens Bank Park 10.23 9.41 0.82

For future analysis, I will look into park factors and also any other factors related to the parks. If you have another set of data (e.g. hits or home runs) feel free to send them and I will run them against the data. Also, you could run your own regression using LINEST() in Excel or OpenOffice (for us cheap people).

I would like to use wind as a measurement, but haven't found a good way to convert wind speed and direction into one variable. Again, if anyone has any idea, please let me know. If any one can think of other factors to consider, please let me know.


1 comment:

mberenis said...

MLB parks in their own right are meticulous. I like your blog! Check out my blog for a free internet speed test. Comments appreciated!

--->Free Internet Speed Test<---