Wednesday, January 7, 2009

What factors have an effect on runs scored at MLB Parks? Part 2

Question: What factors have an effect on runs scored at MLB Parks? Part 2

Why I asked the question: I was trying to find why Chase Field had such a high Park Factor over the years and the research expanded to all stadiums from that point.

Analysis: I did some research on effects on all MLB and MiLB stadiums (link to spreadsheet). After it was published, someone brought up that Chase Field had such a high Park Factor and elevation and temperature could not explain the entire difference. Being unable to tell if it was the lower humidity (as some suggested) or park size, I decided to run a multiple regression for the average park factors over the last 3 years (2006 to 2008) against elevation, average temperature, average humidity, park size (RF, RC, CF, LC, LF), average wall height, errors per game, wind direction, surface type and foul territory area.

Explanation of source of data:

Park Factors – I originally used runs scored per game because of the only Park Factors I could find did not go to the decimal point. After my original article ran, I got flooded with many different sets of data and am using Patriot's numbers from his website (htttp://gosu02.tripod.com/id103.html)
Elevation (ft)- Collected from the List of Major League Baseball stadiums on Wikepedia – Elevation is a major factor in determining the distance a ball travels and the time the defense has to react to the ball once it is in play.
Temperature (degrees F)– Collected from the retrosheet database – last 3 years, except only 1 year's worth of data for Washington. The higher the temperature the farther a ball will travel
Relative Humidity (percentage) – Average values from April to September from websites BBC.com and CityRating.com Humidity is not supposed to have much of an effect on the distance a ball travels, but maybe that that small differences will explain some differences.
Dimensions (ft) – Taken from Wikepedia. Only 5 sets used – LF, FC, CF, RC and RF. Originally used total area from ballpark, but I found out that even though these two stadiums that had about the same area the stadium shaped like #1 below had a higher park factor:

Stadium #1 360-380-400-380-360
Stadium #2 380-380-380-380-380

Park Foul Area (ft squared) – Areas were calculated by Mitchel Lichtman. The larger the foul area, more foul balls will be caught, therefore less runs scored
Wind strength and direction (mph)- I used Retrosheet data from the last 3 years (1 year for Washington). The data from Retrosheet comes in the for of 8 different directions. From these different directions, I created the following matrix:

Wind Direction X component Y component
To LF -0.71 0.71
To CF 0 1
To RF 0.71 0.71
LF to RF 1 0
From LF 0.71 -0.71
From CF 0 -1
From RF -0.71 -0.71
RF to LF -1 0

I multiplied the X and Y values by the wind speed, added all the wind values up for each component and then divided by the number of games. Y component is a wind blowing out to CF, while the X component is a wind blowing to RF.

  • Question: When collecting this data, I found there was no wind blowing in form right field and I thought I had made a mistake somewhere. I searched on the games database for hat wind direction and the most recent case was in 2003. Has the wind not once over 5 years blown in from 1 game from RF? Is there some unspoken rule that the scores don't mark it this way?
Opponents Errors per game – I was looking for a way to measure how tough it is to play in a Stadium (i.e. Fly balls in Metrodome). The best metric I could come up with is to average the amount of errors the opposing team has per game.
Playing Surface – The three stadiums with Turf were given a value of 1 and the rest 0. Being that it was the new Field Turf, I wondered if runs scored might go down because the balls hit would be slower than “AstroTurf” and less weird bounces.
Average Wall Height (ft) – Averaged the values ballparks.com

Now it is time for a few graphs that show the data collected.
  • Note: Data on the Washington Nationals is only from 2008 since they just moved into a new park this last year.
The initial data is for each of the major league parks is in the following table

Park Factors and Park Characteristics

Team Park Factors LF LC CF RC RF Average Wall Height Foul Area Surface
Arizona Diamondbacks 1.0505 330 376 407 376 335 13.33 26712 0
Atlanta Braves 0.9934 335 380 400 390 330 8.00 23852 0
Baltimore Orioles 0.9946 333 364 410 373 318 13.00 23055 0
Boston Red Sox 1.0313 310 335 390 380 302 19.33 17859 0
Chicago Cubs 1.0257 355 368 400 368 353 14.33 18564 0
Chicago White Sox 1.0285 335 375 400 375 330 8.00 25663 0
Cincinnati Reds 1.0150 328 365 404 365 325 9.33 23376 0
Cleveland Indians 0.9819 325 360 405 375 325 11.67 21664 0
Colorado Rockies 1.1066 347 390 415 382 350 11.00 28269 0
Detroit Tigers 0.9838 345 370 420 388 330 10.00 30227 0
Florida Marlins 0.9667 330 360 434 373 345 8.00 28144 0
Houston Astros 1.0006 315 335 436 365 326 12.33 25139 0
Kansas City Royals 1.0033 330 375 410 375 330 8.00 23528 0
Los Angeles Angels of Anaheim 0.9774 330 382 400 365 330 11.33 22021 0
Los Angeles Dodgers 0.9751 330 368 400 368 330 8.00 18276 0
Milwaukee Brewers 1.0031 344 370 400 374 337 12.00 23047 0
Minnesota Twins 0.9971 343 370 408 352 327 14.33 33578 1
New York Mets 0.9754 338 371 410 371 338 8.00 25665 0
New York Yankees 0.9879 318 399 408 385 314 7.67 18949 0
Oakland Athletics 0.9836 330 362 400 362 330 10.67 40153 0
Philadelphia Phillies 1.0273 329 355 401 357 330 9.53 25107 0
Pittsburgh Pirates 0.9913 325 389 399 364 320 16.00 22914 0
St. Louis Cardinals 0.9816 334 378 396 387 322 9.00 25783 0
San Diego Padres 0.9248 339 368 399 378 309 7.67 21473 0
San Francisco Giants 0.9977 331 375 405 365 326 13.67 23267 0
Seattle Mariners 0.9632 335 375 400 375 335 8.00 26290 0
Tampa Bay Rays 0.9878 315 370 404 370 322 9.50 25590 1
Texas Rangers 1.0497 330 380 400 380 330 10.00 21494 0
Toronto Blue Jays 1.0220 328 375 400 375 328 10.00 30327 1
Washington Nationals 1.0070 336 377 402 370 335 7.00 26400 0

Natural Factors and Errors

Park Team Error per Game % Relative Humidity Wind Strength to CF Wind Strength to RF Elevation Average Temp
Chase Field Arizona Diamondbacks 0.58 18.3 -0.60 0.59 1087 80.6
Turner Field Atlanta Braves 0.66 56.0 0.13 1.68 928 79.8
Oriole Park at Camden Yards Baltimore Orioles 0.59 52.5 0.71 0.44 30 80.6
Fenway Park Boston Red Sox 0.62 58.2 1.85 2.16 15 69.1
Wrigley Field Chicago Cubs 0.72 58.0 -1.20 0.77 600 69.6
U.S. Cellular Field Chicago White Sox 0.64 58.0 0.05 2.46 595 68.3
Great American Ball Park Cincinnati Reds 0.55 55.5 0.47 0.42 542 76.2
Progressive Field Cleveland Indians 0.63 58.2 -0.87 4.23 656 69.1
Coors Field Colorado Rockies 0.74 35.2 -1.40 0.20 5198 73.7
Comerica Park Detroit Tigers 0.63 54.7 -0.12 -0.31 600 71.9
Dolphin Stadium Florida Marlins 0.67 61.8 -5.99 1.30 5 83.2
Minute Maid Park Houston Astros 0.45 62.3 0.39 -0.46 48 74.5
Kauffman Stadium Kansas City Royals 0.64 61.5 -0.40 -0.77 877 76.8
Angel Stadium of Anaheim Los Angeles Angels of Anaheim 0.67 67.2 4.55 1.14 153 75.1
Dodger Stadium Los Angeles Dodgers 0.69 67.2 5.39 0.46 517 73.4
Miller Park Milwaukee Brewers 0.66 63.0 -0.78 0.35 598 72.0
Hubert H. Humphrey Metrodome Minnesota Twins 0.64 57.5 0.00 0.00 992 69.5
Shea Stadium New York Mets 0.65 54.8 0.59 -0.68 12 72.4
Yankee Stadium New York Yankees 0.70 54.8 1.84 -0.12 17 72.2
Oakland-Alameda County Coliseum Oakland Athletics 0.56 59.8 3.47 7.73 7 64.3
Citizens Bank Park Philadelphia Phillies 0.71 53.7 1.49 3.38 19 74.8
PNC Park Pittsburgh Pirates 0.64 53.7 1.65 -1.30 726 73.0
Busch Stadium St. Louis Cardinals 0.57 54.2 2.46 0.39 453 76.9
PETCO Park San Diego Padres 0.44 65.3 1.89 8.23 21 69.4
AT&T Park San Francisco Giants 0.66 59.8 8.63 2.85 10 64.4
Safeco Field Seattle Mariners 0.73 53.5 0.42 1.21 17 64.4
Tropicana Field Tampa Bay Rays 0.65 59.2 0.00 0.00 38 72.0
Rangers Ballpark in Arlington Texas Rangers 0.63 56.5 -4.65 4.52 546 81.2
Rogers Centre Toronto Blue Jays 0.56 57.3 0.17 2.07 290 71.4
Nationals Park Washington Nationals 0.58 54.2 5.68 -5.68 17 75.9

I ran a regression analysis on the data to get an equation that uses the preceding data.

The regression equation ended up having an R-squared of 0.714 and the Standard Deviation of the difference of the initial Park Factor and the final Park Factor was 0.0178.

There was two problems with that initial equation:

1.The variable for wind blowing to CF was negative, therefore the more the wind was blowing out, less scoring that would be. That just defies all logic, so I threw both the Wind Components out for the next round of analysis
2.The variable for Wall Height was positive, meaning the higher the wall, the more runs that are scored. Home runs score more runs than doubles, so I decided to remove Wall Height also.

After rerunning the regression after removing Wall Height and Wind, I got the following equation Standard Deviation of 0.0.0184 and R-squared of 0.692:

Park Factors = Away Teams Errors per game * (0.016) + % Relative Humidity * (-0.0012) + Foul Area * (-0.00000061) + Elevation * (0.000021) + Average Temperature * (0.00077) + Left Field * (-0.0010) + Left Center Field * (-0.00063) + Center Field * (-0.0010) + Right Center Field * (-0.00020) + Right Field * (0.0011) + 0.0090 (if Surface is turf) + 1.7056

Here is a simple chart of the factors for easy comparison of the factor and how much effects the park factor and run scoring environment.

Table 3. Amount each factor has on Park Factors and Runs Scored (9.54 runs per game was the average runs scored by both teams over the past 3 years)

Factor Change in Park Factor Change in Runs Scored per game (9.54 runs per game)
10 degree F increase 0.008 0.07
Increase in RH by 10% 0.012 -0.12
10,000 sq ft increase in foul area -0.006 -0.06
Surface is Turf 0.009 0.09
1000 ft increase in elevation 0.021 0.20
1 Errors for Away Team 0.016 0.15
10 ft increase in LF -0.010 -0.10
10 ft increase in LC -0.006 -0.06
10 ft increase in CF -0.010 -0.10
10 ft increase in RC -0.002 -0.02
10 ft increase in RF 0.011 0.10

As it can be seen, each factor can significantly effect the runs scored. The following table is the original and final numbers for each of the ballparks. I also have added a column of combined stadium attributes (Dimensions, Foul Area and Surface Type) added to the equation's constant value to help to show which stadium designs lead to more runs.

The regression equation is able to predict some stadiums run production quite well. Here is a table where the regression was able to predict the Park Factor within 0.01.

Team Park Original Value Dimensions plus constant Projected Value Difference
Los Angeles Angels of Anaheim Angel Stadium of Anaheim 0.9774 0.9960 0.9860 -0.0086
Minnesota Twins Hubert H. Humphrey Metrodome 0.9971 0.9839 0.9981 -0.0011
Oakland Athletics Oakland-Alameda County Coliseum 0.9836 0.9982 0.9840 -0.0004
Colorado Rockies Coors Field 1.1066 0.9729 1.1056 0.0010
Kansas City Royals Kauffman Stadium 1.0033 0.9874 1.0001 0.0033
Cincinnati Reds Great American Ball Park 1.0150 0.9986 1.0096 0.0054
Milwaukee Brewers Miller Park 1.0031 0.9947 0.9963 0.0068
Baltimore Orioles Oriole Park at Camden Yards 0.9946 0.9794 0.9875 0.0070


I grouped the parks that exceeded the Standard Deviation of 0.0184 These are the stadiums that the factors I am using can't explain the runs scored at that stadium.

Team Park Original Value Dimensions plus constant Projected Value Difference
Seattle Mariners Safeco Field 0.9632 0.9962 0.9925 -0.0293
San Diego Padres PETCO Park 0.9248 0.9724 0.9539 -0.0291
Los Angeles Dodgers Dodger Stadium 0.9751 1.0065 1.0030 -0.0278
Cleveland Indians Progressive Field 0.9819 1.0028 1.0087 -0.0268
Toronto Blue Jays Rogers Centre 1.0220 1.0022 1.0023 0.0197
San Francisco Giants AT&T Park 0.9977 0.9894 0.9769 0.0208
Chicago White Sox U.S. Cellular Field 1.0285 0.9912 0.9958 0.0328
Texas Rangers Rangers Ballpark in Arlington 1.0497 0.9946 1.0096 0.0401

Using the preceding data we can do analysis on future parks. I will pick the Met's new stadium, Citi Field. Most of the natural effects will be the same and the errors aren't know yet, but we can look at the dimensions and foul area to come to some conclusion.

Feature Shea Stadium Change in PF Citi Field Change in PF Difference (Citi -Shea)
LF 338 -0.34 335 -0.33 0.01
LC 371 -0.24 379 -0.24 0
CF 410 -0.41 408 -0.41 0
RC 371 -0.07 383 -0.08 -0.01
RF 338 0.36 330 0.35 -0.01
Foul Area 25665 -0.02 20900 -0.01 0




Total = -0.01

The new Mets stadium looks to allow less runs per game than the previous one. If you used the 9.54 runs per game environment, it would allow 0.12 runs less per game or about 10 less runs over the entire 81 home games.

I have had a lot of help putting this study together and special thanks to Mitchel Lichtman and Patriot for providing and data and to Sky Kalkman for his many suggestions. I hope the data gives people more of an insight to various variables that go into a stadium and how much of an effect each variable has on the run scored environment.


Extra information for those that want to do their own regression analysis.

You could run your own regression using LINEST() in OpenOffice by using the data I have collected. I have the spreadsheet for download and by inserting your own park factors into the spreadsheet, it will calculate the values for you.

Note: LINEST() puts the equation values in the opposite order they occur in the table and the 3rd value down on the left is the r-squared value.

Instructions:

1. Open the spreadsheet in OpenOffice Calc. I use OpenOffice because it's free for everyone and creates the variables simply.
2. Inset your values for the various teams into Table 1, 2, 3 and 4 under Park Factor columns. The cells for LINEST() will automatically update using the numbers.
3. Copy all the LINEST() values and paste them into the area after Table 2. The upper left hand corner of the original data should be pasted on the cell that has a border. See following image.



4. All the values will be automatically updated in Table 2
5. Do the same with Tables 3 and 4, but they don't contain Wall Height and Wind factors.

1 comment:

mew said...

Bet On Sports Online is a kind of interface game and it's kinda exciting.