Simple Arithmetic

November 29, 2009


I caught the end of tonight’s Notre Dame-Stanford game. After Gerhart throws a TD pass on fourth-and-4 to tie the game at 38, Notre Dame goes three-and-out, giving Stanford the ball back at their own 29-yard line. During the forthcoming four-minute-and-change march down the field, I keep thinking to myself that there’s no way that Notre Dame’s going to stop Stanford from scoring this drive, so they should just let Stanford run it in and give themselves as much time as possible to score again. However, with a few minutes to go and maybe 30 or so yards separating Stanford from the end zone, I realized no team in this day and age would give up on its defense this early and would probably hope that its defense could limit the opposition to a field goal.

That all changed when Toby Gerhart gets tackled inside the 5 yard line, earning the first down with the score knotted at 38 with 59 seconds to go. Stanford uses its first time out, and Notre Dame has one time out left. When play resumes, Gerhart gets the ball, and he walks into the end zone untouched. Which makes plenty sense.

What doesn’t make sense is the following thirty seconds of commentary. I don’t really watch college football that often, so I don’t really know any of the commentators, but it’s when you hear someone say that something to the effect of “Charlie Weis was probably having a discussion about whether this was the right decision, and he won that battle” that makes you wonder why it is that these sportscasters are there to begin with.

It’s a matter of simple arithmetic—with 0:59 seconds to go, Notre Dame needs to stop Stanford THREE times to get Stanford to THINK about a field goal. Stanford will run the ball each time, and, even if they are unsuccessful in scoring, the clock will continue to run for FORTY SECONDS unless either team calls time out…

…of which Notre Dame has precisely ONE.

There’s no discussion or argument. I can sort of understand people having problems wrapping their head around the algebra necessary to understand the Belichick decision two weeks ago, but this is just silly. I just can’t see why these aren’t things noted by commentators for fairly big college football games, but I’m obviously not the only one who’s been railing against sports broadcasters.


Reflecting on One Blogger’s Predictions

November 5, 2009

So, two out of three ain’t bad, especially when you get the last one right. Good times. Not sure Matsui was a good pick for MVP, but Damon getting hurt and Matsui winning MVP leaves open the possibility for heated debate in the offseason as to who to keep.


One Blogger’s World Series Predictions

October 28, 2009

This doesn’t represent the views of my fellow blogger, but since we haven’t had much of a chance to blog recently, I thought I’d share my varied predictions about the World Series:

1) Yankees will go with a three man rotation. Despite what you hear about Gaudin being stretched out, I think you have to go C.C. three times in this series rather than throw Gaudin against this line-up and hopefully count on Sabathia out of the ‘pen on two days’ rest in a potential Game 7, and I think Yankees management are very willing to rest their Series hopes on their ace.

2) Posada catches Burnett at least once in this series. They’ll do it at least the second time around, possibly in Game 2 if the Yankees lose Game 1. This seems rather likely now, given the rumors that Bruney’s been added to the roster for Cervelli, but I just don’t think that the Yankees are willing to play in the Phillies’ park with a line-up without Matsui and Posada (and puts Pettitte and Molina instead).

3) Yankees in 6.

Hopefully, in the off-season, we’ll have a chance to blog some more and look at some interesting things, but we’ll get there when we get there.


2009 MLB Playoffs: Do I Feel Lucky?

October 8, 2009

It’s that time of year again! Baseball Playoffs!

I know what you’re thinking — okay, fine, I only got four of the eight playoff teams in my preseason predictions — this kid sucks at predictions. Blame the World Baseball Classic. But this year, we’re in luck. Break out your baseball gear, you bandwagoners! For those unfamiliar with the spectacle, not only will we see the sudden emergence of all things Red Sox, but with four other major-market teams (and their major payrolls), we also get to see the New York, Philadelphia and Los Angeles pockets magically find their colors.

Apparently, you can buy playoff appearances. So buckle up — we’re in for some obnoxious big-city fun this October. But before we begin, repeat after me: The playoffs are a crapshoot. Anything can happen in one series. The regular season is about being the best team. The playoffs are about being the luckiest team. So you could throw me season split stats, career postseason numbers, or some other obscure stat, but this sabermagician knows better.
The random variation in a five- or seven-game series is far too great to try to back up your playoff predictions. So sorry, A-Rod haters or Manny fanatics, I don’t want to hear the word “clutch.” They’re great players, but the results of those crucial postseason at-bats come down to luck. It’s like any regular season series, where someone is on a hot streak (the playoff hero) or a cold streak (the playoff goat). Luck. And timing. Did I mention luck?

There’s been a lot of work done on the probability of favorites and underdogs advancing in the playoffs. While there are issues such as home-field advantage and starting starters on short rest, the goal is to show how likely it is for an upset to occur. In general, take two teams, A and B, and put them in a five-game series, first to three. Team A always has a 60 percent chance of winning a game — thus, B always has a 40 percent chance of winning the game. If you do the math, it turns out that team B has a 31.7 percent chance of taking the series. Those are pretty good odds for the underdog to pull off an upset.

Of course, in real life, teams don’t usually follow the 60/40 split — that was for dramatic effect. It’s usually a lot closer to 50 percent, which means the chances for upsets are higher.
Having said that, let’s make predictions anyway. Half the fun is guessing, the other half is watching.

Red Sox vs. Angels: Angels in 5. That .300, switch-hitting, walk-taking, versatile offense will be too much. More simply: I’ll take .350 OBP 1-9 over El Capitan — Jason Varitek — and Alex Gonzalez any day. Just pray Brian Fuentes learns how to pitch.

Twins vs. Yankees: Yankees in 3. So the pitching’s shaky. But that offense is insane, and that sandbox park helps too.

Cardinals vs. Dodgers: Cardinals in 4. For the NL, Chris Carpenter, Adam Wainwright, Albert Pujols and Matt Holliday are the names of the Four Horsemen.

Rockies vs. Phillies: Phillies in 4. This is less the Phanatics than the Rockies — they’re just worse everywhere, and starting on the road doesn’t help either.

ALCS — Angels vs. Yankees: Yankees in 5. I really don’t want to see the Angels’ flyball pitchers in the Bronx. That shiny stadium will go well with that shiny pennant.

NLCS — Cardinals vs. Phillies: Cardinals in 6. Chase Utley, Ryan Howard and Raul Ibanez are just begging to be Tony LaRussa’d. Your pick: Joel Piniero vs. Joe Blanton. And no, I’m not bitter over last year, I swear.

World Series — Cardinals vs. Yankees: Yankees in 6. You know that miracle nobody-team that struggled through adversity and overcame ridiculous odds to win? This isn’t them. How good are the Yanks? Scary Good. Scary “what happens when they buy more players next year?” good.

Top Ten of Tennis

August 23, 2009

A non-baseball related post, but a point worth making all the same, I feel: on this Yahoo! Sports post, a blogger argues against a British newspaper’s claim that the current top 10 is the greatest top 10 ever (maybe just because they’re just so damn pissed that Murray can’t win Wimbledon). He goes on to argue on behalf of certain top 10s in 1987 and 1993, saying there certainly existed better top 10s at some point.

However, his way of comparing top 10s is by comparing the number of Grand Slams won by each member in the top 10. This is clearly not a relevant point; assume, for instance, that Andy Roddick won every Grand Slam over the past 10 years (a fantasy, I know) and led the current top 10, and compare that to a top 10 where every player won 4 Grand Slams in total over the previous ten years. The number of grand slams won by each player in the top 10 is completely unrelated to the level of play by each player and is more a testament to the competitiveness of play during the relevant time period.

The author does mention that the dominance of Federer and Nadal might have something to do with this, but then why bother even making this comparison? It’s not like I just disagree with his claim (in fact, I think he’s right), but this is obviously not a sensible way of comparing players across eras.


Random is as Random Does–A New Model to Predict BABIP

August 21, 2009

Batting average on balls in play (BABIP), or how often a batter is awarded a hit for a batted ball (excluding home runs and fouls), has been receiving a great deal of attention recently. The reason for this is that it has become widely believed that once a batter puts a ball in play, whether the ball is lined right at an infielder or just a foot to his left (and is thus unable to be fielded) is mere chance. As a result, a player whose hits happen to avoid fielders more often than those of other players will have higher BABIPs, but this is often due more to luck than skill, and this luck-dependent statistic can have dramatic effects on a player’s more traditional statistics.

For example, let’s look at New York Yankee second baseman Robinson Cano’s BABIPs and triple slash rates over the past three years:


2007: 0.306/0.353/0.488 (0.329)

2008: 0.271/0.305/0.410 (0.283)

2009: 0.318/0.351/0.509 (0.317)

While it isn’t clear which BABIP is more representative of Cano’s true BABIP, what remains obvious is that the fluctuation in BABIP can have rather large effects on more traditional statistics; this assumes, of course, that Cano did not suddenly become a worse player in 2008, then got better again in 2009, which could explain the drop-off in production in 2008.

Examining a player’s swings in performance in certain years and looking at the BABIPs during those years lead to similar results. It seems obvious that BABIP is rather fickle and bounces along rather frequently, and it’s usually pretty difficult to predict a randomly moving target. The fluctuations of BABIPs appear rather random, but I wondered–is it possible that these random fluctuations, when looked at over the entire population of major league players, are similarly random each year? If so, what sort of implications could that have for predicting BABIP?

Well, as it turns out, there is a name for processes that exhibit similar statistical properties over time. A process is said to be stationary if all aspects of its behavior are unchanged by shifts in time. In particular, there are weakly stationary processes, whose mean, variance, and covariance (but not skewness and kurtosis) are unchanged by time shifts. Real life examples of stationary process include the changes in stock prices, but not stock prices themselves. A quick refresher: the mean of a process or distribution is a fancy way of saying its average, while the variance of a distribution is a way to quantify how “spread out” the values are away from the average value.

I first sought to determine whether BABIP was a stationary process. To do this, my plan was to take all player seasons in which the player was of a certain age, then examine the BABIPs for those players as they aged. To test for stationarity, I would look at the statistical distribution of BABIPs for these players at each age; specifically, I would look at the mean of the BABIPs, as well as their variance, and hope to find that they remained constant over time. The reasoning behind choosing the seasons of a certain age only if the player played in previous seasons is to keep the pool of players in consideration the same from year to year. This ensures that the changes in BABIP are due to the player aging, not because there are different players being considered.

I used the standard and advanced batting statistics from from 1984 to 2008. I only included hitters that played at least 40 games in that year and averaged 3.1 plate appearances per game. The 40 was somewhat arbitrarily chosen, and, while I’m not sure of how many games I should have used, I don’t think the particular number changes the results of the study. After creating a list of players’ seasons that met these qualifications for each year between 1984 and 2008, I searched for all players who played at a certain age, I then collected the BABIPs for these players for each year following that year.

An example of this is the following: below are the BABIPs for players who played a year at age 19 (the table is cut off at age 26 due to spacing issues):


Age 19

Age 20

Age 21

Age 22

Age 23

Age 24

Age 25

Age 26

*Ken Griffey, Jr.









Ivan Rodriguez









Alex Rodriguez









Adrian Beltre









B.J. Upton







Justin Upton













Next, I computed the variance at each age, first by calculating the squared difference between a player’s BABIP and the mean BABIP for all players at that age:


Age 19

Age 20

Age 21

Age 22

Age 23

Age 24

Age 25

Age 26










The time series of the mean and variance of BABIPs can be plotted:



Note that the mean and variance are quite noisy. The mean fluctuates between 0.282 and 0.325, while the standard deviation (the square root of variance) moves from nearly 0 to 0.034. The noisiness might have something to do with the fact that players might still be physically maturing and might be getting faster at younger ages, but it’s probably more likely due to the small number of seasons with which we are given to use. I chose age 19 for this example primarily to show how the calculations were carried out, and the numbers here don’t seem to suggest that BABIP is a stationary process, but now, let’s look at years for which we have more data.

There were 945 seasons by players 27 years of age between 1984 and 2008. A graph of the average of their future BABIPs between ages 27 and 36 can be seen as the blue line below (I stopped at 36 since there were only 100 player- seasons at age 36):



As you can see, the mean and variance of BABIP are fairly stable. The mean here never drops below 0.292 and never goes above .299, while the variance remains tightly between 0.00125 and 0.00150. Trending a linear model on the variance graph shows a slope of -0.000001, suggesting that the variance is largely constant throughout time. A closer look at the actual distribution of BABIPs can be found in Appendix A, which follows the main post.

Examining the results for other ages show results similar to these, which suggests stationarity of BABIP. By accepting this result, we’re saying that all baseball players at a certain age see their BABIPs vary similarly around the same BABIP as they grow older. While some may argue that players are more likely to see their BABIPs begin to drop as they get older, it is possible that this is offset by a player’s increased experience or willingness to train harder in order to stay in the game at an older age. As the Red Queen said in Lewis Carroll’s Through the Looking Glass, “it takes all the running you can do, to keep in the same place.”

Since our process appears to be stationary, we can now to fit the data to certain parsimonious models; here, parsimonious refers to models without excess variables. One type of model that we can now use to fit our data is called an autoregressive (AR) model. An autoregressive process of order p is a process whose forecasted observations are modeled as a weighted average of its previous p observations plus some error. In other words, autoregressive processes follow a regression model where Yt is the “dependent” or “response” variable and past values Yi, where 0 < i < t are the “independent” or “predictor” variables.

The simplest autoregressive process is the AR(1) process, which states that under certain restrictions:

Yt – u = a * (Yt-1 – b) + et

for all t, where a < 1 and e_t is normally distributed with mean zero and variance of s^2 (also called white noise). The AR stands for autoregressive, while the 1 denotes that only the Yt-1 term is a dependent variable for predict Yt. A possible interpretation of the term a(Yt-1 – b) is that it represents “memory” of the past into the current value of the process. Other models, such as moving average (MA) models and autoregressive moving average (ARMA) models, can be used to fit the data as well, and some information about them are detailed in Appendix B.

If you remember from above, the one thing we haven’t tested for in order to be able to use an AR model is covariance. A way to test whether or not we can use an AR model is by looking at its autocorrelation function (ACF), which examines the correlation between all variables separated by any distance apart. So long as the autocorrelation function decreases as the lag increases and is not significantly away from zero at a lag greater than the order of our model, we can use an AR model.

The autocorrelation functions are not shown here, but an example of using AR models can be seen by fitting AR(2) models to the BABIP data for player-seasons of certain ages for which there is sufficient data. For us to use this model to predict 2009 BABIPs, you would have to take players who were 27 in 2008, then take their BABIPs at age 25 and 26, plug them into the model, then get their 2009 BABIPs. In short, these models, each for players of a certain age, are based on the BABIPs of a certain number of previous years’ BABIPs.

Using MATLAB gives us the following AR(2) models:

Age 21: Yt = 1 – 1.139 * (1 – Yt-1) + 0.138 * (1-Yt-2)

Age 22: Yt = 1 – 0.7559 * (1 – Yt-1) – 0.2435 * (1-Yt-2)

Age 23: Yt = 1 – 0.5237 * (1 – Yt-1) – 0.4766 * (1-Yt-2)

Age 24: Yt = 1 – 0.3756 * (1 – Yt-1) – 0.6241 * (1-Yt-2)

Age 25: Yt = 1 – 0.2386 * (1 – Yt-1) – 0.7612 * (1-Yt-2)

Age 26: Yt = 1 – 0.3228 * (1 – Yt-1) – 0.6775 * (1-Yt-2)

Age 27: Yt = 1 – 0.8248 * (1 – Yt-1) – 0.1756 * (1-Yt-2)

Age 28: Yt = 1 – 0.672 * (1 – Yt-1) – 0.3287 * (1-Yt-2)

Age 29: Yt = 1 – 1.004 * (1 – Yt-1) + 0.001366 * (1-Yt-2)

Age 30: Yt = 1 – 0.9495 * (1 – Yt-1) – 0.05216 * (1-Yt-2)

Age 31: Yt = 1 – 0.86 * (1 – Yt-1) – 0.142 * (1-Yt-2)

Let’s use these preliminary models on the 2009 New York Yankees and 2009 Tampa Bay Rays and see how their 2009 BABIPs match up with those predicted by these models (older players are not predicted since models weren’t fitted due to sample size issues):


2007 BABIP 2008 BABIP 2009 Predicted BABIP 2009 Actual BABIP
Mark Teixeira (29) 0.342 0.316 0.314 0.284
Robinson Cano (26) 0.329 0.283 0.314 0.314
Melky Cabrera (25) 0.295 0.271 0.286 0.278
Nick Swisher (28) 0.301 0.249 0.266 0.272
Dioner Navarro (25) 0.249 0.318 0.266 0.235
Carlos Pena (31) 0.297 0.298 0.296 0.236
Jason Bartlett (29) 0.3 0.332 0.33 0.39
Carl Crawford (27) 0.374 0.297 0.31 0.361
B. J. Upton (24) 0.393 0.344 0.375 0.316
Gabe Gross (29) 0.243 0.279 0.272 0.333

Lastly, some questions I anticipate being asked, along with my thoughts:

How is this different from xBABIP?

xBABIP, a description of which can be found here, is a pure linear regression model. Its regressors are various characteristics about a player, including line drive percentage, a measure of plate discipline, and contact rate, all of which must be measured and inputted into the model, and its authors found that all of their variables together explained about 35% of the variation in a hitter’s BABIP. Mainly, it is a descriptive model that uses data from one year to explain BABIP in that year alone.

Here, we showed that BABIP for the population of all baseball players remained stationary over time, which allowed us to use models like autoregressive models to predict future BABIP. These models only require a certain number of previous years’ BABIPs to come up with an estimate for the next year’s BABIP. Unlike xBABIP, this model is predictive in nature and attempts to forecast future BABIPs.

Is this better than xBABIP?

There’s still a great deal of work to be done before anything really conclusive can be seen. Going back to look at additional data and attempting to fit better models is a start, but I think there could be an empirical reason (i.e. experience) for the mean and variance of player BABIPs over time would stay the same, which means that this model is based on true assumptions.

What’s next?

I plan to go back and collect all of the yearly data from Baseball-Reference, then do a more careful job about why certain player-seasons are counted or not. Then, I’ll have to look more closely as to how many player seasons I want to try to fit my data to before trying to determine the best stationary model to use.

Also, there might be an argument for separating players based on their speed. Players that are speedsters in their early twenties might quickly drop off as they hit their thirties, so their mean BABIPs might actually decrease as they age. If this is the case, then stationarity is violated, meaning these models will no longer hold. We’ve briefly looked at Bill James’s Speed Score and looked at the “fastest” players–in particular, those with Speed Scores two standard deviations above the mean in any given year, but the sample size that results is laughably small. We will have to consider requiring less stringent requirements for the “fastest” players, or build in other factors as well.

If anyone has any suggestions regarding any of these, please feel free to comment.

This concludes the very introductory look into stationarity of BABIP and its possible implications for finding a model that best predicts future BABIP. The appendix, which includes information on the actual distribution of BABIPs over time as well as other possible stationary models, follows, and we’ll announce any adjustments or new findings as they come.


Appendix A: Distribution of BABIPs

While we found that the mean and variance of BABIPs remain constant over time, I was curious to look at what the actual distribution of BABIPs was, so I looked at the seasons for players at ages 27 through 37 that played at age 27, and I binned the BABIPs by 1/100ths; that is, BABIPs greater than 0.310 and less than 0.320 were put into the “0.310 bin”, and bins were created for below 0.180 BABIP, between 0.180 and 0.190, all the way to between 0.410 and 0.420, then between 0.420 and 1.000. The histograms for the player-seasons between the ages 27 and 30 are below:





The results look surprisingly normal, but why just guess when you can test it? Using a “mean” and “variance” taken by averaging the means and variances over ages 27 and 37, I de-meaned and scaled the BABIPs for each player for each year and used the Kolmogorov-Smirnov test for normality by testing the transformed BABIPs against the standard normal distribution. The results are below. Note that the number in the first column is the age of the BABIPs being tested, the second column is either 1 or 0, 1 for rejecting the null hypothesis of normality at the 5% significance level, and the number in the third column is the p-value, or the probability that such a distribution of values could occur if truly drawn from a normal distribution; the p-value is capped at 0.5.

Age H_0 p-value
27 0 0.0997
28 0 0.0869
29 0 0.0352
30 0 0.2688
31 0 0.8715
32 0 0.7591
33 0 0.4093
34 0 0.8195
35 0 0.5836
36 0 0.9466
37 0 0.1074

Here, there is no evidence for rejecting the null hypothesis, which means that there is no evidence to say that the BABIPs for each age over and including 27 is not normally distributed.

We can also use the Lilliefors test to test for normality. There is no de-meaning or scaling taking place here as the test examines the data for a fit against a normal distribution with unknown parameters. For age 27, we have the following results (the format follows from above):

Age H_0 p-value


1 0.001
28 0 0.2168
29 1 0.0438
30 0 0.0907
31 0 0.1457
32 0 0.5
33 0 0.0984
34 0 0.5
35 0 0.5
36 0 0.4022
37 0 0.2851

While normality is rejected for age 27 and 29, it is at the 4.4% significance level at the latter age, fairly close to the 5% significance level. In general, this seems to suggest that not only does BABIP have a constant mean and variance, each value in the process is normally distributed.

Appendix B: Other Stationary Models

Another model that requires stationary data is called the moving average (MA) model. While AR models show correlation at all lags, MA models only have correlation at short lags. The simplest moving average model is the MA(1) process, which states that:

Yt – u = et – h * et-1

where et and et-1 are against white noise variables.

Sometimes, however, you might want to fit a model to have properties of both AR and MA models. For that, we have autoregressive moving average (ARMA) models. An ARMA(p,q) model can be written as:

(Yt – u) = a1*(Yt-1 – u) + … + ap(Yt-p-u) + et – h1*et-1 – … – hq*et-q

Note that an ARMA(1,0) model reduces to an AR(1) model, while an ARMA(0,1) model reduces to an MA(1) model. Thus, we can use statistical software to fit our data for BABIPs following a given year to an ARMA model, then repeat the process for all given years.

If the process itself is not stationary but its differences are, then one uses an autoregressive integrated moving average (ARIMA) model, which is somewhat similar to ARMA models.

Yes! CarGo!

August 17, 2009

So after a week’s hiatus I figured I should get back into posting. If anyone watched the Rockies and Marlins doubleheader yesterday, you might have noticed outfielder Carlos Gonzalez smacked a homer in each game. I only noticed because I remember laughing to myself. Remember him? He’s the player that went over along with Huston Street and Greg Smith to the Colorado Rockies in the Matt Holliday deal this past offseason. He was rated the #1 Oakland Athletics prospect and the prospect with the best power by Baseball America heading into the 2008 season. Boy how time flies.

Last year, as a 22 year old for the Oakland Athletics, CarGo hit .242/.273/.361 on his way to a disappointing season. He was a pretty good fielder though, with an UZR of 10.2. When he was called up on June 5th this year, he continued his abysmal performance posting a .202/.280/.333 mark up to the All-Star Break. Apparently, something’s clicked since then and he’s been on a tear, hitting .388/.419/.746 good for a 1.165 OPS. Obviously he’s probably not this good, but it’s encouraging to see that he’s rediscovered his power stroke, posting an ISO of .232 vs .119 last year, as well as posting a better batting average while drawing a fair number of walks.

Digging deeper, we find the standard signs that explain most of his current success. We see more line drives, fewer ground balls, higher fly ball percentage and a higher HR/FB ratio this year compared to last year. One bit of concern is that while he is drawing roughly 8.5% walk percentage, he’s striking out 25.8%. Granted, if he continues to hit homers and extra base hits, the Rockies won’t mind but his strikeouts might catch-up. His BABIP is also at a high .349, which suggests he’s due to fall back to earth.

As you might expect, Carlos seems to be affected by Coors Field, where he posts a .280/.316/.560 line at home vs .289/.364/.474 line away. The thing that jumps out is the OBP and SLG difference, it appears that while he maintains roughly the same batting average, on the road he’s more apt to take a walk whereas at home he’s more apt to swing and let the ball fly, where I imagine he tries to use the park to his advantage. That seems to explain why his slugging percentage is much higher at home, and his OBP is almost inline with his batting average while on the road.

So what does this mean? Well, if he continues to hit well over the next 40 games, it could mean that next year he’ll be a force to be reckoned with, finally fulfilling his prospect potential. Imagine, with his power, Coors Field , and all this at only age 24, that could be pretty special.