Interview with J.C. Bradbury, Economist and Sabernomics Blogger

by Alex Remington on November 13, 2009

bradbury_hsJ.C. Bradbury is an economist at Kennesaw State University and author of the popular blog Sabernomics. He uses economic principles for baseball analysis, which often makes him an iconoclast in sabermetric circles. We haven't spoken to him since April, so it's good to have him back on the blog!

1. It's great to have you back in the blogger community after a long hiatus. Are there any special tidbits you can share with us from your research?

Thanks! Glad to be back. I'm appreciative of readers who have come back to following me. I did not realize how much I needed the break at the time I stepped away. I knew writing had become difficult, but I thought it had more to do with outside obligations that were zapping my time and energy. When it came time to start blogging again, I was almost dreading it; but, I had promised to come back, so I did. However, I have been surprised how easy it has been to write. Sometimes, a vacation is really in order. I will probably take periodic vacations from time to time to avoid the rut I'd gotten into after five years of blogging.

During my hiatus, I spent most of my time working on a book project. Although it was not my original intention, the book is completely about valuing players. In The Baseball Economist, I used a simple system that I had developed to value players. It was actually developed as a side project, but I thought it might be of interest to readers so I included it in the book at the last minute. I'm glad I did, because when I used it to value players on Sabernomics, I received a lot of feedback, especially questions. So, I decided to lay it all out there; and when I was done, I had a 200-page manuscript instead of the single chapter I had planned on writing. I completely tore down and rebuilt my model, which allowed me to improve the model and correct past mistakes. The book is at the editing stage, and I'm just beginning to circulate drafts to publishers right now. The plan is to publish a year from now. But, with the hot stove league upon us, I plan to use much of the work in the book to evaluate deals this offseason.

2. Your recent post "Are General Managers Myopic?" reflects a sentiment not often seen in the internet baseball community: the humility that GMs know than we do. But there will always be GMs who are ahead of the curve and GMs who are behind. Do you see any qualities which appear to be undervalued by the market?

I think some GMs are better than others just as some economists are better than others. But, at some level all economists agree on basic core principles. GMs are no different. My contention isn't that they are all equally bright or that they don't make mistakes, but that they are not making the boneheaded moves that people often claim that they are. The stupidity required to not understand that performance fluctuates from year to year is off the charts. No GM thinks that a player is exactly what he is at the moment, especially when his recent performance is so different from his career. Certainly, it's bad policy to expect GMs to be fooled by hot and cold performance. The baseball insiders I've talked to don't believe in this widespread ignorance either.

3. The biggest question for the Braves right now is what to do with our overstocked pitching staff, and the two that Braves fans would be gladdest to be rid of are unsurprisingly the two least valuable, Kenshin Kawakami and Derek Lowe. How can the Braves best maximize their resources? All embarrassment aside, would a straight salary dump be a good idea?

What a nice problem to have. The Braves have too much of something that the rest of the league desperately wants while needing to replace Adam LaRoche, Mike Gonzalez, and Raphael Soriano. I might be tempted to use Kawakami in relief, but I'm not sure how he would adjust to the switch---though he did pitch some out of the pen in 2009. I think the Braves will move a starter for the best bat that they can get; and they'd be smart to do so, because I believe the market may be overvaluing starting pitchers right now. They'd like to move Lowe, and wouldn't mind dealing Kawakami or Vazquez if the package is right. I even think that they'd trade Jurrjens, because he's still young and cheap enough to bring a good return. Hanson appears to be off-limits. I don't think Kawakami would be a salary dump. He pitched at a level similar to his compensation last year, and I have to believe that bodes well for a year when he's making a major life transition.

Lowe will be harder to move, but I do think he is better than he performed last year. I'm all for a Lowe-for-Milton Bradley swap. Hell, it couldn't make the Braves less interesting. This team is about as vanilla as it comes, and I'd welcome a little excitement. Bad personalities didn't do anything to hurt the Falcons. They're still complaining about D'Angello Hall and Mike Vick; and there are still plenty of people who love them. Trading one good overpaid player for another isn't such a bad idea. Bradley is a very good offensive player who will bounce back from his 2009. If any team can handle his volatile personality, it's the Braves; remember, Gary Sheffield was a perfect teammate here.

4. Another big question which we'll need to answer in the next few days: whether to re-sign Adam LaRoche. At this point in his career, we know that Adam is an extreme second-half player who provides moderate power production from first base but is prone to slumps, mental mistakes, and molasses-slow baserunning. Compared to what's available on the open market, is Adam worth it?

I don't think Adam is really a second-half player. He's had some good second halves, but the statistical evidence is that such splits are random (see Albert and Bennett's Curve Ball). On occasion, it's going to happen that a player has an extreme split one way or the other. Maybe it's something that's unique to his style, but I doubt it. Having said this, I'll take Adam with his warts. He's a good hitter that the Braves can use. There has been a lot of talk about the Braves needing a right-handed power bat. But, between Diaz, Church, McClouth, and Heyward/Schafer the outfield should be fine. The Braves need a bat, and I don't care what position he plays. Why not get that pop at first base? If not Adam, I think the Braves would be wise to upgrade their offense at first.

5. What are some of the biggest misconceptions in the internet stat community? If you could do away with one trendy cliche or assumption, what would it be?

Well, I think the community's main problem is hubris combined with a groupthink attitude. Ken Rosenthal discussed this in a recent column and the sabermetric community chose to chastise rather than heed his point. This is compounded by the fact that those conducting and consuming the analysis aren't adequately familiar with the employed analytical methods and so bad analysis sometimes becomes part of the group belief.

For example, the other day I pointed out an excellent study by economists Jahn Hakes and Skip Sauer that examined baseball's labor market at Baseball Think Factory . A commenter responded "Tangotiger hates the Hakes & Sauer paper"; I guess I was supposed to defer to him. Anyway, I followed the link and the analysis conducted by the pseudonymous sabermetric icon doesn't refute the findings at all. He's plugging in extreme values into a model to make absurd predictions outside the sample for a model that the authors are acknowledging is out-of-whack with what should be. For some reason, this damns the model. I have seen Jahn and Skip present their work several times in front of many economists who are well-versed in the techniques used. It's been vetted by skilled referees and editors and published in respected academic journals. I've read their work closely and talked to them about it.

Yet, what bothers me is not that someone reaches an erroneous conclusion, but that the commenters wholehearted embrace the flawed critique, which it is later parroted across the Internet. No attempt is made to contact the authors, or submit a response to the journal that published the article--a common practice when flaws are discovered after publication. That's not what this is about, it's some sort of status game--chest thumping at a safe distance. Sabermetrics (with a big S) has become a club focused on rhetoric, not a serious research program.

  • Twitter
  • Yahoo Buzz
  • Digg
  • Delicious
  • Reddit
  • Google Bookmarks
  • Mixx
  • StumbleUpon
  • Technorati Favorites
  • Fark
  • Share/Bookmark

{ 20 comments… read them below or add one }

ekogan November 13, 2009 at 2:58 pm

I’m the poster that JC is complaining about.

First a disclaimer, I’m neither a statistician nor an economist, so my understanding of the issues may be faulty.

I have two objections to Hakes & Sauer. One, they, like all other academics writing about baseball, ignore all research done by amateur sabermetricians. Two, they throw both rate stats and counting stats like PA, and correlated stats like OBP and SLG into the same regression. Maybe JC is happy that H&S;try to reinvent the wheel with sloppy work like that and would call it “avoiding groupthink”, but I think it just makes H&S;’s conclusions worthless.

JC claimed that my objections to H&S;were simple appeals to authority of Tangotiger. In this case, I (as far as I can tell) actually understood Tangotiger’s objections and agreed with them. If I did though had to appeal to authority, JC Bradbury would not one I would trust to be correct if I didn’t understand his reasoning.

I’ve been trying to figure out why JC was offended by a blogpost comment. After all, he’s been writing on the Web for a long time, and so can’t be a thin-skinned person. I think it is because he is an economist and thus is infected with the arrogance of the economics profession. In recent years, economists started believing that they could take their statistical toolbox and apply it to analyze any kind of data without first learning about the subject domain. They believe that they can thus reach valid conclusions, even overriding the opinion of domain experts, but since they didn’t bother to learn about the subject first, they often just make fools of themselves. But, god forbid, someone point that out, they are just perpetuating hidebound groupthink. Regression says so!

Here are some examples of economists blundering about with sloppy statistical analysis on non-economic topics:
Kovash/Levitt wrote a paper saying that pitchers throw too many fastballs, but basically consider balls or called or swinging strikes to be of no account.
In Freakonomics, Levitt claims that drunk driving is better than drunk walking. Here’s why he’s wrong
And speaking of Freakonomics, let’s not forget their misleading and sloppy global warming chapter.

Sorry JC, being an economist doesn’t mean that you get to ignore research of other people.

Reply

Alex Remington November 13, 2009 at 3:36 pm

Ekogan, thanks for coming to our site and responding. I’ll let JC speak for himself. However, I’m a bit puzzled by your last two paragraphs. It looks like you’ve found a couple links of non-economists criticizing the work of economists. Are you asserting that the work of economists is inherently suspect?

Reply

Nick Steiner November 13, 2009 at 3:58 pm

That’s absolutely not what he was saying. He said that just because you are an economist, doesn’t mean that you are always right. He gave examples of very good economists who were wrong. That reinforces his, and my belief, that the Sauer study in question was “wrong”.

Reply

ekogan November 13, 2009 at 3:59 pm

Are you asserting that the work of economists is inherently suspect?

Inherently? I don’t know if I’d go that far. However, statistical analysis of data has a lot of subtle ways to screw up, and needs to be applied correctly to get valid results. It might seem that a paper by an academic with a lot of math formulas in it has to be correct, but that paper might contain quite elementary mistakes which are hard for a lay reader to see under all that math. Those mistakes would have been easy to correct if the authors would consult with the subject matter experts before publishing, but the economists often don’t bother. Perhaps they think they always know better. I can read the Kovash/Levitt and Hakes&Sauer papers and then read Tangotiger’s criticisms of them and decide that Tangotiger is right for myself.
Then, when I learn that Levitt wrote that legalizing abortion leads to a drop in crime 18 years later, and I have no prior knowledge about the subject, should that make me more suspicious that he’s wrong somehow? I’d say it should. My rule of thumb would be that if I hear that a subject matter expert and an economist disagree, and I don’t understand the arguments well enough to decide for myself, I’ll believe the subject matter expert.

Reply

Alex Remington November 13, 2009 at 4:17 pm

Nick and ekogan, I agree that being an economist certainly shouldn’t entail an automatic presumption of rectitude. That’s why the peer review process exists. And I certainly appreciate that statistical analysis is an art that too often gets treated as a science. You can’t blindly examine a contextless wall of data and come up with meaningful conclusions.

But the reason I asked my question is that I’m not sure I understand the connection Ekogan draws between the assertions in Superfreakonomics concerning global warming, and the work of J.C. Bradbury or of the economists he cites, except insofar as they are all economists. Can you help me understand?

Reply

Anonymous November 13, 2009 at 4:06 pm

Alex,
His point is that economists have a reputation for building models that are irrelevant to the real world because the economists have not taken the time to understand the domain. This is a common criticism that pops up anytime economists work in interdisciplinary contexts in the academic world.
It’s unfair because there *are* thoughtful economists out there, but Bradbury digging his heels to defend the problems in Sakes and Hauer’s work does nothing to refute that reputation.

Reply

J.C. Bradbury November 13, 2009 at 4:21 pm

ekogan,

I do apologize for singling you out. I don’t recall interacting with you before, you just happened to provide an example recently that I could use, and I went with it.

First, let me say that I was not offended by your post. I’m not seeing any obvious negative emotion in my response. If you are interpreting this, I want to make it clear that offense wasn’t a response I felt. I just found your deference to an authority who was making an erroneous point to be an example of a transmission of mistaken knowledge. Had I found nothing but the initial post at the site, I wouldn’t have thought much of it. I was more concerned that the comments seemed to view the critique positively.

Now, I will reply to your responses.

they, like all other academics writing about baseball, ignore all research done by amateur sabermetricians.

What specific piece of sabermetric research is missing? I have seen exactly one study done on the subject (by Pizza Cutter), and it was posted last week. The references in their first paper include Bill James, Doug Pappas, George Lindsey, and Jay Bennett. Both are Bill James fans. Jahn has a rather large sabermetric library. The Clemson economics department conference room is stocked with books of baseball statistics, including sabermetric authors.

Two, they throw both rate stats and counting stats like PA, and correlated stats like OBP and SLG into the same regression. Maybe JC is happy that H&S;try to reinvent the wheel with sloppy work like that and would call it “avoiding groupthink”, but I think it just makes H&S;’s conclusions worthless.

It’s easy to pick on any model as being imperfect when we live in an imperfect world. Variables chosen are often less than ideal, but must be used because are the best available. I have seen them asked and defend their variable choices. They have conducted robustness tests to make sure they are not getting spurious correlations. Editors and referees do this type of thing. A final specification that seems arbitrary is actually the product of deliberate thought, and is certainly not sloppy.

Now, let me test your knowledge. Do you know what a correlation between included variables causes? Multicollinearity. What is the negative consequence? High standard errors, not biased coefficients. What’s the rule when t-stats are >2? Put both in the model! And considering that they were trying to replicate a hypothesis put forth in Moneyball using these two variables, it made sense to use them. Furthermore, in a follow up paper, which I linked to in the initial BTF thread, they break out the stats into their components and the results still hold. The lesson here: imperfection doesn’t damn the model. By calling the conclusions worthless (which is done out of ignorance on your part) you have thrown out the baby with the bathwater.

And finally, if you think there is a better way, then do a superior study. Science doesn’t work by pointing out imperfections, it progresses by replacing the inferior with the superior. Seize the opportunity and do the study. Should it yield different conclusions, I assure you that Hakes and Sauer would agree that you have surpassed their work.

In this case, I (as far as I can tell) actually understood Tangotiger’s objections and agreed with them. If I did though had to appeal to authority, JC Bradbury would not one I would trust to be correct if I didn’t understand his reasoning.

You don’t have to trust me or any other economist. I disagree with many economists on various issues, sometimes siding with non-economists over economists. Though, I am curious. After you acknowledge that you have no experience or training in economics or statistics, why don’t you trust the many economists who regularly do this type of analysis who vetted the work? The referees were chosen because of their specific expertise for this type of analysis (and yes, this includes a knowledge of baseball). The Journal of Economic Perspectives is a major publication outlet that will only publish papers that meet rigorous standards. Just look at the list of editors who are risking their reputations by allowing such a paper through (http://www.aeaweb.org/jep/contact.php) (though the editors has changed since the initial paper was published). Instead, you defer to a guy who is a prolific writer, but doesn’t even use his real name. The only people reviewing his work is another group of people who may or may not be qualified to evaluate the argument.

Being an economist or ant kind of professor doesn’t make one right. I have too much experience with horrible researchers working as professors. If I thought the criticism was valid (and I do have significant experience doing and teaching econometrics) I would agree that the findings were wrong. But I think the Hakes and Sauer paper’s findings are right. The paper has been widely circulated and cited. Both authors have a long history of good empirical work. On top of this, the paper circulated widely before it was finally published. After passing all of these checks, it’s getting less and less likely that these guys are at fault. This all fits with my own personal impression is that the econometric work is sound. And if it is unsound for some reason that I have not identified, it is not because of what Tangotiger said.

Defering to experts is not always about credentialism or being snooty. Credentials can cause credentialism, which I abhor, but they also reflect useful information that should not be ignored. We defer to experts for a reason: they tend to have superior knowledge. I don’t want people to defer to me because of credentials. I have a blog, I communicate with many readers to engage their arguments. I change my mind when I see that I have erred. But, at the end of the day, I must have my work vetted by other experts (sometimes non economists, as is the case with my research on aging) or I’ll lose my job. I can’t tell my dean, Yoda27 thinks my study is nifty, nor does he care that FatBoi34 thinks my work is hogwash. Instead, we find a group with an established reputation (normally an academic journal) that signals its quality. It does so by associating with other good scholars. These people are human and make mistakes, but they are less likely to err than interested anonymous parties cruising the net.

In recent years, economists started believing that they could take their statistical toolbox and apply it to analyze any kind of data without first learning about the subject domain. They believe that they can thus reach valid conclusions, even overriding the opinion of domain experts, but since they didn’t bother to learn about the subject first, they often just make fools of themselves. But, god forbid, someone point that out, they are just perpetuating hidebound groupthink. Regression says so!

Is that really what happened here? There was no economic theory involved? There was no formal model developed or tested? This was just some stepwise regression throw everything at the wall and see what sticks exercise? No. They deliberately set out to test a hypothesis put forth by someone else. You acknowledge that you have no background in statistics, yet you poke fun of two experienced empirical researchers for misusing their technique.

Sorry JC, being an economist doesn’t mean that you get to ignore research of other people.

Yeah, because I ignore the good research in sabermetrics. Let’s see, I’ve written an academic paper defending DIPS, giving full credit to Voros McCracken for his discovery. I wrote a chapter on how researchers in the social sciences must use the proper performance metrics in their regressions to prevent omitted variable bias (which includes a positive citation of tangotiger’s FIP) when using sports as a laboratory. I have a book that promotes the use of saber-approved stats, though I prefer the less complicated ones for parsimony. And finally, I would like to add a paragraph that I wrote from a forthcoming academic paper.

“we must acknowledge that sabermetricians are responsible for many important discoveries that should inform sports economists in their research. In fact, we have observed instances where economists have made elementary mistakes in understanding sports that would not have been made if they had paid attention to sabermetricians’ findings. It is our hope that academic researchers using sports as their laboratory will review the analysis from all relevant outlets, including the sabermetrician community.”

What am I missing? I am being more ignorant of sabermetrics, or are you being more ignorant of econometrics?

Reply

Nick Steiner November 13, 2009 at 9:01 pm

JC – You’ve demonstrated that you don’t consider Saberists subject matter experts. That’s a big problem, because they very much are.

When Tango, who is the very best Saberist out there (consults for major league teams, has written a book like yourself, etc.) has a problem with a study, you should listen to it. You don’t have to agree with it, but you should give it just as much respect and authority as you would a peer review from one of the best Economists out there.

Now onto the actual criticism that he had of the Sauer-Hakes paper. His main point was that a regression needs to be backed up by causal conjecture before it can be considered valid, not the other way around, especially given the somewhat small sample of data that Sauer-Hakes worked with. One way to show that that Sauer-Hakes’ model didn’t make sense, was to plug in numbers into it.

The numbers he plugged into it were that of a player with 600 plate appearances and the production of a pitcher, and a player with 400 plate appearances and the production of a league average player. He found that they were practically even in terms of value. Doesn’t that fudge the model right there? You say you can break any model by taking it to the extreme; however, even when you don’t:

No, the point is how far I have to go to find an equivalent to a .340/.400 player with 400 PA, if I give the other guy 600 PA. Basically, I can’t even find a guy so bad that he would be valued worse than the .340/.400 400 PA guy. It’s not that it breaks down at the extremes, but it simply breaks down period.

So, when a guy, who is one of the foremost subject experts on the valuing players through stats, has a problem with a study and backs it up with several examples – you should probably agree with him.

Now, if you have numbers that can counteract Tango’s, then disagree with him on the merits of those. However, you CAN’T just appeal to the fact that Sauer and Hakes are Economists and formally published their paper. That, in itself, doesn’t make them right.

Reply

JC Bradbury November 13, 2009 at 9:56 pm

Nick,

You’re missing what I am saying. Tango has an opinion that I think is seriously flawed, and I state exactly what his problem is: “He’s plugging in extreme values into a model to make absurd predictions outside the sample for a model that the authors are acknowledging is out-of-whack with what should be.” How on earth this dooms their entire paper, I have no idea. That’s all I need, because I understand how multiple regression analysis estimates the coefficients. But, maybe someone who doesn’t understand econometrics is confused by this. So, I then argue that the guys with significant training and experience who have a submitted their analysis to outside scrutiny by other qualified researchers probably know what they are doing. It’s possible that they don’t, but the available evidence indicates that they do.

Reply

tangotiger November 13, 2009 at 10:14 pm

As I posted at Primer:

[quote]
Here’s another example of the Sakes/Hauer model, if you use their equation from Table 2, years 2001 (which you can see in post 29 above).

Make someone a free agent, with 600 PA, a .330 OBP, .400 SLG outfielder. His salary is 3.2MM. Make him a SS or C, and it’s 3.4MM.

Does that make sense to anyone?

What if we give him 500 PA instead, with a .500 SLG and .400 OBP. Pretty attractive, right? Well, it’s the same 3.2MM$ salary for the OF, and the same 3.4MM$ salary for the SS.

Now, if they are suggesting their empirical model, based on actual performance data of 2000 that says that an OF with 600 PA, a .330 OBP and .400 SLG is making in free agency as much as the SS with 500 PA, with a .400 OBP and .500 SLG, then should I accept that? Do I need to go further to disprove them? Even if they can somehow justify it, isn’t it more plausible to think that this is some sort of anomoly whereby the uncertainty in the data is so large as to make such a conclusion insignificant?

You have to have SOME logical underpinnings, don’t you? You can’t just throw everything into the regression grinder, and think that you will get a filet mignon, or even a kielbasa? Sometimes, you get dog food.
[/quote]

Walt Davis also provided anectodal evidence in that same thread:
[quote]
But I’ll agree those results look fishy. About the lowest possible realistic FA-eligible salary — a starting SS posting a 330/350 line, say — is estimated at about $3 M. In 2001, this, roughly speaking, would have been Royce Clayton ($4.5 M), Pat Meares ($3.8 M) or Tony Womack ($4 M and not yet an FA) so it’s looking pretty badly off at that end. (For 2B, you’ve got Mike Lansing at $6.25 and McLemore at $2; at 3B, Brosius at $5.25.)

Meanwhile a 400/600 FA-eligible OF is predicted to make less than $6 M. We’re talking Barry Bonds ($10.3), Sheffield ($10), Sosa ($12.5), Alou ($5.25), Giles ($7.3), Burks ($5.7), Edmonds ($6.3). So it fits pretty well for 3 of the 8 but is generally way low.

So it’s a model which fits the middle of the data but not either end of the data. Pretty much all models do that. The fact that Mike Lansing made more than what the model would project for Gary Sheffield is a pretty good indicator that the model didn’t fit the data well.
[/quote]

Reply

Nick Steiner November 14, 2009 at 1:05 am

JC – I know how a multiple regression works. It’s not that hard. I learned how to do multiple regression in 10th grade – I have done it many times on things related to baseball. So has Tango, I’m sure, and half of the other Saberists you are disparaging for not understand Econometrics.

However, when dealing with a small sample size of data, as Sauer and Hakes were, and running the regression against a variable like salary (which is influenced by many other things than the parameters than they inputted), you may very well get wacky results. Mainly, because a regression is unaware of logical inputs – it just works with what it sees.

You should read this:

http://www.insidethebook.com/ee/index.php/site/comments/the_sauer_hakes_moneyball_spreadsheet/#comments

Tell me if you see any ” extreme values” in those examples.

As to this:

So, I then argue that the guys with significant training and experience who have a submitted their analysis to outside scrutiny by other qualified researchers probably know what they are doing.

That is exactly what I am arguing AGAINST. Knowing how to value players takes knowledge and statistics, proper regression techniques and economic training. It also takes knowledge about baseball and how it works. The economist don’t believe that Sabermetrics is all that hard, for some reason, so they don’t consult a subject matter expert, like Tango, and just do the standard regressions ignoring all logical factors. And that’s exactly what you see with Sauer and Hakes paper.

Now, I know that you, JC, are different than that. You have done some good work in the Sabermetric field and have a respect for the Saberists out there. That is why your unflinching support of economist who clearly don’t know as much about baseball as you, is surprising.

Reply

JC Bradbury November 14, 2009 at 7:57 am

Nick,

The subject is far too complicated for a tenth-grader to learn. Someone may have introduced you to the basic idea, but you did not learn it. Now, if I’m wrong (and I doubt that), please tell me who the author of your econometrics text was and which software package you used. Or I’d like to see some examples of past regression analyses that you have conducted. Right now it doesn’t seem to me that you do understand the subject. Excuse me for using my experience to make this judgment (Oh but now I’m going to be accused of credentialism, because credentials are only about making other people feel bad).

What I find strange is that you think econometrics is the easy part while the baseball component is the area where expertise is needed. Econometrics requires years of training and practice. What does sabermetrics require? Watching a lot of baseball? Thinking about how the game works? Reading Bill James or other sabermetricians? Did it ever occur to you that these are things that economists do in addition to being an economist? As I stated above (I get really irritated when people accuse me of not responding when those who respond to me don’t bother to read what I have written), both Skip and Jahn are highly knowledgeable of sabermetrics and life-long baseball fans. Jahn has an impressive library of a sabermetric books that would make most sabermetricians jealous. Jahn and Skip have been involved with sabermetrics longer than I have. Part of the reason I know them is that we were connected after learning of our mutual research interest. Why do you think they wrote the paper?

Nick Steiner November 14, 2009 at 1:30 pm

JC – I’m not implying that I am a better statistician or econometrician than you or Sauer-Hakes, or even close to the same level. Simply that I have a rough understanding of what they did. If I were looking to do a regression model of something, I would certainly consult a subject matter expert like you or Sauer-Hakes before I proceeded to run the regression.

Conversely, if I was a pure statistician or an econometrician and didn’t understand how baseball works as well as others, I would consult a subject matter expert on whatever I was running the regression on to make sure it made sense.

In my opinion, Sauer-Hakes did not do that. Based on the results of their study, and the conclusions from which they draw from it, it appears that they did not properly understand the subject that they were running the regression on.

I’m am not implying that Sabermetrics is harder than econometrics, although they definitely have similar attributes – it’s most likely a lot easier. However, that doesn’t mean that you don’t need to learn and study about it to understand it, and it definitely takes a lot more than just watching a lot of baseball or reading some books.

If you are not an expert in a certain field, you need to consult someone who is to get the best results on whatever experiment that you are doing. I don’t see how you could disagree with that. Sauer-Hakes are not experts in the field of Sabermetrics, thus they should have consulted one before they ran the study.

And when a Sabermetrician criticizes that study, not for the math or the economics, but for the baseball related part of it, then yes, you should defer to him.

Coach (2010- Mr. Overrated retires) November 13, 2009 at 4:24 pm

Bradbury’s Lowe-for-Milton Bradley swap has some merit but unfortunately Bobby Cox would never permit it. As for his take on Adam Laroche, I agree that Adam is not appreciated nearly enough. People just do not realize how valuable a great defensive first baseman really is. Defense doesn’t show up in the box score and few do it better than LaRoche.

But I would appreciate Bradbury’s take on the reasoning behind overpaying for Derek Lowe one year ago and now attempting to trade him in part because Atlanta simply could not afford his contract in the first place. Frank Wren will be doing a one-eighty about face if Lowe is traded.

Reply

Shek November 13, 2009 at 11:05 pm

Here’s another example of the Sakes/Hauer model, if you use their equation from Table 2, years 2001 (which you can see in post 29 above).

Make someone a free agent, with 600 PA, a .330 OBP, .400 SLG outfielder. His salary is 3.2MM. Make him a SS or C, and it’s 3.4MM.

Does that make sense to anyone?

It certainly can make sense. You can’t field a team with 9 outfielders, and so the supply and demand for different positions will yield slightly different salaries even if offensive statistics are the exact same.

What if we give him 500 PA instead, with a .500 SLG and .400 OBP. Pretty attractive, right? Well, it’s the same 3.2MM$ salary for the OF, and the same 3.4MM$ salary for the SS.

Now, if they are suggesting their empirical model, based on actual performance data of 2000 that says that an OF with 600 PA, a .330 OBP and .400 SLG is making in free agency as much as the SS with 500 PA, with a .400 OBP and .500 SLG, then should I accept that? Do I need to go further to disprove them? Even if they can somehow justify it, isn’t it more plausible to think that this is some sort of anomoly whereby the uncertainty in the data is so large as to make such a conclusion insignificant?

This result is not implausible as well. Once again, you cannot field a team with 9 outfielders, and so there will be variation in salary based upon position. Also, a player with significantly more PAs will correlate with someone that played more games, meaning that the team didn’t have to substitute a lesser player. Is there not value in that?

I don’t have my Stata up and running, and so I had to do a down and dirty filter of the Lahman Database in Excel (and so I’m happy to be corrected if I fat fingered something and spit out incorrect calculations/filters), but I couldn’t find any players from 1999-2005 (the years of their original study) that actually exist with these kind of numbers. Thus, they aren’t just outliers from within the actual data, but are instead outliers beyond the outliers of the actual data.

Reply

Shek November 13, 2009 at 11:12 pm

Sorry, but I the hypertext tags I used were the incorrect ones and so the blockquotes dropped off of my reply. Bottomline, salary differentials based on position and plate appearances can certainly make a difference, and so the examples given do not invalidate the model prima faccia. Also, given how the results are reported, the calculations provided may not representative (for example, the PA coefficient could very well be .00025, which leads to a different predicted salary than .0003) the actual results well – the fact that PA is carried out to more digits isn’t important for the thrust of the model since PA are used only as a control variable and is not a variable of interest for the hypothesis that is being tested by the model.

Reply

JC Bradbury November 14, 2009 at 7:54 am

1) The goal of Hakes and Sauer was to test the Moneyball hypothesis that OBP was undervalued relative to SLG; hence, the title of the paper.

2) This test must include OBP and SLG in the model. The concept can be broken down and testing further, which they did, but what is interesting is if this central tenet of Moneyball is true. The exercise is not about designing the perfect model for predicting salaries. I vividly recall discussing this fact with the authors at the time the paper was written when I asked them about alternate specifications of the model. They responded that they had done this and this analysis would be a part of another paper, which it was, but were focused on Moneyball itself for this exercise. This then creates the problem of adjusting for playing time. This could be controlled for in ways other than plate appearances (e.g., interaction terms), but the authors ultimately decided the parsimony of their specification made it the right choice. Adding in the impact of all sectors of the labor market is another tough issue. Ideally, you would like to separate the labor classifications, but they are trying to estimate the market price for the entire labor market—reserved and arbitration-eligible players are a part of that market. So, they include dummies to act as a control. Again, interaction terms or some other correction could have been used, but they felt that their final specification was best. And they were able to convince many other economists (colleagues, editors, and referees) at different levels of review that what they produced was the best choice.

3) The goal of the study was if the market was out of whack at the time the book was written. The findings indicate the pre-Moneyball models don’t predict as well as the post-Moneyball sample based on what we would expect them to be. That is a point in favor of the paper, not an objection. Furthermore, in 2001 the labor market was especially out of whack, and I find it odd that it was the specification chosen for close examination. The regression equation was designed to pick up information from real-world data, the values are not something presupposed by the authors. The coefficient on OBP is negative—higher OBP lowers your salary. You don’t need to plug in any values to see that this is counter-intuitive. Part of the reason why the salaries remain so stable when Tangotiger adjusts the inputs the higher value for OBP cuts into the impact of SLG. As Hakes and Sauer acknowledge in the text, the coefficients on OBP are not even statistically significant—the market appeared to be ignoring the relevance of OBP at the time. That’s their argument.

4) So, the Hakes and Sauer papers may be imperfect, joining the ranks of every other empirical study ever written If you think you can do better, here is a solution. Take the freely available data and run alternate specifications. As it stands, the critique is that the perfect is the enemy of the good. If further testing reveals the labor market was not out of whack, then we have an argument.

Reply

Guy November 14, 2009 at 10:33 am

The question isn’t whether the model is perfect, but whether it is accurate enough to make a rather subtle distinction in how the labor market was valuing two highly correlated skills, OBP and SLG, over short periods of time. Their claim, as you say, is that the market was assigning no value to OBP in 2001. But we know this is wrong. Their own later paper showed (see table 3) that the market was valuing both batting avg and the ability to draw BBs long before Moneyball, from 1986 thru 2003. So by definition, it was valuing OBP. And the relative value of BA and BBs was actually not far from what H-S say is correct. You can argue from their data that power was overvalued (though I think their power metric is too flawed to be sure), but their data proves conclusively that OBP and BBs were valued.

I agree we shouldn’t cherry-pick one year with an odd model to criticize. But the real point there is that these coefficients fluctuate wildly at the annual level — see table 5 in the 2nd paper. One-year models just don’t work — but H-S insist on drawing conclusions from individual years.

But let’s not neglect the other major claim of the paper, that in one year, 2004, OBP went from being severly under-valued to properly valued. However, this cannot possibly be true, because at least 80% of the players in their sample were either in a multi-year FA contract signed before 2004 or were still subject to arbitration decisions by arbitrators whose decisions are governed entirely by precedents set in 2003 or before. It is literally impossible for player salaries to adjust like this in a single year. Putting aside the details of whether Hakes and Sauer’s model was or was not properly specified, the simple fact is that they cannot possibly be right about this—it’s an economic and mathematical impossibility. Doesn’t that matter? To choose to believe a model over what we know about the baseball salary market makes no sense. Econometric tools can be powerful, but still need to be used within the constraints of good judgement and common sense.

Reply

David November 15, 2009 at 10:07 am

You know what’s funny here? You don’t even need to have the slightest bit of knowledge about econometrics or statistics to know that JC is wrong. That’s the beauty of this argument. The H&S model does not work and saying it does over and over will not make it true.

Reply

Ken November 16, 2009 at 11:39 am

Can you define when an empirical analysis “works”? The authors set out to test a specific point, and then presented their results in as simple a way as possible. That point was simply that OBP was under-valued relative to SLG. I can guarantee that they have run a large number of different regressions is preparing this paper. The testing of alternative specifications is common, but as is also common they don’t all make it into the final paper – if those additional regressions don’t show anything that overturns the basic point then it makes sense to leave them out.

The H&S model works to show that OBP was undervalued relative to SLG, and that this issue has corrected itself. It does not claim to account for everything that affects salary – it doesn’t include age effects, it doesn’t include anything about defense – but these issues are not central to the main argument of the paper and are therefore not important here.

Reply

Leave a Comment

Previous post: Thursday Thinking: Hudson, Uggla

Next post: Friday Filler: Vazquez, Lowe, Kawagoe