For some reasons, I didn't watch a lot of Major League baseball games this past year. There are some reasons behind it, but out of that little games, one team I saw play the most was the Texas Rangers, surely thanks in large part to their signing of Japanese legend Yu Darvish. It still didn't have me watch a large number of their games in 2012, and almost none after the All-Star game, but every time I saw Darvish pitch for them I always suspected that he was terribly fooled by umpires that he lost on some called strikes. Upon the perception I always went to the box score to check who was the pitching duo of him on the game, and when I found the guy was Mike Napoli, I always heaved a deep sigh in my mind. Napoli looked atrocious in all aspects of defensive games behind the plate, needless to say framing, standing in the way of Darvish having his pitches called strike in his favor. While we won't disagree with the idea that Napoli isn't overall a good or even decent defensive player behind the plate, how do other catchers compare in the field of framing in 2012?
If you are a good follower on the Internet saber-world, it would be highly likely to have already read a couple of articles on catcher framing, notably by Max Marchi and Mike Fast. What I mean to do in this post is neither duplicate nor disparage their work. Rather, I just try to quantify catcher framing skills from the slightly different perspective and report the result here to share with you guys reading my blog right now.
What I used to compute framing skills is five years worth of Pitch f/x data, the brief details of which is described in the previous post. For those not having read it, my dataset is corrected for velocity, movement, and location between parks. Reclassification of pitch type has not been done at this time, but will be implemented in the future. For the specific purpose of this post, I'll use called pitches only (i.e. strike out looking and balls, intentional or unintentional) to attempt to estimate framing contribution and ability for all catchers during the FX era. I also omitted all starting pitchers as batters, but sadly couldn't take relievers out of the population. Relievers don't usually come to the plate, so it doesn't make any meaningful influence.
For each batter handedness and pitch count (or 'plate count', as Tango likes to call it; I'm personally fond of the term 'batting count' or 'hitting count'), I computed league average called strike rate on each actual pitch thrown with the identity of individual year also in mind (as I explained briefly in my last post, strike zone actually varies dependent on seasons). Then, I also figured out each pitcher, batter, and umpire's net called strike rate on each actual pitch thrown and got mean differences, lending itself to controlling for some bias inherent in each causative agent participating in the process. To give one example, here's the internal working process. Since I talked about Darvish and Napoli at the beginning of this article, let me feature the duo to illustrate the point. Darvish faces Mike Trout with Napoli behind the plate and Bod Davidson... - Oh! the Nemesis of all Japanese fans! - calling the game, and Darvish throws 93 mph fastball a bit far and away to Trout, that Trout kept the stick on his shoulder and Bob called the pitch 'Ball!'. In 2012, that 1-0 pitch to a right-handed batter is estimated to be judged strike 87.0% of the time according to naive model, and this probability functions as a reference point. Then, the estimated probability of called strike on that pitch is adjusted by the hitter, pitcher, and umpire's own rate. Net called strike rate for Yu Darvish is -0.1%, meaning every pitch Darvish throws is less likely to be called strike, on a very slight amount. Likewise, Trout's net called strike rate is -0.5% and Bob's 1.7%, so the 87.0% probability naive model spitted is adjusted that exactly the same pitch to the same-handed hitter in the same hitting count for the same year, is now estimated to be called strike 88.1% of the time. Since Bob is a pitcher-friendly umpire, that pitching combo could benefit from his wider zone. Nonetheless, that pitch is called ball, so Napoli is debuted -.09 runs (-.881 times 0.102 runs, the run value in 1-0 count) for this result. The same procedure is conducted through all called pitches for all years. The count-based run value is via my own calculation, that I figured out linear weights through the count for 2008 to 2012, weeding out all intentional walks, bunts, and pitchers as batters. I did some slight correction for the distribution of the quality of hitters in each specific count, since lots of Pujols, Fielder, Mauer, et al. see themselves in 3-0 count. Here's run values chart for all counts permutation.
So with the methodology being described, here's the result. Have a casual staring at the table below and go to the next paragraph.
Our best friend Jose Molina is the King of Framer, leading also this past year with 22 runs saved. There are also some familiar names behind Jose, such as Jonathan Lucroy, Russell Martin, and Yorvit Torrealba. On the laggers, you can see some brutal framers such as Ryan Doumit and Gerald Laird, also well familiar if you already read Max and/or Mike's stuff. Actually, my model has good correlation with Mike's model, at r = 0.92, carrer-wise, despite the slight discord in years (I can't do a comparison season-wise with Mike's, and it seems that Max didn't publish his result, so I didn't compare mine to his). How about regression amounts? To make a fair comparison, I fitted mine to Mike's standard, that is, each catcher season is paired with two years sample (2009 and 2011 vs. 2008 and 2010) and is used only if both of those sample includes more than 6,000 called pitches. My model spitted correlation at around r = 0.81, requiring around 2,700 pitches for the signal and noise to see the same amount of variance. For your information, Mike reported that he found around r = 0.7 and about 4,500 pitches for the regression purposes. It looks like a huge leap from Mike's, but is this a fluke? However, I wrangled with different minimum requirements of the number of pitches caught, different permutation of years, even three-years interval (like 2008 and 2011 vs. 2009 and 2012), consecutive single year, leaping single year (like 2010 vs. 2012), and both usual correlation and weighted correlation, and both arithmetic mean and harmonic mean, most of the results returned around 1,600 to 2,500 regression amounts. The first sample spitted, in fact, one of the worst outcomes. Surely, if you raise the minimum pitch requirement to say 12,000 for both sample, the amount of regression needed skyrocketed, but very few catchers meet the criteria in the first place. So 2,500 to 3,000 pitches may be a realistic suggestion.
As to the envelope of framing skills, standard deviation of yearly performance in framing runs with weights on the number of called pitches each catcher actually caught is about 7 or 8 runs, meaning 95% of all catchers reside within +/- 15 runs in a single year in framing performance. I cannot do any analysis in team-switchers right now, since I don't have any such data in handy. The next part is a bit technical, so if you don't like to get caught in such a swamp, feel free to skip to the "One More Thing" headline.
[Drawback of WOWY]
While I was playing around with my computation of framing skills, however, I realized that some catchers are under the significant influence of notorious bias immanent in WOWY, whose values are unduly deducted from their fellow pitchers' skill in persuading umpires from calling strikes, and the more vital point is, that effect is far more marked one than I had ever imagined. Brian McCann is a catcher, the Braves' regular catcher for all of the covered years in the experiment. He caught three quarters of all called pitches the Braves pitchers threw during the period, and another one fifth of them were caught by David Ross, Atlanta's main backup catcher who, fortunately for the Braves fans but unfortunately enough for practitioners, is also estimated to be a competent framer who caught 78% of all called pitches he has actually caught as a member of the Braves. And to make the matter more vexing, lots of Atlanta's pitchers only pitched for the team. Tim Hudson, Jair Jurrjens, Tommy Hanson, Mike Minor, Kris Medlen, Kenshin Kawakami, and Brandon Beachy, all of them didn't pitch for any team but the Atlanta organization. After all, only pitchers who threw at least 1,000 called pitches (roughly equivalent to 400 TBF) both as a Braves and other organizations were Derek Lowe, Javier Vazquez, Jo-Jo Reyes, and Michael Gonzalez. You can have a good grasp on how large and serious the effect is by clicking on the below image.
Out of all pitchers who threw more than 2,000 called pitched during the past five years that threw at least one as a member of the Braves, this graph shows each pitcher's net called strike rate attached a number implying the rate of the number of pitches that pitcher threw as a Braves compared to the total number of pitches he threw for all teams. Let's take a look at Derek Lowe, the fifth pitcher from the left. He looks like great in terms of getting extra called strikes in his favor, as you can see at the location of his number, further than 0.05 more than average (meaning he receives five extra strikes every 100th pitch compared to average, a tremendous amount). That 0.64 number attached to his location means he pitches 64% of the total number of pitches he threw as a Braves.
So here's a critical point. There are lots of points colored in green and attached number 1 at around 0.02 CS rate. Put this number into perspective, a true +.02 catcher can accumulate more than 20 runs in a year solely by framing, as good or better than the best fielder saves in a single year. Therefore, we basically threw away all records when Hudson, Jurrjens, Hanson, Minor, etc... is on the mound completely. And the end result? McCann is originally estimated to have saved eye-whooping 160 runs, by far the best figure and even about 50 runs more than runner-ups Martin and Lucroy. Ross is also a good framer, estimated to have helped his team near 70 runs by framing. But since the bias is destroying the two Atlanta's catchers, Ross is now estimated to have 15 runs and McCann is... negative 4 runs! Unreasonably huge drop. Not that we assume all of surplus values in Hudson, et al. should be credited to the Atlanta's catchers. However, the bias certainly exists, with tremendous extent. If you sort all catchers in descending order by the absolute difference between adjusted and non-adjusted runs, ten out of top 20 catchers caught for only one team and another four caught more than four fifth of all called pitches for a single team. And while it's no doubt that Jose Molina is the very talented framer, one of the reason his excellence shines in the ranking is he caught for three teams with not much of disproportionate amount of playing time, and two of the three teams of which he was a member employed lots of pitchers who transferred at least one time in the past five years. In summary, WOWY is a great approach, can be useful in lots of scientific fields, and one of my favorite methods in sabermetric analysis. It's very effective applied to analysis in career level (Tango did some catcher analysis back in 2008 in the THT Annual), and still good enough even for less years (five years in this case), but the bias is eroding more and more and the degree of amount is far greater than my initial expectation is. I hope to tackle with this issue in the future, but at this time I couldn't touch more on this topic.
*** Actually, McCann is also an interesting subject in yet another area. Did anyone notice that Mike's model and mine have a starkly different view on his contribution, despite taking the similar approach? The reason is the occurrence of his contribution in terms of run values is skewed that he failed to succeed in counts where the impact of the call is important. It might be easier to get in relation to the timing of events in ERA, but unlike the sequence in ERA, we (or I?) have little or no knowledge on the degree of voices the signal has compared to those by noises in the timing of framing and it should be, I think, researched further down the road. For your information for the time being, if I set all run values in counts to a uniform, McCann is instead estimated to have saved 10 runs. Moreover, not all catchers are subject to such a steep variation. After all, the only catcher whose value is exposed to similar magnitude is Miguel Montero, in the same direction. If you take the mean absolute differences for all catchers with the weights on the number of called pitches, the calculator spitted out slightly less than 5 runs, and weighted standard deviation is a bit less than 4 runs.
[One More Thing]
Having dove too far into the technical part, have you got exhausted? Then let's come back to the topic in the introduction paragraph. As I stated at the beginning I felt like Darvish was hugely fooled by umpires in getting his pitches called strikes early in the year (I wonder I watched only a few starts after the first three months). So did my gut hold true? Or was it I who was fooled by umpires?
In part 2, I plan to dig into examining framing in a variety of situations, such as parks, base-out situation, home/away, run differences, pitch types, etc... It wouldn't be out at least in two weeks, so stay tuned and wait the next release without heed.