Tuesday, December 25, 2012

Catcher Framing Runs, Part 1


 For some reasons, I didn't watch a lot of Major League baseball games this past year. There are some reasons behind it, but out of that little games, one team I saw play the most was the Texas Rangers, surely thanks in large part to their signing of Japanese legend Yu Darvish. It still didn't have me watch a large number of their games in 2012, and almost none after the All-Star game, but every time I saw Darvish pitch for them I always suspected that he was terribly fooled by umpires that he lost on some called strikes. Upon the perception I always went to the box score to check who was the pitching duo of him on the game, and when I found the guy was Mike Napoli, I always heaved a deep sigh in my mind. Napoli looked atrocious in all aspects of defensive games behind the plate, needless to say framing, standing in the way of Darvish having his pitches called strike in his favor. While we won't disagree with the idea that Napoli isn't overall a good or even decent defensive player behind the plate, how do other catchers compare in the field of framing in 2012?

 If you are a good follower on the Internet saber-world, it would be highly likely to have already read a couple of articles on catcher framing, notably by Max Marchi and Mike Fast. What I mean to do in this post is neither duplicate nor disparage their work. Rather, I just try to quantify catcher framing skills from the slightly different perspective and report the result here to share with you guys reading my blog right now.

 What I used to compute framing skills is five years worth of Pitch f/x data, the brief details of which is described in the previous post. For those not having read it, my dataset is corrected for velocity, movement, and location between parks. Reclassification of pitch type has not been done at this time, but will be implemented in the future. For the specific purpose of this post, I'll use called pitches only (i.e. strike out looking and balls, intentional or unintentional) to attempt to estimate framing contribution and ability for all catchers during the FX era. I also omitted all starting pitchers as batters, but sadly couldn't take relievers out of the population. Relievers don't usually come to the plate, so it doesn't make any meaningful influence.

 For each batter handedness and pitch count (or 'plate count', as Tango likes to call it; I'm personally fond of the term 'batting count' or 'hitting count'), I computed league average called strike rate on each actual pitch thrown with the identity of individual year also in mind (as I explained briefly in my last post, strike zone actually varies dependent on seasons). Then, I also figured out each pitcher, batter, and umpire's net called strike rate on each actual pitch thrown and got mean differences, lending itself to controlling for some bias inherent in each causative agent participating in the process. To give one example, here's the internal working process. Since I talked about Darvish and Napoli at the beginning of this article, let me feature the duo to illustrate the point. Darvish faces Mike Trout with Napoli behind the plate and Bod Davidson... - Oh! the Nemesis of all Japanese fans! - calling the game, and Darvish throws 93 mph fastball a bit far and away to Trout, that Trout kept the stick on his shoulder and Bob called the pitch 'Ball!'. In 2012, that 1-0 pitch to a right-handed batter is estimated to be judged strike 87.0% of the time according to naive model, and this probability functions as a reference point. Then, the estimated probability of called strike on that pitch is adjusted by the hitter, pitcher, and umpire's own rate. Net called strike rate for Yu Darvish is -0.1%, meaning every pitch Darvish throws is less likely to be called strike, on a very slight amount. Likewise, Trout's net called strike rate is -0.5% and Bob's 1.7%, so the 87.0% probability naive model spitted is adjusted that exactly the same pitch to the same-handed hitter in the same hitting count for the same year, is now estimated to be called strike 88.1% of the time. Since Bob is a pitcher-friendly umpire, that pitching combo could benefit from his wider zone. Nonetheless, that pitch is called ball, so Napoli is debuted -.09 runs (-.881 times 0.102 runs, the run value in 1-0 count) for this result. The same procedure is conducted through all called pitches for all years. The count-based run value is via my own calculation, that I figured out linear weights through the count for 2008 to 2012, weeding out all intentional walks, bunts, and pitchers as batters. I did some slight correction for the distribution of the quality of hitters in each specific count, since lots of Pujols, Fielder, Mauer, et al. see themselves in 3-0 count. Here's run values chart for all counts permutation.

BallStrikeRun Values
000.071
010.079
020.195
100.102
110.107
120.231
200.152
210.165
220.319
300.179
310.256
320.575

 So with the methodology being described, here's the result. Have a casual staring at the table below and go to the next paragraph.

NamePitchesRuns20122011201020092008
Jose Molina23363862211181719
Russell Martin449295861891214
Alex Avila2662241141963NA
Yorvit Torrealba27770395413610
Jonathan Lucroy21419366229NANA
Brayan Pena138623481556NA
Jeff Mathis281642774395
Jarrod Saltalamacchia26637271212-15-1
Geovany Soto392702664953
Gregg Zaun1307825NANA5154
Joe Mauer339032468441
Matt Wieters37193226772NA
Miguel Olivo33782176439-4
Taylor Teagarden939217-213131
Ryan Hanigan24471165428-3
Ivan Rodriguez2708315NA-1926
David Ross1566015-10178
Chris Snyder2514614-26111-2
Bobby Wilson100461410-1410
Miguel Montero3493812-32390
Eli Whiteside10915100253NA
Josh Thole1824410141-3-2NA
Yadier Molina4547494-5407
Paul Bako91228NANANA34
Francisco Cervelli11378802410
Wilson Ramos106767250NANA
A.J. Pierzynski459806-46-39-2
John Baker171676-20223
Henry Blanco136675-30-164
Brian Schneider18636503-214
Victor Martinez200955NA-17-32
Ronny Paulino186255-1-7147
Humberto Quintero2202040-8623
Kurt Suzuki499332-631-510
Jesus Flores14991072NA-2-6
Bengie Molina26324-1NANA34-8
Josh Bard13151-1NA07-6-3
Kelly Shoppach25856-11-3-1-13
Drew Butera11762-31-1-3NANA
Ramon Castro10166-4NA0-12-5
Wil Nieves14370-4-1-2-3-24
Brian McCann45845-402-1004
Lou Marson19314-4000-40
Buster Posey16529-6-2-3-10NA
Rod Barajas33616-11-8-12-383
Carlos Santana17674-11-9-64NANA
Dioner Navarro24181-13-1-22-1-11
Matt Treanor18052-15-3-1131-4
Kenji Johjima12984-15NANANA-9-6
Jason Varitek24643-16NA-9-71-1
J.P. Arencibia17510-16-2-12-2NANA
John Jaso14189-17-1-8-8NA0
Jason Castro9923-17-15NA-2NANA
John Buck37301-21-4-6-1715
George Kottaras13145-21-3-7-9-20
Rob Johnson15465-21-2-7-11-10
A.J. Ellis14250-22-10-5-81-1
Mike Napoli27429-23-100-3-4-5
Jason Kendall30160-28NANA-5-19-4
Koyie Hill14538-29-2-10-4-11-1
Nick Hundley25302-31-7-90-3-12
Ramon Hernandez29737-32-5-14-8-50
Carlos Ruiz37880-330-16-4-5-7
Chris Iannetta29701-35-7-8-1-4-15
Gerald Laird28067-39-8-4-7-11-10
Jorge Posada15817-44NA0-24-14-6
Ryan Doumit27070-69-16-8-2-10-33

 Our best friend Jose Molina is the King of Framer, leading also this past year with 22 runs saved. There are also some familiar names behind Jose, such as Jonathan Lucroy, Russell Martin, and Yorvit Torrealba. On the laggers, you can see some brutal framers such as Ryan Doumit and Gerald Laird, also well familiar if you already read Max and/or Mike's stuff. Actually, my model has good correlation with Mike's model, at r = 0.92, carrer-wise, despite the slight discord in years (I can't do a comparison season-wise with Mike's, and it seems that Max didn't publish his result, so I didn't compare mine to his). How about regression amounts? To make a fair comparison, I fitted mine to Mike's standard, that is, each catcher season is paired with two years sample (2009 and 2011 vs. 2008 and 2010) and is used only if both of those sample includes more than 6,000 called pitches. My model spitted correlation at around r = 0.81, requiring around 2,700 pitches for the signal and noise to see the same amount of variance. For your information, Mike reported that he found around r = 0.7 and about 4,500 pitches for the regression purposes. It looks like a huge leap from Mike's, but is this a fluke? However, I wrangled with different minimum requirements of the number of pitches caught, different permutation of years, even three-years interval (like 2008 and 2011 vs. 2009 and 2012), consecutive single year, leaping single year (like 2010 vs. 2012), and both usual correlation and weighted correlation, and both arithmetic mean and harmonic mean, most of the results returned around 1,600 to 2,500 regression amounts. The first sample spitted, in fact, one of the worst outcomes. Surely, if you raise the minimum pitch requirement to say 12,000 for both sample, the amount of regression needed skyrocketed, but very few catchers meet the criteria in the first place. So 2,500 to 3,000 pitches may be a realistic suggestion.

 As to the envelope of framing skills, standard deviation of yearly performance in framing runs with weights on the number of called pitches each catcher actually caught is about 7 or 8 runs, meaning 95% of all catchers reside within +/- 15 runs in a single year in framing performance. I cannot do any analysis in team-switchers right now, since I don't have any such data in handy. The next part is a bit technical, so if you don't like to get caught in such a swamp, feel free to skip to the "One More Thing" headline.

[Drawback of WOWY]
 While I was playing around with my computation of framing skills, however, I realized that some catchers are under the significant influence of notorious bias immanent in WOWY, whose values are unduly deducted from their fellow pitchers' skill in persuading umpires from calling strikes, and the more vital point is, that effect is far more marked one than I had ever imagined. Brian McCann is a catcher, the Braves' regular catcher for all of the covered years in the experiment. He caught three quarters of all called pitches the Braves pitchers threw during the period, and another one fifth of them were caught by David Ross, Atlanta's main backup catcher who, fortunately for the Braves fans but unfortunately enough for practitioners, is also estimated to be a competent framer who caught 78% of all called pitches he has actually caught as a member of the Braves. And to make the matter more vexing, lots of Atlanta's pitchers only pitched for the team. Tim Hudson, Jair Jurrjens, Tommy Hanson, Mike Minor, Kris Medlen, Kenshin Kawakami, and Brandon Beachy, all of them didn't pitch for any team but the Atlanta organization. After all, only pitchers who threw at least 1,000 called pitches (roughly equivalent to 400 TBF) both as a Braves and other organizations were Derek Lowe, Javier Vazquez, Jo-Jo Reyes, and Michael Gonzalez. You can have a good grasp on how large and serious the effect is by clicking on the below image.


 Out of all pitchers who threw more than 2,000 called pitched during the past five years that threw at least one as a member of the Braves, this graph shows each pitcher's net called strike rate attached a number implying the rate of the number of pitches that pitcher threw as a Braves compared to the total number of pitches he threw for all teams. Let's take a look at Derek Lowe, the fifth pitcher from the left. He looks like great in terms of getting extra called strikes in his favor, as you can see at the location of his number, further than 0.05 more than average (meaning he receives five extra strikes every 100th pitch compared to average, a tremendous amount). That 0.64 number attached to his location means he pitches 64% of the total number of pitches he threw as a Braves.

 So here's a critical point. There are lots of points colored in green and attached number 1 at around 0.02 CS rate. Put this number into perspective, a true +.02 catcher can accumulate more than 20 runs in a year solely by framing, as good or better than the best fielder saves in a single year. Therefore, we basically threw away all records when Hudson, Jurrjens, Hanson, Minor, etc... is on the mound completely. And the end result? McCann is originally estimated to have saved eye-whooping 160 runs, by far the best figure and even about 50 runs more than runner-ups Martin and Lucroy. Ross is also a good framer, estimated to have helped his team near 70 runs by framing. But since the bias is destroying the two Atlanta's catchers, Ross is now estimated to have 15 runs and McCann is... negative 4 runs! Unreasonably huge drop. Not that we assume all of surplus values in Hudson, et al. should be credited to the Atlanta's catchers. However, the bias certainly exists, with tremendous extent. If you sort all catchers in descending order by the absolute difference between adjusted and non-adjusted runs, ten out of top 20 catchers caught for only one team and another four caught more than four fifth of all called pitches for a single team. And while it's no doubt that Jose Molina is the very talented framer, one of the reason his excellence shines in the ranking is he caught for three teams with not much of disproportionate amount of playing time, and two of the three teams of which he was a member employed lots of pitchers who transferred at least one time in the past five years. In summary, WOWY is a great approach, can be useful in lots of scientific fields, and one of my favorite methods in sabermetric analysis. It's very effective applied to analysis in career level (Tango did some catcher analysis back in 2008 in the THT Annual), and still good enough even for less years (five years in this case), but the bias is eroding more and more and the degree of amount is far greater than my initial expectation is. I hope to tackle with this issue in the future, but at this time I couldn't touch more on this topic.

 *** Actually, McCann is also an interesting subject in yet another area. Did anyone notice that Mike's model and mine have a starkly different view on his contribution, despite taking the similar approach? The reason is the occurrence of his contribution in terms of run values is skewed that he failed to succeed in counts where the impact of the call is important. It might be easier to get in relation to the timing of events in ERA, but unlike the sequence in ERA, we (or I?) have little or no knowledge on the degree of voices the signal has compared to those by noises in the timing of framing and it should be, I think, researched further down the road. For your information for the time being, if I set all run values in counts to a uniform, McCann is instead estimated to have saved 10 runs. Moreover, not all catchers are subject to such a steep variation. After all, the only catcher whose value is exposed to similar magnitude is Miguel Montero, in the same direction. If you take the mean absolute differences for all catchers with the weights on the number of called pitches, the calculator spitted out slightly less than 5 runs, and weighted standard deviation is a bit less than 4 runs.

[One More Thing]
 Having dove too far into the technical part, have you got exhausted? Then let's come back to the topic in the introduction paragraph. As I stated at the beginning I felt like Darvish was hugely fooled by umpires in getting his pitches called strikes early in the year (I wonder I watched only a few starts after the first three months). So did my gut hold true? Or was it I who was fooled by umpires?

 Lots of variation occurred throughout the year, but you can notice some remarkable patterns. First, Darvish certainly went badly off early in the year, and secondly, almost all of red-colored bars stretches toward downwards with relatively no shorter length. Third, he did rather benefit more often when Torrealba caught his pitches. I should also add that in a game May 11th, Torrealba started but exited after a head contusion by the splintered bat of Albert Pujols in the first inning, and caught only one inning before Napoli took the duty for the next five innings, so I decided to put Napoli as his partner on the day. Overall, Napoli is disproportionately responsible for the debt of runs caused by framing on games Darvish started, which would not be a surprise at all given his poor defensive reputation. And now that Napoli is gone to the Red Sox, we might be glad to be less irritated every time we see Darvish (and all the other pitchers of the Rangers) takes the hill in 2013 and beyond. Am I too harsh with Napoli?

 In part 2, I plan to dig into examining framing in a variety of situations, such as parks, base-out situation, home/away, run differences, pitch types, etc... It wouldn't be out at least in two weeks, so stay tuned and wait the next release without heed.

Tuesday, December 4, 2012

NPB park factors for 2006-2012


 One of my followers asked me yesterday on Twitter whether I published any park factors for NPB. Actually, I computed park factors and in consequence so-called "advanced" statistics (actually I don't like the idea of wOBA, WAR, FIP, etc... being classified in "advanced" category, but this post is not one centered around that way of thinking) for all players after the Japan Series last November. But because yesterday I noticed that somehow my script didn't snatch raw data correctly and instead stored mistaken values on my file (that's why I didn't realize until he asked me, since even if a script has some errors, it doesn't spit those errors if it stores something different on behalf of true one and the two values differ only a bit amount, at least through the quick and dirty eye test. I'd also like to point out that it was not my coding that caused the pain, but either a parse module or a site itself, though even if that was the case I'm not going to disparage them) and thought this was a good time to write a generalized script to compute park factors in order to utilize it again down the road, I decided to redo my park calculation and upload the result on this blog.

 Actually, as long as I know, all park factors you can find on the Web, even in Japanese (and I don't necessarily think you could do better by searching in Japanese than in English for NPB-related data), are raw factors, which is just computed with actual runs on the specific park in the specific year, and we all know that it should be tweaked a bit when implemented in actual players' statistics. I don't like to bother to talk about the internal structure of calculation that much, since most of you guys don't have much interest in NPB at least unless some players are going to head to MLB, let alone park factors, so here's a quick explanation. I figured out all runs logged on home and compared it to the league context. Then I scaled it so that league mean is 1.00 due to some games being played in rural stadiums, almost all of which are rather small and hence hitters park, and then averaged out values of each park with up to five years and some tiny weights dependent on the year, then regressed a bit to finish the calculation. The data below lists year, team (actually, I don't specify park name, though I accounted for park change for Hiroshima Carp prior to 2009), league, the number of home games, raw PF, and true, regressed PF in order. Note that YB on team abbreviation means Yokohama Bay Stars, and DB is DeNA Bay Stars, the same team but the team name changed due to a ownership change. Forgive me if you feel the way I show my PF below too dirty, but it lends itself better to you copying and pasting on your file.


year,team,league,N_home,raw_pf,true_pf
2006,Bs,pl,34,1.01,1
2006,C,cl,65,1.15,1.09
2006,D,cl,70,0.84,0.91
2006,E,pl,62,1.06,1.02
2006,F,pl,57,0.94,0.94
2006,G,cl,64,0.98,1
2006,H,pl,65,0.81,0.93
2006,L,pl,65,0.98,1.03
2006,M,pl,68,0.88,0.96
2006,T,cl,60,0.92,0.91
2006,YB,cl,66,1.12,1.12
2006,Ys,cl,65,1.26,1.12
2007,Bs,pl,48,0.89,1
2007,C,cl,66,1.07,1.09
2007,D,cl,67,0.98,0.91
2007,E,pl,67,0.94,1.01
2007,F,pl,58,0.88,0.94
2007,G,cl,63,0.91,1
2007,H,pl,70,1,0.93
2007,L,pl,70,1.08,1.04
2007,M,pl,72,1.01,0.96
2007,T,cl,62,0.92,0.91
2007,YB,cl,66,1.05,1.12
2007,Ys,cl,68,1.19,1.12
2008,Bs,pl,48,0.89,1
2008,C,cl,66,1.1,1.09
2008,D,cl,67,1,0.91
2008,E,pl,70,1.04,1.02
2008,F,pl,59,0.88,0.94
2008,G,cl,63,1.05,1
2008,H,pl,68,0.94,0.93
2008,L,pl,68,1.04,1.04
2008,M,pl,72,1.09,0.96
2008,T,cl,61,0.76,0.91
2008,YB,cl,65,1.12,1.12
2008,Ys,cl,65,0.98,1.11
2009,Bs,pl,49,1.07,0.98
2009,C,cl,67,0.97,0.98
2009,D,cl,67,0.81,0.9
2009,E,pl,70,1.04,1
2009,F,pl,60,1.02,0.94
2009,G,cl,63,0.95,0.99
2009,H,pl,69,0.92,0.95
2009,L,pl,69,1.07,1.07
2009,M,pl,72,0.88,0.99
2009,T,cl,60,0.97,0.92
2009,YB,cl,65,1.22,1.14
2009,Ys,cl,69,1.07,1.1
2010,Bs,pl,51,1.14,1
2010,C,cl,68,0.96,0.98
2010,D,cl,69,0.83,0.86
2010,E,pl,68,0.97,1.02
2010,F,pl,58,0.91,0.95
2010,G,cl,64,1.06,1.01
2010,H,pl,69,0.91,0.92
2010,L,pl,67,0.98,1.07
2010,M,pl,72,0.89,1.01
2010,T,cl,60,0.89,0.9
2010,YB,cl,65,1.14,1.15
2010,Ys,cl,66,1.09,1.11
2011,Bs,pl,58,0.81,0.99
2011,C,cl,69,1.04,0.98
2011,D,cl,70,0.75,0.86
2011,E,pl,63,0.96,1.02
2011,F,pl,61,0.88,0.95
2011,G,cl,63,0.93,1.01
2011,H,pl,68,0.89,0.91
2011,L,pl,67,1.16,1.07
2011,M,pl,72,1.02,1.02
2011,T,cl,61,0.95,0.9
2011,YB,cl,64,1.17,1.15
2011,Ys,cl,63,1.18,1.12
2012,Bs,pl,58,1,1
2012,C,cl,68,0.85,0.97
2012,D,cl,67,0.77,0.86
2012,E,pl,68,1.04,1.02
2012,F,pl,58,0.94,0.95
2012,G,cl,64,0.99,1.01
2012,H,pl,68,0.79,0.91
2012,L,pl,68,1.06,1.07
2012,M,pl,72,1.15,1.02
2012,T,cl,60,0.76,0.89
2012,YB,cl,63,1.1,1.15
2012,Ys,cl,66,1.24,1.12


Monday, December 3, 2012

What kind of pitches are harder or easier to bunt?


 A sacrifice bunt has conventionally been featured on sabermetric blogosphere for its relative lack of importance to a baseball game and often criticized when it is dreadfully abused by a baseball manager on a given game. Even if you're a sabermetric newcomer who happens to be on this blog you would have already read at least a couple of articles on the validity of bunt usage citing run expectancy or that kind of stuff elsewhere as long as your passion for baseball is decent enough and you are always willing to cultivate your own baseball insights. The use of a sacrifice bunt has been decidedly one of the most controversial topics among baseball fans throughout the past couple of decades but there have been less articles published on it from the perspective of relative difficulties of executing successful bunts against different types of pitches. What I'm going to write today is focused on pitcher-batter confrontation on bunt attempts during the at-bats.

  Let me first show some data before cutting to the chase since it is one of areas where I was a bit interested in. Remember that most hitters feel more comfortable facing opposite-handed pitchers and vice versa, usually known as platoon splits. Managers construct their team's lineup paying some attention to an opposing starter's handedness on that day, and when the game is 'on' and they are pressed for calling a pinch hitter or new reliever, some consideration of a platoon advantage always resides at some place in their mind, even though they often treat all players as having identical splits and ignoring uniqueness of players, or stick excessively to a result of recent matchups and avert any warning uttered by Regression God. But is there any such predisposition when it comes to a sacrifice bunt and if that's the case, how much? From 1993 to 2011, I took all events I define as a sacrifice bunt attempt. My definition is all situations where 1) a runner on 1st and less than 2 outs, 2) a runner on 2nd and less than 2 outs, or 3) runners on 1st and 2nd and less than 2 outs. And out of those situations, I regard a successful bunt as 1) 1st runner advanced to 2nd or further, 2) 2nd runner advanced to 3rd or further, and 3) both 1st and 2nd runners advanced to at least one base ahead and no more than one out were recorded on that specific event, respectively. And all the other outcomes, along with any pitches batters attempted to bunt but missed (i.e. swinging strike, foul tip, and foul bunt on a bunt attempt) are considered to be a failed bunt. I should also point out that my bunt attempt definition is based on pitch-by-pitch, so if a bunter tried to conduct a sacrifice bunt but fouled off two straight pitches with bunt attempts, but finally succeeded in sending the runner on 2-0 count bunt, he is credit for one success and debuted for two failures. However, for pitches that a bunter squared to try to do a bunt, but a ball was off the zone and called ball, I don't include them in my analysis since I can't tell it from generic non-bunt approaches. Same is true of a called strike with squaring bunt attempt but drawing his stick back during the pitch flight. Then for each batter I crunched bunt success rate against right- and left-handed pitchers respectively, and calculated the differences weighted by a lesser side of his bunt attempts. OK, so have you caught up with so far? Let's check out the results.

 According to my research, right-handed hitters successfully sacrificed on a bunt against righties 47.7% of the time whereas facing lefties their success rate ascends to 48.9%, only 1.2% difference inspected. How about left-handed hitters? Their success rate against righties and lefties is 46.6% and 46.4% respectively, only 0.1% difference (rounding aside). So basically, when it comes to a sacrifice bunt, it has little or nothing to do with a platoon handedness advantage.

 And let me point out one more before going into details . How much is successful sacrifice bunting affected by an opposing pitcher's batted-ball tendency? I computed GB% (the rate of the number of ground-balls on the total number of batted-balls a pitcher allowed in play) for each pitcher/year going through 1993, exclusive to pitchers who were able to see at least 200 balls in play in the particular year (all bunt events are excluded). Then, I took maximum and minimum 15% out of the data set and defined them as GB and FB groups respectively. For your information, mean ground-ball rate for those two groups are 38.2% and 53.4% respectively, while among all pitchers, it's 45.4%. Bunters successfully bunted 44.2% of the time against GB group while against FB group, they succeeded in bunting 47.6% of the time, the 3.4% difference. Is this a result from any difference of quality of bunters on each group? Bunters whom pitchers on GB group pitched against were able to do a successful bunt against the rest of group pitchers (i.e. belong neither to GB nor FB group above) 47.1% of the time. How about bunters on FB group of pitchers faced? Their success rate against the neutral groups were also 47.1%. So basically, there are no bias on the quality of bunters each batted-ball group of pitchers faced. But how about the quality of pitchers within each group? Pitchers on GB group allowed .328 wOBA and those who belong to the other end of the spectrum allowed .341 wOBA. Definitely and as you would expect, GB pitchers on the whole were better hurlers in terms of total performance in confrontation. So I took a brute force approach, tearing the poorest performers off the group of fly-ballers until their performance as a group jibes with the better one. However, bunters still take a bit more pains to bunt against worm burners, as only 0.4 percentages of points got narrow between the two parties. I'll take a deeper look at this later, but keep this in mind for the time being.

 So with that knowledge in mind, let's ask always awesome Pitch f/x.

 Actually, here's my first post using Pitch f/x, so let me digress a bit to describe the underpinning of my resources. I use complete dataset from 2008 on, with pre- and post-season and All-Star games are omitted but called games are kept in. I also do some park-adjustments on location, movement, and velocity for all pitches on all stadiums where Pitch f/x cameras are implemented. As to each specific pitch type, I don't touch up any re-classification at this time, but would likely do in the future. Strike-zone definition is based on my own definition and computation, where the borderline is set at 50% probability of a called strike. This accords very well with Mike's horizontally, but differs slightly on vertical zone borderline, as I make use of raw sz_top and sz_bot parameters as well as a batter's own height. However, the reason of the slight discord sounds much more originating from the fact that I plug more recent years (strike zone is actually expanding a little, especially in the bottom, in recent years, especially 2012) than input of extra parameters fed to the equation. And even the difference is very tiny, around one inch wider to the bottom if average values are set to the three parameters.


[Pitch Type]
 First of all, what kind of pitch types are easier or harder to bunt? From here to the rest of this article I classify all pitches as either fastballs (four-seamers, two-seamers, sinkers, and cutters), breaking balls (sliders, curveballs, and knuckle-curves), or changeups (change-ups and splitters) and analyse it through these three partitioned categories. Here's the result.

ResultN
Breaking35.9%2,648
Changeups41.6%1,677
Fastballs44.6%14,736

Fastballs are considered to be the easiest pitch to bunt and on the other end of the spectrum lie breaking balls. As to platoon effects on pitch types, almost no difference can be detected other than fastballs, where 2 to 3 percentages of gap can be inspected, and right-handed pitchers' breaking balls to right-handed hitters, where the rate drops down to 33.7% on 1,413 pitches. If you wonder why so many fastballs are thrown with a bunt attempt (remember that number is composed of all pitches, not restricted to those consequent on successful bunts), to suspect the presence of selection bias is always a good thing. Here's wholly unrealistic example, but if Miguel Cabrera steps to the plate while Prince Fielder is dozing off on 1st base and you notice somehow (yeah, somehow!) Miggy tries to bunt with no supposed intention to pull his stick back to swing instead during the pitch flight, isn't all you can do in the situation throw 80s mph fastball down the middle to induce bunt in order? Actually this is an excessively impractical for the purpose of the illustration (and we know Jim Leyland is "smart enough" to not force Cabrera to bunt) but bear that sort of bias in your mind.

 Investigating inside each specific bin, four-seamers are easier to bunt than other pitches in the bin (around a couple of percentage points; this is one reason bunting against GB pitchers is tougher as you've seen first in this article).

[Location]

 When an opposing batter looks like trying to do a sac bunt, where in the zone should a pitcher throw? The above graph shows that bunt success rate sees its pinnacle down the middle horizontally while the further away pitches are from a hitter's body the harder they are to bunt, and in terms of vertical location the closer the ball is thrown to the ground, more failed bunts occurred. Actually, this trend holds true whether what types of pitches are thrown and is also irrespective of an opposing pitcher's handedness both vertically and horizontally.


[Velocity]
 How about velocity? Conventional wisdom says that the faster the fastball is thrown, the harder to bunt (or hit, anyway). Does this hold true?


 This theory is supported by the above graph. Successful bunt rate drops down if the fastball is thrown with lots of speed, though you have to caution yourself that there are great uncertainty at the speed of 95 mph or more because so fast pitches are not often seen in this range. Breaking balls are relatively constant in terms of relationship between velocity and bunt success rate (or rather a bit downward trend). On changeups, you can find a bit awkward dip around early-80 mph. I'll explain more on this later.

[Movement]


 As to vertical movement, it is generally harder to bunt if the ball is thrown with more downward movement. In fastballs, you can see the highest success rate around +9 value, where most four-seam fastballs can be seen. Change-ups also see its peak on a point where most pitches of that ilk are thrown (check out density estimate on the left side of the plot). Breaking balls, however, looks consistently declining irrelevant of distribution of pitches. This is caused by the fact that lots of pitches on the upper part of the distribution which are classified sliders are actually cutters, and remember that cutters are much easier to bunt than breaking balls.

 For an obvious reason, I restricted my sample to only righty-vs-righty match-up for horizontal movement. However, you're not able to see much of a meaningful result as my sample consistes of only 7,986 pitches and once you sliced it to three separated pitch category and given that on top of that there are a wide variety of values (from -10 to 10 mostly) seen in the range of pitch movement, you cannot take a look at the graph to come to any conclusion with decent certainty. However, doing nothing is always worse than attempting to detect some patterns even from less helpful dataset. So what kind of patterns can you see?


 Like the plot of the vertical movement, the success rate is highest at the point that most pitches are thrown within each specific bin other than breaking balls, which see its trough instead. As I stated in the last paragraph I extracted only right-handed hitters and pitchers, though if the handedness of hitters are deregulated the similar tendency can be seen. I doubt most pitchers, if any, can control the movement of their pitches at their disposal (at least until some comprehensive analyses are conducted and published), so anyway you would be better off just discriminating by each particular pitch types rather than pursing too far into movement values.

 And what does bring about the blip in change-ups you see in velocity graph? I rambled through a screen along with some codes and noticed that in early-80s mph, change-ups are more likely to be thrown in ahead count, with slightly more vertical movement (to the downward direction, of course) and slightly down in the zone. I'm never going to say that these are the sole factors driving the result in that way, but suspect that that pitchers throw more winning shots in that velocity range leads to the phenomenon. Also, in around 90s mph, you would see more misclassified pitches and it may raise the success rate in a bit amounts.

[One more thing]
 We all know that a pitcher's job as a batter when a runner is on is mostly send the runner farther to set the table for the top of the order if the number of outs is less than two. In those situations, pitchers as batters are likely to choose to sacrifice themselves to send the runner through bunt attempts. So are those bunters able to move the runner successfully?

HittersPitchersDiff
Breaking39.1%29.8%9.3%
Changeups41.7%41.6%0.1%
Fastballs45.9%42.3%3.6%
Overall44.6%40.6%4.0%

 Actually, Pitchers are made of only starters and relievers are included in Hitters, but who cares? If a reliever somehow comes to the plate with a runner(s) is already on the base, managers most often decide to pull him down by calling a pinch-hitter. Ah, okay, okay, forgive my laziness and check out the above table and you can notice an interesting finding; pitchers as batters are indefensibly awful when they are fed with breaking balls. I suspect pitchers in general, even if they are hard-working enough to practice bunting, do against only fastballs but not breaking balls. So why not pick on them by sliders-curves-and-sliders? And for those pedants out there, this is hugely statistically significant (after all, Pitchers account for more than one third of the total records). Rather, hitting count is a bigger causative agent for driving this effect, as there would be reasons behind it that opposing pitchers bother to throw breaking balls to the pitchers at the plate in the first place. The data bear it out well that pitchers as batters are more likely to be fed with breaking balls in behind count than in early and/or ahead count, but even after accounting for the effect, Pitchers still don't see themselves go well, to the tune of 7.5 percentages of points off Hitters.

[Conclusion]
 We dived into Pitch f/x and checked through whether there are some sort of patterns to make batters incompetent to do a sacrifice bunt. We got to know that pitchers can control lots of factors at their pleasure that have an impact on whether the outcome of bunts come to success or failures. Breaking balls are tougher to bunt than fastballs, and change-ups lie in the middle. Locating your pitches at your disposal do also have an influence and pitches thrown down and away are harder to bunt. Velocity also has a voice but it is actually a small voice. Movement is a bit tricky but basically if you can add more downward movement to your weapons (like picking out sinkers instead of four-seamers), you can tilt the odds a bit to make it happen to your favor. Finally, pitchers as batters are desperate bunters when being attacked by breaking balls, so if you have an ability to earn strikes by breaking balls, no reason whatsoever to turn only to your fastballs.