One of my followers asked me yesterday on Twitter whether I published any park factors for NPB. Actually, I computed park factors and in consequence so-called "advanced" statistics (actually I don't like the idea of wOBA, WAR, FIP, etc... being classified in "advanced" category, but this post is not one centered around that way of thinking) for all players after the Japan Series last November. But because yesterday I noticed that somehow my script didn't snatch raw data correctly and instead stored mistaken values on my file (that's why I didn't realize until he asked me, since even if a script has some errors, it doesn't spit those errors if it stores something different on behalf of true one and the two values differ only a bit amount, at least through the quick and dirty eye test. I'd also like to point out that it was not my coding that caused the pain, but either a parse module or a site itself, though even if that was the case I'm not going to disparage them) and thought this was a good time to write a generalized script to compute park factors in order to utilize it again down the road, I decided to redo my park calculation and upload the result on this blog.
Actually, as long as I know, all park factors you can find on the Web, even in Japanese (and I don't necessarily think you could do better by searching in Japanese than in English for NPB-related data), are raw factors, which is just computed with actual runs on the specific park in the specific year, and we all know that it should be tweaked a bit when implemented in actual players' statistics. I don't like to bother to talk about the internal structure of calculation that much, since most of you guys don't have much interest in NPB at least unless some players are going to head to MLB, let alone park factors, so here's a quick explanation. I figured out all runs logged on home and compared it to the league context. Then I scaled it so that league mean is 1.00 due to some games being played in rural stadiums, almost all of which are rather small and hence hitters park, and then averaged out values of each park with up to five years and some tiny weights dependent on the year, then regressed a bit to finish the calculation. The data below lists year, team (actually, I don't specify park name, though I accounted for park change for Hiroshima Carp prior to 2009), league, the number of home games, raw PF, and true, regressed PF in order. Note that YB on team abbreviation means Yokohama Bay Stars, and DB is DeNA Bay Stars, the same team but the team name changed due to a ownership change. Forgive me if you feel the way I show my PF below too dirty, but it lends itself better to you copying and pasting on your file.