Lately I’ve rediscovered the Lahman package for R. Since I’ve got a Lahman database on my localhost, I normally use a db connection in R to grab the data I need. In the process it’s easy to forget how quick and easy the Lahman package makes it to gather baseball data.
No Love for the Pitching Stats
One nice feature of the Lahman package is the “battingStats” function, which returns some common batting stats by a function that sets the NA values to zero. Upon “tweaking” the battingStats function, I realized there was nothing similar in the Lahman R package for pitching stats so…I whipped on up.
To add standard pitching metrics to the Lahman R package, try the below function, which can be called by the command “pstats <- pitchingStats(),” and it will create a new dataframe with the extended pitching stats. The cool thing about this function is it can also be used if you're calling the Lahman data from your own database as long as you've got the standard Lahman column names in your data frame. The function returns common pitching metrics such as IP, WHIP, pitcher BABIP, K's per 9, BB per 9, strike out percentage and walk percentage.
pitchingStats <- function(data=Lahman::Pitching, idvars=c("playerID","yearID","stint","teamID","lgID"), cbind=TRUE) { require('plyr') NA2zero <- function(x) { # Takes a column vector and replaces NAs with zeros x[is.na(x)] <- 0 x } # Set needed variables for calculations vars <- c('IPouts', 'BB', 'H', 'HR', 'BFP', 'SO') d3 <- apply(data[, vars], 2, NA2zero) d3 <- if(is.vector(d3)) {as.data.frame(as.list(d3)) } else { as.data.frame(d3) } d3 <- plyr::mutate(d3, IP = IPouts / 3, WHIP = ifelse(IP > 0, round((BB+H) / IP, 3), NA), BABIP = ifelse(IP > 0, round((H-HR) / (BFP-SO-BB-HR), 3), NA), K_9 = ifelse(IP > 0, round((SO*9) / IP, 3), NA), BB_9 = ifelse(IP > 0, round((BB*9) / IP, 3), NA), Kpct = ifelse(IP > 0, round(SO/BFP, 3), NA), BBpct = ifelse(IP > 0, round(BB/BFP, 3), NA) ) d3 <- d3[, (length(vars)+1):ncol(d3)] if (cbind) data.frame(data, d3) else data.frame(data[,idvars], d3) }
I can’t take credit for the concept of the code, I borrowed that from the “battingStats” function that was already in the package. Nonetheless, if anyone out there likes quick access to a few standard metrics, this function and the Lahman package are great places to start.
As an afterthought, I also added a couple of metrics to the standard battingStats function of the package. My fork and associated code can be found on my GitHub repository here.