A couple of months back I wrote an article on why batting average is obsolete as an offensive metric in baseball. The argument was based off of a linear regression analysis of several other advanced metrics and used the coefficient of determination or “” to measure the results.
I finally got around to properly visualizing the results. The data used for the viz was the same data set from my original analysis in R-Studio. The full R code can be found on my Github page here and is also at the bottom of this post.
The scatter plot below shows the relationships of various batting metrics to actual runs scored, the tighter the grouping, the higher the correlation. I also printed the different measures on the viz. The data were collected from all MLB teams from 2001 to 2013 (including post-season) courtesy of the Lahman database.
And the Winner is…
OPS and wOBA were pretty close. In the regression, both metrics had a 0.95 coefficient of determination. It’s important to note that “intentional base on balls” (IBB) was not included in this particular wOBA calculation because IBB isn’t a team stat in the Lahman database. In the wOBA calculation here, IBB was captured as BB but in many wOBA calculations BB is broken down into “intentional” and “unintentional” with IBB being weighted slightly higher. With the inclusion of IBB, I believe wOBA to be the clear winner.
library(RMySQL) drv = dbDriver("MySQL") con = dbConnect(dbDriver("MySQL"), user = "root", password = "password", dbname = "lahman") ##SQL get dat ##Note the join on the table "Guts." This is a custom table that includes yearly wOBA values ##The Guts table is only required for wOBA, you can delet the join and the wOBA calculation or ##you can go to Fangraphs.com and download the Guts table to add to your own database. teams = dbSendQuery(con, "SELECT t.yearID, t.teamID, t.name, t.AB, t.H, t.2B, t.3B, t.HR, t.R, t.SB, t.H / t.AB AS BA, (t.H + t.BB + t.HBP) / (t.AB + t.BB + t.HBP + t.SF) AS OBP, ((t.H + t.BB + t.HBP) / (t.AB + t.BB + t.HBP + t.SF)) + (((t.H-t.2B-t.3B-t.HR) + (2 * t.2B) + (3 * t.3B) + (4 * t.HR))/t.AB) AS OPS, (t.H + t.2B + 2 * t.3B + 3 * t.HR) / t.AB AS SLG, (g.wBB * (t.BB) + g.wHBP * t.HBP + g.w1B * (t.H-t.2B-t.3B-t.HR) + g.w2B * t.2B + g.w3B * t.3B + g.wHR * t.HR) / (t.AB + t.BB + t.SF + t.HBP) AS wOBA FROM Teams t Join Guts g ON g.yearID = t.yearID WHERE t.yearID > 2000 ") Batting = fetch(teams, n = -1) #Batting average plot(Batting$R, Batting$BA, xlab="Runs", ylab = "BA", pch=23, col='red') abline(lm(Batting$BA~Batting$R)) cor(Batting$BA,Batting$R) #OBP plot(Batting$R, Batting$OBP, xlab="Runs", ylab = "OBP", pch=23, col='purple') abline(lm(Batting$OBP~Batting$R)) cor(Batting$R,Batting$OBP) #Slugging plot(Batting$R, Batting$SLG, xlab="Runs", ylab = "SLG", pch=23, col='green') abline(lm(Batting$SLG~Batting$R)) cor(Batting$R,Batting$SLG) #OPS plot(Batting$R, Batting$OPS, xlab="Runs", ylab = "OPS", pch=23, col='blue') abline(lm(Batting$OPS~Batting$R)) cor(Batting$R,Batting$OPS) #wOBA #Note, this wOBA calculation DOES NOT account for IBB but applies weights to the OPS formula. plot(Batting$R, Batting$wOBA, xlab="Runs", ylab = "wOBA", pch=23, col='brown') abline(lm(Batting$wOBA~Batting$R)) cor(Batting$R,Batting$wOBA)