I recently migrated my Lahman and PitchFx databases to PostgreSQL form MySQL with phenomenal results. The verdict is, Postgres blows MySQL away and here’s why…
Nulls Treated as Zero
As far as sabermetrics and MySQL goes, the only thing you have to know is, in MySQL, nulls and empty strings are treated as zeros. Why is this a problem? Consider the following MySQL query for wOBA. Since MySQL converts the nulls to zeros, it returns some bad math.
Consider the following table pulled from a MySQL instance of Lahman (sorted by year ascending). Guys like George Wright and Al Spalding shouldn’t have wOBA values per my formula because things like sacrifice flies, hit by pitch, and intentional walk weren’t tracked back then.
So, MySQL is telling me that; Shoeless Joe Jackson, who’s “intentional walks” weren’t counted, supposedly had zero IBBs in his career. Then MySQL has the audacity to calculate a wOBA based on those zeros(!), no way, that’s just bad math!
By the way, the wOBA calculation that I used for this is at the bottom of the post and uses data from Fangraphs to calculate the wOBA score for each year.
Extensions (R and Python)
This one is a gimme. If you like advanced statistics, you probably use R, and maybe Python. Postgres, has extensions for both, which basically act as “wrappers” for both programming languages. You may not use the PL/R extension for the heavy-lifting, but it’s nice to have it built right in!
Common Table Expressions
CTEs are also available in MS SQL Server but not MySQL. This syntax is normally used for much larger queries in order to cut down on system overhead, but they are really handy because they have replication, which allows them to refer to themselves, unlike sub-selects. Therefore, you are able to cut down on syntax as well as system cost.
This may not be a big deal in the Lahman instance but when dealing with a large amount of data in a PitchFx database, it’s a huge help.
Full Joins
MySQL also lacks a FULL JOINS clause. MySQL is able to emulate a FULL JOIN by doing a RIGHT JOIN/LEFT JOIN combination but it require more syntax and more system time. While you may not use this clause as much in your Lahman instance, it can be very valuable elsewhere.
wOBA Calculation using Fangraphs’ yearly wOBA Scores (Guts table)
#wOBA valeus SELECT CONCAT(m.nameFirst, ' ', m.nameLast) AS Name, b.playerID, b.yearID, b.AB, b.IBB, b.SF, (g.wBB*(b.BB-b.IBB)+g.wHBP*b.HBP+g.w1B*(b.H-b.2B-b.3B-b.HR) +g.w2B*b.2B+g.w3B*b.3B+g.wHR*b.HR)/(b.AB+b.BB-b.IBB+b.SF+b.HBP) wOBA FROM Batting b JOIN Guts g ON g.yearID = b.yearID JOIN Master m ON m.playerID = b.playerID WHERE AB > 300#The AB>300 gets rid of pitchers GROUP BY b.playerID ORDER BY wOBA DESC