Quantcast
Channel: Sabermetrics – Data Science Riot!
Viewing all articles
Browse latest Browse all 21

Lahman Database in Postgres: An Updated Guide

$
0
0

I recently wrote a post on the advantages of PostgrSQL for statistical analysis in the Lahman database, so I thought it would be wise to follow that up with a short “how to.”

First, go grab my table schema from here, and the csv version of Lahman 2014 from here.

EDIT: The above schema is now good for the 2015 version.

Changes in Postgres

1| Converting all table and column names to lowercase (Postgres likes this)
2| Converting any column that begins with an integer to begin with a letter, i.e. “2B” in the Batting table becomes “h2b” in the Postgres “batting” table.

For Lahman 2014

Sean Lahman has (to date) released the 2014 version of his database in the “beta” format. There may be changes down the road, but I keep my DB up to date and will be sure to post any changes. The biggest changes in the 2014 beta version are the deletion on certain tables. Most notably is the “playerNick” column in the “master” table—I don’t know of anyone who refers to Babe Ruth as “George,” in any event…

Thats’ about it for now. Like I said, I tend to keep my Lahman database current, so if you have a different version, check back and I’ll make updates. Since I’m not an open-source jerk, I have to give credit to Brent Nycum, who did the original SQL tables (although his table structure is a bit old).

Happy hacking!

Lahman_postgres


Viewing all articles
Browse latest Browse all 21

Trending Articles