Data! Getcha Data Here!

Data! Getcha Data Here!

Today’s Braves fans have it easy. When I was a kid, being a Braves fan was hard. It was a character building experience. Young Braves fans in the 1970’s learned to become optimists and to see the silver lining in even the darkest of clouds. We had to. The team didn’t give us much to work with.

I didn’t go to many games, but I remember flipping through the program at a game once. There was a page where fans could score the game, marking each play as it occurred and helping us to recall afterward the reasons we had hope for the next game.

Baseball, a Statistician’s Game

Baseball aficionados have long kept detailed records about games played. Statistics exist going all the way back to 1871. Want to know which batter led the league in Strike Outs in 1929? You can. It was the appropriately named Hack Wilson with 83. Who had the most Doubles in a single season? That’s Earl Webb in 1931 with 67. What pitcher has had the worst ERA with at least 5 games played? Phillies fans may remember Patrick Schuster’s 2016 performance when he had a whopping 45 ERA in 6 games. 

With all this data readily available, Major League Baseball gives IT Professionals a fertile playing field for learning analytics and statistics.

Learning Data Science with Baseball

Want to learn statistical analysis with R or Python? Or perhaps you would like to explore PowerBI? MLB gives us plenty of data with which to play and learn. Want to learn how to join tables and write subqueries? Again MLB has a rich dataset for you. 

Here’s a source of data that I’ve used for testing and for creating presentations when I speak at conferences and SQLSaturdays: SeanLahman.com. The site provides free downloads of MLB data going back to 1871. It’s available in CSV, Microsoft, Access, and SQL Server files.

Download the data and begin experimenting today. And if you want to develop a deeper sense of empathy, just look at the stats for the 1978 Braves, who had a record of 69-93 and gave up 150 more runs than they scored. Now that’s character building.

Looking for some humorous examples of data analysis gone wrong? Check out Beware of Spurious Correlations when Analyzing Your “Big Data.”

6 Responses

  1. […] this example, the BaseballData database is owned by the Joe […]

  2. […] this demonstration, we’ll use the BaseballData database mentioned in Data! Getcha Data Here! This data set contains baseball statistics for players and managers going back to 1871. We’ll […]

  3. […] start by backing our BaseballData database using the dbatools Backup-DbaDatabase command. This creates a native SQL Server backup file, the […]

  4. […] Sometimes an example is worth a thousand words so let’s create a simple scenario to demonstrate how db_denydatareader overrides db_datareader. For this example, I’ll use the BaseballData sample database. […]

  5. […] that you have a working SQL Server instance, let’s restore a copy of our BaseballData database to […]

  6. […] Let’s look at a slightly more realistic example than the 1 + ‘1’ example. An example that uses data. I’m going to use my BaseballData database for this example, but the concept applies everywhere. (For more information on the baseball example database, see Data! Getcha Data Here!) […]

Leave a Reply

Your email address will not be published. Required fields are marked *