<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Data Science and Analytics Archives - The SERO Group</title>
	<atom:link href="https://theserogroup.com/category/data-science-and-analytics/feed/" rel="self" type="application/rss+xml" />
	<link>https://theserogroup.com/category/data-science-and-analytics/</link>
	<description>SQL Servers Healthy, Secure, And Reliable</description>
	<lastBuildDate>Wed, 14 Oct 2020 15:40:20 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9</generator>

<image>
	<url>https://theserogroup.com/wp-content/uploads/2024/07/cropped-Canister-only-1-32x32.png</url>
	<title>Data Science and Analytics Archives - The SERO Group</title>
	<link>https://theserogroup.com/category/data-science-and-analytics/</link>
	<width>32</width>
	<height>32</height>
</image> 
<site xmlns="com-wordpress:feed-additions:1">121220030</site>	<item>
		<title>Data! Getcha Data Here!</title>
		<link>https://theserogroup.com/career-development/data-getcha-data-here/</link>
					<comments>https://theserogroup.com/career-development/data-getcha-data-here/#comments</comments>
		
		<dc:creator><![CDATA[Joe Webb]]></dc:creator>
		<pubDate>Tue, 18 Sep 2018 21:08:08 +0000</pubDate>
				<category><![CDATA[Career Development]]></category>
		<category><![CDATA[Data Science and Analytics]]></category>
		<category><![CDATA[SQL Server]]></category>
		<guid isPermaLink="false">http://theserogroup.com/?p=2133</guid>

					<description><![CDATA[<p>Today&#8217;s Braves fans have it easy. When I was a kid, being a Braves fan was hard. It was a character building experience. Young Braves fans in the 1970&#8217;s learned to become optimists and to see the silver lining in even the darkest of clouds. We had to. The team didn&#8217;t give us much to&#8230; <br /> <a class="read-more" href="https://theserogroup.com/career-development/data-getcha-data-here/">Read more</a></p>
<p>The post <a href="https://theserogroup.com/career-development/data-getcha-data-here/">Data! Getcha Data Here!</a> appeared first on <a href="https://theserogroup.com">The SERO Group</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>Today&#8217;s Braves fans have it easy. When I was a kid, being a Braves fan was hard. It was a character building experience. Young Braves fans in the 1970&#8217;s learned to become optimists and to see the silver lining in even the darkest of clouds. We had to. The team didn&#8217;t give us much to work with.</p>



<p>I didn&#8217;t go to many games, but I remember flipping through the program at a game once. There was a page where fans could <a href="https://www.artofmanliness.com/articles/how-to-score-a-baseball-game-with-pencil-and-paper/" target="_blank" rel="noopener noreferrer">score the game</a>, marking each play as it occurred and helping us to recall afterward the reasons we had hope for the next game.</p>



<h3 class="wp-block-heading">Baseball, a Statistician&#8217;s Game</h3>



<p>Baseball aficionados have long kept detailed records about games played. Statistics exist going all the way back to 1871. Want to know which batter led the league in Strike Outs in 1929? You can. It was the appropriately named Hack Wilson with 83. Who had the most Doubles in a single season? That&#8217;s Earl Webb in 1931 with 67. What pitcher has had the worst ERA with at least 5 games played? Phillies fans may remember Patrick Schuster&#8217;s 2016 performance when he had a whopping 45 ERA in 6 games. </p>



<p>With all this data readily available, Major League Baseball gives IT Professionals a fertile playing field for learning analytics and statistics.</p>



<h3 class="wp-block-heading">Learning Data Science with Baseball</h3>



<p>Want to learn statistical analysis with R or Python? Or perhaps you would like to explore PowerBI? MLB gives us plenty of data with which to play and learn. Want to learn how to join tables and write subqueries? Again MLB has a rich dataset for you. </p>



<p>Here&#8217;s a source of data that I&#8217;ve used for testing and for creating presentations when I speak at conferences and SQLSaturdays: <a href="http://www.seanlahman.com/baseball-archive/statistics/" target="_blank" rel="noopener noreferrer">SeanLahman.com</a>. The site provides free downloads of MLB data going back to 1871. It&#8217;s available in CSV, Microsoft, Access, and SQL Server files.</p>



<p>Download the data and begin experimenting today. And if you want to develop a deeper sense of empathy, just look at the stats for the 1978 Braves, who had a record of 69-93 and gave up 150 more runs than they scored. Now that&#8217;s character building.</p>



<p>Looking for some humorous examples of data analysis gone wrong? <a href="http://theserogroup.com/2018/01/15/beware-of-spurious-correlations-when-analyzing-your-big-data/">Check out Beware of Spurious Correlations when Analyzing Your &#8220;Big Data.&#8221;</a></p>



<p></p>
<p>The post <a href="https://theserogroup.com/career-development/data-getcha-data-here/">Data! Getcha Data Here!</a> appeared first on <a href="https://theserogroup.com">The SERO Group</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://theserogroup.com/career-development/data-getcha-data-here/feed/</wfw:commentRss>
			<slash:comments>6</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">2133</post-id>	</item>
		<item>
		<title>Beware of Spurious Correlations when Analyzing Your &#8220;Big Data&#8221;</title>
		<link>https://theserogroup.com/management/beware-of-spurious-correlations-when-analyzing-your-big-data/</link>
		
		<dc:creator><![CDATA[Joe Webb]]></dc:creator>
		<pubDate>Mon, 15 Jan 2018 16:28:15 +0000</pubDate>
				<category><![CDATA[Data Science and Analytics]]></category>
		<category><![CDATA[Management]]></category>
		<guid isPermaLink="false">http://theserogroup.com/?p=1651</guid>

					<description><![CDATA[<p>Machine Learning. Artificial Intelligence. Data Science. Deep Learning. Big Data Analytics. These terms, and many like them, have been in the news a lot recently. And with good reason. Many organizations are taking their first tentative steps toward sifting through the vast amounts of data collected in disparate systems in search of hidden nuggets of insight to&#8230; <br /> <a class="read-more" href="https://theserogroup.com/management/beware-of-spurious-correlations-when-analyzing-your-big-data/">Read more</a></p>
<p>The post <a href="https://theserogroup.com/management/beware-of-spurious-correlations-when-analyzing-your-big-data/">Beware of Spurious Correlations when Analyzing Your &#8220;Big Data&#8221;</a> appeared first on <a href="https://theserogroup.com">The SERO Group</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>Machine Learning. Artificial Intelligence. Data Science. Deep Learning. Big Data Analytics.</p>
<p>These terms, and many like them, have been in the news a lot recently. And with good reason. Many organizations are taking their first tentative steps toward sifting through the vast amounts of data collected in disparate systems in search of hidden nuggets of insight to improve operations or some other aspect of business. Rushing headlong into a statistical analysis of their systems, they hope to find the next big thing.</p>
<h3>Statistics for Statistic&#8217;s Sake?</h3>
<p>But statistics for statistic&#8217;s sake is not very useful. A deep analysis of data is most interesting and useful when the results can be used to predict future outcomes based on prior correlations with a relatively high degree of certainty.</p>
<p>Baseball, for example, is an industry rife with relatively meaningless statistics. The traditional metrics of batting average and earned run average have proven to add relatively little value when evaluating a players contribution to a team’s chance of winning. Yet they have been given significant weight both on the field and in the back office. These statistics have been used to negotiate contracts, determine batting order, and influence the starting pitching rotation.</p>
<h3>True Key Performance Indicators</h3>
<p>Over the past ten years or so, a ton of other statistics have been identified and found to be far more predictive of successful outcomes. For example, on-base percentage and the ability to draw walks are more indicative of the positive offensive contributions of a player. From a pitching perspective, the ability to get a batter to hit a ground ball translates into more outs and hence more team wins than the traditional ERA metric. If you are interested in sports and data analytics, I would highly recommend the following two books: <a href="http://jwebb.me/16EKw80" target="_blank" rel="noopener noreferrer">Moneyball: The Art of Winning an Unfair Game by Michael Lewis</a> and <a href="http://jwebb.me/2zbmDfv" target="_blank" rel="noopener noreferrer">Big Data Baseball by Travis Sawchik</a>. Both are fascinating and entertaining reads, especially for the data-minded sports fan.</p>
<p>Like Major League Baseball, many organizations are combing through copious amounts of data looking for ways to improve operations, sales, manufacturing, etc. As an example, Target can identify pregnant patrons based on subtle changes in their buying habits, often before they have shared their good news with relatives. Disney has used RFID in its theme parks to track queue length, visitor travel patterns, etc, so they can adjust and improve operations.</p>
<h3>Analyzing Your &#8220;Big Data&#8221;</h3>
<p>As your organization begins to evaluate using data analytics to unearth hidden correlations that may influence your strategic initiatives, remember that to be truly relevant, statistics alone cannot tell the whole story. Subject matter experts from operations, HR, engineering, manufacturing, and other areas within the business should be involved with the project. They can offer great insight and help you to ask better questions. Without their expertise, meaningless correlations can be identified and inadvertently given credence.</p>
<h3>Some Spurious Examples</h3>
<p>Some humorously extreme examples of these spurious correlations can be found at <a href="http://www.tylervigen.com/spurious-correlations" target="_blank" rel="noopener noreferrer">Tylervegin.com</a>. For example, the divorce rate in Maine has an uncanny correlation to the amount of margarine consumed per capita annually. Spreading margarine on a biscuit causes a nasty breakup in Maine? Of course not, but the statistics imply otherwise.</p>
<p><div style="width: 690px" class="wp-caption aligncenter"><img fetchpriority="high" decoding="async" class="wp-image-1654 size-full" src="http://theserogroup.com/wp-content/uploads/2017/12/SC_Maine_Margerine.png" alt="" width="680" height="268" srcset="https://theserogroup.com/wp-content/uploads/2017/12/SC_Maine_Margerine.png 680w, https://theserogroup.com/wp-content/uploads/2017/12/SC_Maine_Margerine-300x118.png 300w" sizes="(max-width: 680px) 100vw, 680px" /><p class="wp-caption-text">Chart courtesy of TylerVigen.com.</p></div></p>
<div style="text-align: center;"></div>
<p>Likewise, the number of drivers killed annually in collisions with trains closely parallels the amount of crude oil imported into the US from Norway. If the U.S. wants to see the death toll drop to zero, lawmakers should ban importing crude oil from Norway? That&#8217;s preposterous. Well, to most of us, anyway. I don&#8217;t want to predict what Congress may do.</p>
<p><div id="attachment_1655" style="width: 690px" class="wp-caption aligncenter"><img decoding="async" aria-describedby="caption-attachment-1655" class="wp-image-1655 size-full" src="http://theserogroup.com/wp-content/uploads/2017/12/SC_Oil_Trains.png" alt="" width="680" height="268" srcset="https://theserogroup.com/wp-content/uploads/2017/12/SC_Oil_Trains.png 680w, https://theserogroup.com/wp-content/uploads/2017/12/SC_Oil_Trains-300x118.png 300w" sizes="(max-width: 680px) 100vw, 680px" /><p id="caption-attachment-1655" class="wp-caption-text">Chart courtesy of TylerVigen.com.</p></div></p>
<div style="text-align: center;"></div>
<p>Obviously, these correlations do not imply causation. And, that&#8217;s the point. Without the right subject matter experts in the room, an analytics team may draw incorrect inferences and lead the organization on a wild goose chase.</p>
<h3>Asking the Right Questions</h3>
<p>Asking the right questions is key. And who better to ask great questions than Subject Matter Experts, end users, etc., when analyzing your &#8220;Big Data?&#8221;</p>
<p>Looking for some data to experiment with? Check out <a href="http://theserogroup.com/2018/09/18/data-getcha-data-here/">Data! Getcha Data Here!</a></p>
<p>Are you inviting the right people to your project&#8217;s table?</p>
<p>The post <a href="https://theserogroup.com/management/beware-of-spurious-correlations-when-analyzing-your-big-data/">Beware of Spurious Correlations when Analyzing Your &#8220;Big Data&#8221;</a> appeared first on <a href="https://theserogroup.com">The SERO Group</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1651</post-id>	</item>
	</channel>
</rss>
