Jekyll2022-08-18T00:48:38+00:00https://sokolj.com/feed.xmlsokolj.comAll Things Data ScienceJohn Sokol, MSSuper Bowl Tableau Dashboard2021-02-28T00:00:00+00:002021-02-28T00:00:00+00:00https://sokolj.com/Super-Bowl-Tableau-Dashboard<!-- <img src="/assets/Super-Bowl-Dashboard/foles.jpg" > -->
<iframe src="https://public.tableau.com/views/SuperBowlWinProbabilities/SuperBowl44-53?:showVizHome=no&:embed=true" width="1030" height="830"></iframe>
<!-- https://public.tableau.com/views/SuperBowlWinProbabilities/SuperBowl44-53?:embed=y&:display_count=yes&publish=yes -->
<p class="notice--info">The Dashboard was updated on 2/28/2020 with the addition of Super Bowl 55, and Super Bowl 34 - 43 data. The original post revolved around the Philadelphia Eagles winning Super Bowl 52.</p>
<h2 id="introduction">Introduction</h2>
<p>As a Giants fan, it was painful to watch the Philadelphia Eagles win their first Super Bowl victory. Gone are the days
of using the end all arguments question “How many Super Bowl rings do the Eagles have?”
The idea is anathema to football fan that has a disdain for the birdgang (Giants, Cowboys, Redskins, Patriots, etc).</p>
<p>But as someone who loves the game of Football, Super Bowl LII was an incredibly entertaining matchup that established several precedents:</p>
<ul>
<li>1,151 total offensive yardage (Patriots 613 and Eagles 513), 200 yards more than any previous Super Bowl.</li>
<li>The Eagles are the first team in NFL history (regular season or postseason) to win a game despite allowing more than 600 yards.</li>
<li>The Eagles are the 4th team in NFL history to win the Super Bowl after having a losing record the year before.
<br />
Source: <a href="http://www.sportingnews.com/nfl/news/super-bowl-52-eagles-patriots-stats-fast-facts-records-milestones/1kbmcltvjrukzzty6cpb8796y">Sportingnews</a>.</li>
</ul>
<p>Considering how entertaining and important this game was, the following question arose:</p>
<blockquote>
<p>What were the key plays, or the turning points, that had the most impact on game? How did the plays affect the likelihood of either team winning?</p>
</blockquote>
<p>As with any data science project, the workflow begins with a good question.</p>
<p>I created a Tableau dashboard to visually answer this question. The time remaining in the game (independent variable) is plotted against win probability (dependent variable) to show the two team’s likelihood of winning after each play.</p>
<!--The true value of Tableau is the ability to swap between Super Bowls in one view, and accessing each play description using tooltips. -->
<h2 id="data-source">Data Source</h2>
<p>Determining the right data source wasn’t easy. American Football does not have the luxury of a statistically mature infastructure like Baseball. But I eventually found the open-source R package <a href="https://github.com/maksimhorowitz/nflscrapR">nflscrapR</a>, written by Makism Horowitz and Ron Yurko that scrapes data from the official NFL API.</p>
<h3 id="nflscrapr">nflscrapR</h3>
<p>The nflscrapR Github page provides several examples of querying the official NFL API to help new users hit the ground running. Two important aspects of the package are the following:</p>
<ul>
<li>Game level analysis.</li>
<li>Data available after the 2009 season.</li>
</ul>
<p>Discovering the data goes back to 2009 made me reassess the scope of my project: why not visualize key plays and win probabilities from <em>every</em> Super Bowl between 2009 and present day?</p>
<table>
<thead>
<tr>
<th>Super Bowl</th>
<th>Date</th>
<th>Away team</th>
<th>Away team score</th>
<th>Home team</th>
<th>Home team score</th>
</tr>
</thead>
<tbody>
<tr>
<td>Super Bowl LIII (53)</td>
<td>3 February 2019</td>
<td><strong>New England Patriots</strong></td>
<td><strong>13</strong></td>
<td>Los Angeles Rams</td>
<td>3</td>
</tr>
<tr>
<td>Super Bowl LII (52)</td>
<td>4 February 2018</td>
<td><strong>Philadelphia Eagles</strong></td>
<td><strong>41</strong></td>
<td>New England Patriots</td>
<td>33</td>
</tr>
<tr>
<td>Super Bowl LI (51)</td>
<td>5 February 2017</td>
<td><strong>New England Patriots</strong></td>
<td><strong>34</strong></td>
<td>Atlanta Falcons</td>
<td>28</td>
</tr>
<tr>
<td>Super Bowl 50</td>
<td>7 February 2016</td>
<td>Carolina Panthers</td>
<td>10</td>
<td><strong>Denver Broncos</strong></td>
<td><strong>24</strong></td>
</tr>
<tr>
<td>Super Bowl XLIX (49)</td>
<td>1 February 2015</td>
<td><strong>New England Patriots</strong></td>
<td><strong>28</strong></td>
<td>Seattle Seahawks</td>
<td>24</td>
</tr>
<tr>
<td>Super Bowl XLVIII (48)</td>
<td>2 February 2014</td>
<td><strong>Seattle Seahawks</strong></td>
<td><strong>43</strong></td>
<td>Denver Broncos</td>
<td>8</td>
</tr>
<tr>
<td>Super Bowl XLVII (47)</td>
<td>3 February 2013</td>
<td><strong>Baltimore Ravens</strong></td>
<td><strong>34</strong></td>
<td>San Francisco 49ers</td>
<td>31</td>
</tr>
<tr>
<td>Super Bowl XLVI (46)</td>
<td>5 February 2012</td>
<td><strong>New York Giants</strong></td>
<td><strong>21</strong></td>
<td>New England Patriots</td>
<td>17</td>
</tr>
<tr>
<td>Super Bowl XLV (45)</td>
<td>6 February 2011</td>
<td>Pittsburgh Steelers</td>
<td>25</td>
<td><strong>Green Bay Packers</strong></td>
<td><strong>31</strong></td>
</tr>
<tr>
<td>Super Bowl XLIV (44)</td>
<td>7 February 2010</td>
<td><strong>New Orleans Saints</strong></td>
<td><strong>31</strong></td>
<td>Indianapolis Colts</td>
<td>17</td>
</tr>
</tbody>
</table>
<h2 id="data-cleaning-in-r">Data cleaning in R</h2>
<p>This step is extremely important in the project workflow. Knowledgeable manipulation skills can save hours of work that can be devoted to data interpretation and implementing good data visualization practices.</p>
<p>Download the nflscrapR package directly from Github in RStudio:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># need 'devtools' to download packages from Github</span><span class="w">
</span><span class="n">install.packages</span><span class="p">(</span><span class="s1">'devtools'</span><span class="p">)</span><span class="w">
</span><span class="n">devtools</span><span class="o">::</span><span class="n">install_github</span><span class="p">(</span><span class="n">repo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"maksimhorowitz/nflscrapR"</span><span class="p">)</span><span class="w">
</span><span class="c1"># load the package</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">nflscrapR</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>The following is R code for retrieving the win probability statistics for Super Bowl LII, and a quick ggplot line graph for exploratory data analysis. The complete R script is available <a href="https://github.com/sokolj1/Super-Bowl-Data-Visualization/blob/master/data_visualization_nflscrapR.R">here</a>.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># import additional libraries for data viz and manipulation</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w">
</span><span class="c1"># extract the statistics for the last game of the 2017 season (Super Bowl LII)</span><span class="w">
</span><span class="n">super_bowl52</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">game_play_by_play</span><span class="p">(</span><span class="n">GameID</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tail</span><span class="p">(</span><span class="n">extracting_gameids</span><span class="p">(</span><span class="m">2017</span><span class="p">,</span><span class="w">
</span><span class="n">playoffs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">),</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">))</span><span class="w">
</span><span class="c1"># queries time remaining after each play, home team win probability, away team win probability, and play description </span><span class="w">
</span><span class="n">eagles_pats</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">super_bowl52</span><span class="o">$</span><span class="n">TimeSecs</span><span class="p">,</span><span class="n">super_bowl52</span><span class="o">$</span><span class="n">Home_WP_post</span><span class="p">,</span><span class="w">
</span><span class="n">super_bowl52</span><span class="o">$</span><span class="n">Away_WP_post</span><span class="p">,</span><span class="w"> </span><span class="n">super_bowl52</span><span class="o">$</span><span class="n">desc</span><span class="p">)</span><span class="w">
</span><span class="c1"># omit erroneous instances where home team win probability == away team win probability</span><span class="w">
</span><span class="n">eagles_pats_final</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">na.omit</span><span class="p">(</span><span class="n">eagles_pats</span><span class="p">[</span><span class="o">!</span><span class="p">(</span><span class="n">eagles_pats</span><span class="o">$</span><span class="n">super_bowl52.Home_WP_post</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">eagles_pats</span><span class="o">$</span><span class="n">super_bowl52.Away_WP_post</span><span class="p">),])</span><span class="w">
</span><span class="c1"># rename columns</span><span class="w">
</span><span class="n">colnames</span><span class="p">(</span><span class="n">eagles_pats_final</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"time_remaining"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Home"</span><span class="p">,</span><span class="w">
</span><span class="s2">"Away"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Play Description"</span><span class="p">)</span><span class="w">
</span><span class="c1"># ggplot of Super Bowl LII for EDA</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">eagles_pats_final</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">time_remaining</span><span class="p">,</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Home</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">geom_line</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">time_remaining</span><span class="p">,</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Home</span><span class="p">,</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#c60c30"</span><span class="p">),</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.7</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_line</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">time_remaining</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Away</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#004953"</span><span class="p">),</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.7</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_x_reverse</span><span class="p">(</span><span class="n">breaks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">3600</span><span class="p">,</span><span class="w"> </span><span class="m">3300</span><span class="p">,</span><span class="w"> </span><span class="m">3000</span><span class="p">,</span><span class="w"> </span><span class="m">2700</span><span class="p">,</span><span class="w"> </span><span class="m">2400</span><span class="p">,</span><span class="w"> </span><span class="m">2100</span><span class="p">,</span><span class="w"> </span><span class="m">1800</span><span class="p">,</span><span class="w"> </span><span class="m">1500</span><span class="p">,</span><span class="w"> </span><span class="m">1200</span><span class="p">,</span><span class="w"> </span><span class="m">900</span><span class="p">,</span><span class="w"> </span><span class="m">600</span><span class="p">,</span><span class="w"> </span><span class="m">300</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">),</span><span class="w">
</span><span class="n">labels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"Kickoff"</span><span class="p">,</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="s2">"End of Q1"</span><span class="p">,</span><span class="s2">""</span><span class="p">,</span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="s2">"Halftime"</span><span class="p">,</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="s2">""</span><span class="p">,</span><span class="s2">"End of Q3"</span><span class="p">,</span><span class="s2">""</span><span class="p">,</span><span class="s2">""</span><span class="p">,</span><span class="s2">"End of Regulation"</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">scale_y_continuous</span><span class="p">(</span><span class="n">labels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">scales</span><span class="o">::</span><span class="n">percent</span><span class="p">,</span><span class="w"> </span><span class="n">limits</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0.10</span><span class="p">,</span><span class="m">1</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">ylab</span><span class="p">(</span><span class="s2">"Win Probability"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">xlab</span><span class="p">(</span><span class="s2">""</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">ggtitle</span><span class="p">(</span><span class="s2">"Super Bowl LII Win Probability Chart"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_color_manual</span><span class="p">(</span><span class="n">values</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="s2">"#004953"</span><span class="p">,</span><span class="w"> </span><span class="s2">"#c60c30"</span><span class="p">),</span><span class="w"> </span><span class="n">labels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"PHI"</span><span class="p">,</span><span class="w"> </span><span class="s2">"NE"</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Source: nflscrapR"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">theme</span><span class="p">(</span><span class="n">panel.background</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
</span><span class="n">axis.line.x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_line</span><span class="p">(</span><span class="n">colour</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#DCDCDC"</span><span class="p">),</span><span class="w">
</span><span class="n">panel.grid.major.y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_line</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="m">.1</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="o">=</span><span class="s2">"#DCDCDC"</span><span class="p">),</span><span class="w"> </span><span class="n">axis.ticks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">())</span><span class="w">
</span></code></pre></div></div>
<p><img src="../assets/Super-Bowl-Dashboard/sb_52_5_19_2019_ai.jpg" align="center" /></p>
<p>A few noticeable observations is the Eagles commanded the greater win probability for a majority of the game, suggesting the Eagles were in the drivers seat with the exception of a few minutes in the first quarter and the final minutes of the game. Although this is a great visualization tool, the graphic doesn’t provide context for the data itself, such as what play occurred that changed the win probability. This involves another layer of data complexity, preferably with plot interactivity. Tableau was my choice to incorporate an interactive data visualization solution.</p>
<p>One nflscrapR attribute for game_play_by_play data is play description after each play, so this is ideal for providing the user context with respect to win probability. After extracting the win probablities and play description, the individual dataframes were concatenated using rbind(), then written to a <a href="https://github.com/sokolj1/sokolj1.github.io/blob/master/assets/Super-Bowl-Dashboard/super_bowl46_52.csv">csv file</a>.</p>
<p>Perhaps the biggest challenge was extracting and tabluating the team scores data. nflscrapR only provides the possession team and defensive team scores. So the scores had to be organized by possession and defense for each team, then joined together by the common fields of TimeRemaining and Super Bowl.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># PHI Score</span><span class="w">
</span><span class="c1"># filters by possession team </span><span class="w">
</span><span class="n">sb52_phi_pos</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">super_bowl52</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="n">super_bowl52</span><span class="o">$</span><span class="n">posteam</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">"PHI"</span><span class="p">)</span><span class="w">
</span><span class="c1"># queries time remaining and scores when Philly possessed the ball </span><span class="w">
</span><span class="n">sb52_phi_pos</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">sb52_phi_pos</span><span class="o">$</span><span class="n">TimeSecs</span><span class="p">,</span><span class="w"> </span><span class="n">sb52_phi_pos</span><span class="o">$</span><span class="n">PosTeamScore</span><span class="p">)</span><span class="w">
</span><span class="c1"># rename columns</span><span class="w">
</span><span class="n">colnames</span><span class="p">(</span><span class="n">sb52_phi_pos</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"TimeRemaining"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Score"</span><span class="p">)</span><span class="w">
</span><span class="c1"># filters by defensive team </span><span class="w">
</span><span class="n">sb52_phi_def</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">super_bowl52</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="n">super_bowl52</span><span class="o">$</span><span class="n">DefensiveTeam</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">"PHI"</span><span class="p">)</span><span class="w">
</span><span class="c1"># queries time remaining and scores when Philly played defense</span><span class="w">
</span><span class="n">sb52_phi_def</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">sb52_phi_def</span><span class="o">$</span><span class="n">TimeSecs</span><span class="p">,</span><span class="w"> </span><span class="n">sb52_phi_def</span><span class="o">$</span><span class="n">DefTeamScore</span><span class="p">)</span><span class="w">
</span><span class="c1"># rename columns</span><span class="w">
</span><span class="n">colnames</span><span class="p">(</span><span class="n">sb52_phi_def</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"TimeRemaining"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Score"</span><span class="p">)</span><span class="w">
</span><span class="c1"># join both possession and defensive dataframes by common field TimeRemaining</span><span class="w">
</span><span class="n">sb52_phi_merge</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">merge</span><span class="p">(</span><span class="n">sb52_phi_pos</span><span class="p">,</span><span class="w"> </span><span class="n">sb52_phi_def</span><span class="p">,</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"TimeRemaining"</span><span class="p">,</span><span class="w"> </span><span class="n">all</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>The dataframe needs to be reversed to show beginning of the game counting down to the end; NA values also need to be removed.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># reverses the dataframe so TimeRemaining is organized in decreasing order, also removes na values </span><span class="w">
</span><span class="n">sb52_phi_scores</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cbind</span><span class="p">(</span><span class="n">sb52_phi_merge</span><span class="p">[</span><span class="m">1</span><span class="p">],</span><span class="w"> </span><span class="n">mycol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">apply</span><span class="p">(</span><span class="n">sb52_phi_merge</span><span class="p">[</span><span class="m">-1</span><span class="p">],</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">max</span><span class="p">,</span><span class="w"> </span><span class="n">na.rm</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">))</span><span class="w">
</span><span class="n">sb52_phi_scores</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sb52_phi_scores</span><span class="p">[</span><span class="nf">dim</span><span class="p">(</span><span class="n">sb52_phi_scores</span><span class="p">)[</span><span class="m">1</span><span class="p">]</span><span class="o">:</span><span class="m">1</span><span class="p">,]</span><span class="w">
</span><span class="c1"># rename columns</span><span class="w">
</span><span class="n">colnames</span><span class="p">(</span><span class="n">sb52_phi_scores</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"TimeRemaining"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Away"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># merge both PHI and NE Scores in one dataframe</span><span class="w">
</span><span class="n">sb52_scores</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">merge</span><span class="p">(</span><span class="n">sb52_phi_scores</span><span class="p">,</span><span class="w"> </span><span class="n">sb52_ne_scores</span><span class="p">,</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"TimeRemaining"</span><span class="p">)</span><span class="w">
</span><span class="n">sb52_scores</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">sb52_scores</span><span class="p">[</span><span class="nf">dim</span><span class="p">(</span><span class="n">sb52_scores</span><span class="p">)[</span><span class="m">1</span><span class="p">]</span><span class="o">:</span><span class="m">1</span><span class="p">,],</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="s2">"Super Bowl 52"</span><span class="p">,</span><span class="w"> </span><span class="n">nrow</span><span class="p">(</span><span class="n">sb52_scores</span><span class="p">)))</span><span class="w">
</span><span class="n">colnames</span><span class="p">(</span><span class="n">sb52_scores</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"TimeRemaining"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Home"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Away"</span><span class="p">,</span><span class="s2">"Super Bowl"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>This code is just for tabluating the scores for the Philadelphia Eagles. Unfortunately, not all the data was clean. I cross validated the scores after each significant play with ESPN, and for a few games the scores were incorrect. A notable example was Super Bowl 50, so I had to manually correct the scores of the dataframe with the appropriate timeRemaining value. Albeit a tedious process, the result of rigourous data cleaning was another separate <a href="https://github.com/sokolj1/sokolj1.github.io/blob/master/assets/Super-Bowl-Dashboard/super_bowl_scores.csv">csv file</a> that contains the time remaining, home and away scores, and corresponding Super Bowl. Now this data is ready for visualization.</p>
<h2 id="tableau-dashboard">Tableau Dashboard</h2>
<p>The end product is the dashboard embeded using Tableau Public. The workbook can be downloaded by clicking on the ‘download’ icon on the bottom right corner of the dashboard.</p>
<p>Since I created this dashboard for a data visualization course, I incorporated the three major concepts learned during the course: Tufte’s Principles, Kosslyn’s Principles, and Cairo’s Wheel. A few examples these ideas implemented are described below:</p>
<ul>
<li>Tufte’s Principles:
<ol>
<li>Above all else, show the data. The focus should be on the content of the data.</li>
<li>Data-ink / total ink. One should maximize the data-ink ratio, within reason.</li>
<li>Emphasize important data elements using color.
<br /></li>
</ol>
</li>
<li>Kosslyn’s Principles:
<ol>
<li>The color scheme should be cogent and simple to understand in context of the visualized data</li>
<li>Understanding breaks down once too much information. Provide neither too much or too little information.</li>
<li>Provide neither too much or too little information.
<br /></li>
</ol>
</li>
<li>Cairo’s Wheel:
<ol>
<li>Functionality (0.9) to Decoration (0.1): Measures the embellishment/decorative detail incorporated into the visualization.</li>
<li>Multidimensionality (0.7) to Familiarity (0.2): Assess the structure of the visualization and the dimensions used to access additional data</li>
<li>Density (0.7) to Lightness (0.3): Measures the granularity of the data.</li>
</ol>
</li>
</ul>
<p>I hope this post serves as an example for how powerful Tableau dashboards can be for visualizing data.</p>John Sokol, MSDashboard visualization of Super Bowl XXXIV - LV win probabilities and play descriptionsMGMT 3125 - Introduction to Data Visualization2019-01-12T00:00:00+00:002019-01-12T00:00:00+00:00https://sokolj.com/MGMT-3125-Introduction-to-Data-Visualization<h2 id="course-information">Course Information</h2>
<h3 id="description">Description</h3>
<p>Introduction to Data Visualization provides an overview of business analytics, including the process of business intelligence and dashboard design, principles of data visualization, and effectively communicating data stories. The course uses Excel, Tableau and Rstudio.</p>
<p>This the first course with a comprehensive overview of the fundamental concepts and tools of business analytics to improve decision making and performance. This is a hands-on course that is designed to introduce both the principles of data visualization and industry standard data visualization software. Students will learn visual representation methods and techniques that increase the understanding of complex data models. Emphasis is placed on the identification of patterns, trends and notable differences in various datasets. Excel, Tableau and R are used to apply the abovementioned data visualization principles.</p>
<h3 id="syllabus">Syllabus</h3>
<ul>
<li><a href="/assets/mgmt_3125/MGMT3125_Spring2019_.pdf">Download here</a></li>
</ul>
<h3 id="textbooks">Textbooks</h3>
<ul>
<li>
<p>Storytelling with Data: A Data Visualization Guide for Business Professionals, by Cole Nussbaumer Knaflic, 1st edition, Wiley. ISBN-13: 978- 1119002253.</p>
</li>
<li>
<p>Data Visualization: A Practical Introduction, by Kieran Healy. Princeton. ISBN-13: 978-0691181622. <a href="http://socviz.co">Available online</a>.</p>
</li>
</ul>
<h3 id="datasets">Datasets</h3>
<p>Datasets applicable to each assignment are provided under the respective assignment header, and all datasets are indexed here for reference.</p>
<ul>
<li><a href="/assets/mgmt_3125/week1/IT tickets by location, month (2017).xlsx">IT tickets by month, location</a></li>
<li><a href="/assets/mgmt_3125/week3/Life Expectancy Final.xlsx">World life expectancies</a></li>
<li><a href="/assets/mgmt_3125/week7/Walmart Retail Sales 2012-2015.csv">Walmart retail sales</a></li>
<li><a href="/assets/mgmt_3125/week7/titanic_clean_.csv">RMS Titanic</a></li>
<li><a href="/assets/mgmt_3125/super_bowl44_52.csv">Super Bowl 44 - 52 win probability</a></li>
<li><a href="/assets/mgmt_3125/week1/denver_crime.csv">Denver crime rates</a></li>
<li><a href="/assets/mgmt_3125/week2/Sokol Heart Rate Data.csv">Sokol heart rate</a></li>
<li><a href="/assets/mgmt_3125/week3/marriage_by_age.zip">Marriage by age</a></li>
<li><a href="/assets/mgmt_3125/week5/American University data.csv">American university data</a></li>
</ul>
<h2 id="week-1---123">Week 1 - 1/23</h2>
<h3 id="chapter-1-the-importance-of-context">Chapter 1: The importance of context</h3>
<p>To consider context is strongly encouraged before building any visualization. The following are underlying situations to take into consideration:</p>
<ul>
<li>Who is your audience?</li>
<li>What do you need them to know or do?</li>
<li>How will you communicate? What is the medium of choice?</li>
</ul>
<h4 id="exploratory-vs-explanatory-analysis">Exploratory vs. explanatory analysis</h4>
<ul>
<li>Exploratory: understanding the data to determine what is noteworthy to show the audience</li>
<li>Explanatory: the specific story about the data to tell the audience</li>
</ul>
<p>Too often the visualization author shows the exploratory analysis instead of the important explanatory analysis the audience needs to know.</p>
<h4 id="the-who-what-and-how-of-context">The who, what and how of context</h4>
<p>Who: Very important to understand the intended audience and how they perceive you. This will shape how you approach creating the visualization.</p>
<ul>
<li>Helps to identify common ground to help relay your findings to your audience</li>
<li>The more specific identification of the audience, the better. Just saying clients/key stakeholders usually isn’t specific enough</li>
</ul>
<p>What: What do you need your audience to know?</p>
<ul>
<li>Should be clear how you want your audience to react</li>
<li>Take into account overall tone of visualization and/or presentation</li>
<li>If you are analyzing and communicating the data, you are considered the ‘subject matter expert’</li>
<li>Be confident with your findings! Don’t let fear of public speaking decrease your credibility</li>
</ul>
<p>How: Mechanism of conveying the information</p>
<ul>
<li>Live presentations should have little detail considering you’re available to answer questions</li>
<li>Static documents demand generous detail to cover all the bases for possible client/stakeholder questions</li>
</ul>
<h4 id="the-big-idea">The big idea</h4>
<p>A single sentence summary of the study background information, issue the study is trying to solve, and the outcome of the study. Three components of The Big Idea:</p>
<ul>
<li>The idea must articulate your unique point of view</li>
<li>The idea must convey what is at stake</li>
<li>The idea must be a complete sentence</li>
</ul>
<h4 id="storyboarding">Storyboarding</h4>
<p>Method of organizing ideas by creating a visual outline for the content of the visualization.</p>
<ul>
<li>Our minds do not think linearly, rather our thoughts are scattered. Time and effort must be put forth to organize them properly.</li>
<li>Pen and paper work great for this task. Alternatively, if you have a Mac or iOS software, I recommend <a href="https://mindnode.com">MindNode</a></li>
</ul>
<h3 id="assignment-1---excel-graphs">Assignment 1 - Excel graphs</h3>
<ul>
<li><a href="/assets/mgmt_3125/week1/Assignment 1 - Excel Graphs_s19.pdf">Assignment 1</a></li>
<li><a href="/assets/mgmt_3125/week1/IT tickets by location, month (2017).xlsx">IT tickets</a></li>
</ul>
<h3 id="week-1-lecture-slides">Week 1 lecture slides</h3>
<ul>
<li><a href="/assets/mgmt_3125/week1/Week1_1_23_lecture.pdf">Week 1 lecture slides</a></li>
</ul>
<h2 id="week-2---130">Week 2 - 1/30</h2>
<h3 id="chapter-2-choosing-an-effective-visualization">Chapter 2: Choosing an effective visualization</h3>
<p>Choosing the right visualization is typically selected from 8 traditional methods. Knaflic elaborates on the scenarios that fit each visualization best.</p>
<h4 id="the-eight-common-visualizations">The eight common visualizations</h4>
<ol>
<li>Simple text
<ul>
<li>Works great for just a number or two.</li>
<li>Refrain from taking up excessive space with a bar graph when simple text will suffice to compare two values.</li>
<li>Be creative - make the font of the number you’re trying to convey extremely large; also use color to your advantage by highlighting poignant parts of the study.</li>
</ul>
</li>
<li>Table
<ul>
<li>Useful to look at the raw data in the build phase, especially if the client/stakeholder requests it.</li>
<li>Use light or minimal table borders so the observer can concentrate on the data</li>
<li>Do not use tables for presentations; a well designed graph will convey insights much faster than a table with solely numerical values</li>
</ul>
</li>
<li>Heatmap (also known as a treemap)
<ul>
<li>Method of visualizing table data</li>
<li>Use color saturation to create visual cues, shortening the amount of time it takes to get from initial observation to data insight</li>
</ul>
</li>
<li>Scatterplot
<ul>
<li>Effective for showing a relationship between two fields</li>
<li>Include baseline statistics such as the median or average values so the observer can compare the plotted values with these summary statistics</li>
</ul>
</li>
<li>Line graph
<ul>
<li>Optimal for visualizing time series data and also more than one series of data (more than one line).</li>
<li>I like visualizing summary statistics within a line graph, such as the average representing the average of the time series data, then the max and min values as a range of values that reach above and below the average.</li>
<li>Line graphs do not need a zero baseline, because the graph is comparing data relative to each data point. Nonetheless, a good practice is to let the audience know you’re baseline is non-zero.</li>
</ul>
</li>
<li>Slopegraph
<ul>
<li>Great for visualizing two time periods or points of comparison. Shows increases or decreases between two data points.</li>
<li>Use a slope graph if you want to show increasing or decreasing values without showing what occurred in the middle of a particular timeframe.</li>
<li>Employ color to emphasize a trend that will be the focal point of your storytelling discussion.</li>
</ul>
</li>
<li>Vertical bar graph
<ul>
<li>Go-to visualization for plotting categorical data (the information is organized into groups).</li>
<li>Viewers are so familiar with bar charts that this is the quickest chart to convey valuable data insights to people that matter.</li>
<li>Always needs to have a zero baseline, or else there will be misconceptions about the graph.</li>
<li>Common decision is to preserve the axis labels or eliminate them entirely to label the data points. If you’re focusing on big picture trends, then leave the axis labels. If you’re focused on the specific numerical values, then label the data points directly.</li>
</ul>
</li>
</ol>
<!-- 8. Horizontal bar graph -->
<h4 id="graphs-to-be-avoided">Graphs to be avoided</h4>
<p>Area Graphs</p>
<ul>
<li>Avoid area graphs, as human eyes are optimized for attributing quantitative value to two-dimensional space.</li>
<li>Same principle applies with pie charts; replace pie charts with horizontal bar charts.</li>
</ul>
<p>Secondary y-axis: generally not a good idea</p>
<ul>
<li>Avoid the use of a secondary or right hand y-axis</li>
<li>Don’t show the second y-axis. Instead, label the data points that belong on this axis directory</li>
<li>Pull the graphics apart vertically and have a separate y-axis for each (both along the left) but leverage the same x axis across both</li>
</ul>
<p>There isn’t a single correct visual display. Rather, there are different types of visuals that could meet a given need. The most important aspect of visual build/design is <em>What do you need your audience to know?</em> Choose a visual display to make this abundantly clear.</p>
<h3 id="assignment-2---introduction-to-tableau">Assignment 2 - Introduction to Tableau</h3>
<ul>
<li><a href="/assets/mgmt_3125/week2/Assignment 2 - Intro Tableau s19.pdf">Assignment 2</a></li>
</ul>
<h3 id="week-2-lecture-slides">Week 2 lecture slides</h3>
<ul>
<li><a href="/assets/mgmt_3125/week2/Week2_1_30_lecture.pdf">Week 2 lecture slides</a></li>
</ul>
<h3 id="week-2-videos">Week 2 Videos</h3>
<h4 id="tableau-fundamentals">Tableau fundamentals</h4>
<iframe id="ytplayer" type="text/html" width="640" height="360" src="https://www.youtube.com/embed/4VzmdekIw00" frameborder="0" allowfullscreen=""></iframe>
<h4 id="changing-tableau-data-sources">Changing Tableau data sources</h4>
<iframe id="ytplayer" type="text/html" width="640" height="360" src="https://www.youtube.com/embed/yXqC_cGpIMg" frameborder="0" allowfullscreen=""></iframe>
<h2 id="week-3---26">Week 3 - 2/6</h2>
<h3 id="chapter-3-clutter-is-your-enemy">Chapter 3: Clutter is your enemy!</h3>
<h4 id="cognitive-load--clutter">Cognitive load & clutter</h4>
<ul>
<li>Cognitive load: Every single element added to a page takes up cognitive load on the part of the audience.</li>
<li>Identify anything that doesn’t add informative value. Remove these elements.</li>
<li>
<p>Maximize the data ink ratio or signal to noise ratio to relieve cognitive load.</p>
</li>
<li>Clutter: Visual elements that take up space but don’t increase understanding.</li>
<li>Why reduce clutter? Clutter makes visualizations more complicated than necessary</li>
</ul>
<p>Employ the Gestalt Principles of Visual Perception to identify communicative elements and noisy elements.</p>
<h4 id="gestalt-principles-of-visual-perception">Gestalt Principles of visual perception</h4>
<ul>
<li>Proximity: Tendency to think of physically close objects as belonging to a group.</li>
<li>Similarity: Objects that are of similar color, shape, size, or orientation are perceived as related or belonging to a group.</li>
<li>Enclosure: Objects that are physically enclosed together as belonging to a group.</li>
<li>Closure: When parts of a whole are missing, our eyes fill in the gap; This principle renders chart borders and background shading unnecessary.</li>
<li>Continuity: When looking at objects, our eyes seek the smoothest path and naturally create continuity in what we see even where it may not explicitly exist.</li>
<li>Connection: Connective property typically has a strong associate value than similar color, size, or shape.</li>
</ul>
<p>Without obvious visual cues, the audience will typically start at the top left of a visualization, then move their eyes in a “Z” shape across the page or screen as they take in information</p>
<ul>
<li>Because of this, upper left justify text (title, axis, legends, etc).</li>
<li>Generally, diagonal elements such as lines connecting text to visual attributes should be avoided.</li>
<li>White space is akin to pauses in sentences; use white space strategically to draw attention to the parts of the page that are not white space.</li>
<li>Contrast: The more things we make different, the lesser the degree to which any of them stand out.</li>
</ul>
<h4 id="decluttering-strategies">Decluttering strategies</h4>
<ul>
<li>Remove chart border</li>
<li>Remove gridlines</li>
<li>Remove data markers</li>
<li>Clean up axis labels</li>
<li>Label data directly</li>
<li>Leverage consistent color</li>
</ul>
<p>Anytime you put information in front of an audience, you are creating cognitive load and asking them to use their brain power to process that information. Visual clutter creates excessive cognitive load that can hinder the transmission of your message. The Gestalt principles helps identify and remove unnecessary visual elements. Leverage alignment of elements and maintain white space to help make the interpretation of visuals a comfortable experience for the audience.</p>
<h3 id="assignment-3---tableau-line-and-bar-graphs">Assignment 3 - Tableau Line and bar graphs</h3>
<ul>
<li><a href="/assets/mgmt_3125/week3/Assignment 3 - Line, bar graphs s19.pdf">Assignment 3</a></li>
</ul>
<h3 id="week-3-lecture-slides">Week 3 lecture slides</h3>
<ul>
<li><a href="/assets/mgmt_3125/week3/Week3_2_6_lecture.pdf">Week 3 lecture slides</a></li>
</ul>
<h3 id="week-3-videos">Week 3 Videos</h3>
<h4 id="tableau-bar-graphs">Tableau bar graphs</h4>
<iframe id="ytplayer" type="text/html" width="640" height="360" src="https://www.youtube.com/embed/ey42xjtdtHw" frameborder="0" allowfullscreen=""></iframe>
<h4 id="tableau-line-graphs">Tableau line graphs</h4>
<iframe id="ytplayer" type="text/html" width="640" height="360" src="https://www.youtube.com/embed/ThTu-47Qd68" frameborder="0" allowfullscreen=""></iframe>
<h2 id="week-4---213">Week 4 - 2/13</h2>
<h3 id="chapter-4-focus-your-audiences-attention">Chapter 4: Focus your audience’s attention</h3>
<h4 id="actively-engage-the-audience">Actively engage the audience</h4>
<p>Pre-attentive attributes</p>
<ul>
<li>Visualization attributes such as size, color, and position that are strategically employed to highly important aspects of the data.</li>
<li>Also used to create a visual hierarchy of elements to lead the audience through the information I want to communicate.</li>
</ul>
<p>It is important to understand how our audience sees and processes information puts ourselves in a better position to communicate effectively.</p>
<p>There are three types of memory that is important to understand for designing effective visual communication:</p>
<ol>
<li>Iconic memory
<ul>
<li>Information that that stays in memory for a fraction of a second before you sub-consciously decide to remove the memory or move it to short-term memory</li>
</ul>
</li>
<li>Short-term memory
<ul>
<li>People can keep about four chunks of visual information in their short-term memory at once</li>
<li>Emphasizing a large amount of information on a visualization places an unnecessary burden on our audience, and thereby lose our ability to communicate effectively.</li>
</ul>
</li>
<li>Long-term memory
<ul>
<li>Short term memory either goes to long term memory or our brain removes it. Aggregate of visual and verbal memory.</li>
<li>Built up over a lifetime and is vitally important for pattern recognition and general cognitive processing.</li>
</ul>
</li>
</ol>
<blockquote>
<p>By using pre-attentive attributes strategically, we can enable our audience to see what we want them to see before they even know they’re seeing it!</p>
</blockquote>
<p>When used sparingly, pre-attentive attributes can be extremely useful for:</p>
<ul>
<li>Drawing your audiences attention quickly to where you want them to look</li>
<li>Creating a visual hierarchy of information</li>
</ul>
<p>Some pre-attentive attributes grabs your attention, such as color and size, whereas italics achieve a milder emphasis.</p>
<h4 id="create-a-visual-hierarchy-of-information">Create a visual hierarchy of information</h4>
<p>There are variances within specific pre-attentive attributes that will draw attention with more or less strength</p>
<ul>
<li>For example, a bright blue will draw more attention than a muted blue. Both will draw more attention than light gray</li>
<li>Leverage this variance to make the visuals scannable by emphasizing some components and de-emphasizing others. This establishes implicit instructions for the audience</li>
</ul>
<p>Size:</p>
<ul>
<li>Relative size denotes relative importance</li>
<li>If visualization attributes are equally important, then make them equally as big. Otherwise, if one this is really important, then make that BIG</li>
</ul>
<p>Color:</p>
<ul>
<li>Don’t make an attribute colorful just to make it colorful. Leverage color selectively as a strategic tool to highlight the important parts of the visualization. Use of color should always be intentional.</li>
<li>Start out by creating a visualization with all shades of gray, then pick a single bold color to draw attention to where you want it.</li>
<li>Don’t use black as a base color, as color stands out more against gray than black.</li>
<li>Use color sparingly and consistently</li>
<li>Design with the colorblind in mind (blue and orange instead of green and red)</li>
<li>Consider leveraging brand colors</li>
</ul>
<p>Position:</p>
<ul>
<li>Without other visual cues, most members of an audience will start at the top-left and scan with their eyes in a zig-zag motion across the screen.</li>
<li>Top-left, top-right, bottom-left, bottom-right</li>
</ul>
<h3 id="assignment-4---calculated-fields-and-parameters">Assignment 4 - Calculated fields and parameters</h3>
<ul>
<li><a href="/assets/mgmt_3125/week4/Assignment 4 - Calculated Fields s19.docx .pdf">Assignment 4</a></li>
</ul>
<h3 id="week-4-lecture-slides">Week 4 lecture slides</h3>
<ul>
<li><a href="/assets/mgmt_3125/week4/Week4_2_13_lecture.pdf">Week 4 lecture slides</a></li>
</ul>
<h3 id="week-4-videos">Week 4 Videos</h3>
<h4 id="tableau-calculated-fields">Tableau calculated fields</h4>
<iframe id="ytplayer" type="text/html" width="640" height="360" src="https://www.youtube.com/embed/DMgbqUQYiDo" frameborder="0" allowfullscreen=""></iframe>
<h4 id="tableau-parameters">Tableau parameters</h4>
<iframe id="ytplayer" type="text/html" width="640" height="360" src="https://www.youtube.com/embed/lmkA_pzcQ6Q" frameborder="0" allowfullscreen=""></iframe>
<h2 id="week-6---227">Week 6 - 2/27</h2>
<h3 id="chapter-5-think-like-a-designer">Chapter 5: Think like a designer</h3>
<p>This chapter explains how traditional design considerations are applied to create visualizations
and persuade an audience to accept your visual designs.</p>
<h4 id="highlight-the-important-stuff">Highlight the important stuff</h4>
<p>One main theme of the textbook is to employ preattentive attributes to draw attention to
specific points of the visualization. This section sheds lights onto how much of the visual should
be highlighted; as highlighted items in the visual increase, the power of highlighting effects to draw
the audience to a specific talking point decreases.</p>
<blockquote>
<p>At most, it is recommended only 10% of the visual design be highlighted - Universal Principles of Design (Lidwell, Holden, and Butler, 2003).</p>
</blockquote>
<p><strong>Bold</strong>, <em>italics</em>, and <u>underline</u></p>
<ul>
<li>Use for titles, labels, captions and short word sequences. Bold preferred over italics as this adds minimal noise to the visual.</li>
</ul>
<p>Typeface</p>
<ul>
<li>Avoid using different fonts, as this can easily disrupt aesthetics.</li>
</ul>
<p>Color</p>
<ul>
<li>Highly effective when used strategically and sparingly to draw your audience to a visual attribute you want to emphasize. It is wise to make all other attributes gray, while making your important attribute(s) a color that stands in clear contrast such as bright blue, orange, or red.</li>
</ul>
<p>Size</p>
<ul>
<li>Size is another great preattentive to use to draw attention from your audience.</li>
</ul>
<h4 id="eliminate-distractions">Eliminate distractions</h4>
<p>Make spartan decisions as to which visual attributes are expendable.</p>
<ul>
<li>
<p>Not all data is equally important.</p>
</li>
<li>
<p>Consider if summarizing is appropriate instead of including extraneous detail.</p>
</li>
<li>
<p>Ask yourself, would eliminating this attribute change anything? If the answer is no, then remove it!</p>
</li>
<li>
<p>Push less important information to the background. Let your prominent attributes stand out.</p>
</li>
</ul>
<h4 id="dont-overcomplicate">Don’t overcomplicate</h4>
<blockquote>
<p>The more complicated a visualization looks, the more time the audience perceives it will take to understand and the less likely they are to spend time to understand it… between simple and complicated, favor simple. - Cole Knaflic</p>
</blockquote>
<p>Make your visualization legible</p>
<ul>
<li>Use consistent, easily readable font.</li>
</ul>
<p>Use simple language</p>
<ul>
<li>Refrain from using convoluted jargon, and when in doubt, define a word or phase that isn’t assumed to be common knowledge.</li>
</ul>
<h4 id="text-is-your-friend">Text is your friend</h4>
<p>Use text in your visualization to provide context by labeling data points, introducing the visual, reinforcing key concepts, and ultimately to tell a cohesive story.</p>
<p>Text elements in visuals that are necessary</p>
<ul>
<li>Title.</li>
<li>Axis labels (unless the title conveys this in a straightforward manner).</li>
<li>Short conclusive interpretive statement (see below)</li>
</ul>
<blockquote>
<p>Don’t assume the audience will intrepret the visual the same way you do. If there is a conclusion you want the audience to meet, state it plainly!</p>
</blockquote>
<h4 id="aesthetics">Aesthetics</h4>
<p>Beauty is one of the most important considerations for your audience to accept the message of your visual. Taking time to create visually appealing design means the audience will have more patience with the visual.</p>
<p>Color</p>
<ul>
<li>We have already discussed the important of wise color use. Remember, use sparingly and strategically.</li>
</ul>
<p>Alignment</p>
<ul>
<li>Create clean vertical and horizontal lines that denote attribute organization.</li>
</ul>
<p>White Space</p>
<ul>
<li>Preserve margins and don’t add unnecessary elements just to fill in extra space.</li>
</ul>
<h3 id="assignment-5-dashboard-design-and-tableau-parameters">Assignment 5: Dashboard design and Tableau parameters</h3>
<ul>
<li><a href="/assets/mgmt_3125/week6/Assignment 5 - Dashboard, parameters s19_final_.pdf">Assignment 5</a></li>
</ul>
<h3 id="week-6-lecture-slides">Week 6 lecture slides</h3>
<ul>
<li><a href="/assets/mgmt_3125/week6/Week6_2_27_lecture.pdf">Week 6 lecture slides</a></li>
</ul>
<h3 id="week-6-videos">Week 6 Videos</h3>
<h4 id="building-tableau-dashboards">Building Tableau dashboards</h4>
<iframe id="ytplayer" type="text/html" width="640" height="360" src="https://www.youtube.com/embed/5Em1ZTzfnPE" frameborder="0" allowfullscreen=""></iframe>
<h2 id="week-7---36">Week 7 - 3/6</h2>
<h3 id="mid-term-exam">Mid-term exam</h3>
<ul>
<li><a href="/assets/mgmt_3125/mid_term/Mid-term study guide_.pdf">Mid-term study guide</a></li>
</ul>
<h2 id="week-9---320">Week 9 - 3/20</h2>
<h3 id="chapter-6-dissecting-model-visuals">Chapter 6: Dissecting model visuals</h3>
<p>This chapter provides several examples of implementing the data visualization concepts explained
above by discussing the thought process behind why each visualization is effective.</p>
<p>As emphasized throughout this book, it is strongly encouraged to consider your audience. How will they process the information? What attributes of the visualization should be emphasized and de-emphasized? The following are very important to think about:</p>
<ul>
<li>Choice of visual</li>
<li>Color</li>
<li>Size</li>
<li>Relative ordering of data</li>
<li>Alignment and positioning of elements</li>
<li>Use of words</li>
</ul>
<p>Examining each model visual will help reinforce the concepts of chapters 1 - 5.</p>
<h4 id="model-visual-1-line-graph">Model visual #1: Line graph</h4>
<p><img src="/assets/mgmt_3125/week6/c06f001.png" /></p>
<p>Words are used appropriately. Everything, including graph title, vertical and horizontal axis titles are present. The two lines are directly labeled so there is no need for a legend. The viewers eye’s are drawn to the “Progress to date” trend due to the following pre-attentive attributes:</p>
<ul>
<li>Notable line size contrast</li>
<li>Dark blue color</li>
<li>Presence of data marker on final point</li>
<li>Size of text</li>
</ul>
<p>Notice how the author establishes a visual hierarchy of most significant to least significant with color: dark blue, light blue, dark gray, light gray.</p>
<p>The author thought thinking about numbers in the thousands isn’t intuitive, so the zeros in the y-axis label are preserved.</p>
<h4 id="model-visual-2-annotated-line-graph-with-forecast">Model visual #2: Annotated line graph with forecast</h4>
<p><img src="/assets/mgmt_3125/week6/c06f002.png" /></p>
<p>The difference between solid and dotted line distinguishes between actual data and forecast data. Clear labeling of actual and forecast helps clarify this distinction.</p>
<p>Everything has been pushed to the background with the exception of the graph title, dates in the text annotation, data, data markers, and numeric data labels. Historical data points are not labeled, only the forecast data to give the viewer a clear understanding of forward-looking expectations.</p>
<h4 id="model-visual-3-100-stacked-bars">Model visual #3: 100% stacked bars</h4>
<p><img src="/assets/mgmt_3125/week6/c06f003.png" /></p>
<p>The graph title, legend, and vertical y-axis title are all aligned in the upper-left-most position, creating a clean sense of organization on the left side of the graph. On the right side, the text at the top and the final red bars of data are aligned together.</p>
<p>Red is used as the single attention grabbing color, while dark to light shades of gray push all other data to the background. The change over time in the percentage of projects that meet their goals is more difficult to compare considering there is no consistent baseline at either the top or bottom of the graph, but given this is lower priority comparison, that is ok.</p>
<p>The graph uses supercategories (years to quarters) for organization and reference. The words at the top right reinforce what we should be paying attention to (from Q3 onward a significant amount of missed goals).</p>
<h4 id="model-visual-5-horizontal-stacked-bars">Model visual #5: Horizontal stacked bars</h4>
<p><img src="/assets/mgmt_3125/week6/c06f005.png" /></p>
<p>A stacked bar graph makes sense due to the 3 categories of survey priorities (most important, 2nd most important, 3rd most important). The author uses a horizontal bar graph so the category names are easy to read. The categories are organized vertically in descending order by total % of development priorities. Words are used well, as all graph elements are titled and labeled. The x-axis is eliminated altogether.</p>
<h4 id="in-closing">In closing</h4>
<p>To create a habit of employing sound data visualization concepts into everyday workflow, we can learn from
effective graphs and consider the design choices made to create them. This chapter reviewed the following concepts:</p>
<ul>
<li>Choice of graph</li>
<li>Ordering of data</li>
<li>Emphasize and de-emphasize graph components through color, thickness and size</li>
<li>Alignment and positioning of elements</li>
<li>Appropriate use of text</li>
</ul>
<p>It is important to critically analyze the visualizations that you encounter to consider if each element/attribute in the visualization has purpose and is intentionally implemented.</p>
<h3 id="chapter-7-lessons-in-storytelling">Chapter 7: Lessons in storytelling</h3>
<p>Great storytelling is an incredibly underrated and powerful and skill. The author can devote a great amount of time creating the perfect visualization, but if the author fails to communciate a compelling story, the message won’t be as impactful, or worse, obfuscated.</p>
<p>This chapter is all about leveraging the power of storytelling to communicate effectively with data.</p>
<h3 id="week-9-lecture-slides">Week 9 lecture slides</h3>
<ul>
<li><a href="/assets/mgmt_3125/week9/Week9_3_20_lecture.pdf">Week 9 lecture slides</a></li>
</ul>
<h3 id="week-9-videos">Week 9 videos</h3>
<h4 id="tableau-text-tables">Tableau Text Tables</h4>
<iframe id="ytplayer" type="text/html" width="640" height="360" src="https://www.youtube.com/embed/HNRdr_RvDWY" frameborder="0" allowfullscreen=""></iframe>
<h2 id="week-10---327">Week 10 - 3/27</h2>
<h3 id="r-line-graphs">R line graphs</h3>
<h3 id="week-10-lecture-slides">Week 10 lecture slides</h3>
<ul>
<li><a href="/assets/mgmt_3125/week10/Week10_3_27_lecture.pdf">Week 10 lecture slides</a></li>
</ul>
<h3 id="assignment-6-r-line-graphs">Assignment 6: R line graphs</h3>
<ul>
<li><a href="/assets/mgmt_3125/week10/Assignment 6 - R line graphs.pdf">Assignment 6</a></li>
</ul>
<h2 id="week-12---410">Week 12 - 4/10</h2>
<h3 id="r-bar-graphs-and-scatter-plots">R bar graphs and scatter plots</h3>
<h3 id="week-12-lecture-slides">Week 12 lecture slides</h3>
<ul>
<li><a href="/assets/mgmt_3125/week12/Week12_4_10_lecture.pdf">Week 12 lecture slides</a></li>
</ul>
<h3 id="assignment-7-rstudio-bar-graphs-and-scatter-plots">Assignment 7: Rstudio bar graphs and scatter plots</h3>
<ul>
<li><a href="/assets/mgmt_3125/week12/Assignment 7 - R bar graphs scatter.pdf">Assignment 7</a></li>
</ul>
<h2 id="week-13---417">Week 13 - 4/17</h2>
<h3 id="week-13-lecture-slides">Week 13 lecture slides</h3>
<ul>
<li><a href="/assets/mgmt_3125/week13/Week13_4_17.pdf">Week 13 lecture slides</a></li>
</ul>
<h3 id="uploading-dashboards-to-tableau-public">Uploading dashboards to Tableau Public</h3>
<ul>
<li><a href="/assets/mgmt_3125/week11/How to Publish to Tableau Public.pdf">Publish to Tableau Public directions</a></li>
</ul>
<h2 id="week-15---51">Week 15 - 5/1</h2>
<h3 id="final-project-presentations">Final project presentations</h3>John Sokol, MSI currently teach this course. Repository for notes, lecture videos, and assignments