Data Analysis and Visualization

#Data-Analysis-and-Visualization

For today's workshop we will be using the pandas library, the matplotlib library, and the seaborn library. Also, we will read data from the web with the pandas-datareader. By the end of the workshop, participants should be able to use Python to tell a story about a dataset they build from an open data source.

GOALS:

  • Understand basic functionality of Pandas DataFrame
  • Use Matplotlib to visualize data
  • Use Seaborn to explore data
  • Import data from web with pandas-datareader and compare development indicators from the World Bank

Introduction to Jupyter Notebook and Python

#Introduction-to-Jupyter-Notebook-and-Python
  • Markdown cells, markdown, and some HTML
  • Python as calculator
  • Libraries, importing, abbreviating, standard functions
  • Basic Plots with matplotlib

Markdown Cells

An important component of the Jupyter notebooks are Markdown cells. These cells allow you to type, write mathematics, and even render a number of html tags. We will use markdown cells to help describe our work in the Jupyter notebook.

To change a cell to a markdown cell, you can either use the menu bar, or the keyboard shortcut ctrl + m + m. From here, you can type markdown syntax, @@0@@, HTML, and even more code styles. We will typically use the features demonstrated below:

1
2
3
4
5
6
7
8
#Header 1
##Header 2
###Header 3
*italic*
**bold**
![](image/filepath.png)
@@1@@
@@2@@

Here are two cheatsheets to help you with markdown syntax and @@3@@ symbols.

One important note is to organize our files in our notebook directories. We will use the convention of having a data and image subdirectory where we will store our images and datasets. Thus, if we have a picture of a dog in our image folder, we can show this with

1
![](images/dog.png)

Basics of Python

#Basics-of-Python
  • Lists
  • Functions
  • Plots
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...

Functions

#Functions

Very important idea for us. Here, we define a function that takes some input and spits out an output. We can define these however we want, so let's examine a mathematical and non-mathematical example.

Loading output library...
Loading output library...
Loading output library...
Loading output library...

Today we will examine two different libraries for plotting with Python. The first, is the standard matplotlib library. We will continue to come back to matplotlib and it is a very powerful library. Sometimes, to harness this power requires deep understanding, however, it can do most things you'd like. Using the Jupyter notebook, we will import the library to make sure the plots stay in the notebook using a magic command, we will abbreviate the pyplot library, and import and abbreviate the numpy library.

1
2
3
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np

Now, when we go to use these libraries, we preface any function with plt or np. Here are cheatsheets for each of the libraries:

Loading output library...
Loading output library...
Loading output library...
Loading output library...

To find out more about each of these functions, we can use the built-in help. Tell me more about each of the options above by executing cells with

1
2
np.random.randn?
np.arange?
Loading output library...
Loading output library...
Loading output library...
Loading output library...

Accessing data through API

#Accessing-data-through-API

Pandas has the functionality to access certain data through a datareader. We will use the pandas_datareader to investigate information about the World Bank. For more information, please see the documentation:

http://pandas-datareader.readthedocs.io/en/latest/remote_data.html

We will explore other examples with the datareader later, but to start let's access the World Bank's data. For a full description of the available data, look over the source from the World Bank.

https://data.worldbank.org/data-catalog/world-development-indicators

Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...