Panda

Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures. The name Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data.

imports

#imports

Let's start with the imports Customarily, we import as follows:

Dataframe

#Dataframe

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.

dataframe

Dataset

#Dataset

For the purpose of this tutorial we will be using the pokemon dataset. You can download it for your own purposes here

Pokemon

Google Drive

#Google-Drive

Since this notebook is hosted on google colab, I am using my google drive to load my dataset.

Loading the dataset to a dataframe

#Loading-the-dataset-to-a-dataframe

to read the csv instead of

1
2
3
import csv
with open('employee_birthday.txt') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')

We will just use the pandas read_csv function. This will directly load the csv into a dataframe. Other such functions include read_json,read_sql,_read_html, etc.

Learning about the dataset

#Learning-about-the-dataset
Loading output library...
Loading output library...
Loading output library...

Pokemon Types

#Pokemon-Types
Loading output library...
Loading output library...

If you want to get the value count for each speed use the value_counts() function

Loading output library...

As you can see the speed value 50 is the most common among pokemons

Fastest pokemon

#Fastest-pokemon

to get the row number of the fastest pokemon we use

1
df['Speed'].idxmax()

This will return only the row index, to get the entire row we use -

1
df.iloc[df['Speed'].idxmax()]

Its basically getting the row using the row number like we get an array0 element.

Loading output library...

DeoxysSpeed Fastest pokemon

Greatest attack

#Greatest-attack

We can do the same to find the pokemon with the greatest attack power which is mewtwo X

Mewtwo X

Loading output library...

legendary pokemon

#legendary-pokemon

alt text

  • You might have noticed a column called legendary in the dataset.
  • The value of that column is True if th particular pokemon is legendary.
  • Let's select only legendary pokemon

Now this is a list of True and False values telling us the rows which are legendary and which are not. To get the actual dataframe for jsut the legendary pokemon.

Loading output library...

Deleting a column

#Deleting-a-column
  • we can use drop
Loading output library...

Deleting a row

#Deleting-a-row

use axis= 0 to delete rows

Select

#Select

By column

#By-column
Loading output library...

By index

#By-index
Loading output library...
Loading output library...
Loading output library...

Filtering

#Filtering
  • To filter data in a dataframe you can use logical operators
Loading output library...

Replacing NaN values

#Replacing-NaN-values
  • As you can see in the above filter, some of the values in Type 2 are not available and are labelled as NaN
  • Let's replace the NaN values in Type 2 with the values in Type 1
  • pandas primarily uses the value np.nan to represent missing data.

Let's try the filter again after the replacement

Loading output library...

Groupby

#Groupby
  • The groupby function - groups series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns.
  • Let us group our pokemon by Generation and Type
Loading output library...

Adding a column

#Adding-a-column
  • You might have noticed a slight issue with the Mega pokemon names
  • Things like PidgeotMega or BlastoiseMega Blastoise
  • If you didn't then look at the output above ^
  • Let us create a new row called fixed names
  • That converts names like - BlastoiseMega Blastoise to just Mega Blastoise
  • For this we will be using regex
Loading output library...
  • Alernatively we can use a lambda function to do the above task
Loading output library...

sorting

#sorting
  • Let's list out the strongest pokemon
Loading output library...

Strongest pokemon for each type

#Strongest-pokemon-for-each-type
  • Drop all duplicates for each type and keep the top most row
Loading output library...