Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures. The name Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data.



Let's start with the imports Customarily, we import as follows:



A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.




For the purpose of this tutorial we will be using the pokemon dataset. You can download it for your own purposes here


Google Drive


Since this notebook is hosted on google colab, I am using my google drive to load my dataset.

Loading the dataset to a dataframe


to read the csv instead of

import csv
with open('employee_birthday.txt') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')

We will just use the pandas read_csv function. This will directly load the csv into a dataframe. Other such functions include read_json,read_sql,_read_html, etc.

Learning about the dataset

Loading output library...
Loading output library...
Loading output library...

Pokemon Types

Loading output library...
Loading output library...

If you want to get the value count for each speed use the value_counts() function

Loading output library...

As you can see the speed value 50 is the most common among pokemons

Fastest pokemon


to get the row number of the fastest pokemon we use


This will return only the row index, to get the entire row we use -


Its basically getting the row using the row number like we get an array0 element.

Loading output library...

DeoxysSpeed Fastest pokemon

Greatest attack


We can do the same to find the pokemon with the greatest attack power which is mewtwo X

Mewtwo X

Loading output library...

legendary pokemon


alt text

  • You might have noticed a column called legendary in the dataset.
  • The value of that column is True if th particular pokemon is legendary.
  • Let's select only legendary pokemon

Now this is a list of True and False values telling us the rows which are legendary and which are not. To get the actual dataframe for jsut the legendary pokemon.

Loading output library...

Deleting a column

  • we can use drop
Loading output library...

Deleting a row


use axis= 0 to delete rows



By column

Loading output library...

By index

Loading output library...
Loading output library...
Loading output library...


  • To filter data in a dataframe you can use logical operators
Loading output library...

Replacing NaN values

  • As you can see in the above filter, some of the values in Type 2 are not available and are labelled as NaN
  • Let's replace the NaN values in Type 2 with the values in Type 1
  • pandas primarily uses the value np.nan to represent missing data.

Let's try the filter again after the replacement

Loading output library...


  • The groupby function - groups series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns.
  • Let us group our pokemon by Generation and Type
Loading output library...

Adding a column

  • You might have noticed a slight issue with the Mega pokemon names
  • Things like PidgeotMega or BlastoiseMega Blastoise
  • If you didn't then look at the output above ^
  • Let us create a new row called fixed names
  • That converts names like - BlastoiseMega Blastoise to just Mega Blastoise
  • For this we will be using regex
Loading output library...
  • Alernatively we can use a lambda function to do the above task
Loading output library...


  • Let's list out the strongest pokemon
Loading output library...

Strongest pokemon for each type

  • Drop all duplicates for each type and keep the top most row
Loading output library...