Modules for Data Science Basics

#Modules-for-Data-Science-Basics

Let us start by importing the modules needed for our lesson

#Let-us-start-by-importing-the-modules-needed-for-our-lesson

Pandas is a module used for organizing and manipulating large data structures. However, there are some rules that we must follow when working with the pandas module

  • Pandas has its own specific datatype called "DataFrame", i.e. if you run type(df) and df is an instance of the panda module, its datatype will be DataFrame.
  • Inside the DataFrame you can multiple types of variables! Also, when a cell has data type object, its just a string.
  • Calling a column, you use brackets where df'columname"
  • Lastly, to create a Dataframe, the data you give it needs to be iterable! Lists, dictionaries, arrays etc.

So, let us start by creating Dataframe

Loading output library...

Awesome! We just made our first DataFrame with two columns However, to understand something you must know what is the programming heirarchy going on behind 1. Pandas is a module, duh. 2. DataFrame is an object of Pandas, think like a blueprint for building something. 3. Our df is an instance of that object, a manifestation of what DataFrame looks like. 4. Instances have methods associated with the object (Thats why we can call .head() and stuff)

Ok, so now we know whats happening underneath, but what if I want only certain columns or rows?

Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...

There is also another neat feature of DataFrames called boolean filtering. You can filter out values according to the conditions you give it

Loading output library...

To get the values, use conditions inside a bracket that indexes the DataFrame

Loading output library...