Three Pandas Tips For Noobs


For those new to Pandas, you'll learn a number of tips that will help with your data engineering and analysis tasks. You may find these buried in the documentation or StackOverflow posts, but I'm consolidating them here for you.

Here's what's covered:

  • Ensuring changes you make to DataFrames stick
  • Applying a function with no arguments to a DataFrame
  • Applying a function with arguments to a DataFrame

Here's the link to the original dataset we're using:

Additional Resources


This and much more is covered in my upcoming book: Python Business Intelligence Cookbook, now available for pre-order from Packt Publishing.

Import The Data


The first thing we need to do is import the data into a DataFrame. I suggest using the read_csv() method from Pandas for this.

Loading output library...

1. Ensuring Your Changes Stick


There are many ways to fill in missing (NaN) values in a DataFrame; some people use the mean of the column, others enter 0. You can do whatever you want. However, just because you tell Pandas to fill in the missing values doesn't mean the change will stick.

Let's use the fillna() method of the DataFrame and see what happens.

Loading output library...

Hrm, it looks like the DataFrame is updated, but is it? I think not!

Loading output library...

What the heck?! The missing values haven't actually been updated. So how do we make the change stick? Using the inplace=True argument like so...

Loading output library...

Success! The DataFrame has now been updated.

2. Applying a Function With No Arguments to a DataFrame


One of the reasons Pandas rocks is that you can apply a function to either a single column of a DataFrame or an entire DataFrame, using the apply() function. You'll be using this often, so here's how.

Loading output library...

According to Pandas, the Date is an object, meaning it doesn't actually see it as a date. Let's change that.

Loading output library...

Voila! Our data column is now a datetime.

3. Applying a Function With Arguments to a DataFrame


Along with applying a function to a single column, another common task is to create an additional column based on the values in two or more columns. In order to do that, we need create a function that takes multiple parameters, and then apply it to the DataFrame.

We'll be using the same apply() function we used in the previous tip, plus a little lambda magic.

Loading output library...
Loading output library...

The lambda function can easily throw you for a curve. For more information on what they are and how to use them check out the Python Tutorial: Lambda, Filter, Reduce and Map.

And Go!


With these three tips you're well on your way to data engineering your day away.