Ecommerce Email Marketing Data Project


By: Will Ro

Loading output library...



email - columns are unique ID's for users who recieved the emails. These have duplicates and these duplicates will be important in finding the timeline of actions taken in the action column.

event_date - has a detailed time of emails sent or actions taken. While email send times are quite systematic, purchase times are directly from user behavior, therefore will vary. Also, it's important to note that these are all in the same month.

event_type - delivered vs. halted will tell me if the user is BC or Control group; purchase actions will be key to defining metrics and exploration for this exercise.

action - will be a good feature to use for exploration

total - the total value has some missing values, which will be covered and cleaned.

event_type and action are categorical variables.

Below, I will make a new dataframe with only the features I want to keep, and dummifying the categorical variables. This will allow pandas to start identifying this information as ints and floats, allowing me to make visualizations and clean the data further for the next part of the assignment.

Loading output library...
Loading output library...
Loading output library...

Standardized sum of events by date (Above)


Above is a visualization of how each events fluctuated during the time period. I used the zscore of these values, which will allow me to display the events' change relative to itself, and put these three events on the same measuring board. Dotted Line at y = 0 is where the mean is, therefore falling below that line means the events were happening less than average, and vice versa.

Did certain events happen more frequently at certain times of the month?


As we can see from the visualization, emails were much more frequently sent out during the beginning of the month. Interestingly, purchases have sharper spikes during the very beginning and near the 11th but still somewhat follow the email trend.

Note: Delivered (Red) and Halted (Blue) follow very similar patterns, by nature

Loading output library...
Loading output library...

Standardized Sum of Events by Time of the Day


This is the same standardized sum of events like above, but this time by the hour of the day.

Are people more likely to purchase during certain times of the day?


Yes! At 15:00 (3:00 PM) we see an immense spike!

Loading output library...
Loading output library...

Purchase Patterns by Day of the Week


Above displays which day of the week people purchased the most. Tuesday and Wednesday look significantly higher while Saturday is especially low.

Part 2


Part 2 of the assignment is as follows:

A table that breaks down the following metrics, per action:

purchase-conversion rate for BC cohort and control cohort, % lift in purchase-conversion rate

customer-conversion rate for BC cohort and control cohort, % lift in customer-conversion rate

email-conversion rate for BC cohort and control cohort, % lift in email-conversion rate

attributed revenue per email for BC cohort and control cohort, % lift in attributed revenue per email



This whole part is possible by identifying credited emails (or, emails that were sent within 5 days before a purchase). My idea is to create 2 new columns credit_event_type_delivered and credit_event_type_halted, where it is either 0 (False) or 1 (True) if an email was sent(halted) and a purchase was made within 5 days.

Code Summary


1) make new dataframe called converted_df which has all the actions of only the email ids that made a purchase at one point.

2) Make a function (with nested for-loop) that:

Iteration through unique ids that made a purchase -> the days that they were sent an email -> checking if the purchase day was within the 5-day period -> give credit if it is

3) Use credit values to define which emails were sent within 5 days before a purchase, build metrics table with defined measurements

Loading output library...
Loading output library...

Metrics defined


credit features made above identifies emails that were sent 5 days before a purchase was made.

purchase-conversion rate = sum of purchases made after credit emails / number of unique ids that had a deliver(halt)

customer-conversions rate = unique ids that had a credited email / number of unique ids with a deliver(halt)

email-conversion rate = sum of credit / total sum of delivered(halt)

attributed revenue per email = sum of total per customer / sum of total credit per customer

Loading output library...
Loading output library...

Metrics table


Above is the metrics table, where different measurement is divided up by experimental and control groups to distinguish lift in the two groups.

The conversion rate metrics actually do not see a positive lift. However, the attributed revenue per email is significantly higher than the control group, which I can assume means that the emails do help customers spend more.