Profitable App Profiles for the App Store and Google Play Markets


This project is analysing what profiles profitable applications have in common in App Store and Google Play Markets. Two major application stores. As of 2018, there are 4 million apps in App Store and Google Play market. It requires significant time and money to analyze them. For that we are going to sample them and analyze. I'm going to use

  • 10,000 Android apps from Google Play
  • 7,000 iOS apps from App Store

The goal of the project is to identify what are the traits of profitable applications and define the strategy that we can create one.

Data Cleansing


We are only analyzing the apps for English speaking audience and free apps only. We are going to remove:

  • Non-English apps.
  • apps that aren't free.

Duplicate Data


Defining duplicate data


As the following example, Instagram, we can see that there are duplicates in the data. There is only one instagram app but there are four record of it. One way to define the most recent data is the number of reviews. The assumption is that the more reviews it has, the more recent data it is.

The steps to identifying duplicates

  • create empty list for duplicate apps and uniques apps
  • Loop through the data, extract the name of the app.
  • If app is in unique_app, append to duplicate apps list.
  • If app is not in unique apps list, append to unique apps list.

The steps to identifying the most recent data

  • create empty dictionary for the review counts
  • loop through the data
  • assign name and n_reviews. Because all data is in character, we converted it to float
  • If max review count is smaler than current review count, assign current review count to the max review count
  • If the record is not in the dictionary, simply add it to the dictionary
Loading output library...

Steps to store data without duplicate

  • create empty list to store the clean list and name of the app already added
  • assign name and review count into variables
  • If name is not in already added list, check if the review counts matches the max review count.
  • If so append to the lists. one app data, the other one name only

Removing the data that are not in English


Filtering free apps only


The goal is to determine what are the profiles that are popular in both platforms. Because the more users you attract, the more likely your app will be profitable. To minimize the risks and overhead, the validation strategy for an apps is:

  • Build a minimal Android version of the app, and add it to Google Play
  • If the app has a good response from users, develop it further
  • If the app is profitable after six months, build an iOS version of the app and add it to App Store

It looks like they both provide Genre or Category columns

Frequency Table


From the result above we can say

  • The most common genre in App Store is Games.
  • Even looking at other genres, most apps are developed for entertainment (Games and Entertainment takes up about half of the appplication genre)
  • Generally speaking there are more apps for entertainment because that's what people want. However, that means also there are lots of apps out there already for it. So it doesn't mean that the app in entertainment category is going to be profitable.

Compare to iOS data, it's much more messy to navigate. This seems more detailed data as well. For instance, most litkely Roll Playing, Strategy, Adventure, and ect would be categorized in iOS as Games.

Now this is more tidy data. What we see is that:

  • Game is also one of the most common genre in Android.
  • Family is number one before anything, but what makes an app as family is vague.
  • From the analysis so far, Game is the most popular apps from the both market.

One way to see how many users there are is see how many installations there have been. This data is missing in iOS data. The next best thing we can use is total number of rating.

It's interesting. There are lots of games out there. However, what people use most is Social Networking in App Store. M

Now we are going to look at android data. It does have installation number. However it's not precise. The values are in the format of 100+, 1000+, 5000+, etc. We don't actually know exact number. 5000+ could mean 6000, 7000, or 9999. But for our purpose here it could be enough.

Loading output library...

The result is the same as iOS. The most popular catego