In this project, we'll pretend we're working as data analysts for a company that builds Android and iOS mobile apps. We make our apps available on Google Play and in the App Store.
We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that the number of users of our apps determines our revenue for any given app. Our goal is to analyze data to help our developers understand what type of apps are likely to attract more users.
First, we'll download and read in data about app sales in the Google Play store and the Apple app store.
# Download Google Play store data
from pyodide.http import pyfetch
google = await pyfetch("https://dq-marketing-site.s3.amazonaws.com/googleplaystore.csv")
with open("googleplaystore.csv", "wb") as f:
f.write(await google.bytes())
# Download Apple store data
apple = await pyfetch("https://dq-marketing-site.s3.amazonaws.com/AppleStore.csv")
with open("AppleStore.csv", "wb") as f:
f.write(await apple.bytes())
# Load the data into pandas
import pandas as pd
google = pd.read_csv('googleplaystore.csv')
apple = pd.read_csv('AppleStore.csv')
google.head()
| App | Category | Rating | Reviews | Size | Installs | Type | Price | Content Rating | Genres | Last Updated | Current Ver | Android Ver | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Photo Editor & Candy Camera & Grid & ScrapBook | ART_AND_DESIGN | 4.1 | 159 | 19M | 10,000+ | Free | 0 | Everyone | Art & Design | January 7, 2018 | 1.0.0 | 4.0.3 and up |
| 1 | Coloring book moana | ART_AND_DESIGN | 3.9 | 967 | 14M | 500,000+ | Free | 0 | Everyone | Art & Design;Pretend Play | January 15, 2018 | 2.0.0 | 4.0.3 and up |
| 2 | U Launcher Lite – FREE Live Cool Themes, Hide ... | ART_AND_DESIGN | 4.7 | 87510 | 8.7M | 5,000,000+ | Free | 0 | Everyone | Art & Design | August 1, 2018 | 1.2.4 | 4.0.3 and up |
| 3 | Sketch - Draw & Paint | ART_AND_DESIGN | 4.5 | 215644 | 25M | 50,000,000+ | Free | 0 | Teen | Art & Design | June 8, 2018 | Varies with device | 4.2 and up |
| 4 | Pixel Draw - Number Art Coloring Book | ART_AND_DESIGN | 4.3 | 967 | 2.8M | 100,000+ | Free | 0 | Everyone | Art & Design;Creativity | June 20, 2018 | 1.1 | 4.4 and up |
apple.head()
| id | track_name | size_bytes | currency | price | rating_count_tot | rating_count_ver | user_rating | user_rating_ver | ver | cont_rating | prime_genre | sup_devices.num | ipadSc_urls.num | lang.num | vpp_lic | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 284882215 | 389879808 | USD | 0.0 | 2974676 | 212 | 3.5 | 3.5 | 95.0 | 4+ | Social Networking | 37 | 1 | 29 | 1 | |
| 1 | 389801252 | 113954816 | USD | 0.0 | 2161558 | 1289 | 4.5 | 4.0 | 10.23 | 12+ | Photo & Video | 37 | 0 | 29 | 1 | |
| 2 | 529479190 | Clash of Clans | 116476928 | USD | 0.0 | 2130805 | 579 | 4.5 | 4.5 | 9.24.12 | 9+ | Games | 38 | 5 | 18 | 1 |
| 3 | 420009108 | Temple Run | 65921024 | USD | 0.0 | 1724546 | 3842 | 4.5 | 4.0 | 1.6.2 | 9+ | Games | 40 | 5 | 1 | 1 |
| 4 | 284035177 | Pandora - Music & Radio | 130242560 | USD | 0.0 | 1126879 | 3594 | 4.0 | 4.5 | 8.4.1 | 12+ | Music | 37 | 4 | 1 | 1 |
Before beginning our analysis, we need to make sure the data we analyze is accurate, or the results of our analysis will be wrong. This means that we need to do the following:
First, let's remove any apps that have a higher rating than would be valid on the stores.
google = google[google["Rating"] <= 5]
apple = apple[apple["user_rating"] <=5]
Next, let's remove duplicate apps, and only keep the app with the highest rating.
google = google.groupby("App").apply(lambda x: x.sort_values("Reviews", ascending=False).iloc[0,:])
apple = apple.groupby("track_name").apply(lambda x: x.sort_values("rating_count_tot", ascending=False).iloc[0,:])
Recall that at our company, we only build apps that are free to download and install, and we design them for an English-speaking audience. This means that we'll need to do the following:
Let's remove non-English apps from the data sets. We'll do this by removing any apps that have 3 or more non-English characters in their names. This is because some English apps have 1-2 non-English characters in their names.
google = google[google["App"].apply(lambda x: sum([ord(c) > 127 for c in x]) < 3)]
apple = apple[apple["track_name"].apply(lambda x: sum([ord(c) > 127 for c in x]) < 3)]
Finally, we'll remove all the paid apps, so we're left only with free apps, like the ones our company makes.
google = google[google["Price"] == "0"]
apple = apple[apple["price"] == 0.0]
google
| App | Category | Rating | Reviews | Size | Installs | Type | Price | Content Rating | Genres | Last Updated | Current Ver | Android Ver | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| App | |||||||||||||
| +Download 4 Instagram Twitter | +Download 4 Instagram Twitter | SOCIAL | 4.5 | 40467 | 22M | 1,000,000+ | Free | 0 | Everyone | Social | August 2, 2018 | 5.03 | 4.1 and up |
| - Free Comics - Comic Apps | - Free Comics - Comic Apps | COMICS | 3.5 | 115 | 9.1M | 10,000+ | Free | 0 | Mature 17+ | Comics | July 13, 2018 | 5.0.12 | 5.0 and up |
| .R | .R | TOOLS | 4.5 | 259 | 203k | 10,000+ | Free | 0 | Everyone | Tools | September 16, 2014 | 1.1.06 | 1.5 and up |
| /u/app | /u/app | COMMUNICATION | 4.7 | 573 | 53M | 10,000+ | Free | 0 | Mature 17+ | Communication | July 3, 2018 | 4.2.4 | 4.1 and up |
| 058.ba | 058.ba | NEWS_AND_MAGAZINES | 4.4 | 27 | 14M | 100+ | Free | 0 | Everyone | News & Magazines | July 6, 2018 | 1.0 | 4.2 and up |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Аim Training for CS | Аim Training for CS | GAME | 3.6 | 2328 | 12M | 100,000+ | Free | 0 | Everyone | Action | October 25, 2014 | 1.8 | 2.3 and up |
| 【Miku AR Camera】Mikuture | 【Miku AR Camera】Mikuture | FAMILY | 4.4 | 36268 | 41M | 1,000,000+ | Free | 0 | Teen | Entertainment | April 25, 2017 | 3.0.15 | 4.2 and up |
| 漫咖 Comics - Manga,Novel and Stories | 漫咖 Comics - Manga,Novel and Stories | COMICS | 4.1 | 12088 | 21M | 1,000,000+ | Free | 0 | Mature 17+ | Comics | July 6, 2018 | 2.3.1 | 4.0.3 and up |
| 💘 WhatsLov: Smileys of love, stickers and GIF | 💘 WhatsLov: Smileys of love, stickers and GIF | SOCIAL | 4.6 | 22098 | 18M | 1,000,000+ | Free | 0 | Everyone | Social | July 24, 2018 | 4.2.4 | 4.0.3 and up |
| 🔥 Football Wallpapers 4K | Full HD Backgrounds 😍 | 🔥 Football Wallpapers 4K | Full HD Backgrounds 😍 | ENTERTAINMENT | 4.7 | 11661 | 4.0M | 1,000,000+ | Free | 0 | Everyone | Entertainment | July 14, 2018 | 1.1.3.2 | 4.0.3 and up |
7551 rows × 13 columns
apple
| id | track_name | size_bytes | currency | price | rating_count_tot | rating_count_ver | user_rating | user_rating_ver | ver | cont_rating | prime_genre | sup_devices.num | ipadSc_urls.num | lang.num | vpp_lic | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| track_name | ||||||||||||||||
| ! OH Fantastic Free Kick + Kick Wall Challenge | 883539642 | ! OH Fantastic Free Kick + Kick Wall Challenge | 162557952 | USD | 0.0 | 0 | 0 | 0.0 | 0.0 | 4.0 | 4+ | Games | 40 | 5 | 2 | 1 |
| *Solitaire* | 1140586546 | *Solitaire* | 124961792 | USD | 0.0 | 460 | 10 | 4.5 | 4.5 | 2.3.1 | 4+ | Games | 37 | 5 | 1 | 1 |
| . Calculator . | 662749248 | . Calculator . | 59232256 | USD | 0.0 | 1525 | 352 | 4.5 | 5.0 | 3.15 | 4+ | Utilities | 37 | 5 | 31 | 1 |
| 1+2=3 | 953831664 | 1+2=3 | 21727232 | USD | 0.0 | 2816 | 37 | 4.0 | 3.5 | 9.13.0 | 4+ | Games | 37 | 4 | 1 | 1 |
| 1-Bit Rogue: A dungeon crawler RPG! | 1128070374 | 1-Bit Rogue: A dungeon crawler RPG! | 64439296 | USD | 0.0 | 378 | 106 | 4.5 | 4.5 | 1.3 | 12+ | Games | 38 | 3 | 1 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 秒速 | 1095603248 | 秒速 | 13033472 | USD | 0.0 | 0 | 0 | 0.0 | 0.0 | 1.0.1 | 4+ | Games | 38 | 5 | 2 | 1 |
| 花札Online | 1078812538 | 花札Online | 209218560 | USD | 0.0 | 0 | 0 | 0.0 | 0.0 | 1.1.20 | 12+ | Games | 40 | 5 | 1 | 1 |
| 豆瓣 | 907002334 | 豆瓣 | 109557760 | USD | 0.0 | 407 | 0 | 3.5 | 0.0 | 4.18.1 | 12+ | Social Networking | 37 | 5 | 2 | 1 |
| 铁路12306 | 564818797 | 铁路12306 | 28961792 | USD | 0.0 | 177 | 0 | 2.0 | 0.0 | 2.80 | 4+ | Travel | 38 | 0 | 1 | 1 |
| 飞猪 | 453691481 | 飞猪 | 148888576 | USD | 0.0 | 154 | 0 | 4.0 | 0.0 | 8.2.2 | 17+ | Travel | 37 | 0 | 1 | 1 |
3201 rows × 16 columns
We're now left with a set of apps that we can analyze to determine profit.
Sign up for Dataquest to continue this and dozens of other exciting projects!