In this article we will see how to download hundreds of applications from Google Play Store. If you are asking why would someone would want to download a high volume of APKs, I cant really find a lot of answers for that, but on my case as part of an experiment I am involved, our goal was to get a bunch of Android applications (> 10k) and do some static and dynamic analysis on them. An amazing project offering such capabilities, is Androzoo, which contains a vast amount of applications from different marketplaces and with different ranking based on Antivirus analysis. The reason why Androzoo was not enough as a source of APKs, was that it provided random applications that would reflect the average situation of applications in the marketplace of choice. I wanted to also have, in the experiment we are running, an image of the status of the top apps used currently around the world. These would be the top apps sorted by downloads as Google Play Store presents them!
Another awesome github project is the PlaystoreDownloader, which is a command line tool allowing you to download an APK simply by specifying its package name! The idea was to list the package names of the top N applications and then use the tool to download all of them. Luckily, a script to list the top applications from different categories was already available within the tool, here.
In order to setup the tool it is required to do some one-time preparation which I will not go through in this article, but you can find it in the Github page of the tool. The basic steps are to add your account in an emulator and then extract the
Google Service Framework (GSF).
Preparing the APK list
So, lets get started! The first thing to do is to generate a list of APKs that will be downloaded. We are going to do this by using the aforementioned script crawl_top_apps_by_category.py.
Feel free to edit the number of top applications that will be retrieved per category, in our case I have put that to
top_num = 1 for cat in store_categories: doc = api.list_app_by_category(cat, "apps_topselling_free", top_num).doc for app in doc.child if doc.docid else doc.child.child: downloads = app.details.appDetails.numDownloads rating = app.aggregateRating.starRating
The command we need to run now from the root of the project is the following, accompanied with an example output:
➜ PlaystoreDownloader git:(master) ✗ python3 -m scripts.crawl_top_apps_by_category com.google.android.apps.messaging|COMMUNICATION|1,000,000,000+ downloads|4.492577075958252 org.findmykids.app|PARENTING|10,000,000+ downloads|4.601987361907959 de.wetteronline.wetterapp|WEATHER|50,000,000+ downloads|4.490137577056885 com.boranuonline.mydates2|DATING|1,000,000+ downloads|4.476698398590088 gr.mobile.myodos|AUTO_AND_VEHICLES|10,000+ downloads|4.179999828338623 [...]
As we can see the script returns a list of APKs with information about the package name, the category the APK is belonging, the number of downloads and the rating. As long as we have the package name it is enough to be used to download the app.
Download APKs script
So, time for the actual step of downloading the APKs. For this I wrote the following python script which was placed in the root of the project:
#!/usr/bin/env python3 import glob import subprocess import sys from random import randint from time import sleep apk_downloads_path = "Downloads/" apk_list_to_download = "exampleAPKlist.txt" # as created by the crawl_ py files # Check existing apks existing_apk_list =  for file in glob.glob(apk_downloads_path + "*.apk"): # create a list with only the package names from the filenames existing_apk_list.append(file.split('-')[-1][:-4].replace(" ", "")) print("Already existing " + str(len(existing_apk_list))) # Create a list with the package names to be downloaded. apk_list_file = open(apk_list_to_download, 'r') apk_lines = apk_list_file.readlines() apk_list =  for line in apk_lines: apk_list.append(line.split('|')) apk_list = list(set(apk_list)) # The list to be used for downloading: final_list = [x for x in apk_list if x not in existing_apk_list] print("In Total " + str(len(apk_list)) + " apks.\nTo be downloaded: " + str(len(final_list))) for apkfile in final_list: try: print("Downloading " + str(apkfile)) ret = subprocess.check_output(['python3', 'download.py', apkfile], stderr=subprocess.STDOUT) if 'Server busy, please try again later' in str(ret): print("sleeping for long now....") sleep(200) sleep(randint(1, 10)) except KeyboardInterrupt: sys.exit("existing") except: print("Something went wrong with the APK: " + str(apkfile)) sleep(60)
apk_list_to_download points to the output of the script
crawl_top_apps_by_category.The basic flow of the script is to try and create a list with the APKs already downloaded in order not to have any duplicates, then create a list with unique packages to be downloaded and as a final step before starting the download to make sure that none of the APKs in the already downloaded list are included in the final list. Then, by using
subproccess it calls for the download of each APK. Someone might argue that it could be a more pythonic approach to have the python tool imported in this script but that would require also some modifications to the tool itself and I did not like that. I preferred to pick the road of least effort here which is to simply call the tool. You will notice that there are some invocations of
sleep() in the script. I found them necessary as the server did cut me off after some time downloading APKs, while using these values I managed to download > 3K APKs without interruption.
First create an example list of APKs
➜ PlaystoreDownloader git:(master) ✗ python3 -m scripts.crawl_top_apps_by_category > exampleAPKlist.txt ➜ PlaystoreDownloader git:(master) ✗ cat exampleAPKlist.txt com.fontskeyboard.fonts|PERSONALIZATION|10,000,000+ downloads|4.581567287445068 com.boranuonline.mydates2|DATING|1,000,000+ downloads|4.476698398590088 de.wetteronline.wetterapp|WEATHER|50,000,000+ downloads|4.490135669708252
Then adjust the script with the name of our list
exampleAPKlist.txt. And then simply run it:
➜ PlaystoreDownloader git:(master) ✗ python3 bulk_downloader.py Already existing 0 In Total 3 apks. To be downloaded: 3 Downloading de.wetteronline.wetterapp Downloading com.fontskeyboard.fonts Downloading com.boranuonline.mydates2 ➜ PlaystoreDownloader git:(master) ✗ python3 bulk_downloader.py Already existing 3 In Total 3 apks. To be downloaded: 0
Running the script in the first case simply goes and downloads the APKs, while running it again identifies that the APKs are already downloaded and skips them.
PlaystoreDownloader is an awesome project, which very conveniently allows us to download an APK for the architecture we want based on the device ID from where we have put our account. Using the script shown above we can automate the process and download hundreds of APKs in just a few lines. Good luck with your experiments and I hope this has been helpful.