This file appears to be in a binary XML format. What is this format and how can it be parsed programmatically (as opposed to using the aapt dump tool in the SDK)?
What about using the Android Asset Packaging Tool (aapt), from the Android SDK, into a Python (or whatever) script?
Through the aapt (http://elinux.org/Android_aapt), indeed, you can retrieve information about the .apk package and about its AndroidManifest.xml file. In particular, you can extract the values of individual elements of an .apk package through the 'dump' sub-command. For example, you can extract the user-permissions in the AndroidManifest.xml file inside an .apk package in this way:
$ aapt dump permissions package.apk
Where package.apk is your .apk package.
Moreover, you can use the Unix pipe command to clear the output. For example:
$ aapt dump permissions package.apk | sed 1d | awk '{ print $NF }'
Here a Python script that to that programmatically:
import os
import subprocess
#Current directory and file name:
curpath = os.path.dirname( os.path.realpath(__file__) )
filepath = os.path.join(curpath, "package.apk")
#Extract the AndroidManifest.xml permissions:
command = "aapt dump permissions " + filepath + " | sed 1d | awk '{ print $NF }'"
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=None, shell=True)
permissions = process.communicate()[0]
print permissions
In a similar fashion you can extract other information (e.g. package, app name, etc...) of the AndroidManifest.xml:
#Extract the APK package info:
shellcommand = "aapt dump badging " + filepath
process = subprocess.Popen(shellcommand, stdout=subprocess.PIPE, stderr=None, shell=True)
apkInfo = process.communicate()[0].splitlines()
for info in apkInfo:
#Package info:
if string.find(info, "package:", 0) != -1:
print "App Package: " + findBetween(info, "name='", "'")
print "App Version: " + findBetween(info, "versionName='", "'")
continue
#App name:
if string.find(info, "application:", 0) != -1:
print "App Name: " + findBetween(info, "label='", "'")
continue
def findBetween(s, prefix, suffix):
try:
start = s.index(prefix) + len(prefix)
end = s.index(suffix, start)
return s[start:end]
except ValueError:
return ""
If instead you want to parse the entire AndroidManifest XML tree, you can do that in a similar way using the xmltree command:
aapt dump xmltree package.apk AndroidManifest.xml
Using Python as before:
#Extract the AndroidManifest XML tree:
shellcommand = "aapt dump xmltree " + filepath + " AndroidManifest.xml"
process = subprocess.Popen(shellcommand, stdout=subprocess.PIPE, stderr=None, shell=True)
xmlTree = process.communicate()[0]
print "Number of Activities: " + str(xmlTree.count("activity"))
print "Number of Services: " + str(xmlTree.count("service"))
print "Number of BroadcastReceivers: " + str(xmlTree.count("receiver"))