Parsing a Wikipedia dump
For example using this Wikipedia dump: http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=lebron%20james&rvprop=content&redirects=true&format=xmlfm Is there an existing library for Python that I can use to create an array with the mapping of subjects and values? For example: {height_ft,6},{nationality, American} It looks like you really want to be able to parse MediaWiki markup. There is a python library designed for this purpose called mwlib . You can use python's built-in XML packages to extract the page content from the API's response, then pass that content into mwlib's