How do I grab just the parsed Infobox of a wikipedia article?

后端 未结 8 1660
萌比男神i
萌比男神i 2020-12-16 04:05

I\'m still stuck on my problem of trying to parse articles from wikipedia. Actually I wish to parse the infobox section of articles from wikipedia i.e. my application has re

相关标签:
8条回答
  • 2020-12-16 04:25

    There is a number of semantic data providers from which you can extract structured data instead of trying to parse it manually:

    • DbPedia - as already mentioned provides SPARQL endpoint which could be use for data queries. There is a number of libraries available for multiple platforms, including PHP.

    • Freebase - another creative commons data provider. Initial dataset is based on parsed Wikipedia data, but there is some information taken from other sources. Data set could be edited by anyone and, in contrast to Wikipedia, you can add your own data into your own namespace using custom defined schema. Uses its own query language called MQL, which is based on JSON. Data has WebID links back to correspoding Wikipedia articles. Free base also provides number of downloadable data dumps. Freebase has number of client libraries including PHP.

    • Geonames - database of geographical locations. Has API which provides Country and Region information for given coordinates, nearby locations (e.g. city, railway station, etc.)

    • Opensteetmap - community built map of the world. Has API allowing to query for objects by location and type.

    • Wikimapia API - another location service

    0 讨论(0)
  • 2020-12-16 04:25

    To update this a bit: a lot of data in Wikipedia infoboxes are now taken from Wikidata, which is a free database of structured information. See data page for Germany for example, and https://www.wikidata.org/wiki/Wikidata:Data_access for information on how to access the data programatically.

    0 讨论(0)
提交回复
热议问题