How to build wikipedia category hierarchy?

孤者浪人 提交于 2019-12-05 06:09:47
kane

Yes, it turns out this stackoverflow answer was right. It referenced the right datasets, but I was too dense to understand how to relate them together.

Thanks to @svick for leading me through the individual steps in a private chat.

For the benefit of others, I've explicitly detailed the relationship between the data sets and the exact steps to traverse the graph in my blog, which is a summary of our private chat.

Parsing Wikipedia Page Hierarchy

I met the same problem for japanese wikipedia.

I solved this problem as follows:

  • get sql for category, categorylinks, page and import to my mysql server.
  • run the following command. You can get subcategories of '学問'.
    MariaDB [wikipedia]> select page.page_title from categorylinks join page on page.page_id = categorylinks.cl_from join category on categorylinks.cl_to = category.cat_title where categorylinks.cl_type = 'subcat' and category.cat_title like '学問';
+-----------------------------------+
| page_title                        |
+-----------------------------------+
| 学問の分野                        |
| 科学                              |
| 学問スタブ                        |
| 架空の思想・学問                  |
| 学者                              |
| 学術出版                          |
| 学術称号                          |
| 学術団体                          |
| 学生                              |
| 学派                              |
| 学問の賞                          |
| 研究                              |
| 高等教育                          |
| 知識                              |
| 問題                              |
| ルネサンス・ユマニスム            |
+-----------------------------------+
16 rows in set (0.00 sec)
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!