问题
I looked at the Heritrix documentation website, and they listed a Python .ARC file reader. However, it is 404 not found when I clicked on it. http://crawler.archive.org/articles/developer_manual/arcs.html
Does anyone else know any Heritrix ARC reader that uses Python?
(I asked this question before, but closed it due to inaccuracy)
回答1:
Nothing a little Googling can't find: http://archive-access.cvs.sourceforge.net/viewvc/archive-access/archive-access/projects/hedaern/
来源:https://stackoverflow.com/questions/1575442/how-to-read-arc-files-from-the-heritrix-crawler-using-python