C# web and ftp crawler library
问题 I need a library (hopefully in C#!) which works as a web crawler to access HTTP files and FTP files. In principle, I'm happy with reading HTML, I want to extend it to PDF, WORD, etc.. I'm happy with a starter's open source software or at least any directions for documentation. 回答1: Check NCrawler project Simple and very efficient multithreaded web crawler with pipeline based processing written in C#. Contains HTML, Text, PDF, and IFilter document processors and language detection(Google).