Simple web crawler in C#

前端 未结 4 773
囚心锁ツ
囚心锁ツ 2020-12-04 18:55

I have created a simple web crawler but i want to add the recursion function so that every page that is opened i can get the urls in this page,but i have no idea how i can d

4条回答
  •  粉色の甜心
    2020-12-04 19:20

    I fixed your GetContent method as follow to get new links from crawled page:

    public ISet GetNewLinks(string content)
    {
        Regex regexLink = new Regex("(?<= newLinks = new HashSet();    
        foreach (var match in regexLink.Matches(content))
        {
            if (!newLinks.Contains(match.ToString()))
                newLinks.Add(match.ToString());
        }
    
        return newLinks;
    }
    

    Updated

    Fixed: regex should be regexLink. Thanks @shashlearner for pointing this out (my mistype).

提交回复
热议问题