Where's the bug in this tree traversal code?

风流意气都作罢 提交于 2019-12-05 19:05:37

First, note that everything goes according to plan as long as we're iterating over elements that don't have any sibling. As soon as we hit element <c>, things start going haywire. It's also interesting that <c>, <d> and <e> all think they contain <f>'s children.

Let's take a closer look at your call to SelectMany():

_context.SelectMany(n => n.ChildNodes)

That iterates through the items in _context and accumulates the child nodes of each item. Let's have a look at _context. Everything should be okay, since it's a List of length 1. Or is it?

I suspect your SharpQuery(string) constructor stores sibling elements inside the same list. In that case, _context may not be of length 1 anymore and, remember, SelectMany() accumulates the child nodes of each item in the list.

For example, if _context is a list containing <c>, <d>, <e> and <f>, only <f> has children, and SelectMany() is called once for each element, it will accumulate and return the children of <f> four times.

I think that's your bug.

EDIT: Since you've posted the full class, I don't have to guess anymore. Look at the sequence of operations when you iterate over <b> (iterators replaced by lists for better comprehension):

  1. Call Children() on <b>, which returns <c>, <d>, <e> and <f>,
  2. Call Traverse() on the resulting list:
    1. Call Children() on <c>, but _context actually contains <c>, <d>, <e> and <f>, not only <c>, so that returns <g>, <h> and <i>,
    2. Call Children() on <d>, same thing since _context is the same for both <c> and <d> (and <e>, and <f>),
    3. Lather, rinse, repeat.

Children() is broken, it selects children of siblings as well. I'd rewrite to something like this:

public IEnumerable<HtmlNode> Traverse(IEnumerable<HtmlNode> nodes)
{
    foreach (var node in nodes)
    {
        yield return node;
        foreach (var child in Traverse(ChildNodes(node)))
            yield return child;
    }
}

private IEnumerable<HtmlNode> ChildNodes(HtmlNode node)
{
    return node.ChildNodes.Where(n => n.NodeType == HtmlNodeType.Element);
}


public SharpQuery(IEnumerable<HtmlNode> nodes, SharpQuery previous = null)
{
    if (nodes == null) throw new ArgumentNullException("nodes");
    _previous = previous; // Is this necessary?
    _context = new List<HtmlNode>(nodes);
}

Shouldn't this be enough?

public IEnumerable<HtmlNode> Traverse()
{
    foreach (var node in _context)
    {
        Console.WriteLine(node.Name);
        yield return node;
        foreach (var child in Children().Traverse())
            {} //yield return child;
    }
}

I can't tell exactly, but you can see the pattern is that once your stuff encounter the "/" of the "c", it start to consider the "g,h,i" to be part of every subsequent letter until it encounter the "/" of the "f".

Most likely you have an extra loop that you should get ride of.

Or for some reason, it doesnt compute correctly the "/>" parts.

I would do something like this and see if it fixes the issue:

public IEnumerable<HtmlNode> Traverse()
{
foreach (var node in _context)
{
    Console.WriteLine(node.Name);
    if(!node.HasChildren) {
     yield return node;
    }
    else {
    foreach (var child in Children().Traverse())
        yield return child;
    }
    }
}
}
mpen
public IEnumerable<HtmlNode> All()
{
    var queue = new Queue<HtmlNode>(_context);

    while (queue.Count > 0)
    {
        var node = queue.Dequeue();
        yield return node;
        foreach (var next in node.ChildNodes.Where(n => n.NodeType == HtmlNodeType.Element))
            queue.Enqueue(next);
    }
}

courtesy link

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!