I'm using this code to change the href attribute of a HTML stream.
first I download a full html page using this code:(URL is webpage address)
HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(URL);
HttpWebResponse myHttpWebResponse =
(HttpWebResponse)myHttpWebRequest.GetResponse();
Stream s = myHttpWebResponse.GetResponseStream();
then I process this:
HtmlDocument doc = new HtmlDocument();
doc.Load(s);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("/a"))
{
string att = link.Attributes["href"].Value;
link.Attributes["href"].Value = "http://ahmadalli.somee.com/default.aspx?url=" + att;
}
doc.Save(s);
s is html stream.
but I've got an exception that says doc.DocumentNode is null!
i tried many sites but doc.DocumentNode is null to
This works for me.
using(WebClient client = new WebClient())
{
client.Encoding = System.Text.Encoding.UTF8;
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(client.DownloadString("http://www.google.com?q=stackoverflow"));
foreach (var href in doc.DocumentNode.Descendants("a").Select(x => x.Attributes["href"]))
{
if (href == null) continue;
href.Value = "http://ahmadalli.somee.com/default.aspx?url=" + HttpUtility.UrlEncode(href.Value);
}
StringWriter writer = new StringWriter();
doc.Save(writer);
var finalHtml = writer.ToString();
}
Also see the HttpUtility.UrlEncode to be able to get the url back correctly. Otherwise, some parameters in original url may cause problem.
Use HttpUtility.UrlDecode to decode it.
Try using //a instead of /a.
In XPath, this basically means give me all the links in the document, as opposed to give me all the links in the document root.
Update:
The following code works fine:
var myHttpWebRequest = (HttpWebRequest)WebRequest.Create("http://google.com");
var myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();
var s = myHttpWebResponse.GetResponseStream();
var doc = new HtmlDocument();
doc.Load(s);
foreach (var link in doc.DocumentNode.SelectNodes("//a"))
{
var att = link.Attributes["href"].Value;
link.Attributes["href"].Value = "http://ahmadalli.somee.com/default.aspx?url=" + att;
Console.WriteLine(link.Attributes["href"].Value);
}
Here is your answer: HTML Agility Pack Null Reference.
Try using the below code:
HtmlDocument htmlDoc = new HtmlDocument
{
OptionAddDebuggingAttributes = false,
OptionAutoCloseOnEnd = true,
OptionFixNestedTags = true,
OptionReadEncoding = true
};
try
{
using (Stream reader = myHttpWebResponse.GetResponseStream())
{
reader.Seek(0, SeekOrigin.Begin);
htmlDoc.Load(reader, true);
}
HtmlNode node = htmlDoc.DocumentNode;
if (node != null)
{
foreach (var href in doc.DocumentNode.Descendants("a").Select(x =>x.Attributes["href"]))
{
href.Value = "http://ahmadalli.somee.com/default.aspx?url=" +HttpUtility.UrlEncode(href.Value);
}
}
}
catch { }
I am using HtmlAgility pack version: 1.4.0
Solved your problem? If no, please comment. Else mark as answer.
Anchor tag reference is an incorrectly escaped string:
...doc.DocumentNode.SelectNodes("/a") //incorrect
...doc.DocumentNode.SelectNodes("//a") //correct
...doc.DocumentNode.SelectNodes(@"/a") //also correct
The original code fails to select any nodes and evaluates to null; this should be checked against to prevent failing on, say, a document where there are no links at all (however unlikely that is :)
var anchors = doc.DocumentNode.SelectNodes("//a");
if (anchors != null)
{
foreach (HtmlNode link in anchors)
{
/*do stuff*/
}
}
来源:https://stackoverflow.com/questions/9139156/why-html-agility-pack-htmldocument-documentnode-is-null