问题
I wanted to use HTMLDocument
object from mshtml
library. I was trying to assign HTML to document:
var doc = new mshtml.HTMLDocument();
var html = File.ReadAllText(@"path_to_html_file");
doc.body.innerHTML = html; // <-- this line throws error
However, I get error on the third line:
System.NullReferenceException: 'Object reference not set to an instance of an object.'
mshtml.DispHTMLDocument.body.get returned null.
I was trying to use dynamic code, but it didn't work either:
dynamic doc = Activator.CreateInstance(Type.GetTypeFromProgID("htmlfile"));
In this case I get the following error:
Microsoft.CSharp.RuntimeBinder.RuntimeBinderException:
'Cannot perform runtime binding on a null reference'
Is there some solution to overcome this problem? Thanks!
UPDATE: VBA code
Sub GetData()
Dim doc As MSHTML.HTMLDocument
Dim fso As FileSystemObject, txt As TextStream
Set doc = New MSHTML.HTMLDocument
Set fso = New FileSystemObject
Set txt = fso.OpenTextFile("path_to_html_file")
doc.body.innerHTML = txt.ReadAll() '// <-- No error here
txt.Close
End Sub
回答1:
You could cast the mshtml.HtmlDocument
to the IHTMLDocument2 interface, to have the main objects' properties and methods available:
var doc = (IHTMLDocument2)new mshtml.HTMLDocument();
Or create a HtmlDocumentClass
instance using Activator.CreateInstance()
with the Type Guid, then cast to a IHTMLDocument2
Interface.
IHTMLDocument2 doc =
(IHTMLDocument2)Activator.CreateInstance(
Type.GetTypeFromCLSID(new Guid("25336920-03F9-11CF-8FD0-00AA00686F13")));
It's more or less the same thing. I'ld prefer the first one, mainly for this reason
Then you can write to the HtmlDocument
whatever you want. For example:
doc.write(File.ReadAllText(@"[Some Html Page]"));
Console.WriteLine(doc.body.innerText);
To create a HtmlDocument, a skeleton HTML Page is enough, something like this:
string html = "<!DOCTYPE html><html><head></head><Body><p></body></html>";
doc.write(html);
Note: before a Document is created, all elements in the page will be null
.
After, you can set the Body.InnerHtml
to something else:
doc.body.innerHTML = "<P>Some Text</P>";
Console.WriteLine(doc.body.innerText);
Note that if you need to work with HTML Document more extensively, you'll have to cast to a higher level interface: IHTMLDocument3 to IHTMLDocument8 (as of now), depeding on the System version.
The classic getElementById, getElementsByName, getElementsByTagName methods are availble in the IHTMLDocument3
interface.
For example, use the getElementsByTagName()
to retrieve the InnerText
of an HTMLElement
using it's tag name:
string innerText =
(doc as IHTMLDocument3).getElementsByTagName("body")
.OfType<IHTMLElement>().First().innerText;
Note:
If you can't find the IHTMLDocument6
, IHTMLDocument7
and IHTMLDocument8
interfaces (and possibly other interfaces referenced in the MSDN Docs), then you probably have an old Type library in the \Windows\Assembly\
GAC
. Follow Hans Passant's advices to create a new Interop.mshtml
library:
How to get mshtml.IHTMLDocument6 or mshtml.IHTMLDocument7?
来源:https://stackoverflow.com/questions/54388912/cant-assign-html-to-htmldocument-body