问题
i need to upload a file to a customer site automatically, the site is protected by login credentials. Now I've a really big problem because the login page (and probably the rest of the site..) have a malformed HTML. How can i handle this pages? seem that casperJS not be able to handle the malformed HTML
Malformed HTML EXAMPLE (this is the site page cleaned up a bit but with original problems like tr or td not closed and so on..):
<html>
<head>
<title>TEST Login Page</title>
</head>
<body>
<div>
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<form name="loginForm" method="post" action="test.do">
<tr>
<input type="username" name="username" size="12" value=""></td>
<input type="password" name="password" size="12" value=""></td>
<input type="submit" value="Login" class="submit"></td>
</tr>
</form>
<tr>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</div>
</body>
</html>
CLEANED HTML
!DOCTYPE html>
<html>
<head>
<title>TEST Login Page</title>
</head>
<body>
<div>
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>
<form name="loginForm" method="post" action="test.do" id="loginForm">
<input type="username" name="username" size="12" value="" />
<input type="password" name="password" size="12" value="" />
<input type="submit" value="Login" class="submit" />
</form>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</div>
</body>
</html>
Casper JS Example:
casper.start(serverName, function(){
this.echo(this.getHTML('form[name="loginForm"]'));
});
casper.run();
With malformed code, nothing return but with cleaned one everityng work fine!
there is a way to handle this problem?
回答1:
If the HTML is malformed then it is undefined how PhantomJS will parse and handle it. PhantomJS breaks the page completely (sample):
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>
<input type="username" name="username" size="12" value=""><input type="password" name="password" size="12" value=""><input type="submit" value="Login" class="submit"><table>
<tbody>
<form name="loginForm" method="post" action="test.do"></form>
<tr>
</tr>
<tr>
</tr>
</tbody>
</table>
It may still be salvageable by
- downloading the page in question with __utils__.sendAJAX,
- fixing it first with plain javascript and string operations (this is the tricky part) and
- then assign this fixed string to
casper.page.content
.
This will essentially be an about:blank
page with your markup. So you will need to start CasperJS with the --local-to-remote-url-access=true
flag.
If you are not bound to PhantomJS you may try out http://slimerjs.org/ as the engine for CasperJS. It uses the gecko engine of the installed Firefox which might handle broken HTML better. It can be run in headless mode through xvfb.
来源:https://stackoverflow.com/questions/25260155/casperjs-how-to-handle-malformed-html