问题
I don't care what the library is, but I need a way to extract <.script.> elements from the <.body.> of a page (as string). I then want to insert the extracted <.script.>s just before <./body.>.
Ideally, I'd like to extract the <.script.>s into 2 types;
1) External (those that have the src attribute)
2) Embedded (those with code between <.script.><./script.>)
So far I've tried with phpDOM, Simple HTML DOM and Ganon.
I've had no luck with any of them (I can find links and remove/print them - but fail with scripts every time!).
Alternative to
https://stackoverflow.com/questions/23414887/php-simple-html-dom-strip-scripts-and-append-to-bottom-of-body
(Sorry to repost, but it's been 24 Hours of trying and failing, using alternative libs, failing more etc.).
Based on the lovely RegEx answer from @alreadycoded.com, I managed to botch together the following;
$output = "<html><head></head><body><!-- Your stuff --></body></html>"
$content = '';
$js = '';
// 1) Grab <body>
preg_match_all('#(<body[^>]*>.*?<\/body>)#ims', $output, $body);
$content = implode('',$body[0]);
// 2) Find <script>s in <body>
preg_match_all('#<script(.*?)<\/script>#is', $content, $matches);
foreach ($matches[0] as $value) {
    $js .= '<!-- Moved from [body] --> '.$value;
}
// 3) Remove <script>s from <body>
$content2 = preg_replace('#<script(.*?)<\/script>#is', '<!-- Moved to [/body] -->', $content); 
// 4) Add <script>s to bottom of <body>
$content2 = preg_replace('#<body(.*?)</body>#is', '<body$1'.$js.'</body>', $content2);
// 5) Replace <body> with new <body>
$output = str_replace($content, $content2, $output);
Which does the job, and isn't that slow (fraction of a second)
Shame none of the DOM stuff was working (or I wasn't up to wading through naffed objects and manipulating).
回答1:
$js = "";
$content = file_get_contents("http://website.com");
preg_match_all('#<script(.*?)</script>#is', $content, $matches);
foreach ($matches[0] as $value) {
    $js .= $value;
}
$content = preg_replace('#<script(.*?)</script>#is', '', $content); 
echo $content = preg_replace('#<body(.*?)</body>#is', '<body$1'.$js.'</body>', $content);
回答2:
To select all script nodes with a src-attribute
$xpathWithSrc = '//script[@src]';
To select all script nodes with content:
$xpathWithBody = '//script[string-length(text()) > 1]';
Basic usage(Replace the query with your actual xpath-query):
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
foreach($xpath->query('//body//script[string-length(text()) > 1]') as $queryResult) {
    // access the element here. Documentation:
    // http://www.php.net/manual/de/class.domelement.php
}
回答3:
Try https://github.com/fabpot/goutte it's intuitive and easy to use.
回答4:
If you're really looking for an easy lib for this, I can recommend this one:
$dom = str_get_html($html);
$scripts = $dom->find('script')->remove;
$dom->find('body', 0)->after($scripts);
echo $dom;
There's really no easier way to do things like this in PHP.
来源:https://stackoverflow.com/questions/23429182/php-parse-html-extract-script-tags-from-body-and-inject-before-body