Extracting data from website with XSLT

落爺英雄遲暮 提交于 2019-12-11 12:56:16

问题


I'm trying to learn XSLT and I came across a problem. The thing I would like to do is to extract some data from a website, transform it with xslt templates and finally show it in my own xhtml page.

Lets say i have a xml file (this will be my xhtml site):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<?xml-stylesheet type="text/xsl" href="myXSLTFile.xsl"?>


<!--here I want to have markup produced by xslt file-->

The question is how to achieve this? I want my xslt file to work on nodes from a particular web site (for example http://www.example.com) and produce result into my own xml file.

If you find my explanation confusing please ask and I'll try to explain that problem a little better.

EDIT. I'll give an example. Lets say we have this page: http://www.w3.org/TR/xhtml1/. I want to Develop XSLT document extracting titles of chapters and sections from Full table of contents and putting them into a table in my own xml file. The thing I have problem with is how to reference page: http://www.w3.org/TR/xhtml1/ in my xslt file so that it works on its nodes (this page is written in xhtml so I don't have to worry about transforming html to xml).

EDIT2. After further research it seems as though Thomas W.'s answer is the solution to the problem, but you have to deal with XSS problems (tips in LarsH's answer).


回答1:


In theory, you can do something like

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="test.xsl"?>
<page href="http://www.w3.org/TR/xslt/index.htm"/>

and have a stylesheet like

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns="http://www.w3.org/1999/xhtml"
    xmlns:h="http://www.w3.org/1999/xhtml">

  <xsl:template match="/">
    <html>
      <head></head>
      <body>
        <xsl:for-each select="document(*/@href)//h:h2">
          <xsl:copy-of select="."/>
        </xsl:for-each>
      </body>
    </html>
  </xsl:template>

</xsl:stylesheet>

But this doesn't really work across browsers (Chrome only, as it seems to me). One reason might be XSS security features that block loading the foreign page.




回答2:


A couple of ways to get around XSS restrictions... see AJAX and Cross-Site Scripting to Read the Header

  • Add a local PHP or other server page to proxy to the other web site.
  • Use CORS.


来源:https://stackoverflow.com/questions/14202489/extracting-data-from-website-with-xslt

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!