IMPORTXML XPath_Query for Google Sheets

瘦欲@ 提交于 2020-01-13 17:07:41

问题


I'm using GoogleSheet's IMPORTXML function to retrieve data for each calendar date one year earlier or the closest year-ago date where data are available.

This is a sample of the data (full data source is here):

 <entry>
    <id>http://data.treasury.gov/Feed.svc/DailyTreasuryYieldCurveRateData(6794)</id>
    <title type="text"></title>
    <updated>2018-02-06T22:05:38Z</updated>
    <author>
      <name />
    </author>
    <link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(6794)" />
    <category term="TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
    <content type="application/xml">
      <m:properties>
        <d:Id m:type="Edm.Int32">6794</d:Id>
        <d:NEW_DATE m:type="Edm.DateTime">2017-02-24T00:00:00</d:NEW_DATE>
        <d:BC_1MONTH m:type="Edm.Double">0.4</d:BC_1MONTH>
        <d:BC_3MONTH m:type="Edm.Double">0.52</d:BC_3MONTH>
        <d:BC_6MONTH m:type="Edm.Double">0.65</d:BC_6MONTH>
        <d:BC_1YEAR m:type="Edm.Double">0.8</d:BC_1YEAR>
        <d:BC_2YEAR m:type="Edm.Double">1.12</d:BC_2YEAR>
        <d:BC_3YEAR m:type="Edm.Double">1.38</d:BC_3YEAR>
        <d:BC_5YEAR m:type="Edm.Double">1.8</d:BC_5YEAR>
        <d:BC_7YEAR m:type="Edm.Double">2.12</d:BC_7YEAR>
        <d:BC_10YEAR m:type="Edm.Double">2.31</d:BC_10YEAR>
        <d:BC_20YEAR m:type="Edm.Double">2.69</d:BC_20YEAR>
        <d:BC_30YEAR m:type="Edm.Double">2.95</d:BC_30YEAR>
        <d:BC_30YEARDISPLAY m:type="Edm.Double">2.95</d:BC_30YEARDISPLAY>
      </m:properties>
    </content>
  </entry>
  <entry>
    <id>http://data.treasury.gov/Feed.svc/DailyTreasuryYieldCurveRateData(6795)</id>
    <title type="text"></title>
    <updated>2018-02-06T22:05:38Z</updated>
    <author>
      <name />
    </author>
    <link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(6795)" />
    <category term="TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
    <content type="application/xml">
      <m:properties>
        <d:Id m:type="Edm.Int32">6795</d:Id>
        <d:NEW_DATE m:type="Edm.DateTime">2017-02-27T00:00:00</d:NEW_DATE>
        <d:BC_1MONTH m:type="Edm.Double">0.44</d:BC_1MONTH>
        <d:BC_3MONTH m:type="Edm.Double">0.5</d:BC_3MONTH>
        <d:BC_6MONTH m:type="Edm.Double">0.68</d:BC_6MONTH>
        <d:BC_1YEAR m:type="Edm.Double">0.81</d:BC_1YEAR>
        <d:BC_2YEAR m:type="Edm.Double">1.2</d:BC_2YEAR>
        <d:BC_3YEAR m:type="Edm.Double">1.46</d:BC_3YEAR>
        <d:BC_5YEAR m:type="Edm.Double">1.87</d:BC_5YEAR>
        <d:BC_7YEAR m:type="Edm.Double">2.18</d:BC_7YEAR>
        <d:BC_10YEAR m:type="Edm.Double">2.36</d:BC_10YEAR>
        <d:BC_20YEAR m:type="Edm.Double">2.72</d:BC_20YEAR>
        <d:BC_30YEAR m:type="Edm.Double">2.98</d:BC_30YEAR>
        <d:BC_30YEARDISPLAY m:type="Edm.Double">2.98</d:BC_30YEARDISPLAY>
      </m:properties>
    </content>
  </entry>
  <entry>

This is the XPath query I'm currently using to retrieve data for 2017 Feb 27:

//*[local-name() = 'NEW_DATE'][text() = '2017-02-27T00:00:00']/..

This is the result that displays:

6795    2017-02-27T00:00:00 0.44    0.5 0.68    0.81    1.2 1.46    1.87    2.18    2.36    2.72    2.98    2.98

Is there a way to:

  1. Retrieve all data displayed except the "d:Id" element ("6795" above) and
  2. If searching for a date for which there are no data (e.g. 2017 Feb 25 will result in a "#N/A" error since the query would return nothing for the missing date), the query will default to the next available forward date (e.g. 2017 Feb 27)?

I'm avoiding using the IF function in order to make fewer IMPORTXML calls.


回答1:


How about this answer?

For your 1st question

Sample :

=TRANSPOSE(IMPORTXML(A1, "//*[local-name() = 'NEW_DATE'][text() = '2017-02-27T00:00:00']/../*[local-name()!='Id']"))
  • "A1" is the URL of http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData?$filter=year(NEW_DATE)%20eq%202017.
  • For //*[local-name() = 'NEW_DATE'][text() = '2017-02-27T00:00:00']/.., values except for <d:Id m:type="Edm.Int32">6795</d:Id> were retrieved.
  • The result is transposed, because the values are outputted to rows.

Result :

For your 2nd question

When =TRANSPOSE(IMPORTXML(A1, "//*[local-name() = 'NEW_DATE'][text() = '2017-02-25T00:00:00']/../*[local-name()!='Id']")) is used, the result of #N/A is retrieved.

If I misunderstand your question, I'm sorry.




回答2:


Because Google Sheets (GS) seems to be compatible with only XPath 1.0 (its documentation and product forum pages here and here don't confirm or clarify which version(s) is supported as of this date), alternative approaches such as XPath 2.0 IF-THEN-ELSE statements cannot be used. Instead, the source XML data sought can be filtered using GS native functions.

QUESTION 1

Per @Tanaike's proposed solution, retrieving all children nodes of an element, save one, can be accomplished using the XPath "not" command, i.e. !, as applied below to the parent element of "NEW_DATE" and leaving out the child element "Id". TRANSPOSE is used to display it in columnar form. (A1 is the cell containing the source XML URL in the question.)

=TRANSPOSE(IMPORTXML(A1, "//*[local-name() = 'NEW_DATE'][text() = '2017-02-25T00:00:00']/../*[local-name()!='Id']"))

QUESTION 2

To search for a specific date one year ago and, in case data for that date are missing, to retrieve the closest forward date will require nested GS functions to first retrieve the "NEW_DATE" data with the above formula, next to reverse its order using SORT and to MATCH the closest available date. The INDEX function is then used on the reverse-sorted element to select the appropriate date. The CONCATENATE, TEXT and TODAY functions are used just to set the year-ago date in a format compatible with the XML data. The formula is below.

=TRANSPOSE(IMPORTXML(A1,CONCATENATE("//*[local-name() = 'NEW_DATE'][text() = '",INDEX(SORT(IMPORTXML(A1, "//*[local-name() = 'NEW_DATE']"),1,FALSE),MATCH(CONCATENATE(TEXT(TODAY()-365,"YYYY-MM-DD"),"T00:00:00"),SORT(IMPORTXML(A1, "//*[local-name() = 'NEW_DATE']"),1,FALSE),-1)),"']/../*[local-name()! = 'Id']")))



来源:https://stackoverflow.com/questions/48795106/importxml-xpath-query-for-google-sheets

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!