<div> tags inside <div> using importXML Xpath query, in Google Spreadsheet

為{幸葍}努か 提交于 2019-12-06 14:54:28

问题


I'm using Xpath in Google docs to get the text inside <div>. I want to save the text inside <div id="job_description"> in one cell of Google doc spreadsheet, but it shows each <div> in separate cell.

<div id="job_description">
    <div>
        <strong>
            Basic Purpose:
        </strong>
        <br></br>
    </div>
    <div>
        Work closely with developers, product owners and Q…
        <br></br>
    </div>
    <div>
        The Test Analyst is accountable for the developmen…
        <br></br>
    </div>
    <div>
        <strong>
            Duties and Responsibilities:
        </strong>
    </div>
    <ul>
        <li></li>
        <li></li>
    </ul>
    <div>
        <strong>
            Requirements:
        </strong>
        <br></br>
    </div>
    <ul>
        <li></li>
        <li></li>
    </ul>
</div>

Image: http://i.stack.imgur.com/K0mAY.png

and this is the code I wrote:

=IMPORTXML(E4,"//div[@id='job_description']")

May you help me to put all of the text (including <div> <ul> ...) inside the <div id="job_description"> in only one cell ?


回答1:


Using JOIN is a good start, but you can make it a single operation.

You did not show the URL to the page you're importing, so I can only give you an example with another page. For instance, if you are importing www.w3.org and looking for a div where @class='event closed expand_block', use

=JOIN(CHAR(10),IMPORTXML("http://www.w3.org/","//div[@class='event closed expand_block']//text()"))

Notice that I also modified the XPath expression: //text() makes sure only descendant text nodes are retrieved, that is, all the text.


EDIT: Responding to your comment:

May I know what is CHAR(10) referring to?

Yes, of course. CHAR returns a character and takes a number as input. In the case of CHAR(10), a newline character is returned (I assume because of &#10;).

In the formula, CHAR(10) is used as the first argument of JOIN, which is the delimiter of the objects that are to be joined.




回答2:


For now I found a solution , I'll put it here so that others can know my answer, but if there is any other solution please let us know

I used JOIN to put the separate cells (L3:X3) into one single cell

=Trim(JOIN(" ",L3:X3)) 



回答3:


you can also use regexreplace to remove the line breaks, with

=REGEXREPLACE(IMPORTXML(E4,"//div[@id='job_description']"),"\n","")

this should wrap it all into one cell for you.



来源:https://stackoverflow.com/questions/28265221/div-tags-inside-div-using-importxml-xpath-query-in-google-spreadsheet

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!