tags inside using importXML Xpath query, in Google Spreadsheet

问题

I'm using Xpath in Google docs to get the text inside <div>. I want to save the text inside <div id="job_description"> in one cell of Google doc spreadsheet, but it shows each <div> in separate cell.

<div id="job_description">
    <div>
        <strong>
            Basic Purpose:
        </strong>
        <br></br>
    </div>
    <div>
        Work closely with developers, product owners and Q…
        <br></br>
    </div>
    <div>
        The Test Analyst is accountable for the developmen…
        <br></br>
    </div>
    <div>
        <strong>
            Duties and Responsibilities:
        </strong>
    </div>
    <ul>
        <li></li>
        <li></li>
    </ul>
    <div>
        <strong>
            Requirements:
        </strong>
        <br></br>
    </div>
    <ul>
        <li></li>
        <li></li>
    </ul>
</div>

Image: http://i.stack.imgur.com/K0mAY.png

and this is the code I wrote:

=IMPORTXML(E4,"//div[@id='job_description']")

May you help me to put all of the text (including <div> <ul> ...) inside the <div id="job_description"> in only one cell ?

回答1:

Using JOIN is a good start, but you can make it a single operation.

You did not show the URL to the page you're importing, so I can only give you an example with another page. For instance, if you are importing www.w3.org and looking for a div where @class='event closed expand_block', use

=JOIN(CHAR(10),IMPORTXML("http://www.w3.org/","//div[@class='event closed expand_block']//text()"))

Notice that I also modified the XPath expression: //text() makes sure only descendant text nodes are retrieved, that is, all the text.

EDIT: Responding to your comment:

May I know what is CHAR(10) referring to?

Yes, of course. CHAR returns a character and takes a number as input. In the case of CHAR(10), a newline character is returned (I assume because of 
).

In the formula, CHAR(10) is used as the first argument of JOIN, which is the delimiter of the objects that are to be joined.

回答2:

For now I found a solution , I'll put it here so that others can know my answer, but if there is any other solution please let us know

I used JOIN to put the separate cells (L3:X3) into one single cell

=Trim(JOIN(" ",L3:X3))

回答3:

you can also use regexreplace to remove the line breaks, with

=REGEXREPLACE(IMPORTXML(E4,"//div[@id='job_description']"),"\n","")

this should wrap it all into one cell for you.

来源：https://stackoverflow.com/questions/28265221/div-tags-inside-div-using-importxml-xpath-query-in-google-spreadsheet

标签

html

xpath

spreadsheet

google-docs

<div> tags inside <div> using importXML Xpath query, in Google Spreadsheet

问题

回答1:

回答2:

回答3: