How to optimize many importxml from the same website?

不羁的心 提交于 2020-12-07 03:40:23

问题


I have a Google Sheet where rows are products and columns are information about these products (Material, Color, Price, ...).

These information are retrieved using the function IMPORTXML with the corresponding xpath query. In the following example, the column D is the URL of this product. These functions retrieve information (Glass, Color and Type) for the product on row 3:

=IMPORTXML(D3,"//table[@class='info-table']/tr[th/text()[contains(.,'Glass')]]/td")
=IMPORTXML(D3,"//table[@class='info-table']/tr[th/text()[contains(.,'Color')]]/td")
=IMPORTXML(D3,"//table[@class='info-table']/tr[th/text()[contains(.,'Type')]]/td")

For each product, there are around 10 columns that are retrieved and thus 10 different functions IMPORTXML for the same URL.

It works properly when there are few products but when there are too many products only some information is retrieved and the other cells stay frozen with the value "Loading...".

The only workaround I found is manually copy/pasting the retrieved values (Ctrl+C, Ctrl+Maj+v) and then some other cells are unblocked and start import the values. In the end, I need to do this for all cells and it takes a lot of time.

  1. Why do I have cells that stay frozen in "Loading..."? Is there a limit imposed by Google?
  2. Should I adapt my functions to optimize the process and if possible reduce the number of IMPORTXML functions?
  3. Do you have a better solution than my time consuming workaround?
  4. The xpath query is always very similar. Is it possible to use a list of predefined elements. And then use other Google Sheet functions to obtain the same result but with only one IMPORTXML function

For example:

=IMPORTXML(D3,"//table[@class='info-table']/tr[th/text()[contains(.,{'Glass', 'Color', 'Type'})]]/td")

回答1:


This issue can be fixed by adapting the xpath query and using a combination of different Google Spreadsheet formulas.

  1. It is possible to combine several xpath queries using the character |
  2. To only get the first result, you can surround the xpath query by (YOUR_XPATH)[1]
  3. The TRANSPOSE function will move the several results on the same row

For example:

=TRANSPOSE(IMPORTXML(D2,"
(//table[@class='info-table']/tr[th/text()[contains(.,'Diameter')]]/td)[1] | 
(//table[@class='info-table']/tr[th/text()[contains(.,'Material')]]/td)[1] | 
(//table[@class='info-table']/tr[th/text()[contains(.,'Glass')]]/td)[1] | 
(//table[@class='info-table']/tr[th/text()[contains(.,'W/R')]]/td)[1] | 
(//table[@class='info-table']/tr[th/text()[contains(.,'Caliber')]]/td)[1] | 
(//table[@class='info-table']/tr[th/text()[contains(.,'Type')]]/td)[1]"))

By doing so, there is only one importXML per product.



来源:https://stackoverflow.com/questions/50604523/how-to-optimize-many-importxml-from-the-same-website

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!