Scrapy Splash click button doesn't work

喜欢而已 提交于 2021-02-07 09:10:48

问题


What I'm trying to do

On avito.ru (Russian real estate site), person's phone is hidden until you click on it. I want to collect the phone using Scrapy+Splash.

Example URL: https://www.avito.ru/moskva/kvartiry/2-k_kvartira_84_m_412_et._992361048

After you click the button, pop-up is displayed and phone is visible.

I'm using Splash execute API with following Lua script:

function main(splash)
    splash:go(splash.args.url)
    splash:wait(10)
    splash:runjs("document.getElementsByClassName('item-phone-button')[0].click()")
    splash:wait(10)
    return splash:png()
end

Problem

The button is not clicked and phone number is not displayed. It's a trivial task, and I have no explanation why it doesn't work.

Click works fine for another field on the same page, if we replace item-phone-button with js-show-stat. So Javascript in general works, and the blue "Display phone" button must be special somehow.

What I've tried

To isolate the problem, I created a repo with minimal example script and a docker-compose file for Splash: https://github.com/alexanderlukanin13/splash-avito-phone

Javascript code is valid, you can verify it using Javascript console in Chrome and Firefox

document.getElementsByClassName('item-phone-button')[0].click()

I've tried it with Splash versions 3.0, 3.1, 3.2, result is the same.

Update

I've also tried:

  • @Lore's suggestions, including simulateClick() approach (see simulate_click branch)

  • mouseDown/mouseUp events as described here: Simulating a mousedown, click, mouseup sequence in Tampermonkey? (see trigger_mouse_event branch)


回答1:


The following script works for me:

function main(splash, args)
  splash.private_mode_enabled = false
  assert(splash:go(args.url))
  btn = splash:select_all('.item-phone-button')[2]
  btn:mouse_click()
  btn.style.border = "5px solid black"
  assert(splash:wait(0.5))
  return {
    num = #splash:select_all('.item-phone-button'),
    html = splash:html(),
    png = splash:png(),
    har = splash:har(),
  }
end

There were 2 issues with the original solution:

  1. There are 2 elements with 'item-phone-button' class, and button of interest is the second one. I've checked which element is matched by setting btn.style.border = "5px solid black".
  2. This website requires private mode to be disabled, likely because it uses localStorage. Check http://splash.readthedocs.io/en/stable/faq.html#website-is-not-rendered-correctly for other common suggestions.



回答2:


I don't know how your implementation works, but I suggest to rename main with parse, the default function called by spiders on start.

If this isn't the problem, first thing to do is controlling if you have picked the right element of that class using Javascript with css selector. Maybe it exists another item with item-phone-button class attribute and you are clicking in the wrong place.

If all above is correct, I suggest then two options that worked for me:

  • Using Splash mouse_click and Splash wait (the latter I see you have already used). If it don't work, try double click, by substituting in your code:
    local button = splash:select('item phone-button') 
    button:mouse_click()
    button:mouse_click()
    

  • Using Splash wait_for_resume, that executes javascript code until terminated and then restart LUA. Your code will become simpler too:
    function main(splash)
        splash:go(splash.args.url)
        splash:wait_for_resume("document.getElementsByClassName([[
                      function main(splash) {
                           document.getElementsByClassName('item-phone-button');[0].click()
                           splash.resume();
                      }               
        ]])
        return splash:png()
    end
    

    EDIT: it seems that is good to use dispatchEvent instead of click() like in this example:

    function simulateClick() {
      var event = new MouseEvent('click', {
        view: window,
        bubbles: true,
        cancelable: true
      });
      var cb = document.getElementById('checkbox'); 
      var cancelled = !cb.dispatchEvent(event);
      if (cancelled) {
        // A handler called preventDefault.
        alert("cancelled");
      } else {
        // None of the handlers called preventDefault.
        alert("not cancelled");
      }
    }
    


    来源:https://stackoverflow.com/questions/49276401/scrapy-splash-click-button-doesnt-work

  • 易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
    该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!