问题
Using OkHttp3 I was happily scraping a website for quite some time now. However, some components of the website have been upgraded and are now using an additional OpenID bearer authentication.
I am 99.9% positive my requests are failing due to this bearer token because when I check with Chrome dev tools, I see the bearer token popping up only for these parts. Moreover, a couple of requests request are going to links that end with ".well-known/openid-configuration". In addition, when I hardcode the bearer token from my browser in my OkHttp3 code, everything works. Without the code, I get an 401 non authorized message.
I figured that my browser emulation was not close enough to the real situation so I decided to use a headless browser setup that is doing some javascript invocations. Since I am using Java, I used HtmlUnit. Using this tool I could quickly get to the point where I could successfully scrape parts of the website (just as with OkHttp3) but it would again fail with the newly updated parts. I checked but couldn't find the bearer token in any of the responses (nor in the headers or in the cookies).
Is there any chance this approach (using a headless browser) could work? Or are there perhaps alternative approaches I could check.
来源:https://stackoverflow.com/questions/58973021/retrieve-openid-bearer-token-using-headless-browser-setup