Python 64 bit not storing as long of string as 32 bit python

≯℡__Kan透↙ 提交于 2021-01-29 03:11:43

问题


I have two computers, both running 64-bit Windows 7. One machine has python 32-bit, one is running python 64-bit. Both machines have 8GB of RAM.

I'm using BeautifulSoup to scrape a webpage, but I've been running into issues on my python64 machine. I've been able to figure out that the output of my len(str(BeautifulSoup(request.get(http://www.sampleurl.com).text))) in 64bit is only returning 92520 characters but on the same, static, site on my python32-bit machine, it's returning 135000 characters.

At some point in the past on my python64-bit machine I had python32-bit, but uninstalled it to install python64-bit because I was having issues installing scipy using pip install (turns out that wasn't the issue).

Anyway, I'm unsure as to why my 64bit python machine isn't returning the entire html string and I was wondering if anyone can help me understand what is going on and how can I fix it.


回答1:


This is not a 32bit / 64bit issue. You are most likely a parser issue; one machine using lxml vs. html.parser on the other, for example.

Different parsers deal differently with broken HTML, and lxml is the default only when installed.

See for example:

  • Beautiful Soup findAll doen't find them all
  • Beautiful Soup 4 find_all don't find links that Beautiful Soup 3 finds
  • BeautifulSoup fails to parse long view state
  • Beautifulsoup lost nodes
  • Missing parts on Beautiful Soup results

etc.

Run import lxml on both machines to verify. When you replaced your Python installation on one machine with a 64-bit version, you likely didn't include a compatible lxml version.



来源:https://stackoverflow.com/questions/28616558/python-64-bit-not-storing-as-long-of-string-as-32-bit-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!