Nokogiri adds characters during parsing on Heroku

假装没事ソ 提交于 2019-12-08 06:05:15

问题


It seems like Nokogiri has a problem with UTF-8 conversion of the nbsp character. I've gathered this is an issue related to LibXML2. Nokogiri recommends upgrading LibXML2 to 2.7.7 instead of 2.7.6 that's running on Heroku.

Anyone know how I can use LibXML2 2.7.7 (or higher) on Heroku?

The problem is as follows --

doc = Nokogiri::HTML("<html><p>Hi Hello</p></html>")
doc.inner_html
=> "<html><body><p>Hi Hello</p></body></html>"

doc.inner_html = "<p>Hello&nbsp;World</p>"
=> "<p>Hello&nbsp;World</p>"

doc.inner_html
=> "<p>Hello World</p>"

Looks like this is related: https://github.com/sparklemotion/nokogiri/issues/306

This doesn't happen on my local machine. Rails has 'utf-8' set as the config.encoding and the page that's rendered has a utf-8 charset meta tag.

On my local machine I'm running Nokogiri 1.6 with limxml2 2.8.0 and on Heroku I'm running Nokogiri 1.6 with libxml2 2.7.6.

Thanks.


回答1:


Unfortunately Heroku doesn't support installing additional libraries or binaries to stacks. The best workaround is to vendor these into your project. You'll need to use 64-bit Linux versions to make them work on Heroku; compiling statically can also help ensure that any dependencies needed are included. Similarly, for gems that depend on external libraries, we recommend compiling the gem statically and vendoring it into your project.

If you do wish to try to vendor your binary, library, or gem, you can use Heroku as your build environment. One of Herokus engineers created a build server that allows you to upload source code, run the compilation step, and then download the resulting binary. You can find this project on Github under the name "Vulcan".

Heres a link for more instructions... https://devcenter.heroku.com/articles/buildpack-binaries



来源:https://stackoverflow.com/questions/17980737/nokogiri-adds-characters-during-parsing-on-heroku

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!