Ads filtering server side [closed]

我怕爱的太早我们不能终老 提交于 2019-12-07 03:55:26

问题


I'm working on a web application where I display HTML from other websites. Before displaying the final version I'd like to get rid of the ads.

Any ideas, suggestions on how to accomplish this? it doesn't need to be a super efficient filtering tool, I was thinking in porting some of the filters defined by adblockplus to Ruby and return the parsed doc with some help of Nokogiri.

Let's say I use the super wildcard filter ad. That's not an official adblock but for simplicity I'll use it here. The idea then would be to remove all the elements for which any of the attributes match the filter, e.g: src="http://ad.foo.com?my-ad.gif" href="http://ad.foo.com" class="annoying-ad" etc.

The Nokogiri command for this filter would be:

doc.xpath("//*[@*[contains(., 'ad')]]").each { |element| element.remove }

I applied the filter for this page:

And the result was:

Not that bad, note that the global wildcard filter also got rid of valid elements like headers because they have attributes like id="masthead".

So I think this approach is ok for my case, now the question would be what filters to use? they have a huge list of filters and I don't feel like iterating over all of them. I'm thinking in grabbing the top 10-20 and parse the docs based on that, is there a list out there with the most popular ones? If so, I haven't been able to find it.

来源:https://stackoverflow.com/questions/18564924/ads-filtering-server-side

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!