Chrome extension - Get html from a separate page of a website in the background

百般思念 提交于 2019-12-31 04:51:22

问题


I have made an extension that will track what manga a person reads on a manga site and list what chapter they last read for it in their favorites page. And I've recently come up with a useful feature to make the extension a little bit better. I would like to give the user the option to be able to track only manga that they have Favorited on the site. So as they are reading, the extension will constantly check in the background if it is in their favorites and if so then save it and if not don't save it.

The website has a favorites page that holds a list of all of the manga a person has Favorited. I would like to be able to constantly grab the names of each manga listed on that page in the background hidden from the user.

So my question is, is there any way to grab the html of a specific page in the background and constantly grab specific data such as text of certain elements to save to an array, without the user having to actually be on the favorites page?

Edit: Solution

var barray = [];
function getbm(callback) {
    var xhr = new XMLHttpRequest();
    xhr.onreadystatechange = function(data) {
        if (xhr.readyState == 4) {
            if (xhr.status == 200) {
                var data = xhr.responseText;
                callback(data);
            } else {
                callback(null);
            }
        }
    }
    var url = 'http://mangafox.me/bookmark/index.php?status=all';
    xhr.open('GET', url, true);
    xhr.send();
};
function res(data) {
    var parsed  = $.parseHTML(data);
    parsed = $('<div />').append(parsed);
    parsed.find('h2.title').each(function(){
        var bmanga = $(this).children('a.title').text();
        barray.push({"manga": bmanga});
    });
    chrome.storage.local.set({'bData': barray})
};
getbm(res);

回答1:


It heavily depends on how the page in question is constructed.

If the page is static (HTTP response includes the data you need), then scraping the page via XMLHttpRequest is the way to go.

If the page is dynamic (no data initially, and JavaScript on the page then queries the server to fill it), then XHR route will not work. You can try to observe network requests made by that page and replicate them.

Of note: while it's unlikely, check if the site has a public API. That will save you the reverse-engineering efforts and lets you avoid the grey area of automated data scraping.


Also, see if you can somehow check from the page you're normally tracking if the item is favourited or not. It will be easier than scraping another page.



来源:https://stackoverflow.com/questions/27290161/chrome-extension-get-html-from-a-separate-page-of-a-website-in-the-background

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!