Using R, I am trying to scrape a web page save the text, which is in Japanese, to a file. Ultimately this needs to be scaled to tackle hundreds of pages on a daily basis. I
Hi I have wrote a scraping engine that allows for the scraping of data on webpages that are deeply embedded within the main listing page. I wonder if it might be helpful to use it as an aggregator for your web data prior to importing in R?
The location to the engine is here http://ec2-204-236-207-28.compute-1.amazonaws.com/scrap-gm
The sample parameter I created to scrape the page you had in mind is as below.
{
origin_url: 'http://stocks.finance.yahoo.co.jp/stocks/detail/?code=7203',
columns: [
{
col_name: 'links_name',
dom_query: 'a'
}, {
col_name: 'links',
dom_query: 'a' ,
required_attribute: 'href'
}]
};