web-mining

Good dataset for sentiment analysis? [closed]

有些话、适合烂在心里 提交于 2019-12-31 08:07:54
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed last year . I am working on sentiment analysis and I am using dataset given in this link: http://www.cs.jhu.edu/~mdredze/datasets/sentiment/index2.html and I have divided my dataset into 50:50 ratio. 50% are used as test samples and 50% are used as train samples and the features extracted from train samples and perform

Good dataset for sentiment analysis? [closed]

≡放荡痞女 提交于 2019-12-31 08:07:07
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed last year . I am working on sentiment analysis and I am using dataset given in this link: http://www.cs.jhu.edu/~mdredze/datasets/sentiment/index2.html and I have divided my dataset into 50:50 ratio. 50% are used as test samples and 50% are used as train samples and the features extracted from train samples and perform

Web mining or scraping or crawling? What tool/library should I use? [closed]

最后都变了- 提交于 2019-12-18 14:02:45
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . I want to crawl and save some webpages as HTML. Say, crawl into hundreds popular websites and simply save their frontpages and the "About" pages. I've looked into many questions, but didn't find an answer to this from either web crawling or web scraping questions. What library or tool should I use to build the

Programmatically look up a ticker symbol in R

别等时光非礼了梦想. 提交于 2019-12-04 14:22:30
问题 I have a field of data containing company names, such as company <- c("Microsoft", "Apple", "Cloudera", "Ford") > company Company 1 Microsoft 2 Apple 3 Cloudera 4 Ford and so on. The package tm.plugin.webmining allows you to query data from Yahoo! Finance based on ticker symbols: require(tm.plugin.webmining) results <- WebCorpus(YahooFinanceSource("MSFT")) I'm missing the in-between step. How can I query ticket symbols programmatically based on company names? 回答1: I couldn't manage to do this

Programmatically look up a ticker symbol in R

旧城冷巷雨未停 提交于 2019-12-03 09:54:41
I have a field of data containing company names, such as company <- c("Microsoft", "Apple", "Cloudera", "Ford") > company Company 1 Microsoft 2 Apple 3 Cloudera 4 Ford and so on. The package tm.plugin.webmining allows you to query data from Yahoo! Finance based on ticker symbols: require(tm.plugin.webmining) results <- WebCorpus(YahooFinanceSource("MSFT")) I'm missing the in-between step. How can I query ticket symbols programmatically based on company names? I couldn't manage to do this with the tm.plugin.webmining package, but I came up with a rough solution - pulling & parsing data from

Good dataset for sentiment analysis? [closed]

懵懂的女人 提交于 2019-12-02 15:17:36
I am working on sentiment analysis and I am using dataset given in this link: http://www.cs.jhu.edu/~mdredze/datasets/sentiment/index2.html and I have divided my dataset into 50:50 ratio. 50% are used as test samples and 50% are used as train samples and the features extracted from train samples and perform classification using Weka classifier, but my predication accuracy is about 70-75%. Can anybody suggest some other datasets which can help me to increase the result - I have used unigram, bigram and POStags as my features. doxav There are many sources to get sentiment analysis dataset: huge

How to extract textual contents from a web page? [closed]

我怕爱的太早我们不能终老 提交于 2019-11-30 10:39:29
I'm developing an application in java which can take textual information from different web pages and will summarize it into one page.For example,suppose I have a news on different web pages like Hindu,Times of India,Statesman,etc.Now my application is supposed to extract important points from each one of these pages and will put them together as a single news.The application is based on concepts of web content mining.As a beginner to this field,I can't understand where to start off.I have gone through research papers which explains noise removal as first step in buiding this application. So

Web mining or scraping or crawling? What tool/library should I use? [closed]

别说谁变了你拦得住时间么 提交于 2019-11-30 10:38:30
I want to crawl and save some webpages as HTML. Say, crawl into hundreds popular websites and simply save their frontpages and the "About" pages. I've looked into many questions, but didn't find an answer to this from either web crawling or web scraping questions. What library or tool should I use to build the solution? Or is there even some existing tools that can handle this? There really is no good solution here. You are right as you suspect that Python is probably the best way to start because of it's incredibly strong support of regular expression. In order to implement something like

How to extract textual contents from a web page? [closed]

泪湿孤枕 提交于 2019-11-29 15:51:58
问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 7 years ago . I'm developing an application in java which can take textual information from different web pages and will summarize it into one page.For example,suppose I have a news on different web pages like Hindu,Times of