web-mining | 易学教程

Good dataset for sentiment analysis? [closed]

阅读更多关于 Good dataset for sentiment analysis? [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed last year . I am working on sentiment analysis and I am using dataset given in this link: http://www.cs.jhu.edu/~mdredze/datasets/sentiment/index2.html and I have divided my dataset into 50:50 ratio. 50% are used as test samples and 50% are used as train samples and the features extracted from train samples and perform

Good dataset for sentiment analysis? [closed]

阅读更多关于 Good dataset for sentiment analysis? [closed]

Web mining or scraping or crawling? What tool/library should I use? [closed]

阅读更多关于 Web mining or scraping or crawling? What tool/library should I use? [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . I want to crawl and save some webpages as HTML. Say, crawl into hundreds popular websites and simply save their frontpages and the "About" pages. I've looked into many questions, but didn't find an answer to this from either web crawling or web scraping questions. What library or tool should I use to build the

Programmatically look up a ticker symbol in R

阅读更多关于 Programmatically look up a ticker symbol in R

问题 I have a field of data containing company names, such as company <- c("Microsoft", "Apple", "Cloudera", "Ford") > company Company 1 Microsoft 2 Apple 3 Cloudera 4 Ford and so on. The package tm.plugin.webmining allows you to query data from Yahoo! Finance based on ticker symbols: require(tm.plugin.webmining) results <- WebCorpus(YahooFinanceSource("MSFT")) I'm missing the in-between step. How can I query ticket symbols programmatically based on company names? 回答1: I couldn't manage to do this

Programmatically look up a ticker symbol in R

阅读更多关于 Programmatically look up a ticker symbol in R

I have a field of data containing company names, such as company <- c("Microsoft", "Apple", "Cloudera", "Ford") > company Company 1 Microsoft 2 Apple 3 Cloudera 4 Ford and so on. The package tm.plugin.webmining allows you to query data from Yahoo! Finance based on ticker symbols: require(tm.plugin.webmining) results <- WebCorpus(YahooFinanceSource("MSFT")) I'm missing the in-between step. How can I query ticket symbols programmatically based on company names? I couldn't manage to do this with the tm.plugin.webmining package, but I came up with a rough solution - pulling & parsing data from

Good dataset for sentiment analysis? [closed]

阅读更多关于 Good dataset for sentiment analysis? [closed]

I am working on sentiment analysis and I am using dataset given in this link: http://www.cs.jhu.edu/~mdredze/datasets/sentiment/index2.html and I have divided my dataset into 50:50 ratio. 50% are used as test samples and 50% are used as train samples and the features extracted from train samples and perform classification using Weka classifier, but my predication accuracy is about 70-75%. Can anybody suggest some other datasets which can help me to increase the result - I have used unigram, bigram and POStags as my features. doxav There are many sources to get sentiment analysis dataset: huge

How to extract textual contents from a web page? [closed]

阅读更多关于 How to extract textual contents from a web page? [closed]

I'm developing an application in java which can take textual information from different web pages and will summarize it into one page.For example,suppose I have a news on different web pages like Hindu,Times of India,Statesman,etc.Now my application is supposed to extract important points from each one of these pages and will put them together as a single news.The application is based on concepts of web content mining.As a beginner to this field,I can't understand where to start off.I have gone through research papers which explains noise removal as first step in buiding this application. So

Web mining or scraping or crawling? What tool/library should I use? [closed]

阅读更多关于 Web mining or scraping or crawling? What tool/library should I use? [closed]

I want to crawl and save some webpages as HTML. Say, crawl into hundreds popular websites and simply save their frontpages and the "About" pages. I've looked into many questions, but didn't find an answer to this from either web crawling or web scraping questions. What library or tool should I use to build the solution? Or is there even some existing tools that can handle this? There really is no good solution here. You are right as you suspect that Python is probably the best way to start because of it's incredibly strong support of regular expression. In order to implement something like

How to extract textual contents from a web page? [closed]

阅读更多关于 How to extract textual contents from a web page? [closed]

问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 7 years ago . I'm developing an application in java which can take textual information from different web pages and will summarize it into one page.For example,suppose I have a news on different web pages like Hindu,Times of