data-mining | 易学教程

cspade() R Error

阅读更多关于 cspade() R Error

I am trying to mine rules from the events of cable modems. Linked is one file of thousands. When I try and run the cspade algorithm on the merged file of all devices (12 million rows) it spends hours chewing through RAM until it uses all 64 GB I have available. So I attempted to run the algorithm on the linked file for just one device. I see the exact same thing happen. Since this sub sample is only 2190 rows I thought this was strange. Can someone explain why Im not seeing results in a timely matter on this small data set? https://drive.google.com/file/d/0B6VvhxxLVGccVnhDNmVKUE0yaEk/view?usp

Intelligently grab first paragraph/starting text

阅读更多关于 Intelligently grab first paragraph/starting text

I'd like to have a script where I can input a URL and it will intelligently grab the first paragraph of the article... I'm not sure where to begin other than just pulling text from within <p> tags. Do you know of any tips/tutorials on how to do this kind of thing? update For further clarification, I'm building a section of my site where users can submit links like on Facebook, it'll grab an image from their site as well as text to go with the link. I'm using PHP and trying to determine the best method of doing this. I say "intelligently" because I'd like to try to get content on that page that

java Open source projects for medical diagnose & data mining [closed]

阅读更多关于 java Open source projects for medical diagnose & data mining [closed]

Closed. This question is off-topic . It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . I'm looking for some OS java engines for medical diseases diagnose . these are engines that takes queries input from user discribing patient symptoms and the engine should return suggestions of potential disease according to input symptoms. does such engines exists somewhere? I prefer some Java OS engine in this field if there exists some. any suggestions or Ideas? thanks It sounds like you are looking for a

How to handle huge sparse matrices construction using Scipy?

阅读更多关于 How to handle huge sparse matrices construction using Scipy?

So, I am working on a Wikipedia dump to compute the pageranks of around 5,700,000 pages give or take. The files are preprocessed and hence are not in XML. They are taken from http://haselgrove.id.au/wikipedia.htm and the format is: from_page(1): to(12) to(13) to(14).. from_page(2): to(21) to(22).. . . . from_page(5,700,000): to(xy) to(xz) so on. So. basically it's a construction of a [5,700,000*5,700,000] matrix, which would just break my 4 gigs of RAM. Since, it is very-very Sparse, that makes it easier to store using scipy.lil.sparse or scipy.dok.sparse , now my issue is: How on earth do I

Implementation of k-means clustering algorithm

阅读更多关于 Implementation of k-means clustering algorithm

问题 In my program, i'm taking k=2 for k-mean algorithm i.e i want only 2 clusters. I have implemented in a very simple and straightforward way, still i'm unable to understand why my program is getting into infinite loop. can anyone please guide me where i'm making a mistake..? for simplicity, i hav taken the input in the program code itself. here is my code : import java.io.*; import java.lang.*; class Kmean { public static void main(String args[]) { int N=9; int arr[]={2,4,10,12,3,20,30,11,25};

R, DMwR-package, SMOTE-function won't work

阅读更多关于 R, DMwR-package, SMOTE-function won't work

I need to apply the smote-algorithm to a data set, but can't get it to work. Example: x <- c(12,13,14,16,20,25,30,50,75,71) y <- c(0,0,1,1,1,1,1,1,1,1) frame <- data.frame(x,y) library(DMwR) smotedobs <- SMOTE(y~ ., frame, perc.over=300) This gives the following error: Error in scale.default(T, T[i, ], ranges) : subscript out of bounds In addition: Warning messages: 1: In FUN(newX[, i], ...) : no non-missing arguments to max; returning -Inf 2: In FUN(newX[, i], ...) : no non-missing arguments to min; returning Inf Would appriciate any kind of help or hints. I don't have the full answer. I can

Search twitter and obtain tweets by hashtag, maximizing number of returned search results

阅读更多关于 Search twitter and obtain tweets by hashtag, maximizing number of returned search results

I am attempting to compile a corpus of all Tweets related to the World Cup on Twitter from their API using the twitteR package in R. I am using the following code for a single hashtag (for example). However, my problem is that it appears I am only 'authorized' to access a limited set of the tweets (in this case, only the 32 most recent). library(twitteR) reqURL <- "https://api.twitter.com/oauth/request_token" accessURL <- "https://api.twitter.com/oauth/access_token" authURL <- "http://api.twitter.com/oauth/authorize" #consumerKey <- Omitted #consumerSecret <- Omitted twitCred <- OAuthFactory

Similarity matrix -> feature vectors algorithm?

阅读更多关于 Similarity matrix -> feature vectors algorithm?

If we have a set of M words, and know the similarity of the meaning of each pair of words in advance (have a M x M matrix of similarities), which algorithm can we use to make one k-dimensional bit vector for each word, so that each pair of words can be compared just by comparing their vectors (e.g. getting the absolute difference of vectors)? I don't know how this particular problem is called. If I knew, it would be much easier to find among a bunch of algorithms with similar descriptions, which do something else. Additional observation: I think this algorithm would have to produce one, in

beginner question on investigating on samples in Weka

阅读更多关于 beginner question on investigating on samples in Weka

问题 I've just used Weka to train my SVM classifier under "Classify" tag. Now I want to further investigate which data samples are mis-classified,I need to study their pattern,but I don't know where to look at this from Weka. Could anyone give me some help please? Thanks in advance. 回答1: You can enable the option from: You will get the following instance predictions: === Predictions on test split === inst# actual predicted error prediction 1 2:Iris-ver 2:Iris-ver 0.667 ... 16 3:Iris-vir 2:Iris-ver

Can rapidminer extract xpaths from a list of URLS, instead of first saving the HTML pages?

阅读更多关于 Can rapidminer extract xpaths from a list of URLS, instead of first saving the HTML pages?

问题 I've recently discovered RapidMiner, and I'm very excited about it's capabilities. However I'm still unsure if the program can help me with my specific needs. I want the program to scrape xpath matches from an URL list I've generated with another program. (it has more options then the 'crawl web' operator in RapidMiner) I've seen the following tutorials from Neil Mcguigan: http://vancouverdata.blogspot.com/2011/04/web-scraping-rapidminer-xpath-web.html. But the websites I try to scrape have