data-mining

Newbie: where to start given a problem to predict future success or not

好久不见. 提交于 2019-12-06 13:44:43
问题 We have had a production web based product that allows users to make predictions about the future value (or demand) of goods, the historical data contains about 100k examples, each example has about 5 parameters; Consider a class of data called a prediciton: prediction { id: int predictor: int predictionDate: date predictedProductId: int predictedDirection: byte (0 for decrease, 1 for increase) valueAtPrediciton: float } and a paired result class that measures the result of the prediction:

retrieve information from a url

ⅰ亾dé卋堺 提交于 2019-12-06 12:52:13
I want to make a program that will retrieve some information a url. For example i give the url below, from librarything How can i retrieve all the words below the "TAGS" tab, like Black Library fantasy Thanquol & Boneripper Thanquol and Bone Ripper Warhammer ? I am thinking of using java, and design a data mining wrapper, but i am not sure how to start. Can anyone give me some advice? EDIT: You gave me excellent help, but I want to ask something else. For every tag we can see how many times each tag has been used, when we press the "number" button. How can I retrieve that number also? You

How to score a linear model using PMML file and Augustus on Python

a 夏天 提交于 2019-12-06 09:47:35
问题 I am new to python,PMML and augustus,so this question kind of newbie.I have a PMML file from which i want to score after every new iteration of data. I have to use Python with Augustus only to complete this excercise. I have read various articles some of them worth mentioning as they are good. (http://augustusdocs.appspot.com/docs/v06/model_abstraction/augustus_and_pmml.html , http://augustus.googlecode.com/svn-history/r191/trunk/augustus/modellib/regression/producer/Producer.py) I have read

Python multinomial logit with statsmodels module: Change base value of mlogit regression

一个人想着一个人 提交于 2019-12-06 09:17:32
问题 I have a little problem which I am stuck with. I am building a multinomial logit model with Python statsmodels and wish to reproduce an example given in a textbook. So far so good, but I am struggling with setting a different target value as the base value for the regression. Can somebody help?! import numpy as np import pandas as pd import statsmodels.api as sm import matplotlib.pyplot as plt #import data df = pd.read_excel('C:/.../diabetes.xlsx') #split the data in dependent and independent

how to get all terminal nodes - weight & response prediction 'ctree' in r

这一生的挚爱 提交于 2019-12-06 08:28:51
问题 Here's what I can use to list weight for all terminal nodes : but how can I add some code to get response prediction as well as weight by each terminal node ID : say I want my output to look like this -- Here below is what I have so far to get the weight nodes(airct, unique(where(airct))) Thank you 回答1: The Binary tree is a big S4 object, so sometimes it is difficult to extract the data. But the plot method for BinaryTree object, hase an optional panel function of the form function(node)

Algorithm for clustering people with similar interests

六眼飞鱼酱① 提交于 2019-12-06 06:50:21
问题 I want to cluster people into groups based on their interests. For eg. people who like machine learning and graphs may be placed in a group and people who have interest in mathematics and economics etc. may be placed in a different group. The algorithm should be able to decide which people have most matching interests based on the interests of the people and create clusters.It should also be able to output about other persons in the group in which a particular person is placed. 回答1: This does

Web mining -classification algorithms

给你一囗甜甜゛ 提交于 2019-12-06 06:37:58
问题 my senior project is determining the dominant category of a web page.I crawled dmoz. now i am trying to build arff. After that i will use some feature extraction methods and classification algorithms. Do you know which feature extraction method performs good with any classification algorithm for web mining? 回答1: uClassify uses Bayesian Networks and claims to be able to categorize web pages. uClassify is a free web service where you can easily create your own text classifiers. Examples: Spam

how to write output from rapidminer to a txt file?

流过昼夜 提交于 2019-12-06 05:38:26
i am using rapidminer 5.3.I took a small document which contains around three english sentences , tokenized it and filtered it with respect to the length of words.i want to write the output into a different word document.i tried using Write document utility but it is not working,it is simply writing the same original document into the new one.However when i write the output to the console,it gives me the expected answer.Something wrong with the write document utility. Here is my process READ DOCUMENT --> TOKENIZE --> FILTER TOKENS --> WRITE DOCUMENT Try the following Cut Document (with (\S+)

Principal Component Analysis on Weka

笑着哭i 提交于 2019-12-06 04:55:36
问题 I have just computed PCA on a training set and Weka returned me the new attributes with the way in which they were selected and computed. Now, I want to build a model using these data and then use the model on a test set. Do you know if there is a way to automatically modify the test set according to the new type of attributes? 回答1: Do you need the principal components for analysis or just to feed into the classifier? If not just use the Meta->FilteredClassifier classifier. Set the filter to