问题
I'm due to take up a project which is into data mining. Before I jump in I wanted to probe around for different data mining tools (preferably open source) which allows web based reporting. In my scenario the data would be provided to me, so I'm not supposed to crawl for it.
In a nutshell, I am looking for a tool which does - Data Analysis, Web based Reporting, provides some kind of a dashboard and mining features.
I have worked on the Microsoft Analysis Services and BOXI and off late I have been looking at Pentaho, which seems to be a good option.
Please share your experiences on any such tool which you know of.
cheers
回答1:
I believe WEKA is the best open source DM software out there.
Check it: http://www.cs.waikato.ac.nz/ml/weka/
回答2:
Weka is great, but you might want to try the Orange Data Mining toolkit instead.
http://www.ailab.si/orange/
Edit: And as of November 2010, I must say I really like KNIME.
回答3:
R has a lot of excellent packages related to data mining. In particular, look at:
- The machine learning view on CRAN.
- The natural language processing view on CRAN.
It also ties into Weka (see the RWeka package). And it can be integrated with either .Net (through COM) or Python (through RPy or RPy2).
I would agree regarding Pentaho for a reporting platform, although it's a very large project depending upon what you're using it for.
回答4:
You should also check out Apache Mahout . It can be quite useful for some large-scale machine learning tasks such as user clustering.
回答5:
RapidMiner is my preferred data mining tool.
回答6:
I would try with the new google tools.
-first you need to get the api id for the google-storage, which is where you are going to store and manipulate the data you are going to analyze.
-Then you need to get the api id for google-prediction-api (http://code.google.com/apis/predict/docs/getting-started.html), which for what I saw it is a fantastic outsourced data mining processor. The Prediction API allows you to get more from your data and makes its patterns more accessible. Besides using traditional numeric and nominal data you can also use text data that thanks to this api can be utilized for exampled to categorize emails by language.
-Finally you can use bigQuery that will allow you to perform Ad-hoc analysis, Standardized reporting, Data exploration App prototyping (http://code.google.com/apis/bigquery/)
回答7:
KEEL (http://keel.es) is written in Java and is good for using evolutionary computation for data mining.
回答8:
Have a look at list of Open Source software's for Machine learning maintained by JMLR. you can find it here:
http://mloss.org/software/
http://jmlr.csail.mit.edu/mloss/
They represent State of Art!
My issue with Weka is that a number of algorithms in it are outdated.
回答9:
i believe RapidMiner is an excellent tool that should be added to this list.
回答10:
WEKA (Already mentioned), Orange (http://orange.biolab.si/), Tanagra (http://data-mining-tutorials.blogspot.com) you can find good tutorials there.
Are very good tools for data mining.
回答11:
You could check my software, the SPMF data mining framework.
It is an open-source Java software that offers more than 70 algorithms for:
- frequent itemset mining,
- association rule mining,
- sequential pattern mining
- sequential rule mining.
- and more..
回答12:
Pentaho is a very professional solution. Definitely a very good choice.
回答13:
You can look at Data Mining SDK and its blog.
回答14:
A list of some open source data mining tools are listed here: http://dataminingtools.net/browse.php
回答15:
Eclipse BIRT http://www.eclipse.org/birt/phoenix/project/description.php
回答16:
I believe KNIME deserves to join this list as well.
回答17:
Weka is strong for classification and /machine learning/. To many, this is considered to be more a part of artificial intelligence than of actual data-mining. RapidMiner is largely along the same lines, but with a much nicer UI. Pentaho is the professional support for Weka AFAICT.
You might want to have a look at ELKI, http://elki.dbs.ifi.lmu.de/ which is a comparable project that focuses on clustering algorithms and outlier detections, two other key tasks of data-mining.
回答18:
you can take a look at data mining tool, weka
Here is a link to a collection of tutorials and videos on WEKA Tutorials:http://www.dataminingtools.net/browsetutorials.php?tag=weka
Videos: http://www.dataminingtools.net/videos.php?id=6
回答19:
Along with the tools, i would strongly suggest learning Python and R. These languages help a lot during analysis. Also, large datasets can be 'custom-analysed'. You might also create your own custom dashboard using Javascript(check out the numerous charting and visualization libraries)
回答20:
I am a python-er myself and I have to say:
Yes! All of that can be done in Python.
I last played around with Beautiful Soup[0]. It's a really simple to use module that lets you grab/mine data from html and xml (excellent for 'screen scraping').
If you dont know python, .... well It's really easy to learn.
[0]http://www.crummy.com/software/BeautifulSoup/
来源:https://stackoverflow.com/questions/835754/data-mining-open-source-tools