regression

setting values for ntree and mtry for random forest regression model

耗尽温柔 提交于 2019-12-29 10:28:32
问题 I'm using R package randomForest to do a regression on some biological data. My training data size is 38772 X 201 . I just wondered---what would be a good value for the number of trees ntree and the number of variable per level mtry ? Is there an approximate formula to find such parameter values? Each row in my input data is a 200 character representing the amino acid sequence, and I want to build a regression model to use such sequence in order to predict the distances between the proteins.

Add Regression Plane to 3d Scatter Plot in Plotly

拜拜、爱过 提交于 2019-12-29 04:45:06
问题 I am looking to take advantage of the awesome features in Plotly but I am having a hard time figuring out how to add a regression plane to a 3d scatter plot. Here is an example of how to get started with the 3d plot, does anyone know how to take it the next step and add the plane? library(plotly) data(iris) iris_plot <- plot_ly(my_df, x = Sepal.Length, y = Sepal.Width, z = Petal.Length, type = "scatter3d", mode = "markers") petal_lm <- lm(Petal.Length ~ 0 + Sepal.Length + Sepal.Width, data =

PCA first or normalization first?

只谈情不闲聊 提交于 2019-12-29 03:36:08
问题 When doing regression or classification, what is the correct (or better) way to preprocess the data? Normalize the data -> PCA -> training PCA -> normalize PCA output -> training Normalize the data -> PCA -> normalize PCA output -> training Which of the above is more correct, or is the "standardized" way to preprocess the data? By "normalize" I mean either standardization, linear scaling or some other techniques. 回答1: You should normalize the data before doing PCA. For example, consider the

Python natural smoothing splines

北城以北 提交于 2019-12-28 11:46:18
问题 I am trying to find a python package that would give an option to fit natural smoothing splines with user selectable smoothing factor. Is there an implementation for that? If not, how would you use what is available to implement it yourself? By natural spline I mean that there should be a condition that the second derivative of the fitted function at the endpoints is zero (linear). By smoothing spline I mean that the spline should not be 'interpolating' (passing through all the datapoints). I

What does the capital letter “I” in R linear regression formula mean?

我与影子孤独终老i 提交于 2019-12-28 02:35:07
问题 I haven't been able to find an answer to this question, largely because googling anything with a standalone letter (like "I") causes issues. What does the "I" do in a model like this? data(rock) lm(area~I(peri - mean(peri)), data = rock) Considering that the following does NOT work: lm(area ~ (peri - mean(peri)), data = rock) and that this does work: rock$peri - mean(rock$peri) Any key words on how to research this myself would also be very helpful. 回答1: I isolates or insulates the contents

Machine learning for Java developers, Part 1: Algorithms for machine learning

大城市里の小女人 提交于 2019-12-27 12:32:36
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> Self-driving cars, face detection software, and voice controlled speakers all are built on machine learning technologies and frameworks--and these are just the first wave. Over the next decade, a new generation of products will transform our world, initiating new approaches to software development and the applications and products that we create and use. As a Java developer, you want to get ahead of this curve, especially because tech companies are beginning to seriously invest in machine learning. What you learn today, you can build on over the next five