data-analysis

Plot pandas dataframe containing NaNs

喜欢而已 提交于 2019-12-18 03:56:03
问题 I have GPS data of ice speed from three different GPS receivers. The data are in a pandas dataframe with an index of julian day (incremental from the start of 2009). This is a subset of the data (the main dataset is 3487235 rows...): R2 R7 R8 1235.000000 116.321959 100.805197 96.519977 1235.000116 NaN 100.771133 96.234957 1235.000231 NaN 100.584559 97.249262 1235.000347 118.823610 100.169055 96.777833 1235.000463 NaN 99.753551 96.598350 1235.000579 NaN 99.338048 95.283989 1235.000694 113

What does selecting the largest eigenvalues and eigenvectors in the covariance matrix mean in data analysis?

那年仲夏 提交于 2019-12-17 18:33:40
问题 Suppose there is a matrix B , where its size is a 500*1000 double (Here, 500 represents the number of observations and 1000 represents the number of features). sigma is the covariance matrix of B , and D is a diagonal matrix whose diagonal elements are the eigenvalues of sigma . Assume A is the eigenvectors of the covariance matrix sigma . I have the following questions: I need to select the first k = 800 eigenvectors corresponding to the eigenvalues with the largest magnitude to rank the

Python: pandas merge multiple dataframes

老子叫甜甜 提交于 2019-12-17 04:52:24
问题 I have diferent dataframes and need to merge them together based on the date column. If I only had two dataframes, I could use df1.merge(df2, on='date') , to do it with three dataframes, I use df1.merge(df2.merge(df3, on='date'), on='date') , however it becomes really complex and unreadable to do it with multiple dataframes. All dataframes have one column in common - date , but they don't have the same number of rows nor columns and I only need those rows in which each date is common to every

Fitting polynomial model to data in R

血红的双手。 提交于 2019-12-17 04:40:05
问题 I've read the answers to this question and they are quite helpful, but I need help particularly in R. I have an example data set in R as follows: x <- c(32,64,96,118,126,144,152.5,158) y <- c(99.5,104.8,108.5,100,86,64,35.3,15) I want to fit a model to these data so that y = f(x) . I want it to be a 3rd order polynomial model. How can I do that in R? Additionally, can R help me to find the best fitting model? 回答1: To get a third order polynomial in x (x^3), you can do lm(y ~ x + I(x^2) + I(x

Assigning Values to the neighbors same value in MATLAB

断了今生、忘了曾经 提交于 2019-12-13 20:17:08
问题 I am having a small issue but I am clueless where I am at fault. Can someone please guide me the right way? Thanks in advance. What I have done. My codes finds local maxima’s. Bring down from local maxima to a certain point. Assign the neighbors that are greater than the downsized value, the value of downsized point. Small Example X = [1 0 1 4.3 4.5 5 4.3 4.2 0 0 0 2 6.2 6.3 7 6.2 7.4 8 7.2 1 2 3 4 2]; Local maxima’s are 5, 7, 8, and 4 Go down to certain point. Like 4, 6, 7, 3. Assign

In pandas can you aggregate by mean and round that mean to the nearest int?

蹲街弑〆低调 提交于 2019-12-13 13:51:47
问题 So I have 169 columns which have been treated to leave 1=for yes and 0= for no, now I need to aggregate the 2 million rows by mean, and the round that results to the nearest int, how could I get that? The image is just showing that the values per column are either 0 or 1 回答1: If data is your dataframe, you can get the mean of all the columns as integers simply with: data.mean().astype(int) # Truncates mean to integer, e.g. 1.95 = 1 or, as of version 0.17.0 : data.mean().round(0) # Rounds mean

searching if anyone of word is present in the another column of a dataframe or in another data frame using python

做~自己de王妃 提交于 2019-12-13 09:42:31
问题 Hi I have two DataFrames like below DF1 Alpha | Numeric | Special and | 1 | @ or | 2 | $ | 3 | & | 4 | | 5 | and DF2 with single column Content | boy or girl | school @ morn| I want to search if anyone of the column in DF1 has anyone of the keyword in content column of DF2 and the output should be in a new DF output_DF output_column| Alpha | Special | someone help me with this 回答1: I have a method that is not very good. df1 = pd.DataFrame([[['and', 'or'],['1', '2','3','4','5'],['@', '$','&']]

How does unicodecsv.DictReader represent a csv file

我只是一个虾纸丫 提交于 2019-12-13 07:41:35
问题 I'm currently going through the Udacity course on data analysis in python, and we've been using the unicodecsv library. More specifically we've written the following code which reads a csv file and converts it into a list. Here is the code: def read_csv(filename): with open(filename,'rb')as f: reader = unicodecsv.DictReader(f) return list(reader) In order to get my head around this, I'm trying to figure out how the data is represented in the dictionary and the list, and I'm very confused. Can

How to plot dates as dates (not numbers or character) on x axis of ggplot?

醉酒当歌 提交于 2019-12-13 04:35:05
问题 I have a huge data set containing bacteria samples (4 types of bacteria) from 10 water resources from 2010 until 2019. some values are missing so we need to not include them in the plot or analysis. I want to plot a time series for each type of bacteria for each resource for all years. What is the best way to do that? library("ggplot2") BactData= read.csv('Råvannsdata_Bergen_2010_2018a.csv', sep='\t',header=TRUE) summary(BactData,na.rm = TRUE) df$Date = as.Date( df$Date, '%d/%m/%Y') #require

Pandas Pivot table without aggregating

可紊 提交于 2019-12-13 04:03:24
问题 I have a dataframe df as: Acct_Id Acct_Nm Srvc_Id Phone_Nm Phone_plan_value Srvc_Num 51 Roger 789 Pixel 30 1 51 Roger 800 iPhone 25 2 51 Roger 945 Galaxy 40 3 78 Anjay 100 Nokia 50 1 78 Anjay 120 Oppo 30 2 32 Rafa 456 HTC 35 1 I want to transform the dataframe so I can have 1 row per Acct_Id and Acct_Nm as: Acct_Id Acct_Nm Srvc_Num_1 Srvc_Num_2 Srvc_Num_3 Srvc_Id Phone_Nm Phone_plan_value Srvc_Id Phone_Nm Phone_plan_value Srvc_Id Phone_Nm Phone_plan_value 51 Roger 789 Pixel 30 800 iPhone 25