data-analysis | 易学教程

Plot pandas dataframe containing NaNs

阅读更多关于 Plot pandas dataframe containing NaNs

问题 I have GPS data of ice speed from three different GPS receivers. The data are in a pandas dataframe with an index of julian day (incremental from the start of 2009). This is a subset of the data (the main dataset is 3487235 rows...): R2 R7 R8 1235.000000 116.321959 100.805197 96.519977 1235.000116 NaN 100.771133 96.234957 1235.000231 NaN 100.584559 97.249262 1235.000347 118.823610 100.169055 96.777833 1235.000463 NaN 99.753551 96.598350 1235.000579 NaN 99.338048 95.283989 1235.000694 113

What does selecting the largest eigenvalues and eigenvectors in the covariance matrix mean in data analysis?

阅读更多关于 What does selecting the largest eigenvalues and eigenvectors in the covariance matrix mean in data analysis?

问题 Suppose there is a matrix B , where its size is a 500*1000 double (Here, 500 represents the number of observations and 1000 represents the number of features). sigma is the covariance matrix of B , and D is a diagonal matrix whose diagonal elements are the eigenvalues of sigma . Assume A is the eigenvectors of the covariance matrix sigma . I have the following questions: I need to select the first k = 800 eigenvectors corresponding to the eigenvalues with the largest magnitude to rank the

Python: pandas merge multiple dataframes

阅读更多关于 Python: pandas merge multiple dataframes

问题 I have diferent dataframes and need to merge them together based on the date column. If I only had two dataframes, I could use df1.merge(df2, on='date') , to do it with three dataframes, I use df1.merge(df2.merge(df3, on='date'), on='date') , however it becomes really complex and unreadable to do it with multiple dataframes. All dataframes have one column in common - date , but they don't have the same number of rows nor columns and I only need those rows in which each date is common to every

Fitting polynomial model to data in R

阅读更多关于 Fitting polynomial model to data in R

问题 I've read the answers to this question and they are quite helpful, but I need help particularly in R. I have an example data set in R as follows: x <- c(32,64,96,118,126,144,152.5,158) y <- c(99.5,104.8,108.5,100,86,64,35.3,15) I want to fit a model to these data so that y = f(x) . I want it to be a 3rd order polynomial model. How can I do that in R? Additionally, can R help me to find the best fitting model? 回答1: To get a third order polynomial in x (x^3), you can do lm(y ~ x + I(x^2) + I(x

Assigning Values to the neighbors same value in MATLAB

阅读更多关于 Assigning Values to the neighbors same value in MATLAB

问题 I am having a small issue but I am clueless where I am at fault. Can someone please guide me the right way? Thanks in advance. What I have done. My codes finds local maxima’s. Bring down from local maxima to a certain point. Assign the neighbors that are greater than the downsized value, the value of downsized point. Small Example X = [1 0 1 4.3 4.5 5 4.3 4.2 0 0 0 2 6.2 6.3 7 6.2 7.4 8 7.2 1 2 3 4 2]; Local maxima’s are 5, 7, 8, and 4 Go down to certain point. Like 4, 6, 7, 3. Assign

In pandas can you aggregate by mean and round that mean to the nearest int?

阅读更多关于 In pandas can you aggregate by mean and round that mean to the nearest int?

问题 So I have 169 columns which have been treated to leave 1=for yes and 0= for no, now I need to aggregate the 2 million rows by mean, and the round that results to the nearest int, how could I get that? The image is just showing that the values per column are either 0 or 1 回答1: If data is your dataframe, you can get the mean of all the columns as integers simply with: data.mean().astype(int) # Truncates mean to integer, e.g. 1.95 = 1 or, as of version 0.17.0 : data.mean().round(0) # Rounds mean

searching if anyone of word is present in the another column of a dataframe or in another data frame using python

阅读更多关于 searching if anyone of word is present in the another column of a dataframe or in another data frame using python

问题 Hi I have two DataFrames like below DF1 Alpha | Numeric | Special and | 1 | @ or | 2 | $ | 3 | & | 4 | | 5 | and DF2 with single column Content | boy or girl | school @ morn| I want to search if anyone of the column in DF1 has anyone of the keyword in content column of DF2 and the output should be in a new DF output_DF output_column| Alpha | Special | someone help me with this 回答1: I have a method that is not very good. df1 = pd.DataFrame([[['and', 'or'],['1', '2','3','4','5'],['@', '$','&']]

How does unicodecsv.DictReader represent a csv file

阅读更多关于 How does unicodecsv.DictReader represent a csv file

问题 I'm currently going through the Udacity course on data analysis in python, and we've been using the unicodecsv library. More specifically we've written the following code which reads a csv file and converts it into a list. Here is the code: def read_csv(filename): with open(filename,'rb')as f: reader = unicodecsv.DictReader(f) return list(reader) In order to get my head around this, I'm trying to figure out how the data is represented in the dictionary and the list, and I'm very confused. Can

How to plot dates as dates (not numbers or character) on x axis of ggplot?

阅读更多关于 How to plot dates as dates (not numbers or character) on x axis of ggplot?

问题 I have a huge data set containing bacteria samples (4 types of bacteria) from 10 water resources from 2010 until 2019. some values are missing so we need to not include them in the plot or analysis. I want to plot a time series for each type of bacteria for each resource for all years. What is the best way to do that? library("ggplot2") BactData= read.csv('Råvannsdata_Bergen_2010_2018a.csv', sep='\t',header=TRUE) summary(BactData,na.rm = TRUE) df$Date = as.Date( df$Date, '%d/%m/%Y') #require

Pandas Pivot table without aggregating

阅读更多关于 Pandas Pivot table without aggregating

问题 I have a dataframe df as: Acct_Id Acct_Nm Srvc_Id Phone_Nm Phone_plan_value Srvc_Num 51 Roger 789 Pixel 30 1 51 Roger 800 iPhone 25 2 51 Roger 945 Galaxy 40 3 78 Anjay 100 Nokia 50 1 78 Anjay 120 Oppo 30 2 32 Rafa 456 HTC 35 1 I want to transform the dataframe so I can have 1 row per Acct_Id and Acct_Nm as: Acct_Id Acct_Nm Srvc_Num_1 Srvc_Num_2 Srvc_Num_3 Srvc_Id Phone_Nm Phone_plan_value Srvc_Id Phone_Nm Phone_plan_value Srvc_Id Phone_Nm Phone_plan_value 51 Roger 789 Pixel 30 800 iPhone 25