data-analysis | 易学教程

package “fdapace” (R) - create a functional plot of the first principal component

阅读更多关于 package “fdapace” (R) - create a functional plot of the first principal component

问题 My question is about functional principal component analysis in R. I am working with a multi-dimensional time series looking something like this: My goal is to reduce the dimensions by applying functional PCA and then plot the first principal component like this: I have already used the FPCA function of the fdapace package on the dataset. Unfortunately, I don't understand how to interpret the resulting matrix of the FPCA estimates ( xiEst ). In my understanding the values of the Principal

Column names shift to left on read.table or read.csv

阅读更多关于 Column names shift to left on read.table or read.csv

问题 Originally I have this TSV file (sample): name type qty cxfm 1C 0 d2 H50 2 g3g 1G 2 hb E37 1 nlx E45 4 so I am using read.csv to read data from a .tsv file but I always get this output: name type qty 1 cxfm 1C 0 2 d2 H50 2 3 g3g 1G 2 4 hb E37 1 5 nlx E45 4 instead of getting this one: name type qty 1 cxfm 1C 0 2 d2 H50 2 3 g3g 1G 2 4 hb E37 1 5 nlx E45 4 Any ideas this? this is what I am using to read the files: file_list<-list.files() for (file in file_list){ if (!exists("dataset")){ dataset

how to split values in a datacolumn and adding it to a new column with a condition in pandas

阅读更多关于 how to split values in a datacolumn and adding it to a new column with a condition in pandas

问题 I have a df, name Value Sri is a cricketer Sri,is Ram player Ram Ravi is a singer is cricket and foot is ball and,is,foot and a list, my_list=["is", "foot"] I am trying to split df["value"] by (,) and adding the value to a new column if the value exists in my_list. My expected output is name Value my_list Sri is a cricketer Sri is Ram player Ram Ravi is a singer is cricket and foot is ball and is,foot please help to achieve this, thanks in advance 回答1: Use str.findall with str.join: my_list=[

Enriching DataStream using static DataSet in Flink streaming

阅读更多关于 Enriching DataStream using static DataSet in Flink streaming

问题 I am writing a Flink streaming program in which I need to enrich a DataStream of user events using some static data set (information base, IB). For E.g. Let's say we have a static data set of buyers and we have an incoming clickstream of events, for each event we want to add a boolean flag indicating whether the doer of the event is a buyer or not. An ideal way to achieve this would be to partition the incoming stream by user id, have the buyers set available in a DataSet partitioned again by

Sorting datetime objects by hour to a Pandas dataframe, then visualize to histogram with Matplotlib

阅读更多关于 Sorting datetime objects by hour to a Pandas dataframe, then visualize to histogram with Matplotlib

问题 I need to sort viewers by hour to a histogram. I have some experience using Matplotlib to do that, but I can't find out what is the most pragmatic way to sort the dates by hour. First I read the data from a JSON file, then store the two relevant datatypes in a pandas Dataframe, like this: data = pd.read_json('data/data.json') session_duration = pd.to_datetime(data.session_duration, unit='s').dt.time time = pd.to_datetime(data.time, format='%H:%M:%S').dt.time viewers = [] for x, y in zip(time,

comparing two Dataframe columns to check if they have same value in python

阅读更多关于 comparing two Dataframe columns to check if they have same value in python

问题 I have two dataframes, new1. Name city 0 sri won chn 1 pechi won pune 2 Ram won mum 0 pec won kerala new3 req 0 pec 1 mut I tried, mask=new1.Name.str.contains("|".join(new3.req.values.tolist())) new1[mask] I am getting, new1[mask] Name city 1 pechi won pune 0 pec won kerala As "pechi" contains "pec", it took this valu. but I want the exact match between the values not "contains" my desired output is, new1[mask] Name city 0 pec won kerala 回答1: You need \b that means "word boundary": a = r'\b('

Phonetic filter factory for Hindi

阅读更多关于 Phonetic filter factory for Hindi

问题 I am working with Apache solr ,I am trying to use phonetic filter factory , I have tried all the encoders that are available with solr.PhoneticFilterFactory but none of them is supporting indian languages . Is there any other Filter/Method available so that i can get phonetic representation for indian languages e.g Hindi,tamil,Bengali etc If not then how we can modify existing filters to support these languages. 回答1: Have you tried the new Beider Morse Filter Factory, which was just added in

How can I translate a script I wrote in SQL server to compute the odds ratios to Postgresql?

阅读更多关于 How can I translate a script I wrote in SQL server to compute the odds ratios to Postgresql?

问题 In SQL server, I wrote the following script to calculate the odds ratios based the probabilities of my test group divided by my control group. The script is as follows: --Compute the odds ratios from the model select a.column1, a.uvs as testuvs. b.uvs as controluvs , [odds]=case when b.uvs>0 then a.puvs/b.puvs else Null end into unique_visitor_odds from control_probabilties b inner join test_probabilities a on a.column1=b.column2 where a.uvs>24 and b.uvs>24 order by [odds] desc I am not sure

Merge DataFrames and discard duplicates values

阅读更多关于 Merge DataFrames and discard duplicates values

问题 I'm collecting time-indexed data coming from various files, but sometimes there is some overlapping: df1 = pd.DataFrame([1, -1, -3], columns=['A'], index=pd.date_range('2000-01-01', periods=3)) df2 = pd.DataFrame([-3, 10, 1], columns=['A'], index=pd.date_range('2000-01-03', periods=3)) pd.concat([df1, df2]) A 2000-01-01 1 2000-01-02 -1 2000-01-03 -3 A 2000-01-03 -3 2000-01-04 10 2000-01-05 1 A 2000-01-01 1 2000-01-02 -1 2000-01-03 -3 2000-01-03 -3 2000-01-04 10 2000-01-05 1 1) How to clean

Error while creating a Timeseries plot in R: Error in plot.window(xlim, ylim, log, …) : need finite 'ylim' values

阅读更多关于 Error while creating a Timeseries plot in R: Error in plot.window(xlim, ylim, log, …) : need finite 'ylim' values

问题 Here's a sample of my single column data set: Lines 141,523 146,785 143,667 65,560 88,524 148,422 I read this file as a .csv file, convert it into a ts object and then plot it: ##Read the actual number of lines CSV file Aclines <- read.csv(file.choose(), header=T, stringsAsFactors = F) Aclinests <- ts(Aclines[,1], start = c(2013), end = c(2015), frequency = 52) plot(Aclinests, ylab = "Actual_Lines", xlab = "Time", col = "red") I get the following error message: Error in plot.window(xlim, ylim