data-analysis

package “fdapace” (R) - create a functional plot of the first principal component

拟墨画扇 提交于 2019-12-13 03:46:22
问题 My question is about functional principal component analysis in R. I am working with a multi-dimensional time series looking something like this: My goal is to reduce the dimensions by applying functional PCA and then plot the first principal component like this: I have already used the FPCA function of the fdapace package on the dataset. Unfortunately, I don't understand how to interpret the resulting matrix of the FPCA estimates ( xiEst ). In my understanding the values of the Principal

Column names shift to left on read.table or read.csv

故事扮演 提交于 2019-12-13 02:36:04
问题 Originally I have this TSV file (sample): name type qty cxfm 1C 0 d2 H50 2 g3g 1G 2 hb E37 1 nlx E45 4 so I am using read.csv to read data from a .tsv file but I always get this output: name type qty 1 cxfm 1C 0 2 d2 H50 2 3 g3g 1G 2 4 hb E37 1 5 nlx E45 4 instead of getting this one: name type qty 1 cxfm 1C 0 2 d2 H50 2 3 g3g 1G 2 4 hb E37 1 5 nlx E45 4 Any ideas this? this is what I am using to read the files: file_list<-list.files() for (file in file_list){ if (!exists("dataset")){ dataset

how to split values in a datacolumn and adding it to a new column with a condition in pandas

╄→гoц情女王★ 提交于 2019-12-12 21:04:42
问题 I have a df, name Value Sri is a cricketer Sri,is Ram player Ram Ravi is a singer is cricket and foot is ball and,is,foot and a list, my_list=["is", "foot"] I am trying to split df["value"] by (,) and adding the value to a new column if the value exists in my_list. My expected output is name Value my_list Sri is a cricketer Sri is Ram player Ram Ravi is a singer is cricket and foot is ball and is,foot please help to achieve this, thanks in advance 回答1: Use str.findall with str.join: my_list=[

Enriching DataStream using static DataSet in Flink streaming

ⅰ亾dé卋堺 提交于 2019-12-12 19:06:08
问题 I am writing a Flink streaming program in which I need to enrich a DataStream of user events using some static data set (information base, IB). For E.g. Let's say we have a static data set of buyers and we have an incoming clickstream of events, for each event we want to add a boolean flag indicating whether the doer of the event is a buyer or not. An ideal way to achieve this would be to partition the incoming stream by user id, have the buyers set available in a DataSet partitioned again by

Sorting datetime objects by hour to a Pandas dataframe, then visualize to histogram with Matplotlib

南笙酒味 提交于 2019-12-12 12:30:42
问题 I need to sort viewers by hour to a histogram. I have some experience using Matplotlib to do that, but I can't find out what is the most pragmatic way to sort the dates by hour. First I read the data from a JSON file, then store the two relevant datatypes in a pandas Dataframe, like this: data = pd.read_json('data/data.json') session_duration = pd.to_datetime(data.session_duration, unit='s').dt.time time = pd.to_datetime(data.time, format='%H:%M:%S').dt.time viewers = [] for x, y in zip(time,

comparing two Dataframe columns to check if they have same value in python

别等时光非礼了梦想. 提交于 2019-12-12 06:51:14
问题 I have two dataframes, new1. Name city 0 sri won chn 1 pechi won pune 2 Ram won mum 0 pec won kerala new3 req 0 pec 1 mut I tried, mask=new1.Name.str.contains("|".join(new3.req.values.tolist())) new1[mask] I am getting, new1[mask] Name city 1 pechi won pune 0 pec won kerala As "pechi" contains "pec", it took this valu. but I want the exact match between the values not "contains" my desired output is, new1[mask] Name city 0 pec won kerala 回答1: You need \b that means "word boundary": a = r'\b('

Phonetic filter factory for Hindi

a 夏天 提交于 2019-12-12 05:51:44
问题 I am working with Apache solr ,I am trying to use phonetic filter factory , I have tried all the encoders that are available with solr.PhoneticFilterFactory but none of them is supporting indian languages . Is there any other Filter/Method available so that i can get phonetic representation for indian languages e.g Hindi,tamil,Bengali etc If not then how we can modify existing filters to support these languages. 回答1: Have you tried the new Beider Morse Filter Factory, which was just added in

How can I translate a script I wrote in SQL server to compute the odds ratios to Postgresql?

╄→尐↘猪︶ㄣ 提交于 2019-12-12 04:48:21
问题 In SQL server, I wrote the following script to calculate the odds ratios based the probabilities of my test group divided by my control group. The script is as follows: --Compute the odds ratios from the model select a.column1, a.uvs as testuvs. b.uvs as controluvs , [odds]=case when b.uvs>0 then a.puvs/b.puvs else Null end into unique_visitor_odds from control_probabilties b inner join test_probabilities a on a.column1=b.column2 where a.uvs>24 and b.uvs>24 order by [odds] desc I am not sure

Merge DataFrames and discard duplicates values

怎甘沉沦 提交于 2019-12-12 04:16:27
问题 I'm collecting time-indexed data coming from various files, but sometimes there is some overlapping: df1 = pd.DataFrame([1, -1, -3], columns=['A'], index=pd.date_range('2000-01-01', periods=3)) df2 = pd.DataFrame([-3, 10, 1], columns=['A'], index=pd.date_range('2000-01-03', periods=3)) pd.concat([df1, df2]) A 2000-01-01 1 2000-01-02 -1 2000-01-03 -3 A 2000-01-03 -3 2000-01-04 10 2000-01-05 1 A 2000-01-01 1 2000-01-02 -1 2000-01-03 -3 2000-01-03 -3 2000-01-04 10 2000-01-05 1 1) How to clean

Error while creating a Timeseries plot in R: Error in plot.window(xlim, ylim, log, …) : need finite 'ylim' values

空扰寡人 提交于 2019-12-12 03:39:20
问题 Here's a sample of my single column data set: Lines 141,523 146,785 143,667 65,560 88,524 148,422 I read this file as a .csv file, convert it into a ts object and then plot it: ##Read the actual number of lines CSV file Aclines <- read.csv(file.choose(), header=T, stringsAsFactors = F) Aclinests <- ts(Aclines[,1], start = c(2013), end = c(2015), frequency = 52) plot(Aclinests, ylab = "Actual_Lines", xlab = "Time", col = "red") I get the following error message: Error in plot.window(xlim, ylim