data-analysis

how to append two or more dataframes in pandas and do some analysis

試著忘記壹切 提交于 2019-12-20 03:33:15
问题 I have 3 df's: df1=pd.DataFrame({"Name":["one","two","three"],"value":[4,5,6]}) df2=pd.DataFrame({"Name":["four","one","three"],"value":[8,6,2]}) df3=pd.DataFrame({"Name":["one","four","six"],"value":[1,1,1]}) I can append one by one but I want to append all the three data frames at a time and do some analysis. I am trying to count the name contains in how many data frame divided by total dataframes name present in dataframes/total dataframes My desired output is, Name value Count one 11 1

Pandas error with basemap/proj for map plotting

a 夏天 提交于 2019-12-20 02:57:11
问题 I ran the Python code below that is an example of "Plotting Maps: Visualizing Haiti Earthquake Crisis Data" on a book, Python for Data Analysis . Page 242-246 The code is supposed to create a plot map of Haiti but I got an error as below: Traceback (most recent call last): File "Haiti.py", line 74, in <module> x, y = m(cat_data.LONGITUDE, cat_data.LATITUDE) File "/usr/local/lib/python2.7/site-packages/mpl_toolkits/basemap/__init__.py", line 1148, in __call__ xout,yout = self.projtran(x,y

join two tables without losing relevant values

十年热恋 提交于 2019-12-19 11:49:05
问题 I have two tables representing a database for customer products and its competitors' products: tmp_match - from_product_id and to_product_id representing matches between customer product and competitor product respectively. tmp_price_history - shows the price of each product per date. I am trying to write a query which will list all dates from table tmp_price_history . For each date I want to see customer product price vs competitor product price according to product matches pairs in table

Missing Value in Data Analysis

人走茶凉 提交于 2019-12-18 17:28:26
问题 I have a data set in which the variable GENDER containing two levels Male(M) and Female(F) has lot of Missing values . How do i deal with missing value? What are the different methods to handle these missing values. Any help would be appreciated. 回答1: There are several techniques in order to estimate a missing value. I've been writing a paper for a project at Uni regarding such methods. I will briefly explain 5 commonly used missing data imputation techniques. Hereinafter we will consider a

Missing Value in Data Analysis

ε祈祈猫儿з 提交于 2019-12-18 17:27:22
问题 I have a data set in which the variable GENDER containing two levels Male(M) and Female(F) has lot of Missing values . How do i deal with missing value? What are the different methods to handle these missing values. Any help would be appreciated. 回答1: There are several techniques in order to estimate a missing value. I've been writing a paper for a project at Uni regarding such methods. I will briefly explain 5 commonly used missing data imputation techniques. Hereinafter we will consider a

Matplotlib: Formatting dates on the x-axis in a 3D Bar graph

烂漫一生 提交于 2019-12-18 13:08:31
问题 Given this 3D bar graph sample code, how would you convert the numerical data in the x-axis to formatted date/time strings? I've attempted using the ax.xaxis_date() function without success. I also tried using plot_date(), which doesn't appear to work for 3D bar graphs. Here is a modified version of the sample code to illustrate what I am trying to do: from mpl_toolkits.mplot3d import Axes3D import matplotlib.pyplot as plt import numpy as np import matplotlib.dates as dates dates = [dates

R: Cross validation on a dataset with factors

三世轮回 提交于 2019-12-18 11:56:09
问题 Often, I want to run a cross validation on a dataset which contains some factor variables and after running for a while, the cross validation routine fails with the error: factor x has new levels Y . For example, using package boot: library(boot) d <- data.frame(x=c('A', 'A', 'B', 'B', 'C', 'C'), y=c(1, 2, 3, 4, 5, 6)) m <- glm(y ~ x, data=d) m.cv <- cv.glm(d, m, K=2) # Sometimes succeeds m.cv <- cv.glm(d, m, K=2) # Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =

how to match a word in a datacolumn with a list of values and applying ignorecase in pandas in python

╄→尐↘猪︶ㄣ 提交于 2019-12-18 09:40:31
问题 I have a df, Name Ram is one of the key ram Kumar is playing cricket Ravi is playing and ravi is a good player and a list my_list=["Ram","ravi"] and my desired dataframe is, desired_df, Name Match Count Ram is one of the key ram Ram 1 Kumar is playing cricket Ravi is playing and ravi is a good player ravi 1 I tried extracted = df.str.findall('(' + '|'.join(my_list) + ')', flags=re.IGNORECASE).apply(set) but I am getting like, Match Ram,ram Ravi,ravi but I cannot achieve my desired output,

How to merge two large numpy arrays if slicing doesn't resolve memory error?

谁说胖子不能爱 提交于 2019-12-18 04:27:07
问题 I have two numpy arrays container1 and container2 where container1.shape = (900,4000) and container2.shape = (5000,4000) . Merging them using vstack results in a MemoryError . After searching through the old questions posted here, I tried to merge them using slicing like this: mergedContainer = numpy.vstack((container1, container2[:1000])) mergedContainer = numpy.vstack((mergedContainer, container[1000:2500])) mergedContainer = numpy.vstack((mergedContainer, container[2500:3000])) but after

Plot pandas dataframe containing NaNs

走远了吗. 提交于 2019-12-18 03:56:10
问题 I have GPS data of ice speed from three different GPS receivers. The data are in a pandas dataframe with an index of julian day (incremental from the start of 2009). This is a subset of the data (the main dataset is 3487235 rows...): R2 R7 R8 1235.000000 116.321959 100.805197 96.519977 1235.000116 NaN 100.771133 96.234957 1235.000231 NaN 100.584559 97.249262 1235.000347 118.823610 100.169055 96.777833 1235.000463 NaN 99.753551 96.598350 1235.000579 NaN 99.338048 95.283989 1235.000694 113