data-analysis | 易学教程

reverse dataframe's rows' order with pandas [duplicate]

阅读更多关于 reverse dataframe's rows' order with pandas [duplicate]

This question already has answers here : Closed 8 months ago . Right way to reverse pandas.DataFrame? (2 answers) How can I reverse the order of the rows in my pandas.dataframe ? I've looked everywhere and the only thing people are talking about is sorting the columns , reversing the order of the columns ... What I want is simple : If my DataFrame looks like this: A B C ------------------ LOVE IS ALL THAT MAT TERS I want it to become this: A B C ------------------ THAT MAT TERS LOVE IS ALL I know I can iterate over my data in reverse order but that's not what I want. Check out http://pandas

What to do with missing values when plotting with seaborn?

阅读更多关于 What to do with missing values when plotting with seaborn?

I replaced the missing values with NaN using lambda following function: data = data.applymap(lambda x: np.nan if isinstance(x, basestring) and x.isspace() else x) ,where data is the dataframe I am working on. Using seaborn afterwards,I tried to plot one of its attributes,alcconsumption using seaborn.distplot as follows: seaborn.distplot(data['alcconsumption'],hist=True,bins=100) plt.xlabel('AlcoholConsumption') plt.ylabel('Frequency(normalized 0->1)') It's giving me the following error: AttributeError: max must be larger than min in range parameter. ZicoNuna You can use the following line to

python pandas: how to calculate derivative/gradient

阅读更多关于 python pandas: how to calculate derivative/gradient

问题 Given that I have the following two vectors: In [99]: time_index Out[99]: [1484942413, 1484942712, 1484943012, 1484943312, 1484943612, 1484943912, 1484944212, 1484944511, 1484944811, 1484945110] In [100]: bytes_in Out[100]: [1293981210388, 1293981379944, 1293981549960, 1293981720866, 1293981890968, 1293982062261, 1293982227492, 1293982391244, 1293982556526, 1293982722320] Where bytes_in is an incremental only counter, and time_index is a list to unix timestamps (epoch). Objective: What I

Non-linear regression models in PostgreSQL using R

阅读更多关于 Non-linear regression models in PostgreSQL using R

问题 Background I have climate data (temperature, precipitation, snow depth) for all of Canada between 1900 and 2009. I have written a basic website and the simplest page allows users to choose category and city. They then get back a very simple report (without the parameters and calculations section): The primary purpose of the web application is to provide a simple user interface so that the general public can explore the data in meaningful ways. (A list of numbers is not meaningful to the

How to find the closest word to a vector using word2vec

阅读更多关于 How to find the closest word to a vector using word2vec

问题 I have just started using Word2vec and I was wondering how can we find the closest word to a vector suppose. I have this vector which is the average vector for a set of vectors: array([-0.00449447, -0.00310097, 0.02421786, ...], dtype=float32) Is there a straight forward way to find the most similar word in my training data to this vector? Or the only solution is to calculate the cosine similarity between this vector and the vectors of each word in my training data, then select the closest

Python Pandas join dataframes on index

阅读更多关于 Python Pandas join dataframes on index

I am trying to join to dataframe on the same column "Date", the code is as follow: import pandas as pd from datetime import datetime df_train_csv = pd.read_csv('./train.csv',parse_dates=['Date'],index_col='Date') start = datetime(2010, 2, 5) end = datetime(2012, 10, 26) df_train_fly = pd.date_range(start, end, freq="W-FRI") df_train_fly = pd.DataFrame(pd.Series(df_train_fly), columns=['Date']) merged = df_train_csv.join(df_train_fly.set_index(['Date']), on = ['Date'], how = 'right', lsuffix='_x') It complains dataframe df_train_csv has no column named "Date". I'd like to set "Date" in both

Pandas compare each row with all rows in data frame and save results in list for each row

阅读更多关于 Pandas compare each row with all rows in data frame and save results in list for each row

I try compare each row with all rows in pandas DF through fuzzywuzzy.fuzzy.partial_ratio() >= 85 and write results in list for each row. in: df = pd.DataFrame( {'id':[1, 2, 3, 4, 5, 6], 'name':['dog', 'cat', 'mad cat', 'good dog', 'bad dog', 'chicken']}) use pandas function with fuzzywuzzy library get result: out: id name match_id_list 1 dog [4, 5] 2 cat [3, ] 3 mad cat [2, ] 4 good dog [1, 5] 5 bad dog [1, 4] 6 chicken [] But I don't understand how get this. The first step would be to find the indices that match the condition for a given name . Since partial_ratio only takes strings, we apply

How to plot two DataFrame on same graph for comparison

阅读更多关于 How to plot two DataFrame on same graph for comparison

I have two DataFrames (trail1 and trail2) with the following columns: Genre, City, and Number Sold. Now I want to create a bar graph of both data sets for a side by side comparison of Genre vs. total Number Sold. For each genre, I want to two bars: one representing trail 1 and the other representing trail 2. How can I achieve this using Pandas? I tried the following approach which did NOT work. gf1 = df1.groupby(['Genre']) gf2 = df2.groupby(['Genre']) gf1Plot = gf1.sum().unstack().plot(kind='bar, stacked=False) gf2Plot = gf2.sum().unstack().plot(kind='bar, ax=gf1Plot, stacked=False) I want to

How to capture raw signal from wireless router?

阅读更多关于 How to capture raw signal from wireless router?

I have seen several projects now which derive novel spatial information from radio data collected from a typical wireless router: http://wisee.cs.washington.edu/ http://www.extremetech.com/extreme/133936-using-wifi-to-see-through-walls The idea of using a wireless router as a sort of passive radar is fantastic. I am very interested in experimenting with data collected from a wireless router myself, but there is little information on how to go about actually interfacing with a wireless router and getting a raw stream of information collected by the device. Similar questions have been asked on

Non-linear regression models in PostgreSQL using R

阅读更多关于 Non-linear regression models in PostgreSQL using R

Background I have climate data (temperature, precipitation, snow depth) for all of Canada between 1900 and 2009. I have written a basic website and the simplest page allows users to choose category and city. They then get back a very simple report (without the parameters and calculations section): The primary purpose of the web application is to provide a simple user interface so that the general public can explore the data in meaningful ways. (A list of numbers is not meaningful to the general public, nor is a website that provides too many inputs .) The secondary purpose of the application