pandas

How can I manage units in pandas data?

烈酒焚心 提交于 2021-02-06 07:03:08
问题 I'm trying to figure out if there is a good way to manage units in my pandas data. For example, I have a DataFrame that looks like this: length (m) width (m) thickness (cm) 0 1.2 3.4 5.6 1 7.8 9.0 1.2 2 3.4 5.6 7.8 Currently, the measurement units are encoded in column names. Downsides include: column selection is awkward -- df['width (m)'] vs. df['width'] things will likely break if the units of my source data change If I wanted to strip the units out of the column names, is there somewhere

How can I manage units in pandas data?

生来就可爱ヽ(ⅴ<●) 提交于 2021-02-06 07:01:31
问题 I'm trying to figure out if there is a good way to manage units in my pandas data. For example, I have a DataFrame that looks like this: length (m) width (m) thickness (cm) 0 1.2 3.4 5.6 1 7.8 9.0 1.2 2 3.4 5.6 7.8 Currently, the measurement units are encoded in column names. Downsides include: column selection is awkward -- df['width (m)'] vs. df['width'] things will likely break if the units of my source data change If I wanted to strip the units out of the column names, is there somewhere

Pandas to D3. Serializing dataframes to JSON

我与影子孤独终老i 提交于 2021-02-06 04:27:34
问题 I have a DataFrame with the following columns and no duplicates: ['region', 'type', 'name', 'value'] that can be seen as a hierarchy as follows grouped = df.groupby(['region','type', 'name']) I would like to serialize this hierarchy as a JSON object. If anyone is interested, the motivation behind this is to eventually put together a visualization like this one which requires a JSON file. To do so, I need to convert grouped into the following: new_data['children'][i]['name'] = region new_data[

Pandas to D3. Serializing dataframes to JSON

左心房为你撑大大i 提交于 2021-02-06 04:27:24
问题 I have a DataFrame with the following columns and no duplicates: ['region', 'type', 'name', 'value'] that can be seen as a hierarchy as follows grouped = df.groupby(['region','type', 'name']) I would like to serialize this hierarchy as a JSON object. If anyone is interested, the motivation behind this is to eventually put together a visualization like this one which requires a JSON file. To do so, I need to convert grouped into the following: new_data['children'][i]['name'] = region new_data[

Get Feeds from FeedParser and Import to Pandas DataFrame

孤街浪徒 提交于 2021-02-05 20:36:45
问题 I'm learning python. As practice I'm building a rss scraper with feedparser putting the output into a pandas dataframe and trying to mine with NLTK...but I'm first getting a list of articles from multiple RSS feeds. I used this post on how to pass multiple feeds and combined it with an answer I got previously to another question on how to get it into a Pandas dataframe. What the problem is, I want to be able to see the data from all the feeds in my dataframe. Currently I'm only able to access

Using pandas to read text file with leading whitespace gives a NaN column

我们两清 提交于 2021-02-05 20:35:46
问题 I am using pandas.read_csv to read a whitespace delimited file. The file has a variable number of whitespace characters in front of every line (the numbers are right-aligned). When I read this file, it creates a column of NaN. Why does this happen, and what is the best way to prevent it? Example: Text file: 9.0 3.3 4.0 32.3 44.3 5.1 7.2 1.1 0.9 Command: import pandas as pd pd.read_csv("test.txt",delim_whitespace=True,header=None) Output: 0 1 2 3 0 NaN 9.0 3.3 4.0 1 NaN 32.3 44.3 5.1 2 NaN 7.2

Using pandas to read text file with leading whitespace gives a NaN column

ⅰ亾dé卋堺 提交于 2021-02-05 20:35:23
问题 I am using pandas.read_csv to read a whitespace delimited file. The file has a variable number of whitespace characters in front of every line (the numbers are right-aligned). When I read this file, it creates a column of NaN. Why does this happen, and what is the best way to prevent it? Example: Text file: 9.0 3.3 4.0 32.3 44.3 5.1 7.2 1.1 0.9 Command: import pandas as pd pd.read_csv("test.txt",delim_whitespace=True,header=None) Output: 0 1 2 3 0 NaN 9.0 3.3 4.0 1 NaN 32.3 44.3 5.1 2 NaN 7.2

ANOVA for groups within a dataframe using scipy

≯℡__Kan透↙ 提交于 2021-02-05 20:31:06
问题 I have a dataframe as follows. I need to do ANOVA on this between three conditions. The dataframe looks like: data0 = pd.DataFrame({'Names': ['CTA15', 'CTA15', 'AC007', 'AC007', 'AC007','AC007'], 'value': [22, 22, 2, 2, 2,5], 'condition':['NON', 'NON', 'YES', 'YES', 'RE','RE']}) I need to do ANOVA test between YES and NON, NON and RE and YES and RE, conditions from conditions for Names. I know I could do it like this, NON=df.query('condition =="NON"and Names=="CTA15"') no=df.value YES=df

Apply different functions to different items in group object: Python pandas

半城伤御伤魂 提交于 2021-02-05 20:30:31
问题 Suppose I have a dataframe as follows: In [1]: test_dup_df Out[1]: exe_price exe_vol flag 2008-03-13 14:41:07 84.5 200 yes 2008-03-13 14:41:37 85.0 10000 yes 2008-03-13 14:41:38 84.5 69700 yes 2008-03-13 14:41:39 84.5 1200 yes 2008-03-13 14:42:00 84.5 1000 yes 2008-03-13 14:42:08 84.5 300 yes 2008-03-13 14:42:10 84.5 88100 yes 2008-03-13 14:42:10 84.5 11900 yes 2008-03-13 14:42:15 84.5 5000 yes 2008-03-13 14:42:16 84.5 3200 yes I want to group a duplicate data at time 14:42:10 and apply

ValueError: cannot insert ID, already exists

妖精的绣舞 提交于 2021-02-05 20:23:41
问题 I have this data: ID TIME 1 2 1 4 1 2 2 3 I want to group the data by ID and calculate the mean time and the size of each group. ID MEAN_TIME COUNT 1 2.67 3 2 3.00 1 If I run this code, then I get an error "ValueError: cannot insert ID, already exists": result = df.groupby(['ID']).agg({'TIME': 'mean', 'ID': 'count'}).reset_index() 回答1: Use parameter drop=True which not create new column with index but remove it: result = df.groupby(['ID']).agg({'TIME': 'mean', 'ID': 'count'}).reset_index(drop