data-analysis

Count of a value in consecutive timestamp in pandas

和自甴很熟 提交于 2021-02-20 04:45:28
问题 Hour Site 01/08/2020 00:00 A 01/08/2020 00:00 B 01/08/2020 00:00 C 01/08/2020 00:00 D 01/08/2020 01:00 A 01/08/2020 01:00 B 01/08/2020 01:00 E 01/08/2020 01:00 F 01/08/2020 02:00 A 01/08/2020 02:00 E 01/08/2020 03:00 C 01/08/2020 03:00 G ….. 01/08/2020 04:00 x 01/08/2020 04:00 s ….. 01/08/2020 23:00 G 02/08/2020 00:00 G I have a dataframe like above. I want to count how many times a site comes in consecutive hours & start and end timestamp. wheres in each hour there are multiple sites. For

Count of a value in consecutive timestamp in pandas

笑着哭i 提交于 2021-02-20 04:45:08
问题 Hour Site 01/08/2020 00:00 A 01/08/2020 00:00 B 01/08/2020 00:00 C 01/08/2020 00:00 D 01/08/2020 01:00 A 01/08/2020 01:00 B 01/08/2020 01:00 E 01/08/2020 01:00 F 01/08/2020 02:00 A 01/08/2020 02:00 E 01/08/2020 03:00 C 01/08/2020 03:00 G ….. 01/08/2020 04:00 x 01/08/2020 04:00 s ….. 01/08/2020 23:00 G 02/08/2020 00:00 G I have a dataframe like above. I want to count how many times a site comes in consecutive hours & start and end timestamp. wheres in each hour there are multiple sites. For

Python: Unstacked DataFrame is too big, causing int32 overflow

坚强是说给别人听的谎言 提交于 2021-02-19 09:44:51
问题 I have a big dataset and when I try to run this code I get a memory error. user_by_movie = user_items.groupby(['user_id', 'movie_id'])['rating'].max().unstack() here is the error: ValueError: Unstacked DataFrame is too big, causing int32 overflow I have run it on another machine and it worked fine! how can I fix this error? 回答1: As it turns out this was not an issue on pandas 0.21. I am using a Jupyter notebook and I need the latest version of pandas for the rest of the code. So I did this:

Python: Unstacked DataFrame is too big, causing int32 overflow

人走茶凉 提交于 2021-02-19 09:44:14
问题 I have a big dataset and when I try to run this code I get a memory error. user_by_movie = user_items.groupby(['user_id', 'movie_id'])['rating'].max().unstack() here is the error: ValueError: Unstacked DataFrame is too big, causing int32 overflow I have run it on another machine and it worked fine! how can I fix this error? 回答1: As it turns out this was not an issue on pandas 0.21. I am using a Jupyter notebook and I need the latest version of pandas for the rest of the code. So I did this:

read csv in a for loop using pandas

半城伤御伤魂 提交于 2021-02-19 08:10:23
问题 inp_file=os.getcwd() files_comp = pd.read_csv(inp_file,"B00234*.csv", na_values = missing_values, nrows=10) for f in files_comp: df_calculated = pd.read_csv(f, na_values = missing_values, nrows=10) col_length=len(df.columns)-1 Hi folks, How can I read 4 csv files in a for a loop. I am getting an error while reading the CSV in above format. Kindly help me 回答1: You basically need this: Get a list of all target files. files=os.listdir(path) and then keep only the filenames that start with your

How to calculate time difference between two pandas column [duplicate]

久未见 提交于 2021-02-16 22:40:30
问题 This question already has answers here : Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes (3 answers) Closed 2 years ago . My df looks like, start stop 0 2015-11-04 10:12:00 2015-11-06 06:38:00 1 2015-11-04 10:23:00 2015-11-05 08:30:00 2 2015-11-04 14:01:00 2015-11-17 10:34:00 4 2015-11-19 01:43:00 2015-12-21 09:04:00 print(time_df.dtypes) start datetime64[ns] stop datetime64[ns] dtype: object I am trying to find the time difference between, stop and start.

How to calculate time difference between two pandas column [duplicate]

被刻印的时光 ゝ 提交于 2021-02-16 22:40:23
问题 This question already has answers here : Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes (3 answers) Closed 2 years ago . My df looks like, start stop 0 2015-11-04 10:12:00 2015-11-06 06:38:00 1 2015-11-04 10:23:00 2015-11-05 08:30:00 2 2015-11-04 14:01:00 2015-11-17 10:34:00 4 2015-11-19 01:43:00 2015-12-21 09:04:00 print(time_df.dtypes) start datetime64[ns] stop datetime64[ns] dtype: object I am trying to find the time difference between, stop and start.

How to do intersection of dataframes in pandas

女生的网名这么多〃 提交于 2021-02-11 14:53:31
问题 I have a dataframe like following : <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>Title</th> <th>ASIN</th> <th>State</th> <th>SellerSKU</th> <th>Quantity</th> <th>FBAStock</th> <th>QuantityToShip</th> </tr> </thead> <tbody> <tr> <th>1</th> <td>Daedal crafters- Pack of Two Gajra (Orange and...</td> <td>B075T64ZWJ</td> <td>WEST BENGAL</td> <td>DC216</td> <td>1</td> <td>0</td> <td>1</td> </tr> <tr> <th>2</th> <td>Daedal Dream Catchers - Intricate Web

Pandas parsing csv error - expected 1 fields found 9

拈花ヽ惹草 提交于 2021-02-08 10:50:23
问题 I'm trying to parse from a .csv file: planets = pd.read_csv("planets.csv", sep=',') But I always end up with this error: ParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 9 This is how the first few lines of my csv file look like: # This file was produced by the test # Tue Apr 3 06:03:27 2018 # # COLUMN pl_hostname: Host Name # COLUMN pl_discmethod: Discovery Method # COLUMN pl_pnum: Number of Planets in System # COLUMN pl_orbper: Orbital Period [days] # COLUMN pl

Pandas parsing csv error - expected 1 fields found 9

女生的网名这么多〃 提交于 2021-02-08 10:47:20
问题 I'm trying to parse from a .csv file: planets = pd.read_csv("planets.csv", sep=',') But I always end up with this error: ParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 9 This is how the first few lines of my csv file look like: # This file was produced by the test # Tue Apr 3 06:03:27 2018 # # COLUMN pl_hostname: Host Name # COLUMN pl_discmethod: Discovery Method # COLUMN pl_pnum: Number of Planets in System # COLUMN pl_orbper: Orbital Period [days] # COLUMN pl