pandas

Python location, show distance from closest other location

可紊 提交于 2021-02-07 12:09:18
问题 I am a location in a dataframe, underneath lat lon column names. I want to show how far that is from the lat lon of the nearest train station in a separate dataframe. So for example, I have a lat lon of (37.814563 144.970267), and i have a list as below of other geospatial points. I want to find the point that is closest and then find the distance between those points, as an extra column in the dataframe in suburbs. This is the example of the train dataset <bound method NDFrame.to_clipboard

Will passing ignore_index=True to pd.concat preserve index succession within dataframes that I'm concatenating?

╄→尐↘猪︶ㄣ 提交于 2021-02-07 12:05:49
问题 I have two dataframes: df1 = value 0 a 1 b 2 c df2 = value 0 d 1 e I need to concatenate them across index, but I have to preserve the index of the first dataframe and continue it in the second dataframe, like this: result = value 0 a 1 b 2 c 3 d 4 e My guess is that pd.concat([df1, df2], ignore_index=True) will do the job. However, I'm worried that for large dataframes the order of the rows may be changed and I'll end up with something like this (first two rows changed indices): result =

“DataFrame” object has no attribute 'reshape'

依然范特西╮ 提交于 2021-02-07 11:54:32
问题 I want to reshape some data in a CSV file without header but I keep getting this error AttributeError: 'DataFrame' object has no attribute 'reshape' This is my script, I want to reshape the data in 2nd column only import pandas as pd df = pd.read_csv("test.csv", header=None, usecols=[1]) start = 0 for i in range(0, len(df.index)): if (i + 1)%10 == 0: result = df.iloc[start:i+1].reshape(2,5) start = i + 1 print result Here is the CSV 1,52.1 2,32.2 3,44.6 3,99.1 5,12.3 3,43.2 7,79.4 8,45.5 9,56

“DataFrame” object has no attribute 'reshape'

自闭症网瘾萝莉.ら 提交于 2021-02-07 11:54:11
问题 I want to reshape some data in a CSV file without header but I keep getting this error AttributeError: 'DataFrame' object has no attribute 'reshape' This is my script, I want to reshape the data in 2nd column only import pandas as pd df = pd.read_csv("test.csv", header=None, usecols=[1]) start = 0 for i in range(0, len(df.index)): if (i + 1)%10 == 0: result = df.iloc[start:i+1].reshape(2,5) start = i + 1 print result Here is the CSV 1,52.1 2,32.2 3,44.6 3,99.1 5,12.3 3,43.2 7,79.4 8,45.5 9,56

AttributeError: 'ElementTree' object has no attribute 'getiterator' when trying to import excel file

天涯浪子 提交于 2021-02-07 11:29:35
问题 This is my code. I've just installed jupyterlab and i've added the excel file in there. Same error if i change the path to where the file is on my system. I can't seem to find anyone who had the same problem when simply importing an excel file as a dataframe. The excel file is a 3x26 table with studentnr, course, result columns that have values like 101-105, A-D, 1.0-9.9 respectively. Maybe the problem lies with the excel file? Either way i have no idea how to fix this. import pandas as pd

Pandas GroupBy and select rows with the minimum value in a specific column

梦想的初衷 提交于 2021-02-07 11:24:26
问题 I am grouping my dataset by column A and then would like to take the minimum value in column B and the corresponding value in column C. data = pd.DataFrame({'A': [1, 2], 'B':[ 2, 4], 'C':[10, 4]}) data A B C 0 1 4 3 1 1 5 4 2 1 2 10 3 2 7 2 4 2 4 4 5 2 6 6 and I would like to get : A B C 0 1 2 10 1 2 4 4 For the moment I am grouping by A, and creating a value that indicates me the rows I will keep in my dataset: a = data.groupby('A').min() a['A'] = a.index to_keep = [str(x[0]) + str(x[1]) for

Difference of elements in list in PySpark

白昼怎懂夜的黑 提交于 2021-02-07 10:59:28
问题 I have a PySpark dataframe ( df ) with a column which contains lists with two elements. The two elements in the list are not ordered by ascending or descending orders. +--------+----------+-------+ | version| timestamp| list | +--------+-----+----|-------+ | v1 |2012-01-10| [5,2] | | v1 |2012-01-11| [2,5] | | v1 |2012-01-12| [3,2] | | v2 |2012-01-12| [2,3] | | v2 |2012-01-11| [1,2] | | v2 |2012-01-13| [2,1] | +--------+----------+-------+ I want to take difference betweeen the first and the

Replace certain values based on pattern and extract substring in pandas

流过昼夜 提交于 2021-02-07 10:55:49
问题 Pandas Dataframe with col1 that contains various dates col1 Q2 '20 Q1 '21 May '20 June '20 25/05/2020 Q4 '20+Q1 '21 Q2 '21+Q3 '21 Q4 '21+Q1 '22 I want to replace certain values in col1 that match a pattern. For the values that contain 2 quarters with "+" I want to return a season in string plus the first year contained in the pattern. I want to leave the other values as they are. For example: 1) Q4 '20+Q1 '21 should be 'Winter 20' 2) Q2 '21+Q3 '21 should be 'Summer 21' 3) Q4 '21+Q1 '22 should

Connect to DB using LDAP with python cx_Oracle

北战南征 提交于 2021-02-07 10:52:52
问题 I have a set of python scripts that use cx_Oracle to connect to a remote DB. This is a large project, were this connections are used several times. Additionally, I produce an .exe file that is distributed and should be as self-contained as possible. In other words, if I send you the .exe, you should be able to run it without any extra tinkering (I use pyinstaller ) Right now, I get a connection using ip = 'myhost.example.pt' port = 1521 SID = 'MYDB_PRD.EXAMPLE.PT' dsn_tns = cx_Oracle.makedsn

Connect to DB using LDAP with python cx_Oracle

二次信任 提交于 2021-02-07 10:52:43
问题 I have a set of python scripts that use cx_Oracle to connect to a remote DB. This is a large project, were this connections are used several times. Additionally, I produce an .exe file that is distributed and should be as self-contained as possible. In other words, if I send you the .exe, you should be able to run it without any extra tinkering (I use pyinstaller ) Right now, I get a connection using ip = 'myhost.example.pt' port = 1521 SID = 'MYDB_PRD.EXAMPLE.PT' dsn_tns = cx_Oracle.makedsn