pandas | 易学教程

Python location, show distance from closest other location

阅读更多关于 Python location, show distance from closest other location

问题 I am a location in a dataframe, underneath lat lon column names. I want to show how far that is from the lat lon of the nearest train station in a separate dataframe. So for example, I have a lat lon of (37.814563 144.970267), and i have a list as below of other geospatial points. I want to find the point that is closest and then find the distance between those points, as an extra column in the dataframe in suburbs. This is the example of the train dataset <bound method NDFrame.to_clipboard

Will passing ignore_index=True to pd.concat preserve index succession within dataframes that I'm concatenating?

阅读更多关于 Will passing ignore_index=True to pd.concat preserve index succession within dataframes that I'm concatenating?

问题 I have two dataframes: df1 = value 0 a 1 b 2 c df2 = value 0 d 1 e I need to concatenate them across index, but I have to preserve the index of the first dataframe and continue it in the second dataframe, like this: result = value 0 a 1 b 2 c 3 d 4 e My guess is that pd.concat([df1, df2], ignore_index=True) will do the job. However, I'm worried that for large dataframes the order of the rows may be changed and I'll end up with something like this (first two rows changed indices): result =

“DataFrame” object has no attribute 'reshape'

阅读更多关于 “DataFrame” object has no attribute 'reshape'

问题 I want to reshape some data in a CSV file without header but I keep getting this error AttributeError: 'DataFrame' object has no attribute 'reshape' This is my script, I want to reshape the data in 2nd column only import pandas as pd df = pd.read_csv("test.csv", header=None, usecols=[1]) start = 0 for i in range(0, len(df.index)): if (i + 1)%10 == 0: result = df.iloc[start:i+1].reshape(2,5) start = i + 1 print result Here is the CSV 1,52.1 2,32.2 3,44.6 3,99.1 5,12.3 3,43.2 7,79.4 8,45.5 9,56

“DataFrame” object has no attribute 'reshape'

阅读更多关于 “DataFrame” object has no attribute 'reshape'

AttributeError: 'ElementTree' object has no attribute 'getiterator' when trying to import excel file

阅读更多关于 AttributeError: 'ElementTree' object has no attribute 'getiterator' when trying to import excel file

问题 This is my code. I've just installed jupyterlab and i've added the excel file in there. Same error if i change the path to where the file is on my system. I can't seem to find anyone who had the same problem when simply importing an excel file as a dataframe. The excel file is a 3x26 table with studentnr, course, result columns that have values like 101-105, A-D, 1.0-9.9 respectively. Maybe the problem lies with the excel file? Either way i have no idea how to fix this. import pandas as pd

Pandas GroupBy and select rows with the minimum value in a specific column

阅读更多关于 Pandas GroupBy and select rows with the minimum value in a specific column

问题 I am grouping my dataset by column A and then would like to take the minimum value in column B and the corresponding value in column C. data = pd.DataFrame({'A': [1, 2], 'B':[ 2, 4], 'C':[10, 4]}) data A B C 0 1 4 3 1 1 5 4 2 1 2 10 3 2 7 2 4 2 4 4 5 2 6 6 and I would like to get : A B C 0 1 2 10 1 2 4 4 For the moment I am grouping by A, and creating a value that indicates me the rows I will keep in my dataset: a = data.groupby('A').min() a['A'] = a.index to_keep = [str(x[0]) + str(x[1]) for

Difference of elements in list in PySpark

阅读更多关于 Difference of elements in list in PySpark

问题 I have a PySpark dataframe ( df ) with a column which contains lists with two elements. The two elements in the list are not ordered by ascending or descending orders. +--------+----------+-------+ | version| timestamp| list | +--------+-----+----|-------+ | v1 |2012-01-10| [5,2] | | v1 |2012-01-11| [2,5] | | v1 |2012-01-12| [3,2] | | v2 |2012-01-12| [2,3] | | v2 |2012-01-11| [1,2] | | v2 |2012-01-13| [2,1] | +--------+----------+-------+ I want to take difference betweeen the first and the

Replace certain values based on pattern and extract substring in pandas

阅读更多关于 Replace certain values based on pattern and extract substring in pandas

问题 Pandas Dataframe with col1 that contains various dates col1 Q2 '20 Q1 '21 May '20 June '20 25/05/2020 Q4 '20+Q1 '21 Q2 '21+Q3 '21 Q4 '21+Q1 '22 I want to replace certain values in col1 that match a pattern. For the values that contain 2 quarters with "+" I want to return a season in string plus the first year contained in the pattern. I want to leave the other values as they are. For example: 1) Q4 '20+Q1 '21 should be 'Winter 20' 2) Q2 '21+Q3 '21 should be 'Summer 21' 3) Q4 '21+Q1 '22 should

Connect to DB using LDAP with python cx_Oracle

阅读更多关于 Connect to DB using LDAP with python cx_Oracle

问题 I have a set of python scripts that use cx_Oracle to connect to a remote DB. This is a large project, were this connections are used several times. Additionally, I produce an .exe file that is distributed and should be as self-contained as possible. In other words, if I send you the .exe, you should be able to run it without any extra tinkering (I use pyinstaller ) Right now, I get a connection using ip = 'myhost.example.pt' port = 1521 SID = 'MYDB_PRD.EXAMPLE.PT' dsn_tns = cx_Oracle.makedsn

Connect to DB using LDAP with python cx_Oracle

阅读更多关于 Connect to DB using LDAP with python cx_Oracle