pandas

Pandas value_counts() returns non unique values

∥☆過路亽.° 提交于 2021-02-08 11:33:32
问题 I have a dataframe of surgical activity data that has 58 columns and 200,000 records. One of the columns is treatment specialty. Each row corresponds to a patient encounter. I want to see the relative conribution of medical specialties.One column is 'treatment_specialty'. I have used df['treatment_specialty'].value_counts(normalize=true) to get the relative proprtions of each specialty. This below is returned (no errors). The specialties have codes eg 150 is neurosurgery. df.head() 150 0

DatabaseError : “not all arguments converted during string formatting” when I use pandas.io.sql.to_sql()

时光总嘲笑我的痴心妄想 提交于 2021-02-08 11:26:59
问题 I have a table: And I try to use this import this table by sqlalchemy , the code is: import sqlalchemy as db import pandas.io.sql as sql username = 'root' password = 'root' host = 'localhost' port = '3306' database = 'classicmodels' engine = db.create_engine(f'mysql+pymysql://{username}:{password}@{host}:{port}/{database}') con = engine.raw_connection() #readinto dataframe df = pd.read_sql(f'SELECT * FROM `{database}`.`offices`;', con) print(df[:2]) df_append = pd.DataFrame([{'officeCode': 8,

Multiply dataframe with values from other dataframe

回眸只為那壹抹淺笑 提交于 2021-02-08 11:21:29
问题 I have two dataframes df1 = pd.DataFrame([[1,2],[3,4],[5,6],[7,8]], index = ['a','b','c', 'a'], columns = ['d','e']) d e a 1 2 b 3 4 c 5 6 a 7 8 df2 = pd.DataFrame([['a', 10],['b',20],['c',30],['f',40]]) 0 1 0 a 10 1 b 20 2 c 30 3 f 40 i want my final dataframe to multiply rows of df1 to multiply by a factor corresponding to value in df2 (for eg. 20 for b) so my output should look like d e a 10 20 b 60 80 c 150 180 a 70 80 Kindly provide a solution assuming df1 to be hundreds of rows in

how to find values within a radius from a central position of latitude and longitude value

这一生的挚爱 提交于 2021-02-08 11:16:28
问题 I am trying to calculate all the values contained within a particular radius from a central lat lon position.The code which I am using is as given: import numpy as np import matplotlib.pylab as pl import netCDF4 as nc import haversine f = nc.Dataset('air_temp.nc') def haversine(lon1, lat1, lon2, lat2): # convert decimal degrees to radians lon1 = np.deg2rad(lon1) lon2 = np.deg2rad(lon2) lat1 = np.deg2rad(lat1) lat2 = np.deg2rad(lat2) # haversine formula dlon = lon2 - lon1 dlat = lat2 - lat1 a

how to find values within a radius from a central position of latitude and longitude value

家住魔仙堡 提交于 2021-02-08 11:16:05
问题 I am trying to calculate all the values contained within a particular radius from a central lat lon position.The code which I am using is as given: import numpy as np import matplotlib.pylab as pl import netCDF4 as nc import haversine f = nc.Dataset('air_temp.nc') def haversine(lon1, lat1, lon2, lat2): # convert decimal degrees to radians lon1 = np.deg2rad(lon1) lon2 = np.deg2rad(lon2) lat1 = np.deg2rad(lat1) lat2 = np.deg2rad(lat2) # haversine formula dlon = lon2 - lon1 dlat = lat2 - lat1 a

Pandas read Json - Trailing Data

余生长醉 提交于 2021-02-08 11:14:11
问题 I am trying to read a large Json file through Pandas pd.read_json, but an error is showing: ValueError: Trailing data From my research here I was not successful, so I would like to ask for your help. Tried to run a Json validator and the output is below. How can I fix this? Thank you 回答1: The error messeage you presented contains precise location where the source of the problem is: (At line #191), (At position #1) Look at the indicated place in your JSON file. Edit A weird detail in your file

Pandas read Json - Trailing Data

隐身守侯 提交于 2021-02-08 11:12:07
问题 I am trying to read a large Json file through Pandas pd.read_json, but an error is showing: ValueError: Trailing data From my research here I was not successful, so I would like to ask for your help. Tried to run a Json validator and the output is below. How can I fix this? Thank you 回答1: The error messeage you presented contains precise location where the source of the problem is: (At line #191), (At position #1) Look at the indicated place in your JSON file. Edit A weird detail in your file

Pandas Rolling Python to create new Columns

浪子不回头ぞ 提交于 2021-02-08 11:09:47
问题 I have a dataframe from excel file with data something like this: A B Sum A Sum B 0 1 235353.21333333332 2 89160.59999999999 3 188382.98666666663 4 104677.1466666667 5 207723.25333333333 6 170128.02666666667 7 165287.5 8 44863.200000000004 9 177096.72 10 97687.71666666666 655447.7167 824912.6467 11 113207.76333333334 533302.2667 824912.6467 12 195151.2 444141.6667 1020063.847 13 151408.4433333333 255758.68 1171472.29 14 50865.66999999999 255758.68 1117660.813 15 84536.19000000002 255758.68

Plotly Dash: Why is my figure failing to show with a multi dropdown selection?

末鹿安然 提交于 2021-02-08 11:08:10
问题 I am building a simple python dashboard using dash and plotly. I am also new to python (as is probably evident!) and I'm happy for any/all corrections. I would like to plot a time series of data from a pre-determined CSV file. I have added a dropdown selection box with which I would like to allow multiple different columns to be plotted. Sample data: "TOA5","HE605_RV50_GAF","CR6","7225","CR6.Std.07","CPU:BiSP5_GAF_v2d.CR6","51755","SensorStats" "TIMESTAMP","RECORD","BattV_Min","BattV_Avg",

Converting CSV file to HDF5 using pandas

三世轮回 提交于 2021-02-08 11:07:15
问题 When i use pandas to convert csv files to hdf5 files the resulting file is extremely large. For example a test csv file (23 columns, 1.3 million rows) of 170Mb results in an hdf5 file of 2Gb. However if pandas is bypassed and the hdf5 file is directly written (using pytables) it is only 20Mb. In the following code (that is used to do the conversion in pandas) the values of the object columns in the dataframe are explicitly converted to string objects (to prevent pickling): # Open the csv file