pandas | 易学教程

Pandas value_counts() returns non unique values

阅读更多关于 Pandas value_counts() returns non unique values

问题 I have a dataframe of surgical activity data that has 58 columns and 200,000 records. One of the columns is treatment specialty. Each row corresponds to a patient encounter. I want to see the relative conribution of medical specialties.One column is 'treatment_specialty'. I have used df['treatment_specialty'].value_counts(normalize=true) to get the relative proprtions of each specialty. This below is returned (no errors). The specialties have codes eg 150 is neurosurgery. df.head() 150 0

DatabaseError : “not all arguments converted during string formatting” when I use pandas.io.sql.to_sql()

阅读更多关于 DatabaseError : “not all arguments converted during string formatting” when I use pandas.io.sql.to_sql()

问题 I have a table: And I try to use this import this table by sqlalchemy , the code is: import sqlalchemy as db import pandas.io.sql as sql username = 'root' password = 'root' host = 'localhost' port = '3306' database = 'classicmodels' engine = db.create_engine(f'mysql+pymysql://{username}:{password}@{host}:{port}/{database}') con = engine.raw_connection() #readinto dataframe df = pd.read_sql(f'SELECT * FROM `{database}`.`offices`;', con) print(df[:2]) df_append = pd.DataFrame([{'officeCode': 8,

Multiply dataframe with values from other dataframe

阅读更多关于 Multiply dataframe with values from other dataframe

问题 I have two dataframes df1 = pd.DataFrame([[1,2],[3,4],[5,6],[7,8]], index = ['a','b','c', 'a'], columns = ['d','e']) d e a 1 2 b 3 4 c 5 6 a 7 8 df2 = pd.DataFrame([['a', 10],['b',20],['c',30],['f',40]]) 0 1 0 a 10 1 b 20 2 c 30 3 f 40 i want my final dataframe to multiply rows of df1 to multiply by a factor corresponding to value in df2 (for eg. 20 for b) so my output should look like d e a 10 20 b 60 80 c 150 180 a 70 80 Kindly provide a solution assuming df1 to be hundreds of rows in

how to find values within a radius from a central position of latitude and longitude value

阅读更多关于 how to find values within a radius from a central position of latitude and longitude value

问题 I am trying to calculate all the values contained within a particular radius from a central lat lon position.The code which I am using is as given: import numpy as np import matplotlib.pylab as pl import netCDF4 as nc import haversine f = nc.Dataset('air_temp.nc') def haversine(lon1, lat1, lon2, lat2): # convert decimal degrees to radians lon1 = np.deg2rad(lon1) lon2 = np.deg2rad(lon2) lat1 = np.deg2rad(lat1) lat2 = np.deg2rad(lat2) # haversine formula dlon = lon2 - lon1 dlat = lat2 - lat1 a

how to find values within a radius from a central position of latitude and longitude value

阅读更多关于 how to find values within a radius from a central position of latitude and longitude value

Pandas read Json - Trailing Data

阅读更多关于 Pandas read Json - Trailing Data

问题 I am trying to read a large Json file through Pandas pd.read_json, but an error is showing: ValueError: Trailing data From my research here I was not successful, so I would like to ask for your help. Tried to run a Json validator and the output is below. How can I fix this? Thank you 回答1: The error messeage you presented contains precise location where the source of the problem is: (At line #191), (At position #1) Look at the indicated place in your JSON file. Edit A weird detail in your file

Pandas read Json - Trailing Data

阅读更多关于 Pandas read Json - Trailing Data

Pandas Rolling Python to create new Columns

阅读更多关于 Pandas Rolling Python to create new Columns

问题 I have a dataframe from excel file with data something like this: A B Sum A Sum B 0 1 235353.21333333332 2 89160.59999999999 3 188382.98666666663 4 104677.1466666667 5 207723.25333333333 6 170128.02666666667 7 165287.5 8 44863.200000000004 9 177096.72 10 97687.71666666666 655447.7167 824912.6467 11 113207.76333333334 533302.2667 824912.6467 12 195151.2 444141.6667 1020063.847 13 151408.4433333333 255758.68 1171472.29 14 50865.66999999999 255758.68 1117660.813 15 84536.19000000002 255758.68

Plotly Dash: Why is my figure failing to show with a multi dropdown selection?

阅读更多关于 Plotly Dash: Why is my figure failing to show with a multi dropdown selection?

问题 I am building a simple python dashboard using dash and plotly. I am also new to python (as is probably evident!) and I'm happy for any/all corrections. I would like to plot a time series of data from a pre-determined CSV file. I have added a dropdown selection box with which I would like to allow multiple different columns to be plotted. Sample data: "TOA5","HE605_RV50_GAF","CR6","7225","CR6.Std.07","CPU:BiSP5_GAF_v2d.CR6","51755","SensorStats" "TIMESTAMP","RECORD","BattV_Min","BattV_Avg",

Converting CSV file to HDF5 using pandas

阅读更多关于 Converting CSV file to HDF5 using pandas

问题 When i use pandas to convert csv files to hdf5 files the resulting file is extremely large. For example a test csv file (23 columns, 1.3 million rows) of 170Mb results in an hdf5 file of 2Gb. However if pandas is bypassed and the hdf5 file is directly written (using pytables) it is only 20Mb. In the following code (that is used to do the conversion in pandas) the values of the object columns in the dataframe are explicitly converted to string objects (to prevent pickling): # Open the csv file