split float numbers in half with python

可紊 提交于 2019-12-08 09:33:31

问题


I'm dealing with a csv table converted from a pdf with Adobe acrobat pro. For some reason the software creates recurrent error each 117 rows. It "duplicates & concatenates the numbers" e.g. a row

7307 1 87.1

is transformed into something of this sort:

73077307 11 87187.1

how I can "correct" these rows with python? I would need to split the float in the middle and erase the first half.

I have read several threads about truncation but most of them need to split floats by the decimal point or deal just with integers. The data type would be float64 because I'm using the pandas read_csv function to read the csv.

df = pd.read_csv('path/file.csv',sep=';',index_col='Rang', na_values=['NA'])
df.dropna(how="all", inplace=True) # drop empty rows (an additional issue)
df[(df.index >10000)]

EDIT1: Code added, I thought I could identify the wrong ones because I have 1 row per hour of the year. Any line with index bigger than 365*24=8760 is wrong. But I see now that is not sufficient. One could loop the dataframe and if the index of row(i+1)-index of row (i) is greater than one then it needs correction. But I'm a begginer in python.. I'm not sure how to write that, but that's kind of a different problem.

I'm using python version 2.7.8 pandas v. 0.14.1

many thanks!


回答1:


Grab each space-delimited word into a list as a string. For each item in that list, check if len of word is even or odd. If even, replace word with right half of word. If odd (because of '.' on right side) grab right half rounding up (so 5 right-most chars in 9 char word, for example). Add conversion to float64 when replacing each word.



来源:https://stackoverflow.com/questions/28444525/split-float-numbers-in-half-with-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!