Read .txt file with Python Pandas - strings and floats

微笑、不失礼 提交于 2021-02-10 07:14:57

问题


I would like to read a .txt file in Python (3.6.0) using Pandas. The first lines of the .txt file is shown below:

Text file to read

Location:           XXX
Campaign Name:      XXX
Date of log start:  2016_10_09
Time of log start:  04:27:28
Sampling Frequency: 1Hz
Config file:        XXX
Logger Serial:      XXX

CH Mapping;;XXXC1;XXXC2;XXXC3;XXXC4
CH Offsets in ms;;X;X,X;X;X,X
CH Units;;mA;mA;mA;mA
Time;msec;Channel1;Channel2;Channel3;Channel4
04:30:00;000; 0.01526;10.67903;10.58366; 0.00000
04:30:01;000; 0.17090;10.68666;10.58518; 0.00000
04:30:02;000; 0.25177;10.68284;10.58442; 0.00000

I am using the simple line of code below:

Python Code

import pandas
df = pandas.read_csv("TextFile.txt", sep=";", header=[10])
print(df)

and then get the below output in the terminal:

Terminal Output

    Time msec Channel1 Channel2 Channel3 Channel4
0    NaN  NaN      NaN      NaN      NaN      NaN
1    NaN  NaN      NaN      NaN      NaN      NaN
2    NaN  NaN      NaN      NaN      NaN      NaN
..   ...  ...      ...      ...      ...      ...
599  NaN  NaN      NaN      NaN      NaN      NaN

My immediate thought is that Pandas does not "like" the first two columns. Do you have any suggestions that I can get Pandas to read the .txt file without changing anything in the file itself.

Thank you in advance.


回答1:


You want to pass skiprows=11, and skipinitial_space=True to read_csv along with sep=';' as you have spaces along with your separator:

In [83]:
import io
import pandas as pd
t="""Location:           XXX
Campaign Name:      XXX
Date of log start:  2016_10_09
Time of log start:  04:27:28
Sampling Frequency: 1Hz
Config file:        XXX
Logger Serial:      XXX
​
CH Mapping;;XXXC1;XXXC2;XXXC3;XXXC4
CH Offsets in ms;;X;X,X;X;X,X
CH Units;;mA;mA;mA;mA
Time;msec;Channel1;Channel2;Channel3;Channel4
04:30:00;000; 0.01526;10.67903;10.58366; 0.00000
04:30:01;000; 0.17090;10.68666;10.58518; 0.00000
04:30:02;000; 0.25177;10.68284;10.58442; 0.00000"""
​
df = pd.read_csv(io.StringIO(t), skiprows=11, sep=';', skipinitialspace=True)
df

Out[83]:
       Time  msec  Channel1  Channel2  Channel3  Channel4
0  04:30:00     0   0.01526  10.67903  10.58366       0.0
1  04:30:01     0   0.17090  10.68666  10.58518       0.0
2  04:30:02     0   0.25177  10.68284  10.58442       0.0

You can see the dtypes are now correct:

In [84]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 6 columns):
Time        3 non-null object
msec        3 non-null int64
Channel1    3 non-null float64
Channel2    3 non-null float64
Channel3    3 non-null float64
Channel4    3 non-null float64
dtypes: float64(4), int64(1), object(1)
memory usage: 224.0+ bytes

You may also want to optionally parse the times into datetimes:

In [86]:    
df = pd.read_csv(io.StringIO(t), skiprows=11, sep=';', skipinitialspace=True, parse_dates=['Time'])
df

Out[86]:
                 Time  msec  Channel1  Channel2  Channel3  Channel4
0 2017-03-16 04:30:00     0   0.01526  10.67903  10.58366       0.0
1 2017-03-16 04:30:01     0   0.17090  10.68666  10.58518       0.0
2 2017-03-16 04:30:02     0   0.25177  10.68284  10.58442       0.0

In [87]:    
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 6 columns):
Time        3 non-null datetime64[ns]
msec        3 non-null int64
Channel1    3 non-null float64
Channel2    3 non-null float64
Channel3    3 non-null float64
Channel4    3 non-null float64
dtypes: datetime64[ns](1), float64(4), int64(1)
memory usage: 224.0 bytes


来源:https://stackoverflow.com/questions/42829733/read-txt-file-with-python-pandas-strings-and-floats

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!