问题
I would like to read a .txt file in Python (3.6.0) using Pandas. The first lines of the .txt file is shown below:
Text file to read
Location: XXX
Campaign Name: XXX
Date of log start: 2016_10_09
Time of log start: 04:27:28
Sampling Frequency: 1Hz
Config file: XXX
Logger Serial: XXX
CH Mapping;;XXXC1;XXXC2;XXXC3;XXXC4
CH Offsets in ms;;X;X,X;X;X,X
CH Units;;mA;mA;mA;mA
Time;msec;Channel1;Channel2;Channel3;Channel4
04:30:00;000; 0.01526;10.67903;10.58366; 0.00000
04:30:01;000; 0.17090;10.68666;10.58518; 0.00000
04:30:02;000; 0.25177;10.68284;10.58442; 0.00000
I am using the simple line of code below:
Python Code
import pandas
df = pandas.read_csv("TextFile.txt", sep=";", header=[10])
print(df)
and then get the below output in the terminal:
Terminal Output
Time msec Channel1 Channel2 Channel3 Channel4
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
.. ... ... ... ... ... ...
599 NaN NaN NaN NaN NaN NaN
My immediate thought is that Pandas does not "like" the first two columns. Do you have any suggestions that I can get Pandas to read the .txt file without changing anything in the file itself.
Thank you in advance.
回答1:
You want to pass skiprows=11
, and skipinitial_space=True
to read_csv
along with sep=';'
as you have spaces along with your separator:
In [83]:
import io
import pandas as pd
t="""Location: XXX
Campaign Name: XXX
Date of log start: 2016_10_09
Time of log start: 04:27:28
Sampling Frequency: 1Hz
Config file: XXX
Logger Serial: XXX
CH Mapping;;XXXC1;XXXC2;XXXC3;XXXC4
CH Offsets in ms;;X;X,X;X;X,X
CH Units;;mA;mA;mA;mA
Time;msec;Channel1;Channel2;Channel3;Channel4
04:30:00;000; 0.01526;10.67903;10.58366; 0.00000
04:30:01;000; 0.17090;10.68666;10.58518; 0.00000
04:30:02;000; 0.25177;10.68284;10.58442; 0.00000"""
df = pd.read_csv(io.StringIO(t), skiprows=11, sep=';', skipinitialspace=True)
df
Out[83]:
Time msec Channel1 Channel2 Channel3 Channel4
0 04:30:00 0 0.01526 10.67903 10.58366 0.0
1 04:30:01 0 0.17090 10.68666 10.58518 0.0
2 04:30:02 0 0.25177 10.68284 10.58442 0.0
You can see the dtypes are now correct:
In [84]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 6 columns):
Time 3 non-null object
msec 3 non-null int64
Channel1 3 non-null float64
Channel2 3 non-null float64
Channel3 3 non-null float64
Channel4 3 non-null float64
dtypes: float64(4), int64(1), object(1)
memory usage: 224.0+ bytes
You may also want to optionally parse the times into datetimes:
In [86]:
df = pd.read_csv(io.StringIO(t), skiprows=11, sep=';', skipinitialspace=True, parse_dates=['Time'])
df
Out[86]:
Time msec Channel1 Channel2 Channel3 Channel4
0 2017-03-16 04:30:00 0 0.01526 10.67903 10.58366 0.0
1 2017-03-16 04:30:01 0 0.17090 10.68666 10.58518 0.0
2 2017-03-16 04:30:02 0 0.25177 10.68284 10.58442 0.0
In [87]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 6 columns):
Time 3 non-null datetime64[ns]
msec 3 non-null int64
Channel1 3 non-null float64
Channel2 3 non-null float64
Channel3 3 non-null float64
Channel4 3 non-null float64
dtypes: datetime64[ns](1), float64(4), int64(1)
memory usage: 224.0 bytes
来源:https://stackoverflow.com/questions/42829733/read-txt-file-with-python-pandas-strings-and-floats