How to convert .dat to .csv using python?

橙三吉。 提交于 2019-12-11 03:24:46

问题


I have a file.dat which looks like:

id       | user_id | venue_id | latitude  | longitude | created_at

---------+---------+----------+-----------+-----------+-----------------

984301   |2041916  |5222      |           |           |2012-04-21 17:39:01

984222   |15824    |5222      |38.8951118 |-77.0363658|2012-04-21 17:43:47

984315   |1764391  |5222      |           |           |2012-04-21 17:37:18

984234   |44652    |5222      |33.800745  |-84.41052  | 2012-04-21 17:43:43

I need to get csv file with deleted empty latitude and longtitude rows, like:

id,user_id,venue_id,latitude,longitude,created_at

984222,15824,5222,38.8951118,-77.0363658,2012-04-21T17:43:47

984234,44652,5222,33.800745,-84.41052,2012-04-21T17:43:43

984291,105054,5222,45.5234515,-122.6762071,2012-04-21T17:39:22

I try to do that, using next code:

with open('file.dat', 'r') as input_file:
    lines = input_file.readlines()
    newLines = []
    for line in lines:
        newLine = line.strip('|').split()
        newLines.append(newLine)

with open('file.csv', 'w') as output_file:
    file_writer = csv.writer(output_file)
    file_writer.writerows(newLines)

But all the same I get a csv file with "|" symbols and empty latitude/longtitude rows. Where is mistake? In general I need to use resulting csv-file in DateFrame, so maybe there is some way to reduce number of actions.


回答1:


str.strip() removes leading and trailing characters from a string.
You want to split the lines on "|", then strip each element of the resulting list:

import csv

with open('file.dat') as dat_file, open('file.csv', 'w') as csv_file:
    csv_writer = csv.writer(csv_file)

    for line in dat_file:
        row = [field.strip() for field in line.split('|')]
        if len(row) == 6 and row[3] and row[4]:
            csv_writer.writerow(row)



回答2:


Using split() without parameters will result in splitting after a space example "test1 test2".split() results in ["test1", "test2"]

instead, try this:

newLine = line.split("|")



回答3:


Maybe it's better to use a map() function instead of list comprehensions as it must be working faster. Also writing a csv-file is easy with csv module.

import csv
with open('file.dat', 'r') as fin:
with open('file.csv', 'w') as fout:
    for line in fin:
        newline = map(str.strip, line.split('|'))
        if len(newline) == 6 and newline[3] and newline[4]:
            csv.writer(fout).writerow(newline)



回答4:


Use this:

data = pd.read_csv('file.dat', sep='|', header=0, skipinitialspace=True)
data.dropna(inplace=True)



回答5:


with open("filename.dat") as f:
    with open("filename.csv", "w") as f1:
        for line in f:
            f1.write(line)

This can be used to convert a .dat file to .csv file




回答6:


Combining previous answers I wrote my code for Python 2.7:

import csv

lat_index = 3
lon_index = 4
fields_num = 6
csv_counter = 0

with open("checkins.dat") as dat_file:
    with open("checkins.csv", "w") as csv_file:
        csv_writer = csv.writer(csv_file)
        for dat_line in dat_file:
            new_line = map(str.strip, dat_line.split('|'))
            if len(new_line) == fields_num and new_line[lat_index] and new_line[lon_index]:
                csv_writer.writerow(new_line)
                csv_counter += 1

print("Done. Total rows written: {:,}".format(csv_counter))



回答7:


This has worked for me:

data = pd.read_csv('file.dat',sep='::',names=list_for_names_of_columns)


来源:https://stackoverflow.com/questions/36845032/how-to-convert-dat-to-csv-using-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!