read csv files in python

*爱你&永不变心* 提交于 2021-02-17 07:19:12

问题


I need to read one column from the multiple csv file present in folder and then extract minimum and maximum dates from the column.

For e.g. if i have folder path "/usr/abc/xyz/" and multiple csv files are present as below

aaa.csv
bbb.csv
ccc.csv

and the files are containing data

aaa.csv is containing the data

name,address,dates
xxx,11111,20190101
yyy,22222,20190201
zzz,33333,20190101

bbb.csv is containing the data

name,address,dates
fff,11111,20190301
ggg,22222,20190501
hhh,33333,20190601

so I need to extract the minimum and maximum dates from the files and in the above case the date range should be 20190101 to 20190601

Can anyone please help how can i extract the minimum and maximum dates from the files in python

I need to avoid pandas or any other package as I need to read csv files in directly in pyhton


回答1:


import pandas as pd

dt = pd.read_csv('you_csv.csv')
print(max(dt['dates']))
print(min(dt['dates']))

If you need to avoid pandas you can do the following which is not recommended at all:

dt = []
with open('your_csv.csv', 'r') as f:
    data = f.readlines()
for row in data:
    dt.append(row.split(',')[2].rstrip())
dt.pop(0)
print(max(dt))
print(min(dt))



回答2:


A solution only using the available core libraries. It doesn't read the whole file into memory so should have a very low footprint and will work with larger files.

  • pathlib is used to get all the csv files
  • datetime is used to convert to dates
  • sys is used for user input
$ python3 date_min_max.py /usr/abc/xyz/
min date: 2019-01-01 00:00:00
max date: 2019-06-01 00:00:00
date_min_max.py
from pathlib import Path
from datetime import datetime
import sys


if len(sys.argv) > 1:
    p = sys.argv[1]
else:
    p = "."

files = [x for x in Path(p).iterdir() if x.suffix == ".csv"]

date_format = "%Y%m%d"

dt_max = datetime.strptime("19000101", date_format)
dt_min = datetime.strptime("30000101", date_format)
for file in files:
    with file.open("r") as fh:
        for i, line in enumerate(fh):
            if i == 0:
                continue
            t = line.strip().split(",")[2]
            dt_max = max(dt_max, datetime.strptime(t, date_format))
            dt_min = min(dt_min, datetime.strptime(t, date_format))


print("min date: {}\nmax date: {}".format(dt_min, dt_max))


来源:https://stackoverflow.com/questions/57956289/read-csv-files-in-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!