Comma separated Matrix from txt files - continued

不打扰是莪最后的温柔 提交于 2019-12-25 17:20:57

问题


I need to form a matrix from a list of textfiles containing frequency distribution of expressions. Therefore, I created a list of all that text files (lof) from a directory and used it to build a matrix (thanks to gboffy). Each filename in that list is structured in a way: CompanyName-SerialNumber_IssueDate_IFRS.txt (Example: GoldmanSachs-123456_31.12.2014_IFRS.txt). Each file's content is structured in a exact same way too:

CompanyABC-123456_31.12.2012_IFRS.txt

Company ABC-123456_31.12.2012
financial statement:4
corporate-taxes:8
assets:2
available-for-sale property:0
auditors:213

Company123-789102_31.12.2012_IFRS.txt

Company123-789102_31.12.2012
financial statement:15
corporate-taxes:3
assets:8
available-for-sale property:2
auditors:23

My desired output from this should be a single matrix file written to txt with one line for each company file consisting of (CompanyName,Serial Number,IssueDate,Frequency1,Frequency2,...,FrequencyN):

'CompanyABC','123456','31.12.2012','4','8','2','0','213' \n
'Company123','789102','31.12.2012','15','3','8','2','23' \n

Here is my code so far:

       def list_textfiles(directory, min_file_size):
            # Creates a list of all files stored in DIRECTORY ending on '.txt' with minimum file size
            textfiles = []
            for root, dirs, files in os.walk(directory):
                for name in files:
                    filename = os.path.join(root, name)
                    if os.stat(filename).st_size > min_file_size:
                        textfiles.append(filename)
            return textfiles

        directory = 'C:/CompanyFiles'
        minimum_size = 30000
        lof = list_textfiles(directory, minimum_size)

        res = []

        for f in lof:
            res += [[entry.split(':')[1] for entry in cdata ]
                    for cdata in [data.splitlines() for data in open(f).read().split('\n\n')]]

        with open('C:/CompanyFiles/Matrix.txt', 'wt') as outfile:
            outfile.write(str(res))

How can I modify my code to achieve the output as stated above?


回答1:


This should do the trick:

import os

outFile = 'C:/CompanyFiles/Matrix.txt'
folder = 'C:/CompanyFiles'

with open(outFile, 'w') as wfp:
    for f in os.listdir(inFolder):
        tmp = [line.rstrip() for line in open(os.path.join(folder, f), 'r')]
        arr = tmp[0].split('-')
        arr = [arr[0]] + arr[1].split('_')
        arr += [t.split(':')[1].strip() for t in tmp[1:]]
        wfp.write(','.join(["'" + e + "'" for e in arr]) + '\n')

Note: I haven't tested it thoroughly




回答2:


Try this, after your list of files

#your code

lof = list_textfiles(directory, minimum_size)

for i in lof:
    with open(i) as f:
        for j in f:
            out_list = []
            split_to_out = j.split("-")
            out_list.append(split_to_out[0])
            out_list.append(split_to_out[1].split("_")[0])
            out_list.append(split_to_out[1].split("_")[1])
            temp = next(f, None)
            while temp:
                out_list.append(temp.split(":")[-1])
                temp = next(f, None)
            out_list = [i.strip() for i in out_list]
            to_write = ",".join(out_list) + "\n"
            with open('/home/quadloops/Matrix.txt', 'a') as outfile:
                outfile.write(str(to_write))

>>>cat Matrix.txt
Company ABC,123456,31.12.2012,4,8,2,0,213
Company123,789102,31.12.2012,15,3,8,2,23

Change to to_write = ",".join(out_list) + "\n" gives

>>>cat Matrix.txt
'Company ABC','123456','31.12.2012','4','8','2','0','213'   
'Company123','789102','31.12.2012','15','3','8','2','23'


来源:https://stackoverflow.com/questions/29467731/comma-separated-matrix-from-txt-files-continued

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!