问题
Please help, I have a text file that looks something like this:
ID: 000001
Name: John Smith
Email: jsmith@ibm.com
Company: IBM
blah1: a
blah2: b
blah3: c
ID: 000002
Name: Jane Doe
Email: jdoe@ibm.com
Company: IBM
blah1: a
blah2: b
blah3: c
ID:000003
.
.
.
etc.
Notice that each customer's info is in 7 rows. The ID:000002 marks the start of the next customer, 000003 the next customer, so on and so forth.
I would like my output file to be like this (instead of each customer's data in the next rows, to have each ID and subsequent 7 rows to be transposed to columns):
ID: 000001,Name: John Smith,Email: jsmith@ibm.com,Company: IBM, blah1: a,blah2: b,blah3: c
ID: 000002,Name: Jane Doe,Email: jdoe@ibm.com,Company: IBM,blah1: a,blah2: b,blah3: c
I am not sure if this is the easiest technique, I tried using list but this doesn't seem to work for my purpose. I know my code is not elegant but this is just for automating some data manipulation my myself and one other person. I don't really need anything that's stylish, as long as it works.
#!/usr/bin/python
# open file
input = open ("C:\Documents\Customer.csv","r")
#write to a new file
output = open("C:\Documents\Customer1.csv","w")
#Read whole file into data
data = input.readlines()
list = []
for line in data:
if "User Id:" in line:
list.append(line)
if "User Email:" in line:
list.append(line)
if "Company:" in line:
list.append(line)
if "Contact Id:" in line:
list.append(line)
if "Contact Name:" in line:
list.append(line)
if "Contact Email:" in line:
list.append(line)
print list
import os
output.write("\n".join(list))
# Close the file
input.close()
output.close()
My output file contains escape characters and some customers are added more than once.
回答1:
Think about what you are trying to accompolish, and how simple it really is.
You have a giant list of things split into 7 lines a piece
first and foremost i would turn everything into a giant list just like you already did
data = input.readlines()
count them
totalUsers = len(data)/7 # it SHOULD be divisible by 7
this gives you how many iterations you should need to go over everything.. now its time to get slicey
users = []
start = 0 #because we start on 0
end = 6 # and end on 6 ( which is the 7th line )
for number in totalUsers:
person = totalUsers[start:end] # slicing, learn about it, its cool stuff
start += 7 # move start up 7
end +=7 # move end up 7
users.append(person)
回答2:
....
data = input.read() #read it all in
people = [person.replace("\n","") for person in data.split("ID:")]
data_new = "\nID:".join(people)
output.write(data_new.strip())
first read in your whole file as a big chunk
then split your data on "ID:" so that you have a list
for each item replace newlines with nothing
join your "people" list back together with "\nID:" to get one big block of text
write it back out to your output (and strip
it so that you get rid of any extra leading \n
's)
回答3:
Why does your code and input file differ? You have "ID:" vs "User Id:", "Email" vs "User Email:", etc..? Well anyways, you can do like this:
#!/usr/bin/python
# open file
input = open ("C:\Documents\Customer.csv","r")
#write to a new file
output = open("C:\Documents\Customer1.csv","w")
lines = [line.replace('\n',',') for line in input.split('ID:')]
output.write("\nID:".join(lines)[1:])
# Close files
input.close()
output.close()
Or, if you totally want to filter for specific fields in case something else pops in, like this:
#!/usr/bin/python
#import regex module
import re
# open input file
input = open ("C:\Documents\Customer.csv","r")
#open output file
output = open("C:\Documents\Customer1.csv","w")
#create search string
search = re.compile(r"""
ID:\s\d+|
Name:\s\w+\s\w+|
Email:\s\w+\@\w+\.\w+|
Company:\s\w+|
blah1:\s\w+|
blah2:\s\w+|
blah3:\s\w+
""", re.X)
#write to output joining parts with ',' and adding Newline before IDs
output.write(",".join(search.findall(input.read())).replace(',ID:','\nID:'))
# Close files
input.close()
output.close()
Take a note, in the last example it doesn't have to have 7 fields per person :)
And now with duplicates removed (order is not kept, and complete record is compared):
#!/usr/bin/python
#import regex module
import re
# open input file
input = open ("C:\Documents\Customer.csv","r")
#open output file
output = open("C:\Documents\Customer1.csv","w")
#create search string
search = re.compile(r"""
ID:\s\d+|
Name:\s\w+\s\w+|
Email:\s\w+\@\w+\.\w+|
Company:\s\w+|
blah1:\s\w+|
blah2:\s\w+|
blah3:\s\w+
""", re.X)
# create data joining parts with ',' and adding Newline before IDs
data = ",".join(search.findall(input.read())).replace(',ID:','\nID:')
# split data into list
# removing duplicates out of strings with set() and joining result back
# together for the output
output.write("\n".join(set(data.split('\n'))))
# Close files
input.close()
output.close()
来源:https://stackoverflow.com/questions/16725697/how-to-transpose-lines-to-column-for-only-7-rows-at-a-time-in-file