Please help, I have a text file that looks something like this:
ID: 000001
Name: John Smith
Email: jsmith@ibm.com
Company: IBM
blah1: a
blah2: b
blah3: c
ID: 000002
Name: Jane Doe
Email: jdoe@ibm.com
Company: IBM
blah1: a
blah2: b
blah3: c
ID:000003
.
.
.
etc.
Notice that each customer's info is in 7 rows. The ID:000002 marks the start of the next customer, 000003 the next customer, so on and so forth.
I would like my output file to be like this (instead of each customer's data in the next rows, to have each ID and subsequent 7 rows to be transposed to columns):
ID: 000001,Name: John Smith,Email: jsmith@ibm.com,Company: IBM, blah1: a,blah2: b,blah3: c
ID: 000002,Name: Jane Doe,Email: jdoe@ibm.com,Company: IBM,blah1: a,blah2: b,blah3: c
I am not sure if this is the easiest technique, I tried using list but this doesn't seem to work for my purpose. I know my code is not elegant but this is just for automating some data manipulation my myself and one other person. I don't really need anything that's stylish, as long as it works.
#!/usr/bin/python
# open file
input = open ("C:\Documents\Customer.csv","r")
#write to a new file
output = open("C:\Documents\Customer1.csv","w")
#Read whole file into data
data = input.readlines()
list = []
for line in data:
if "User Id:" in line:
list.append(line)
if "User Email:" in line:
list.append(line)
if "Company:" in line:
list.append(line)
if "Contact Id:" in line:
list.append(line)
if "Contact Name:" in line:
list.append(line)
if "Contact Email:" in line:
list.append(line)
print list
import os
output.write("\n".join(list))
# Close the file
input.close()
output.close()
My output file contains escape characters and some customers are added more than once.
Think about what you are trying to accompolish, and how simple it really is.
You have a giant list of things split into 7 lines a piece
first and foremost i would turn everything into a giant list just like you already did
data = input.readlines()
count them
totalUsers = len(data)/7 # it SHOULD be divisible by 7
this gives you how many iterations you should need to go over everything.. now its time to get slicey
users = []
start = 0 #because we start on 0
end = 6 # and end on 6 ( which is the 7th line )
for number in totalUsers:
person = totalUsers[start:end] # slicing, learn about it, its cool stuff
start += 7 # move start up 7
end +=7 # move end up 7
users.append(person)
....
data = input.read() #read it all in
people = [person.replace("\n","") for person in data.split("ID:")]
data_new = "\nID:".join(people)
output.write(data_new.strip())
first read in your whole file as a big chunk
then split your data on "ID:" so that you have a list
for each item replace newlines with nothing
join your "people" list back together with "\nID:" to get one big block of text
write it back out to your output (and strip
it so that you get rid of any extra leading \n
's)
Why does your code and input file differ? You have "ID:" vs "User Id:", "Email" vs "User Email:", etc..? Well anyways, you can do like this:
#!/usr/bin/python
# open file
input = open ("C:\Documents\Customer.csv","r")
#write to a new file
output = open("C:\Documents\Customer1.csv","w")
lines = [line.replace('\n',',') for line in input.split('ID:')]
output.write("\nID:".join(lines)[1:])
# Close files
input.close()
output.close()
Or, if you totally want to filter for specific fields in case something else pops in, like this:
#!/usr/bin/python
#import regex module
import re
# open input file
input = open ("C:\Documents\Customer.csv","r")
#open output file
output = open("C:\Documents\Customer1.csv","w")
#create search string
search = re.compile(r"""
ID:\s\d+|
Name:\s\w+\s\w+|
Email:\s\w+\@\w+\.\w+|
Company:\s\w+|
blah1:\s\w+|
blah2:\s\w+|
blah3:\s\w+
""", re.X)
#write to output joining parts with ',' and adding Newline before IDs
output.write(",".join(search.findall(input.read())).replace(',ID:','\nID:'))
# Close files
input.close()
output.close()
Take a note, in the last example it doesn't have to have 7 fields per person :)
And now with duplicates removed (order is not kept, and complete record is compared):
#!/usr/bin/python
#import regex module
import re
# open input file
input = open ("C:\Documents\Customer.csv","r")
#open output file
output = open("C:\Documents\Customer1.csv","w")
#create search string
search = re.compile(r"""
ID:\s\d+|
Name:\s\w+\s\w+|
Email:\s\w+\@\w+\.\w+|
Company:\s\w+|
blah1:\s\w+|
blah2:\s\w+|
blah3:\s\w+
""", re.X)
# create data joining parts with ',' and adding Newline before IDs
data = ",".join(search.findall(input.read())).replace(',ID:','\nID:')
# split data into list
# removing duplicates out of strings with set() and joining result back
# together for the output
output.write("\n".join(set(data.split('\n'))))
# Close files
input.close()
output.close()
来源:https://stackoverflow.com/questions/16725697/how-to-transpose-lines-to-column-for-only-7-rows-at-a-time-in-file