问题
right now I have Python 3 code that takes a column of data within a CSV file, delimits the phrases in each cell into individual words based on spaces, then exports the data back into a new CSV file.
What I am wondering about is if there is a way to tell python to only apply the formatting code to a specific column with a particular header?
Here is what my source data looks like
Keyword Source Number
Lions Tigers Bears US 3
Dogs Zebra Canada 5
Sharks Guppies US 2
and here is my code which delimits the phrases in each cell into individual words based on a space
with open(b'C:\Users\jk\Desktop\helloworld.csv', 'r') as datafile:
data = []
for row in datafile:
data.extend(item.strip() for item in row.split())
with open('test.csv', 'w') as a_file:
for result in data:
result = ''.join(result)
a_file.write(result + '\n')
print(result)
so that the source data becomes
Keywords Source Number
Lions US 3
Tigers
Bears
Dogs Canada 5
etc
In this case, I only need all of this code to apply to the one column with the heading Keyword. Ideally, what I am trying to do is also extend the data found in the "Source" and "Number" to these newly created rows (Lions US 3 -- Tigers US 3 -- Bears US 3 etc) but I haven't really figured out that part yet!
I've been poking around the forum for awhile trying to find an answer and I know you can tell python to read the first line of the CSV file where the headers are placed (headers = file.readline()) but beyond that I am lost. Would this be an easier task using the CSV reader?
回答1:
Use the csv module to split your data into columns. Use the csv.DictReader() object to make it easier to select a column by the header:
import csv
source = r'C:\Users\jk\Desktop\helloworld.csv'
dest = 'test.csv'
with open(source, newline='') as inf, open(dest, 'w', newline='') as outf:
reader = csv.DictReader(inf)
writer = csv.DictWriter(outf, fieldnames=reader.fieldnames)
for row in reader:
words = row['Keyword'].split()
row['Keyword'] = words[0]
writer.writerow(row)
writer.writerows({'Keyword': w} for w in words[1:])
The DictReader() will read the first row from your file and use it as the keys for the dictionaries produced for each row; so a row looks like:
{'Keyword': 'Lions Tigers Bears', 'Source': 'US', 'Number': '3'}
Now you can address each column individually, and update the dictionary with just the first word of the Keyword column before producing additional rows for the remaining words.
I'm assuming here that your files are comma separated. If a different delimiter is needed, then set the delimiter argument to that character:
reader = csv.DictReader(inf, delimiter='\t')
for a tab-separated format. See the module documentation for the various options, including pre-defined format combinations called dialects.
Demo:
>>> import sys
>>> import csv
>>> from io import StringIO
>>> sample = StringIO('''\
... Keyword,Source,Number
... Lions Tigers Bears,US,3
... Dogs Zebra,Canada,5
... Sharks Guppies,US,2
... ''')
>>> output = StringIO()
>>> reader = csv.DictReader(sample)
>>> writer = csv.DictWriter(output, fieldnames=reader.fieldnames)
>>> for row in reader:
... words = row['Keyword'].split()
... row['Keyword'] = words[0]
... writer.writerow(row)
... writer.writerows({'Keyword': w} for w in words[1:])
...
12
15
13
>>> print(output.getvalue())
Lions,US,3
Tigers,,
Bears,,
Dogs,Canada,5
Zebras,,
Sharks,US,2
Guppies,,
来源:https://stackoverflow.com/questions/25341417/find-a-specific-header-in-a-csv-file-using-python-3-code