问题
I am learning Python by writing some simple programs. I am trying to do the following -
I have an xlsx. It is of the format:
Team, Player
What I want to do is apply a filter to the field Team, then take a random subset of 3 players from EACH team.
So for instance, my XLS looks like :
Man Utd, Ryan Giggs
Man Utd, Paul Scholes
Man Utd, Paul Ince
Man Utd, Danny Pugh
Liverpool, Steven Gerrard
Liverpool, Kenny Dalglish
...
I want to end up with an XLS consisting of 3 random players from each team, and only 1/2 in the case where there is less than 3 (this is what I am struggling with).
I've started this out like so :
import xlrd, random, csv
# First open the workbook
wb = xlrd.open_workbook('C:\\Users\\ADMIN\\Desktop\\1.xlsx')
# Then select the sheet.
sheet = wb.sheet_by_name('Sheet1')
# Then get values of each column. Excuse first item which is header
teams = sheet.col_values(0)[1:]
players = sheet.col_values(1)[1:]
filtered_teams = filter(lambda x: x[0] > 2, zip(teams, players))
teams = {}
for t,p in zip(teams,players):
if t in teams:
teams[t].append(p)
else:
teams[t] = [p]
samples = [teams[t] + random.sample(teams[t],3) for t in teams]
myFile = open('C:\\Users\\ADMIN\\Desktop\\1.csv', 'wb')
wr = csv.writer(myFile, quoting=csv.QUOTE_ALL)
wr.writerow(samples)
The problems I am having -
wr.writerow(samples)
TypeError: a bytes-like object is required, not 'str'
Do I need to do some kind of explicit cast here ? How can I fix this?
Also, when creating samples, the list of all the teams/players if I use :
samples = [teams[t] + random.sample(teams[t],1) for t in teams]
it will work, but if I use :
samples = [teams[t] + random.sample(teams[t],3) for t in teams]
I get an out of bounds exception, since some teams do not have 3 players associated with them (only 1). To be exact, I get :
raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative
(which my simple Java brain has understood as an OOB).
How can I fix this/just get it to move to the next team at this point? Some kind of try{assignment} catch{move to next team} mechanism.
Can anyone offer any feedback/advice please ?
Thank you !
EDIT:
The errors being thrown were solved by Jean-François Fabre below, thank you very much. However, now, when I write to CSV, it only returns 17 rows ( there should be hundreds), and the format is completely wrong...I was hoping to write something like :
Man Utd, Ryan Giggs
Man Utd, Paul Scholes
Man Utd, Danny Pugh
Liverpool, Steven Gerrard
Liverpool, Kenny Dalglish
but it seems to be just the players getting returned, without any real ordering ? Indeed if I change it to random.sample(teams[t],min(2,len(teams[t])) I still have 5/6 players returned per each team...
Any idea what my logical error could be here ?
回答1:
Well, this is a somehow 2 (now 3 :)) questions in one. Since I have the answer for all of them, I'll jump in:
myFile = open('C:\\Users\\ADMIN\\Desktop\\1.csv', 'wb')
only works for Python 2. For python 3 you have to open in text mode (and possibly have to add newline="" to avoid spurious line blanks:
myFile = open('C:\\Users\\ADMIN\\Desktop\\1.csv', 'w', newline="")
and for your other problem, just change:
random.sample(teams[t],3)
into
random.sample(teams[t],min(3,len(teams[t]))
so you're always within bounds.
Now, about the trashed output, you're creating a list of lists but you only write one row using writerow. This shocked my at first, but then I forgot :) Use writerows instead, or you'll get only one line of lists represented as strings, with brackets, commas...
Another last issue: the team information is missing from the file because you only generate player names.
To sum it up I'd rewrite the whole thing like this with some improvements:
samples = [[team] + players + random.sample(players,min(3,len(players)) for team,players in teams.items()]
with open(r'C:\Users\ADMIN\Desktop\1.csv', 'w', newline='') as myFile:
wr = csv.writer(myFile, quoting=csv.QUOTE_ALL)
wr.writerows(samples)
来源:https://stackoverflow.com/questions/42442955/handling-out-of-bounds-in-python-writing-to-csv