问题
From the list below I'm able to remove the non-alphabet characters but fall short all the same. I want the Draw eliminated without affecting the desired outcome.
df=pd.DataFrame({'Teams': ['Lakefield United',
'101002 Castle FC pk, +½ 1.81 o 3.05 o Un 2 1.92 o',
'101003 Draw 3.00 o',
'Boms',
'101005 Riverside FC pk 2.11 o 2.86 o Un 2, 2½ 1.78 o',
'101006 Draw 3.10 o',
'Barmley',
'101011 Colsely Lakers -1, -1½ 2.04 o 1.46 o Un 2½, 3 1.83 o',
'101012 Draw 4.40 o',]})
Desired Elements: 'Lakefield United\nCastle FC','Boms\nRiverside FC','Barmley\nColsely Lakers' etc.
回答1:
As others have suggested, this requires constructing a termination list for determining when the second team's name ends. Below is one way to do it. You may need to add more items to your termination list, but there will be a limited number of them. I have also converted the dataframe to a list for ease of manipulation.
lst = df.to_numpy().tolist() #convert dataframe to list
termination_list = [",", "p", "-"] #develop list of strings that would terminate the team name
termination_list += [str(n) for n in range(10)] #add number strings to termination_list
result = []
for i in range(len(lst)):
if i% 3 == 0: #string containing first team
next_string = lst[i + 1][0] #string containing second team
next_team = ""
for j in range(6, len(next_string)):
if next_string[j] in termination_list:
break
next_team += next_string[j] #construct next team name, letter by letter
result.append((lst[i][0], next_team)) #append tuple of (first team, second team)
print(result)
Note that this code might cut short the name of any team that has "p" in it. You can fix this by refining the termination list.
来源:https://stackoverflow.com/questions/65659338/is-regex-or-replace-method-best-to-clean-up-list-re-pandas-environment