问题
This is very related to a previous question but I realised that my objective is much more complicated:
I have a sentence: "Forbes Asia 200 Best Under 500 Billion 2011"
I have tokens like:
oldTokens = [u'Forbes', u'Asia', u'200', u'Best', u'Under', u'500', u'Billion', u'2011']
And the indices of where a previous parser has figured out where there should be location or number slots:
numberTokenIDs = {(7,): 2011.0, (2,): 200.0, (5,6): 500000000000.00}
locationTokenIDs = {(0, 1): u'Forbes Asia'}
The token IDs correspond to the index of the tokens where there are locations or numbers, the objective is to obtain a new set of tokens like:
newTokens = [u'Asia', u'200', u'Best', u'Under', u'500', u'2011']
With new number and location tokenIDs perhaps like (to avoid index out of bounds exceptions):
numberTokenIDs = {(5,): 2011.0, (1,): 200.0, (4,): 500000000000.00}
locationTokenIDs = {(0,): u'Forbes Asia'}
Essentially I would like to go through the new reduced set of tokens, and be able to ultimately create a new sentence called:
"LOCATION_SLOT NUMBER_SLOT Best Under NUMBER_SLOT NUMBER_SLOT"
via going through the new set of tokens and replacing the correct tokenID with either "LOCATION_SLOT" or "NUMBER_SLOT". If I did this with the current set of number and location token IDs, I would get:
"LOCATION_SLOT LOCATION_SLOT NUMBER_SLOT Best Under NUMBER_SLOT NUMBER_SLOT NUMBER_SLOT".
How would I do this?
Another example is:
Location token IDs are: (0, 1)
Number token IDs are: (3, 4)
Old sampleTokens [u'United', u'Kingdom', u'USD', u'1.240', u'billion']
Where I want to both delete tokens and also change location and number token IDs to be able to replace the sentence like:
sampleTokens[numberTokenID] = "NUMBER_SLOT"
sampleTokens[locationTokenID] = "LOCATION_SLOT"
Such that the replaced tokens are [u'LOCATION_SLOT', u'USD', u'NUMBER_SLOT']
回答1:
Not a very elegant, but working solution:
oldTokens = [u'Forbes', u'Asia', u'200', u'Best', u'Under', u'500', u'Billion', u'2011']
numberTokenIDs = {(7,): 2011.0, (2,): 200.0, (5,6): 500000000000.00}
locationTokenIDs = {(0, 1): u'Forbes Asia'}
newTokens = []
newnumberTokenIDs = {}
newlocationTokenIDs = {}
new_ind = 0
skip = False
for ind in range(len(oldTokens)):
if skip:
skip=False
continue
for loc_ind in locationTokenIDs.keys():
if ind in loc_ind:
newTokens.append(oldTokens[ind+1])
newlocationTokenIDs[(new_ind,)] = locationTokenIDs[loc_ind]
new_ind += 1
if len(loc_ind) > 1: # Skip next position if there are 2 elements in a tuple
skip = True
break
else:
for num_ind in numberTokenIDs.keys():
if ind in num_ind:
newTokens.append(oldTokens[ind])
newnumberTokenIDs[(new_ind,)] = numberTokenIDs[num_ind]
new_ind += 1
if len(num_ind) > 1:
skip = True
break
else:
newTokens.append(oldTokens[ind])
new_ind += 1
newTokens
Out[37]: [u'Asia', u'200', u'Best', u'Under', u'500', u'2011']
newnumberTokenIDs
Out[38]: {(1,): 200.0, (4,): 500000000000.0, (5,): 2011.0}
newlocationTokenIDs
Out[39]: {(0,): u'Forbes Asia'}
来源:https://stackoverflow.com/questions/38509239/need-to-remove-items-from-both-a-list-and-a-dictionary-of-tuple-value-pairs-at-s