问题
I have a list of tuples. Each tuple consists of a string and a dict. Now each dict in that, consists of a list of tuples. The size of the list is around 8K entries.
Sample data:
dataset = [('made of iron oxide', {'entities': [(12, 16, 'PRODUCT'), (17, 20, 'PRODUCT'), (15, 24, 'PRODUCT'), (12, 19, 'PRODUCT')]}),('made of ferric oxide', {'entities': [(10, 15, 'PRODUCT'), (12, 15, 'PRODUCT'), (624, 651, 'PRODUCT'), (1937, 1956, 'PRODUCT')]})]
From here output expected is:
dataset = [('made of iron oxide', {'entities': [(17, 20, 'PRODUCT'), (15, 24, 'PRODUCT'), (12, 19, 'PRODUCT')]}), ('made of ferric oxide', {'entities': [(10, 15, 'PRODUCT'), (624, 651, 'PRODUCT'), (1937, 1956, 'PRODUCT')]})]
Note:
(12, 19, 'PRODUCT') is kept in output as the difference between start to end number is greater than (12, 16, 'PRODUCT'). PRODUCT is just a label and inconsequential.
These numbers are indexes of the sentences whose entities
index are being displayed. Random sentences have been put in the example as it is inconseqential and the operation needs to be only on the entities
dict. I want to remove overlapping numbers in my list and only keep those index values of entities
that have the greatest length i.e., any value of entities cannot have the same starting or end number
.
回答1:
class Solution(object): #Ref. https://www.geeksforgeeks.org/merging-intervals/
def merge(self, intervals):
"""
:type intervals: List[Interval]
:rtype: List[Interval]
"""
if len(intervals) == 0:
return []
self.quicksort(intervals,0,len(intervals)-1)
#for i in intervals:
#print(i.start, i.end)
stack = []
stack.append(intervals[0])
for i in range(1,len(intervals)):
last_element= stack[len(stack)-1]
if last_element[1] >= intervals[i][0]:
last_element[1] = max(intervals[i][1],last_element[1])
stack.pop(len(stack)-1)
stack.append(last_element)
else:
stack.append(intervals[i])
return stack
def partition(self,array,start,end):
pivot_index = start
for i in range(start,end):
if array[i][0]<=array[end][0]:
array[i],array[pivot_index] =array[pivot_index],array[i]
pivot_index+=1
array[end],array[pivot_index] =array[pivot_index],array[end]
return pivot_index
def quicksort(self,array,start,end):
if start<end:
partition_index = self.partition(array,start,end)
self.quicksort(array,start,partition_index-1)
self.quicksort(array, partition_index + 1, end)
for i in range(len(dataset)): #Your Solution
arr1 = []
for item in dataset[i][1]['entities']:
arr1.append([item[0],item[1]])
ob1 = Solution()
arr2 = ob1.merge(arr1)
arr3=[]
for item in arr2:
arr3.append((item[0],item[1], 'PRODUCT'))
dataset[i][1]['entities'] = arr3
来源:https://stackoverflow.com/questions/61466189/remove-overlapping-numbers-from-inside-a-tuple-in-python-such-that-no-2-tuples-h