Remove overlapping numbers from inside a tuple in python such that no 2 tuples have the same starting or ending number

问题

I have a list of tuples. Each tuple consists of a string and a dict. Now each dict in that, consists of a list of tuples. The size of the list is around 8K entries.

Sample data:

dataset = [('made of iron oxide', {'entities': [(12, 16, 'PRODUCT'), (17, 20, 'PRODUCT'), (15, 24, 'PRODUCT'), (12, 19, 'PRODUCT')]}),('made of ferric oxide', {'entities': [(10, 15, 'PRODUCT'), (12, 15, 'PRODUCT'), (624, 651, 'PRODUCT'), (1937, 1956, 'PRODUCT')]})]

From here output expected is:

dataset = [('made of iron oxide', {'entities': [(17, 20, 'PRODUCT'), (15, 24, 'PRODUCT'), (12, 19, 'PRODUCT')]}), ('made of ferric oxide', {'entities': [(10, 15, 'PRODUCT'), (624, 651, 'PRODUCT'), (1937, 1956, 'PRODUCT')]})]

Note: (12, 19, 'PRODUCT') is kept in output as the difference between start to end number is greater than (12, 16, 'PRODUCT'). PRODUCT is just a label and inconsequential.

These numbers are indexes of the sentences whose entities index are being displayed. Random sentences have been put in the example as it is inconseqential and the operation needs to be only on the entities dict. I want to remove overlapping numbers in my list and only keep those index values of entities that have the greatest length i.e., any value of entities cannot have the same starting or end number.

回答1:

class Solution(object): #Ref. https://www.geeksforgeeks.org/merging-intervals/
   def merge(self, intervals):
      """
      :type intervals: List[Interval]
      :rtype: List[Interval]
      """
      if len(intervals) == 0:
         return []
      self.quicksort(intervals,0,len(intervals)-1)
      #for i in intervals:
         #print(i.start, i.end)
      stack = []
      stack.append(intervals[0])
      for i in range(1,len(intervals)):
         last_element= stack[len(stack)-1]
         if last_element[1] >= intervals[i][0]:
            last_element[1] = max(intervals[i][1],last_element[1])
            stack.pop(len(stack)-1)
            stack.append(last_element)
         else:
            stack.append(intervals[i])
      return stack
   def partition(self,array,start,end):
      pivot_index = start
      for i in range(start,end):
         if array[i][0]<=array[end][0]:
            array[i],array[pivot_index] =array[pivot_index],array[i]
            pivot_index+=1
      array[end],array[pivot_index] =array[pivot_index],array[end]
      return pivot_index
   def quicksort(self,array,start,end):
      if start<end:
         partition_index = self.partition(array,start,end)
         self.quicksort(array,start,partition_index-1)
         self.quicksort(array, partition_index + 1, end)

for i in range(len(dataset)): #Your Solution
    arr1 = []
    for item in dataset[i][1]['entities']:
        arr1.append([item[0],item[1]])
    ob1 = Solution()
    arr2 = ob1.merge(arr1)
    arr3=[]
    for item in arr2:
        arr3.append((item[0],item[1], 'PRODUCT'))
    dataset[i][1]['entities'] = arr3

来源：https://stackoverflow.com/questions/61466189/remove-overlapping-numbers-from-inside-a-tuple-in-python-such-that-no-2-tuples-h

标签

python

list

dictionary

tuples

spacy