Remove overlapping numbers from inside a tuple in python such that no 2 tuples have the same starting or ending number

三世轮回 提交于 2020-05-17 06:04:21

问题


I have a list of tuples. Each tuple consists of a string and a dict. Now each dict in that, consists of a list of tuples. The size of the list is around 8K entries.

Sample data:

dataset = [('made of iron oxide', {'entities': [(12, 16, 'PRODUCT'), (17, 20, 'PRODUCT'), (15, 24, 'PRODUCT'), (12, 19, 'PRODUCT')]}),('made of ferric oxide', {'entities': [(10, 15, 'PRODUCT'), (12, 15, 'PRODUCT'), (624, 651, 'PRODUCT'), (1937, 1956, 'PRODUCT')]})]

From here output expected is:

dataset = [('made of iron oxide', {'entities': [(17, 20, 'PRODUCT'), (15, 24, 'PRODUCT'), (12, 19, 'PRODUCT')]}), ('made of ferric oxide', {'entities': [(10, 15, 'PRODUCT'), (624, 651, 'PRODUCT'), (1937, 1956, 'PRODUCT')]})]

Note: (12, 19, 'PRODUCT') is kept in output as the difference between start to end number is greater than (12, 16, 'PRODUCT'). PRODUCT is just a label and inconsequential.

These numbers are indexes of the sentences whose entities index are being displayed. Random sentences have been put in the example as it is inconseqential and the operation needs to be only on the entities dict. I want to remove overlapping numbers in my list and only keep those index values of entities that have the greatest length i.e., any value of entities cannot have the same starting or end number.


回答1:


class Solution(object): #Ref. https://www.geeksforgeeks.org/merging-intervals/
   def merge(self, intervals):
      """
      :type intervals: List[Interval]
      :rtype: List[Interval]
      """
      if len(intervals) == 0:
         return []
      self.quicksort(intervals,0,len(intervals)-1)
      #for i in intervals:
         #print(i.start, i.end)
      stack = []
      stack.append(intervals[0])
      for i in range(1,len(intervals)):
         last_element= stack[len(stack)-1]
         if last_element[1] >= intervals[i][0]:
            last_element[1] = max(intervals[i][1],last_element[1])
            stack.pop(len(stack)-1)
            stack.append(last_element)
         else:
            stack.append(intervals[i])
      return stack
   def partition(self,array,start,end):
      pivot_index = start
      for i in range(start,end):
         if array[i][0]<=array[end][0]:
            array[i],array[pivot_index] =array[pivot_index],array[i]
            pivot_index+=1
      array[end],array[pivot_index] =array[pivot_index],array[end]
      return pivot_index
   def quicksort(self,array,start,end):
      if start<end:
         partition_index = self.partition(array,start,end)
         self.quicksort(array,start,partition_index-1)
         self.quicksort(array, partition_index + 1, end)

for i in range(len(dataset)): #Your Solution
    arr1 = []
    for item in dataset[i][1]['entities']:
        arr1.append([item[0],item[1]])
    ob1 = Solution()
    arr2 = ob1.merge(arr1)
    arr3=[]
    for item in arr2:
        arr3.append((item[0],item[1], 'PRODUCT'))
    dataset[i][1]['entities'] = arr3


来源:https://stackoverflow.com/questions/61466189/remove-overlapping-numbers-from-inside-a-tuple-in-python-such-that-no-2-tuples-h

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!