how to prevent duplicate text in the output file while using for loop

问题

I have this code which compares a number to a number(what i called item in my code) in the domain range to see if it is already there. If it its then print to the output file if it is not then only print it once.

Question How to make sure that if the number isn't between the domain range then print only one time. ( I used true and false statements but this doesn't work because when it is false, it would print several duplicates- on the code below i am not sure how to implement so that it print the number that not in the domain range once instead of multiple times )

for item in lookup[uniprotID]:
    for varain in wholelookup[uniprotID]:
        for names in wholeline[uniprotID]:
            statement=False
    if re.search(r'\d+',varain).group(0)==item and start <= int(item) <= end:
        result = str(int(item) - start + 1)
        if varain in names.split(' '):
            statement = True
            print ">{0} | at position {1} | start= {2}, end= {3} | description: {4} | {5}".format(uniprotID, result, start, end, varain, names)
            if statement == True:
                print(''.join(makeList[start-1:end]))

回答1:

Something based on this might work for you:

already_seen = set()
for line in sys.stdin:
   if line not in already_seen:
      already_seen.add(line)
      sys.stdout.write(line)

Not that if your files are large, you could end up consuming a lot of Virtual Memory doing this. If so, look into anydbm or a bloom filter.

回答2:

Store the values that are not in the range.

stored_prints = {}

if not ( start <= int( item ) <= end ):
    try:
        stored_prints[item]++
    except:
        stored_prints[item] = 1

print stored_prints

You will have to format and fit it to your need though, but this should do what you need it to do if I understood your question correctly.

来源：https://stackoverflow.com/questions/11638659/how-to-prevent-duplicate-text-in-the-output-file-while-using-for-loop

标签

python

duplicates