Problem with range() function when used with readline() or counter - reads and processes only last line in files

问题

Have two simple 20 line text files. Current script below only reads line 20 in both, runs main 'context_input' process without errors, then exits. Need to apply same process to all lines 1-20.

Same result if using counter with import sys. Requirement is to read strings not create a list. readlines() will cause errors. Any code snippets for setting a proper loop to accomplish this are appreciated.

# coding=utf-8

from src.model_use import TextGeneration
from src.utils import DEFAULT_DECODING_STRATEGY, MEDIUM
from src.flexible_models.flexible_GPT2 import FlexibleGPT2
from src.torch_loader import GenerationInput

from transformers import GPT2LMHeadModel, GPT2Tokenizer

def main():

    with open("data/test-P1-Multi.txt","r") as f:
         for i in range(20):
            P1 = f.readline()

    with open("data/test-P3-Multi.txt","r") as f:
         for i in range(20):
            P3 = f.readline()

    context_input = GenerationInput(P1=P1, P3=P3, size=MEDIUM)

    print("\n", "-"*100, "\n", "PREDICTION WITH CONTEXT WITHOUT SPECIAL TOKENS")
    model = GPT2LMHeadModel.from_pretrained('models/774M')
    tokenizer = GPT2Tokenizer.from_pretrained('models/774M')
    GPT2_model = FlexibleGPT2(model, tokenizer, DEFAULT_DECODING_STRATEGY)

    text_generator_with_context = TextGeneration(GPT2_model, use_context=True)

    predictions = text_generator_with_context(context_input, nb_samples=1)
    for i, prediction in enumerate(predictions):
        print('prediction n°', i, ': ', prediction)

    del model, tokenizer, GPT2_model

if __name__ == "__main__":
    main()

回答1:

So, your fix is going to be in main by reorganizing the with and if statements:

def main():
  with open("data/test-P1-Multi.txt","r") as f1, open("data/test-P3-Multi.txt","r") as f3:
    for i in range(20):
      P1 = f1.readline()
      P3 = f3.readline()

      context_input = GenerationInput(P1=P1, P3=P3, size=MEDIUM)

      print("\n", "-"*100, "\n", "PREDICTION WITH CONTEXT WITHOUT SPECIAL TOKENS")
      model = GPT2LMHeadModel.from_pretrained('models/774M')
      tokenizer = GPT2Tokenizer.from_pretrained('models/774M')
      GPT2_model = FlexibleGPT2(model, tokenizer, DEFAULT_DECODING_STRATEGY)

      text_generator_with_context = TextGeneration(GPT2_model, use_context=True)

      predictions = text_generator_with_context(context_input, nb_samples=1)
      for i, prediction in enumerate(predictions):
          print('prediction n°', i, ': ', prediction)

      del model, tokenizer, GPT2_model

Note: you might be able to pull some code out of the loop if it doesn't change between lines so you don't need to re-initialize them over and over again, but I'm not familiar with what you imported.

来源：https://stackoverflow.com/questions/61506776/problem-with-range-function-when-used-with-readline-or-counter-reads-and-p

标签

python-3.x

range

readline