Use Python to select rows with a particular range of values in one column

前端 未结 3 932
执笔经年
执笔经年 2020-12-30 18:00

I know this is simple, but I\'m a new user to Python so I\'m having a bit of trouble here. I\'m using Python 3 by the way.

I have multiple files that look something

相关标签:
3条回答
  • 2020-12-30 18:13

    ITYM

    with open("addressbook1.txt", 'r') as f:
        # with automatically closes
        file_data = ((line, line.split("\t")) for line in f)
        with open("college_age.txt", 'w') as g, open("adult_age.txt", 'w') as h:
            for line, (name, date, age, sex, color) in file_data:
                if int(age) < 23: # float() if it is not an integer...
                    g.write(line)
                else:
                    h.write(line)
    

    It might look like the file data is iterated through several times. But thanks to the generator expression, file data is just a generator handing out the next line of the file if asked to do so. And it is asked to do so in the for loop. That means, every item retrieved by the for loop comes from the generator file_data where on request each file line gets transformed into a tuple holding the complete line (for copying) as well as its components (for testing).

    An alternative could be

    file_data = ((line, line.split("\t")) for line in iter(f.readline, ''))
    
    • it is closer to readlines() than iterating over the file. As readline() acts behind the scenes slightly different from iteration over the file, it might be necessary to do so.

    (If you don't like functional programming, you as well could create a generator function manually calling readline() until an empty string is returned.

    And if you don't like nested generators at all, you can do

    with open("addressbook1.txt", 'r') as f, open("college_age.txt", 'w') as g, open("adult_age.txt", 'w') as h:
        for line in f:
            name, date, age, sex, color = line.split("\t")
            if int(age) < 23: # float() if it is not an integer...
                g.write(line)
            else:
                h.write(line)
    

    which does exactly the same.)

    0 讨论(0)
  • 2020-12-30 18:33

    The issue here is that you are using readlines() twice, which means that the data is read the first time, then nothing is left the second time.

    You can iterate directly over the file without using readlines() - in fact, this is the better way, as it doesn't read the whole file in at once.

    While you could do what you are trying to do by using str.split() as you have, the better option is to use the csv module, which is designed for the task.

    import csv
    
    with open("addressbook1.txt") as input, open("college_age.txt", "w") as college, open("adult_age.txt", "w") as adult:
       reader = csv.DictReader(input, dialect="excel-tab")
       fieldnames = reader.fieldnames
       writer_college = csv.DictWriter(college, fieldnames, dialect="excel-tab")
       writer_adult = csv.DictWriter(adult, fieldnames, dialect="excel-tab")
       writer_college.writeheader()
       writer_adult.writeheader()
       for row in reader:
           if int(row["Age"]) < 23:
              writer_college.writerow(row)
           else:
              writer_adult.writerow(row)
    

    So what are we doing here? First of all we use the with statement for opening files. It's not only more pythonic and readable but handles closing for you, even when exceptions occur.

    Next we create a DictReader that reads rows from the file as dictionaries, automatically using the first row as the field names. We then make writers to write back to our split files, and write the headers in. Using the DictReader is a matter of preference. It's generally used more where you access the data a lot (and when you don't know the order of the columns), but it makes the code nice a readable here. You could, however, just use a standard csv.reader().

    Next we loop through the rows in the file, checking the age (which we convert to an int so we can do a numerical comparison) to know what file to write to. The with statement closes out files for us.

    For multiple input files:

    import csv
    
    fieldnames = ["Name", "Date", "Age", "Sex", "Color"]
    filenames = ["addressbook1.txt", "addressbook2.txt", ...]
    
    with open("college_age.txt", "w") as college, open("adult_age.txt", "w") as adult:
       writer_college = csv.DictWriter(college, fieldnames, dialect="excel-tab")
       writer_adult = csv.DictWriter(adult, fieldnames, dialect="excel-tab")
       writer_college.writeheader()
       writer_adult.writeheader()
       for filename in filenames:
           with open(filename, "r") as input:
               reader = csv.DictReader(input, dialect="excel-tab")
               for row in reader:
                   if int(row["Age"]) < 23:
                      writer_college.writerow(row)
                   else:
                      writer_adult.writerow(row)
    

    We just add a loop in to work over multiple files. Please note that I also added a list of field names. Before I just used the fields and order from the file, but as we have multiple files, I figured it would be more sensible to do that here. An alternative would be to use the first file to get the field names.

    0 讨论(0)
  • 2020-12-30 18:36

    I think it is better to use csv module for reading such files http://docs.python.org/library/csv.html

    0 讨论(0)
提交回复
热议问题