Extract data from lines of a text file

女生的网名这么多〃 提交于 2019-12-04 06:12:14

The following will read everything into a dictionary keyed by player name. The value associated with each player is itself a dictionary acting as a record with named fields associated with the items converted to a format suitable for further processing.

info = {}
with open('scoring_info.txt') as input_file:
    for line in input_file:
        player, stats, outcome, date = (
            item.strip() for item in line.split('-', 3))
        stats = dict(zip(('kills', 'deaths', 'assists'),
                          map(int, stats.split('/'))))
        date = tuple(map(int, date.split('-')))
        info[player] = dict(zip(('stats', 'outcome', 'date'),
                                (stats, outcome, date)))

print('info:')
for player, record in info.items():
    print('  player %r:' % player)
    for field, value in record.items():
        print('    %s: %s' % (field, value))

# sample usage
player = 'Fizz'
print('\n%s had %s kills in the game' % (player, info[player]['stats']['kills']))

Output:

info:
  player 'Shyvana':
    date: (2012, 11, 22)
    outcome: Loss
    stats: {'assists': 5, 'kills': 12, 'deaths': 4}
  player 'Miss Fortune':
    date: (2012, 11, 22)
    outcome: Win
    stats: {'assists': 3, 'kills': 12, 'deaths': 4}
  player 'Fizz':
    date: (2012, 11, 22)
    outcome: Win
    stats: {'assists': 5, 'kills': 12, 'deaths': 4}

Fizz had 12 kills in the game

Alternatively, rather than holding most of the data in dictionaries, which can make nested-field access a little awkward — info[player]['stats']['kills'] — you could instead use a little more advanced "generic" class to hold them, which will let you write info2[player].stats.kills instead.

To illustrate, here's almost the same thing using a class I've named Struct because it's somewhat like the C language's struct data type:

class Struct(object):
    """ Generic container object """
    def __init__(self, **kwds): # keyword args define attribute names and values
        self.__dict__.update(**kwds)

info2 = {}
with open('scoring_info.txt') as input_file:
    for line in input_file:
        player, stats, outcome, date = (
            item.strip() for item in line.split('-', 3))
        stats = dict(zip(('kills', 'deaths', 'assists'),
                          map(int, stats.split('/'))))
        victory = (outcome.lower() == 'win') # change to boolean T/F
        date = dict(zip(('year','month','day'), map(int, date.split('-'))))
        info2[player] = Struct(champ_name=player, stats=Struct(**stats),
                               victory=victory, date=Struct(**date))
print('info2:')
for rec in info2.values():
    print('  player %r:' % rec.champ_name)
    print('    stats: kills=%s, deaths=%s, assists=%s' % (
          rec.stats.kills, rec.stats.deaths, rec.stats.assists))
    print('    victorious: %s' % rec.victory)
    print('    date: %d-%02d-%02d' % (rec.date.year, rec.date.month, rec.date.day))

# sample usage
player = 'Fizz'
print('\n%s had %s kills in the game' % (player, info2[player].stats.kills))

Output:

info2:
  player 'Shyvana':
    stats: kills=12, deaths=4, assists=5
    victorious: False
    date: 2012-11-22
  player 'Miss Fortune':
    stats: kills=12, deaths=4, assists=3
    victorious: True
    date: 2012-11-22
  player 'Fizz':
    stats: kills=12, deaths=4, assists=5
    victorious: True
    date: 2012-11-22

Fizz had 12 kills in the game

You want to use split (' - ') to get the parts, then perhaps again to get the numbers:

for line in yourfile.readlines ():
    data = line.split (' - ')
    nums = [int (x) for x in data[1].split ('/')]

Should get you all the stuff you need in data[] and nums[]. Alternatively, you can use the re module and write a regular expression for it. This doesn't seem complex enough for that, though.

There are two ways to read the data out from your textfile example.

First method

You can use python's csv module and specify that your delimiter is -.

See http://www.doughellmann.com/PyMOTW/csv/

Second method

Alternatively, if you don't want to use this csv module, you can simply use the split method after you have read each line in your file as a string.

f = open('myTextFile.txt', "r")
lines = f.readlines()

for line in lines:
    words = line.split("-")   # words is a list (of strings from a line), delimited by "-".

So in your example above, champname will actually be the first item in the words list, which is words[0].

# Iterates over the lines in the file.
for line in open('data_file.txt'):
    # Splits the line in four elements separated by dashes. Each element is then
    # unpacked to the correct variable name.
    champname, score, winloss, timestamp = line.split(' - ')

    # Since 'score' holds the string with the three values joined,
    # we need to split them again, this time using a slash as separator.
    # This results in a list of strings, so we apply the 'int' function
    # to each of them to convert to integer. This list of integers is
    # then unpacked into the kills, deaths and assists variables
    kills, deaths, assists = map(int, score.split('/'))

    # Now you are you free to use the variables read to whatever you want. Since
    # kills, deaths and assists are integers, you can sum, multiply and add
    # them easily.

First, you break the line into data fragments

>>> name, score, result, date = "Fizz - 12/4/5 - Win - 2012-11-22".split(' - ')
>>> name
'Fizz'
>>> score
'12/4/5'
>>> result
'Win'
>>> date
'2012-11-22'

Second, parse your score

>>> k,d,a = map(int, score.split('/'))
>>> k,d,a
(12, 4, 5)

And finally, convert the date string into date object

>>> from datetime import datetime    
>>> datetime.strptime(date, '%Y-%M-%d').date()
datetime.date(2012, 1, 22)

Now you have all your parts parsed and normalized to data types.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!