可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm looking for an equivalent to sscanf() in Python. I want to parse /proc/net/* files, in C I could do something like this:

int matches = sscanf(         buffer,         "%*d: %64[0-9A-Fa-f]:%X %64[0-9A-Fa-f]:%X %*X %*X:%*X %*X:%*X %*X %*d %*d %ld %*512s\n",         local_addr, &local_port, rem_addr, &rem_port, &inode);

I thought at first to use str.split, however it doesn't split on the given characters, but the sep string as a whole:

>>> lines = open("/proc/net/dev").readlines() >>> for l in lines[2:]: >>>     cols = l.split(string.whitespace + ":") >>>     print len(cols) 1

Which should be returning 17, as explained above.

Is there a Python equivalent to sscanf (not RE), or a string splitting function in the standard library that splits on any of a range of characters that I'm not aware of?

回答1:

Python doesn't have an sscanf equivalent built-in, and most of the time it actually makes a whole lot more sense to parse the input by working with the string directly, using regexps, or using a parsing tool.

Probably mostly useful for translating C, people have implemented sscanf, such as in this module: http://hkn.eecs.berkeley.edu/~dyoo/python/scanf/

In this particular case if you just want to split the data based on multiple split characters, re.split is really the right tool.

回答2:

When I'm in a C mood, I usually use zip and list comprehensions for scanf-like behavior. Like this:

input = '1 3.0 false hello' (a, b, c, d) = [t(s) for t,s in zip((int,float,strtobool,str),input.split())] print (a, b, c, d)

Note that for more complex format strings, you do need to use regular expressions:

import re input = '1:3.0 false,hello' (a, b, c, d) = [t(s) for t,s in zip((int,float,strtobool,str),re.search('^(\d+):([\d.]+) (\w+),(\w+)$',input).groups())] print (a, b, c, d)

Note also that you need conversion functions for all types you want to convert. For example, above I used something like:

strtobool = lambda s: {'true': True, 'false': False}[s]

回答3:

There is also the parse module.

parse() is designed to be the opposite of format() (the newer string formatting function in Python 2.6 and higher).

>>> from parse import parse >>> parse('{} fish', '1') >>> parse('{} fish', '1 fish') <Result ('1',) {}> >>> parse('{} fish', '2 fish') <Result ('2',) {}> >>> parse('{} fish', 'red fish') <Result ('red',) {}> >>> parse('{} fish', 'blue fish') <Result ('blue',) {}>

回答4:

You can split on a range of characters using the re module.

>>> import re >>> r = re.compile('[ \t\n\r:]+') >>> r.split("abc:def  ghi") ['abc', 'def', 'ghi']

回答5:

You can parse with module re using named groups. It won't parse the substrings to their actual datatypes (e.g. int) but it's very convenient when parsing strings.

Given this sample line from /proc/net/tcp:

line="   0: 00000000:0203 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 335 1 c1674320 300 0 0 0"

An example mimicking your sscanf example with the variable could be:

import re hex_digit_pattern = r"[\dA-Fa-f]" pat = r"\d+: " + \       r"(?P<local_addr>HEX+):(?P<local_port>HEX+) " + \       r"(?P<rem_addr>HEX+):(?P<rem_port>HEX+) " + \       r"HEX+ HEX+:HEX+ HEX+:HEX+ HEX+ +\d+ +\d+ " + \       r"(?P<inode>\d+)" pat = pat.replace("HEX", hex_digit_pattern)  values = re.search(pat, line).groupdict()  import pprint; pprint values # prints: # {'inode': '335', #  'local_addr': '00000000', #  'local_port': '0203', #  'rem_addr': '00000000', #  'rem_port': '0000'}

回答6:

There is an ActiveState recipe which implements a basic scanf http://code.activestate.com/recipes/502213-simple-scanf-implementation/

回答7:

you can turn the ":" to space, and do the split.eg

>>> f=open("/proc/net/dev") >>> for line in f: ...     line=line.replace(":"," ").split() ...     print len(line)

no regex needed (for this case)

回答8:

Upvoted orip's answer. I think it is sound advice to use re module. The Kodos application is helpful when approaching a complex regexp task with Python.

http://kodos.sourceforge.net/home.html

回答9:

Update: The Python documentation for its regex module, re, includes a section on simulating scanf, which I found more useful than any of the answers above.

https://docs.python.org/2/library/re.html#simulating-scanf

回答10:

If the separators are ':', you can split on ':', and then use x.strip() on the strings to get rid of any leading or trailing whitespace. int() will ignore the spaces.

回答11:

There is a Python 2 implementation by odiak.

文章来源: sscanf in Python

标签

python

hex

sscanf