How to retrieve partial matches from a list of strings? [duplicate]

前端未结

关注

 5  2008

栀梦 2021-01-01 20:19

5条回答

情深已故 (楼主)

2021-01-01 20:45

startswith and in, return a Boolean
The in operator is a test of membership.
This can be performed with a list-comprehension or filter
Using a list-comprehension, with in, is the fastest implementation tested.
If case is not an issue, consider mapping all the words to lowercase.
- l = list(map(str.lower, l)).

`filter`:

Using filter creates a filter object, so list() is used to show all the matching values in a list.

l = ['ones', 'twos', 'threes']
wanted = 'three'

# using startswith
result = list(filter(lambda x: x.startswith(wanted), l))

# using in
result = list(filter(lambda x: wanted in x, l))

print(result)
[out]:
['threes']

`list-comprehension`

l = ['ones', 'twos', 'threes']
wanted = 'three'

# using startswith
result = [v for v in l if v.startswith(wanted)]

# using in
result = [v for v in l if wanted in v]

print(result)
[out]:
['threes']

Which implementation is faster?

Using the words corpus from nltk
Words with 'three'
- ['three', 'threefold', 'threefolded', 'threefoldedness', 'threefoldly', 'threefoldness', 'threeling', 'threeness', 'threepence', 'threepenny', 'threepennyworth', 'threescore', 'threesome']

from nltk.corpus import words

%timeit list(filter(lambda x: x.startswith(wanted), words.words()))
[out]:
47.4 ms ± 1.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit list(filter(lambda x: wanted in x, words.words()))
[out]:
27 ms ± 1.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit [v for v in words.words() if v.startswith(wanted)]
[out]:
34.1 ms ± 768 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit [v for v in words.words() if wanted in v]
[out]:
14.5 ms ± 63.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

0 讨论(0)

查看其它5个回答

热议问题