What is pythononic way of slicing a set?

痴心易碎 提交于 2019-12-19 10:38:10

问题


I have some list of data, for example

some_data = [1, 2, 4, 1, 6, 23, 3, 56, 6, 2, 3, 5, 6, 32, 2, 12, 5, 3, 2]

and i want to get unique values with fixed length(i don't care which i will get) and i also want it to be set object.

I know that i can do set from some_data then make it list, crop it and then make it set again.

set(list(set(some_data))[:5])  # don't look so friendly

I understand that i don't have __getitem__ method in set which wouldn't make the whole slice thing possible, but if there is a chance to make it look better?

And i completely understand that set is unordered. So it don't matter which elements will get in final set.

Possible options is to use:

  • ordered-set
  • using dict with None values:

    set(dict(map(lambda x: (x, None), some_data)).keys()[:2])  # not that great
    

回答1:


Sets are iterable. If you really don't care which items from your set are selected, you can use itertools.islice to get an iterator that will yield a specified number of items (whichever ones come first in the iteration order). Pass the iterator to the set constructor and you've got your subset without using any extra lists:

import itertools

some_data = [1, 2, 4, 1, 6, 23, 3, 56, 6, 2, 3, 5, 6, 32, 2, 12, 5, 3, 2]
big_set = set(some_data)
small_set = set(itertools.islice(big_set, 5))

While this is what you've asked for, I'm not sure you should really use it. Sets may iterate in a very deterministic order, so if your data often contains many similar values, you may end up selecting a very similar subset every time you do this. This is especially bad when the data consists of integers (as in the example), which hash to themselves. Consecutive integers will very frequently appear in order when iterating a set. With the code above, only 32 is out of order in big_set (using Python 3.5), so small_set is {32, 1, 2, 3, 4}. If you added 0 to the your data, you'd almost always end up with {0, 1, 2, 3, 4} even if the dataset grew huge, since those values will always fill up the first fives slots in the set's hash table.

To avoid such deterministic sampling, you can use random.sample as suggested by jprockbelly.




回答2:


You could sample the set

import random
set(random.sample(my_set, 5)) 

The advantage of this you'll get different numbers each time




回答3:


You could try a simple set comprehension:

some_data = [1, 2, 4, 1, 6, 23, 3, 56, 6, 2, 3, 5, 6, 32, 2, 12, 5, 3, 2]
n = {x for i, x in enumerate(set(some_data)) if i < 5}
print n

Output:

set([32, 1, 2, 3, 4])



来源:https://stackoverflow.com/questions/40736681/what-is-pythononic-way-of-slicing-a-set

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!