Is there a standard way to represent a \"set\" that can contain duplicate elements.
As I understand it, a set has exactly one or zero of an element. I want functiona
What you're looking for is indeed a multiset (or bag), a collection of not necessarily distinct elements (whereas a set does not contain duplicates).
There's an implementation for multisets here: https://github.com/mlenzen/collections-extended (Pypy's collections extended module).
The data structure for multisets is called bag
. A bag
is a subclass of the Set
class from collections
module with an extra dictionary to keep track of the multiplicities of elements.
class _basebag(Set):
"""
Base class for bag and frozenbag. Is not mutable and not hashable, so there's
no reason to use this instead of either bag or frozenbag.
"""
# Basic object methods
def __init__(self, iterable=None):
"""Create a new basebag.
If iterable isn't given, is None or is empty then the bag starts empty.
Otherwise each element from iterable will be added to the bag
however many times it appears.
This runs in O(len(iterable))
"""
self._dict = dict()
self._size = 0
if iterable:
if isinstance(iterable, _basebag):
for elem, count in iterable._dict.items():
self._inc(elem, count)
else:
for value in iterable:
self._inc(value)
A nice method for bag
is nlargest
(similar to Counter
for lists), that returns the multiplicities of all elements blazingly fast since the number of occurrences of each element is kept up-to-date in the bag's dictionary:
>>> b=bag(random.choice(string.ascii_letters) for x in xrange(10))
>>> b.nlargest()
[('p', 2), ('A', 1), ('d', 1), ('m', 1), ('J', 1), ('M', 1), ('l', 1), ('n', 1), ('W', 1)]
>>> Counter(b)
Counter({'p': 2, 'A': 1, 'd': 1, 'm': 1, 'J': 1, 'M': 1, 'l': 1, 'n': 1, 'W': 1})
You can use a plain list
and use list.count(element)
whenever you want to access the "number" of elements.
my_list = [1, 1, 2, 3, 3, 3]
my_list.count(1) # will return 2
You are looking for a multiset.
Python's closest datatype is collections.Counter:
A
Counter
is adict
subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts. TheCounter
class is similar to bags or multisets in other languages.
For an actual implementation of a multiset, use the bag class from the data-structures package on pypi. Note that this is for Python 3 only. If you need Python 2, here is a recipe for a bag
written for Python 2.4.
You can used collections.Counter
to implement a multiset, as already mentioned.
Another way to implement a multiset is by using defaultdict
, which would work by counting occurrences, like collections.Counter
.
Here's a snippet from the python docs:
Setting the
default_factory
toint
makes thedefaultdict
useful for counting (like a bag or multiset in other languages):>>> s = 'mississippi' >>> d = defaultdict(int) >>> for k in s: ... d[k] += 1 ... >>> d.items() [('i', 4), ('p', 2), ('s', 4), ('m', 1)]
Your approach with dict with element/count seems ok to me. You probably need some more functionality. Have a look at collections.Counter.
element in list
and list.count(element)
)counter.elements()
looks like a list with all duplicatesPython "set" with duplicate/repeated elements
This depends on how you define a set. One may assume that to the OP
Given these assumptions, the options reduce to two abstract types: a list or a multiset. In Python, these type usually translate to a list
and Counter
respectively. See the Details on some subtleties to observe.
Given
import random
import collections as ct
random.seed(123)
elems = [random.randint(1, 11) for _ in range(10)]
elems
# [1, 5, 2, 7, 5, 2, 1, 7, 9, 9]
Code
A list of replicate elements:
list(elems)
# [1, 5, 2, 7, 5, 2, 1, 7, 9, 9]
A "multiset" of replicate elements:
ct.Counter(elems)
# Counter({1: 2, 5: 2, 2: 2, 7: 2, 9: 2})
Details
On Data Structures
We have a mix of terms here that easily get confused. To clarify, here are some basic mathematical data structures compared to ones in Python.
Type |Abbr|Order|Replicates| Math* | Python | Implementation
------------|----|-----|----------|-----------|-------------|----------------
Set |Set | n | n | {2 3 1} | {2, 3, 1} | set(el)
Ordered Set |Oset| y | n | {1, 2, 3} | - | list(dict.fromkeys(el)
Multiset |Mset| n | y | [2 1 2] | - | <see `mset` below>
List |List| y | y | [1, 2, 2] | [1, 2, 2] | list(el)
From the table, one can deduce the definition of each type. Example: a set is a container that ignores order and rejects replicate elements. In contrast, a list is a container that preserves order and permits replicate elements.
Also from the table, we can see:
On Multisets
Some may argue that collections.Counter
is a multiset. You are safe in many cases to treat it as such, but be aware that Counter
is simply a dict (a mapping) of key-multiplicity pairs. It is a map of multiplicities. See an example of elements in a flattened multiset:
mset = [x for k, v in ct.Counter(elems).items() for x in [k]*v]
mset
# [1, 1, 5, 5, 2, 2, 7, 7, 9, 9]
Notice there is some residual ordering, which may be surprising if you expect disordered results. However, disorder does not preclude order. Thus while you can generate a multiset from a Counter
, be aware of the following provisos on residual ordering in Python:
Summary
In Python, a multiset can be translated to a map of multiplicities, i.e. a Counter
, which is not randomly unordered like a pure set. There can be some residual ordering, which in most cases is ok since order does not generally matter in multisets.
See Also
collections
*Mathematically, (according to N. Wildberger, we express braces {}
to imply a set and brackets []
to imply a list, as seen in Python. Unlike Python, commas ,
to imply order.