Python: Check if one dictionary is a subset of another larger dictionary

I'm trying to write a custom filter method that takes an arbitrary number of kwargs and returns a list containing the elements of a database-like list that contain those kwargs.

For example, suppose d1 = {'a':'2', 'b':'3'} and d2 = the same thing. d1 == d2 results in True. But suppose d2 = the same thing plus a bunch of other things. My method needs to be able to tell if d1 in d2, but Python can't do that with dictionaries.

Context:

I have a Word class, and each object has properties like word, definition, part_of_speech, and so on. I want to be able to call a filter method on the main list of these words, like Word.objects.filter(word='jump', part_of_speech='verb-intransitive'). I can't figure out how to manage these keys and values at the same time. But this could have larger functionality outside this context for other people.

Convert to item pairs and check for containment.

all(item in superset.items() for item in subset.items())

Optimization is left as an exercise for the reader.

augurar

In Python 3, you can use dict.items() to get a set-like view of the dict items. You can then use the <= operator to test if one view is a "subset" of the other:

d1.items() <= d2.items()

In Python 2.7, use the dict.viewitems() to do the same:

d1.viewitems() <= d2.viewitems()

In Python 2.6 and below you will need a different solution, such as using all():

all(key in d2 and d2[key] == d1[key] for key in d1)

Note for people that need this for unit testing: there's also an assertDictContainsSubset() method in Python's TestCase class.

http://docs.python.org/2/library/unittest.html?highlight=assertdictcontainssubset#unittest.TestCase.assertDictContainsSubset

It's however deprecated in 3.2, not sure why, maybe there's a replacement for it.

for keys and values check use: set(d1.items()).issubset(set(d2.items()))

if you need to check only keys: set(d1).issubset(set(d2))

For completeness, you can also do this:

def is_subdict(small, big):
    return dict(big, **small) == big

However, I make no claims whatsoever concerning speed (or lack thereof) or readability (or lack thereof).

>>> d1 = {'a':'2', 'b':'3'}
>>> d2 = {'a':'2', 'b':'3','c':'4'}
>>> all((k in d2 and d2[k]==v) for k,v in d1.iteritems())
True

context:

>>> d1 = {'a':'2', 'b':'3'}
>>> d2 = {'a':'2', 'b':'3','c':'4'}
>>> list(d1.iteritems())
[('a', '2'), ('b', '3')]
>>> [(k,v) for k,v in d1.iteritems()]
[('a', '2'), ('b', '3')]
>>> k,v = ('a','2')
>>> k
'a'
>>> v
'2'
>>> k in d2
True
>>> d2[k]
'2'
>>> k in d2 and d2[k]==v
True
>>> [(k in d2 and d2[k]==v) for k,v in d1.iteritems()]
[True, True]
>>> ((k in d2 and d2[k]==v) for k,v in d1.iteritems())
<generator object <genexpr> at 0x02A9D2B0>
>>> ((k in d2 and d2[k]==v) for k,v in d1.iteritems()).next()
True
>>> all((k in d2 and d2[k]==v) for k,v in d1.iteritems())
True
>>>

Alois Mahdal

My function for the same purpose, doing this recursively:

def dictMatch(patn, real):
    """does real dict match pattern?"""
    try:
        for pkey, pvalue in patn.iteritems():
            if type(pvalue) is dict:
                result = dictMatch(pvalue, real[pkey])
                assert result
            else:
                assert real[pkey] == pvalue
                result = True
    except (AssertionError, KeyError):
        result = False
    return result

In your example, dictMatch(d1, d2) should return True even if d2 has other stuff in it, plus it applies also to lower levels:

d1 = {'a':'2', 'b':{3: 'iii'}}
d2 = {'a':'2', 'b':{3: 'iii', 4: 'iv'},'c':'4'}

dictMatch(d1, d2)   # True

Notes: There could be even better solution which avoids the if type(pvalue) is dict clause and applies to even wider range of cases (like lists of hashes etc). Also recursion is not limited here so use at your own risk. ;)

RayLuo

This seemingly straightforward issue costs me a couple hours in research to find a 100% reliable solution, so I documented what I've found in this answer.

"Pythonic-ally" speaking, small_dict <= big_dict would be the most intuitive way, but too bad that it won't work. {'a': 1} < {'a': 1, 'b': 2} seemingly works in Python 2, but it is not reliable because the official documention explicitly calls it out. Go search "Outcomes other than equality are resolved consistently, but are not otherwise defined." in this section. Not to mention, comparing 2 dicts in Python 3 results in a TypeError exception.
The second most-intuitive thing is small.viewitems() <= big.viewitems() for Python 2.7 only, and small.items() <= big.items() for Python 3. But there is one caveat: it is potentially buggy. If your program could potentially be used on Python <=2.6, its d1.items() <= d2.items() are actually comparing 2 lists of tuples, without particular order, so the final result will be unreliable and it becomes a nasty bug in your program. I am not keen to write yet another implementation for Python<=2.6, but I still don't feel comfortable that my code comes with a known bug (even if it is on an unsupported platform). So I abandon this approach.
I settle down with @blubberdiblub 's answer (Credit goes to him):

def is_subdict(small, big): return dict(big, **small) == big

It is worth pointing out that, this answer relies on the == behavior between dicts, which is clearly defined in official document, hence should work in every Python version. Go search:
- "Dictionaries compare equal if and only if they have the same (key, value) pairs." is the last sentence in this page
- "Mappings (instances of dict) compare equal if and only if they have equal (key, value) pairs. Equality comparison of the keys and elements enforces reflexivity." in this page

Here's a general recursive solution for the problem given:

import traceback
import unittest

def is_subset(superset, subset):
    for key, value in subset.items():
        if key not in superset:
            return False

        if isinstance(value, dict):
            if not is_subset(superset[key], value):
                return False

        elif isinstance(value, str):
            if value not in superset[key]:
                return False

        elif isinstance(value, list):
            if not set(value) <= set(superset[key]):
                return False
        elif isinstance(value, set):
            if not value <= superset[key]:
                return False

        else:
            if not value == superset[key]:
                return False

    return True


class Foo(unittest.TestCase):

    def setUp(self):
        self.dct = {
            'a': 'hello world',
            'b': 12345,
            'c': 1.2345,
            'd': [1, 2, 3, 4, 5],
            'e': {1, 2, 3, 4, 5},
            'f': {
                'a': 'hello world',
                'b': 12345,
                'c': 1.2345,
                'd': [1, 2, 3, 4, 5],
                'e': {1, 2, 3, 4, 5},
                'g': False,
                'h': None
            },
            'g': False,
            'h': None,
            'question': 'mcve',
            'metadata': {}
        }

    def tearDown(self):
        pass

    def check_true(self, superset, subset):
        return self.assertEqual(is_subset(superset, subset), True)

    def check_false(self, superset, subset):
        return self.assertEqual(is_subset(superset, subset), False)

    def test_simple_cases(self):
        self.check_true(self.dct, {'a': 'hello world'})
        self.check_true(self.dct, {'b': 12345})
        self.check_true(self.dct, {'c': 1.2345})
        self.check_true(self.dct, {'d': [1, 2, 3, 4, 5]})
        self.check_true(self.dct, {'e': {1, 2, 3, 4, 5}})
        self.check_true(self.dct, {'f': {
            'a': 'hello world',
            'b': 12345,
            'c': 1.2345,
            'd': [1, 2, 3, 4, 5],
            'e': {1, 2, 3, 4, 5},
        }})
        self.check_true(self.dct, {'g': False})
        self.check_true(self.dct, {'h': None})

    def test_tricky_cases(self):
        self.check_true(self.dct, {'a': 'hello'})
        self.check_true(self.dct, {'d': [1, 2, 3]})
        self.check_true(self.dct, {'e': {3, 4}})
        self.check_true(self.dct, {'f': {
            'a': 'hello world',
            'h': None
        }})
        self.check_false(
            self.dct, {'question': 'mcve', 'metadata': {'author': 'BPL'}})
        self.check_true(
            self.dct, {'question': 'mcve', 'metadata': {}})
        self.check_false(
            self.dct, {'question1': 'mcve', 'metadata': {}})

if __name__ == "__main__":
    unittest.main()

NOTE: The original code would fail in certain cases, credits for the fixing goes to @olivier-melançon

NutCracker

I know this question is old, but here is my solution for checking if one nested dictionary is a part of another nested dictionary. The solution is recursive.

def compare_dicts(a, b):
    for key, value in a.items():
        if key in b:
            if isinstance(a[key], dict):
                if not compare_dicts(a[key], b[key]):
                    return False
            elif value != b[key]:
                return False
        else:
            return False
    return True

This function works for non-hashable values. I also think that it is clear and easy to read.

def isSubDict(subDict,dictionary):
    for key in subDict.keys():
        if (not key in dictionary) or (not subDict[key] == dictionary[key]):
            return False
    return True

In [126]: isSubDict({1:2},{3:4})
Out[126]: False

In [127]: isSubDict({1:2},{1:2,3:4})
Out[127]: True

In [128]: isSubDict({1:{2:3}},{1:{2:3},3:4})
Out[128]: True

In [129]: isSubDict({1:{2:3}},{1:{2:4},3:4})
Out[129]: False

If you don't mind using pydash there is is_match there which does exactly that:

import pydash

a = {1:2, 3:4, 5:{6:7}}
b = {3:4.0, 5:{6:8}}
c = {3:4.0, 5:{6:7}}

pydash.predicates.is_match(a, b) # False
pydash.predicates.is_match(a, c) # True

来源：https://stackoverflow.com/questions/9323749/python-check-if-one-dictionary-is-a-subset-of-another-larger-dictionary

标签

python

dictionary

filter

subset