setdefault vs get vs defaultdict

You have a python dictionary, you want to get the value of specific key in the dictionary, so far so good, right?

And then a KeyError –

Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
KeyError: 1 

Hmmm, well if this key does not exist in the dictionary I can use some default value like None, 10, empty string. What’s my options of doing so?

I can think of 3 –

  • get method
  • setdefault method
  • defaultdict data structure
    get method

Let’s investigate first –

key, value = "key", "value"
data = {}
x = data.get(key,value)
print x, data #value {}
data= {}
x = data.setdefault(key,value)
print x, data #value {'key': 'value'}

Well, we get almost the same result, x obtains the same value and in get data is not changed while in setdefault data changes. When does it become a problem?

key, value = "key", "value"
data = {}
x = data.get(key,[])append(value)
print x, data #None {}
data= {}
x = data.setdefault(key,[]).append(value)
print x, data None {'key': ['value']}

So, when we are dealing with mutable data types the difference is clearer and error prone.

When to use each? mainly depends on the content of your dictionary and its’ size.

We can time the differences but it does not really matter as they produce different output and it was not significant for any direction anyhow.

And for defaultdict –

from collections import defaultdict
data = defaultdict(list)
print data[key] #[]
data[key].append(value)
print data[key] #['value']

setdefault sets the default value to a specific key we access to while defaultdict is the type of the data variable and set this default value to every key we access to.

So, if we get roughly the same result I timed the processes for several dictionary sizes (left most column) and run each 1000 times (code below) –

dict size default value method time
100 list setdefault 0.0229508876801
defaultdict 0.0204179286957
set setdefault 0.0209970474243
defaultdict 0.0194549560547
int setdefault 0.0236239433289
defaultdict 0.0225579738617
string setdefault 0.020693063736
defaultdict 0.0240340232849
10000 list setdefault 2.09283614159
defaultdict 2.31266093254
set setdefault 2.12825512886
defaultdict 3.43549799919
int setdefault 2.04997992516
defaultdict 1.87312483788
“” setdefault 2.05423784256
defaultdict 1.93679213524
100000 list setdefault 22.4799249172
defaultdict 29.7850298882
set setdefault 23.5321040154
defaultdict 41.7523541451
int setdefault 26.6693091393
defaultdict 23.1293339729
string setdefault 26.4119689465
defaultdict 23.6694099903

Conclusions and summary –

  • Working with sets is almost always more expensive time-wise than working with lists
  • As the dictionary size grows simple types – string and int perform better with defaultdict then with setdefault while set and list perform worse.
  • Main conclusion – choosing between defaultdict and setdefault also mainly depends in the type of the default value.
  • In this test I tested a particular use case – accessing each key twice. Different use cases \ distributions such as assignment, accessing to the same key over and over again, etc. may have different properties.
  • There is no firm conclusion here just investigating some of interpreter capabilities.

Code –

import timeit
from collections import defaultdict
from itertools import product

def measure_setdefault(n, defaultvalue):
 data = {}
 for i in xrange(0,n):
 x = data.setdefault(i,defaultvalue)
 for i in xrange(0,n):
 x = data.setdefault(i,defaultvalue)

def measure_defaultdict(n,defaultvalue):
 data = defaultdict(type(defaultvalue))
 for i in xrange(0,n):
 x = data[i]
 for i in xrange(0,n):
 x = data[i]

if __name__ == '__main__':
 import timeit
 number = 1000
 dict_sizes = [100,10000, 100000]
 defaultvalues = [[], 0, "", set()]
 for dict_size, defaultvalue in product(dict_sizes, defaultvalues):
 print "dict_size: ", dict_size, " defaultvalue: ", type(defaultvalue)
 print "\tsetdefault:", timeit.timeit("measure_setdefault(dict_size, defaultvalue)", setup="from __main__ import measure_setdefault, dict_size, defaultvalue", number=number)
 print "\\tdefaultdict:", timeit.timeit("measure_defaultdict(dict_size, defaultvalue)", setup="from __main__ import measure_defaultdict, dict_size, defaultvalue", number=number)

Advertisement

2 thoughts on “setdefault vs get vs defaultdict

  1. Thanks for posting the code!

    There is an important difference between `measure_setdefault()` and `measure_defaultdict()` which will skew the timing results.

    – `defaultdict()` constructs a new value object for each key accessed.
    – `dict.setdefault()` does not construct a new value object. It does _not_ copy the `defaultvalue` given; it keeps the same reference.

    This will make `setdefault()` look much faster for complex objects because `defaultdict` is spending time constructing new values while `setdefault()` is not.

    A proper comparison would be to do the same value construction, `type(defaultvalue)()`, before the first `setdefault()`. Really, we want to test default _types_, not default _values_.

    (I know this post is 8 years old, but Google is listing this as the first answer for `defaultdict vs setdefault`, so I felt I should comment.)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s