Working with Python on and off for years now, I had the… opportunity to be the victim of a number of pitfalls inherent to the language, with some of them having caused weird bugs and hard-to-track issues, mostly due to my lack of Python knowledge.
In order to remember them, and perhaps to help you too, reader, if you don’t know them, here are some of my all-time favorites !
Lines that I pasted in a code box are expressions I typed in the Python interpreter. You can copy-paste them in your own Python to check what I’m saying !
Contents
And the program tuple’d
How do you declare a tuple containing 3 integers in Python ?
Easy : (1,2,3) !
Now how do you declare a one-element tuple of integer in Python ?
Easy ? (1) ?
WRONG !
Because what defines a tuple is actually the commas separating the values. Parenthesis are actually optional (sometimes) :
1 2 3 4 |
a = 1,2,3 type(a) <type 'tuple'> |
On the other hand, can you think of a use case where the tuple parenthesis are necessary ?
One simple example : a function call ! If you omit them, the tuple elements will pass as ordinary parameters instead of one tuple parameter.
It makes great sense when you think about it, but since commas are widely used in the syntax of the language, in some contexts you still have to provide parenthesis for the interpreter to understand you’re using a tuple.
1 2 3 4 5 6 7 8 9 |
type(1,2,3) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: type() argument 1 must be string, not int type((1,2,3)) <type 'tuple'> |
What makes it vicious is that due to the language’s duck typing, it may not be obvious at first this mistake has been made because at first, no error will be thrown, and a code you may have written to work with a specific goal in mind may not crash by chance, but will probably not do what you intended.
My example is iteration. Basically I found this in a function meant to iterate on each tuple element, and passed it what I thought to be a tuple containing a single string: (« YOLO »).
I couldn’t be more wrong ! Because there was no comma, the string was passed as is, but as a string is also an iterable in Python, the code I had written with a tuple in mind happened to be « perfectly working » for a string too, iterating on each character… No type error whatsoever, the code was running fine.
So how do you declare a one-element tuple ?
Easy : an element followed by just a comma, and nothing after it…
1 2 3 4 5 |
a = 1, type(a) <type 'tuple'> >>> |
You may find it’s tricky, and perhaps that’s because it is. I hate tuples, and I think they hate me back just right…
And as a bonus, here’s another issue related to tuples that could happen to you :
Assert is a well-known construct in nearly every existing programming language aiming at ensuring we manipulate sensible data, or violently crash otherwise in order to make it absolutely clear something is very wrong.
In Python, do you know what the problem would be with such a line ?
assert(2 + 2 == 5, "Houston we've got a problem")
And the problem is… *drum roll*
Even in Python 3, assert is still a statement, not a function, which means there’s no need for parenthesis to use it.
What the assert above is actually verifying is the existence of a tuple containing two elements, False (the evaluation of 2 + 2 == 5) and « Houston we’ve got a problem », i.e. a two-element tuple that decays to a True value in a boolean context (it exists), so this assert will always be true and doesn’t verify anything.
On the deque
This problem I encountered with a deque is quite similar to the one-element tuple one.
The deque is short for a double-ended queue, an efficient data structure contained in the collections module.
What I wanted to do was a deque of strings, starting with a single one. Something easy like this :
1 2 |
from collections import deque d = deque("toto") |
What is the error here ?
When doing that I wasn’t aware that the deque can be constructed by passing it an iterable, which it’s going to iterate on in order to populate itself.
Once again, the string being an iterable in Python, when doing this I got myself with a very nice deque… of characters !
1 2 3 |
d deque(['t', 'o', 't', 'o']) |
In other words, to do what I wanted I should have used d = deque(["toto"]) .
Emptiness everywhere, justice nowhere
Searching for a substring in a larger string is trivial in Python using the in construct :
1 2 3 4 5 6 7 |
"choco" in "chocolat" True "fox" in "trot" False |
However, be sure to be extra careful on the contents on that substring.
Because, even if it can seem counterintuitive (but also logical in a certain way), the empty string is in any and every possible string !
1 2 3 4 5 6 7 8 |
"" in "" True "" in "tata" True >>> |
What this means is if the substring is empty, this test may return True when you don’t expect it !
Equal but not the same
Equality is not identity.
You can have two different variables holding the same value : they’re equal, but not the same.
And you can have two different variables actually referencing the same address in memory.
This is what’s all about : you have to keep in mind that in Python, variables are pretty much fancy labels, put on data stored God knows where.
In Python, you check the equality of 2 variables with ==, and the identity of two variables with is. (It’s a bit like writing « 2 variables » and « two variables »… It’s nearly the same, but not quite)
The tricky thing is that according to the situation and the inner workings of the Python runtime, they may seem like they’re the same, and that they can be used interchangeably, except that’s not the case.
Indeed, Python uses clever tricks such as small number optimization or string interning to be a little more efficient. It means that as often as it can, Python will create a single instance of any string, and every time this same string comes up, it will use the existing reference instead of building a new one from scratch (which implies immutability is necessary for the system to work).
This is an invaluable feature if e.g. you’re comparing a lot of long strings, because if they’re interned, each time a given string is used, all Python has to do to compare if it’s equal is to check the identity of the string instead of lexicographically compare each character one after another.
Usually, string needing run-time evaluation are not interned, but it’s quite hard to infer the interning conditions without looking at the interpreter source code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
b = " ".join(["a", "b", "c", "d", "e"]) b 'a b c d e' a = "a b c d e" a == b True a is b False a = "abcde" d = "abcde" a is d True a = "a bcde" d = "a bcde" a is d False |
As for small numbers, Python holds a singleton list of numbers between -5 and 256 (included). Which means in some cases, when two numbers are equal, you can indeed use is, and that will return True, but some other times, even if numbers are equal, is will return False!
This is quite easy to check (I test with the range limits, check the other numbers in between if you want, but I hope you get my point):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
a = -5 a is -5 True a = -6 a is -6 False a = 255 a is 255 True a = 256 a is 256 True a = 257 a is 257 False a == 257 True |
Of course, this is all implementation-dependent and I’m talking about CPython here.
Do this loop… Or else !
This can happen when refactoring Python code and removing an « if » branch without removing the corresponding « else ».
Python has a little known and quite unique feature called the « for else« .
Yes, that may seem confusing at first if you’re coming from other languages, but Python for loops have else clauses !
And their purpose is a bit tricky to get right : the code in the else clause runs where the loop completed normally. Which means : if it didn’t hit any break. And if the iterable is empty, the « else » code still gets executed.
Frankly, I was astonished at first that it didn’t cause a syntax error, only to discover it was an actual, rarely-encountered feature of the language.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
def toto(l): ... for i in l: ... print i ... else: ... print("this is the end") ... toto([]) this is the end toto([1,22,3]) 1 22 3 this is the end |
Reference madness
And last but not least…
Here is one lesson I learned after being really puzzled by this strange issue : as far as Python is concerned, a single expression usually refers to the same memory entity, there won’t be an implicit copy.
Let me explain.
I wanted to initialize a dictionary with a list of existing keys, and for each key I wanted to store a separate empty list by default, so that I wouldn’t have to check if the key was already present in the subsequent code.
So my instinctive approach has been :
1 |
d = dict.fromkeys(keyslist, []) |
Do you guess what’s wrong on this line ?
What I wanted to say was « create a dictionary from this list of keys and create a new empty list for each one », but…
What I actually wrote meant something like « create a dictionary from this list of keys and make them all point to the same empty list » !!
Thus it gave this kind of funny situation :
1 2 3 4 5 6 7 8 9 10 11 |
a = {} a {} a = dict.fromkeys([1,2,3], []) a {1: [], 2: [], 3: []} a[1].append("toto") a {1: ['toto'], 2: ['toto'], 3: ['toto']} |
Now this is something I didn’t expect…
A better alternative I found was to use dict.setdefault, that allows you to get the value of a provided key if it exists, and if it doesn’t, to provide a default value to register the key with :
1 2 3 4 5 6 7 8 9 |
b = {} b {} c = b.setdefault("yolo", "swag") c 'swag' b {'yolo': 'swag'} |
That is all at the moment, I would still have quite a lot to write about Python and its various gotchas… But this article is already long enough, so that will be for the next one.
I hope it shows well enough that behind its apparent simplicity, the Python can be a tricky beast to master.