List arguments and isinstance
More than a decade and a half ago, Kragen Javier Sitaker wrote a blog post isinstance() considered harmful, a lot of which I believe holds up pretty well today. It's well worth reading it in its entirety but the gist is that rather than testing a value against a specific type, you generally wish to check that a value confirms to a particular interface. Or if you prefer, using isinstance
is imparting nominative typing in a language where duck typing is the convention.
I came across what I initially thought was a reasonable use of isintance
and this post is about my attempts to remove the reliance on isinstance
.
The case
Writing a web application often involves checking the contents of the current page for specific content. Usually in the form of CSS selectors, or simply text. I had written a simple method to check whether specific texts were contained in any elements matching a given CSS selector. For example, you may have some way of flashing messages and you want to check that after a given request the resulting web page contains the list of messages that you expect.
However, the general idea can be condensed to the more simple task of checking whether a list of texts are present with a main text. So the general idea is something like the following:
def contains_texts(content, texts): for t in texts: if t not in content: return False return True test_content = "abcdefg qwerty" assert contains_texts(test_content, ['abcdefg', 'qwerty']) # Passes
This works fine, but this is in testing code, so we want to make sure that tests do not inadvertently pass. The problem here is that the texts
argument is intended to be a list, or more generally, a container of some strings. The potential problem is that it is easy enough to make the mistake of calling this method with a single string. Strings are iterable, so unfortunately the following assertion passes:
test_content = "abcdefg qwerty" assert contains_texts(test_content, 'gfedcba') # Passes
This is because each of the characters in the single string argument are themselves in the main content, although the string itself is not.
A potential fix for this is to use isinstance
on the argument:
def isinstance_contains_texts(content, texts): if isinstance(texts, str): return texts in content for t in texts: if t not in content: return False return True test_content = "abcdefg qwerty" assert isinstance_contains_texts(test_content, 'abcdefg') # Passes assert not isinstance_contains_texts(test_content, 'gfedcba') # Passes
In this case we do the right thing if the method is called with a single string as the argument, but you could of course instead raise an error. So this basically solves our problem, but at the cost of using isinstance
.
What happens if the content becomes a bytestring, and the input strings themselves are bytestrings:
assert isinstance_contains_texts(b'hello world', [b'hello', b'world']) # Passes assert isinstance_contains_texts(b'hello world', b'olleh') # Passes
We have the same problem as before. We could of course solve this by using if isintance(texts, str) or isinstance(texts, bytes)
, but that feels like we're not really solving the essence of the problem.
Without isinstance
So instead of using isinstance
we could check for what interface is supported by the argument. If we want to check that it is a singleton argument then we would be looking for something supported by strings and bytestrings, but not by, lists, sets, or other general containers:
>>> str_and_bytestring = [x for x in dir('') if x in dir(b'')] >>> stringy_but_not_listy = [x for x in str_and_bytestring if x not in dir([])] >>> stringy_but_not_listy ['__getnewargs__', 'capitalize', 'center', 'endswith', 'expandtabs', 'find', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
Okay so our test could be whether or not the texts
argument has an attribute capitalize
:
def interface_contains_texts(content, texts): if hasattr(texts, 'capitalize'): return texts in content for t in texts: if t not in content: return False return True test_content = "abcdefg qwerty" assert interface_contains_texts(test_content, 'abcdefg') # Passes assert not interface_contains_texts(test_content, 'gfedcba') # Passes assert interface_contains_texts(b'hello world', [b'hello', b'world']) assert not interface_contains_texts(b'hello world', b'olleh')
Sort of success, but this feels as if we're not really writing down what we mean. We don't really care if texts
has a capitalize
attribute at all. What we want to avoid is texts
being a single argument that happens to be iterable. We can check which attributes containers such as lists and sets have that strings or bytestrings do not:
>>> list_and_set = [x for x in dir([]) if x in dir(set())] >>> str_or_bytestring = dir('') + dir(b'') >>> listy_but_not_string = [x for x in list_and_set if x not in str_or_bytestring] >>> listy_but_not_string ['clear', 'copy', 'pop', 'remove']
So we can check if the first argument has a 'clear' attribute:
def interface_contains_texts(content, texts): if not hasattr(texts, 'clear'): return texts in content for t in texts: if t not in content: return False return True test_content = "abcdefg qwerty" assert interface_contains_texts(test_content, 'abcdefg') # Passes assert not interface_contains_texts(test_content, 'gfedcba') # Passes assert interface_contains_texts(b'hello world', [b'hello', b'world']) # Passes assert not interface_contains_texts(b'hello world', b'olleh') # Passes
Unfortunately, you now cannot pass a generator, the following fails with a type error because interface_contains_texts
attempts to check if the generator is in the given string.
assert interface_contains_texts(test_content, (x for x in test_content.split()))
Unfortunately there is no attribute that is on a list, set, and generator that is not on a string or bytestring. But generators do have a close
attribute so we could do:
def interface_contains_texts(content, texts): if not hasattr(texts, 'clear') and not hasattr(texts, 'close'): ...
This does feel close to what we want. We are saying "If the argument is not a list, set style container, or a generator, then we treat it as one singleton".
All of this could be avoided
Another way to do this would be to allow a variable number of arguments into the contains_texts
function:
def variable_args_contains_texts(content, *texts): for t in texts: if t not in content: return False return True test_content = "abcdefg qwerty" assert variable_args_contains_texts(test_content, 'abcdefg') # Passes assert not variable_args_contains_texts(test_content, 'gfedcba') # Passes assert variable_args_contains_texts(test_content, *['abcdefg', 'qwerty']) # Passes assert variable_args_contains_texts(b'hello world', b'hello', b'world') # Passes assert not variable_args_contains_texts(b'hello world', b'olleh') # Passes assert variable_args_contains_texts(test_content, *(x for x in test_content.split())) # Passes
This works for all our cases. You do have to star the argument if you already have a list or generator:
assert variable_args_contains_texts(test_content, *['abcdefg', 'qwerty']) # Passes assert variable_args_contains_texts(test_content, *(x for x in test_content.split())) # Passes
However, recall our original problem was that if you accidentally passed in a single string as the argument your test function may quietly, and erroneously, pass. However, if you forget the star to this function:
>>> variable_args_contains_texts(test_content, ['abcdefg', 'qwerty']) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in variable_args_contains_texts TypeError: 'in <string>' requires string as left operand, not list
You get an error. That's quite alright with me, this will flag up immediately that I've written my test incorrectly and I can fix it immediately. Even if I had the original contains_texts
and update it to this variable args version, when I re-run my test-suite it will let me know of any places I've forgotten to update the call.
Conclusion
Its debateable whether or not the interface_contains_texts
version was better than the original isintance_contains_texts
version. I prefer it because I feel it is a bit more general. I definitely prefer the eventual variable_args_contains_texts
version. I guess that's the main conclusion for now, that checking your uses of isinstance
may lead to a better implementation.
Although I was able to use variable arguments to the solve the problem in this case, there will obviously be cases where that fix does not apply. However, the main takeaway is that you don't necessarily have to swap your isinstance
use for a hasattr
use, thereby directly translating the use of isinstance
. It could be that by having a slightly broader look at the general problem you're trying to solve, another solution is available.
Comments
Comments powered by Disqus