Conditional refactorings

I have been reading through some of the source code from the requests library. This is one of the more well-known, well-used, well-loved, and well-praised Python libraries I know of. I came across some conditional code which I personally would refactor. A first refactor I'm pretty confident about whilst a second I'm less certain is an improvement. I'll detail them both and welcome any comments anyone has.

A specific example of this kind of conditional comes in the adapters module specifically the cert_verify method. Here is the code in question, slightly snipped for brevity:

    def cert_verify(self, conn, url, verify, cert):
        """Verify a SSL certificate. This method should not be called from user
        ...
        :param verify: Either a boolean, in which case it controls whether we verify
        """
        if url.lower().startswith('https') and verify:

            cert_loc = None

            # Allow self-specified cert location.
            if verify is not True:
                cert_loc = verify

            if not cert_loc:
                cert_loc = DEFAULT_CA_BUNDLE_PATH

            if not cert_loc or not os.path.exists(cert_loc):
                raise IOError("Could not find a suitable TLS CA certificate bundle, "
                              "invalid path: {0}".format(cert_loc))

            conn.cert_reqs = 'CERT_REQUIRED'

            if not os.path.isdir(cert_loc):
                conn.ca_certs = cert_loc
            else:
                conn.ca_cert_dir = cert_loc
        else:
            conn.cert_reqs = 'CERT_NONE'
            conn.ca_certs = None
            conn.ca_cert_dir = None

The two refactors I would consider is the setting of cert_loc and the order of the two conditional branches, ie. the if and else branches.

Setting cert_loc

This kind of pattern is quite common where a variable is set to a particular value and then a condition is evaluated which may result in the variable being updated. In this particular example we have:

    cert_loc = None

    # Allow self-specified cert location.
    if verify is not True:
        cert_loc = verify

    if not cert_loc:
        cert_loc = DEFAULT_CA_BUNDLE_PATH

First of all note that verify cannot be Falsey. So at the end of these conditionals cert_loc will either be DEFAULT_CA_BUNDLE_PATH if verify is exactly True or it will be set to whatever verify is. So we can write this more succinctly as:

    cert_loc = DEFAULT_CA_BUNDLE_PATH if verify is True else verify

If you don't like the conditional expression you can do this as:

    if verify is True:
        cert_loc = DEFAULT_CA_BUNDLE_PATH
    else:
        cert_loc = verify

All this was only really possible because we knew that verify could not be Falsey. Still I consider the setting of a variable to a default before then considering whether or not to update it to be a very minor anti-pattern. So imagine that we could not be sure that verify is not Falsey, we could still re-write our conditional without the default setting as:

    if verify is True or not verify:
        cert_loc = DEFAULT_CA_BUNDLE_PATH
    else:
        cert_loc = verify

More generally where you see the pattern:

    x = None
    if cond1:
        x = val1
    if cond2:
        x = val2
    ...
    if not x:
        x = default_x

Then you can change this to be:

    if cond1 and val1:
        x = val1
    elif cond2 and val2:
        x = val2
    else:
        x = default_x

Sometimes this is awkward if one or more of the val_? values are interesting expressions.

One defence of the original style is that you know your variable will be set to something so you will never end up with an undefined variable error. However, if this were ever the case then you have not fully written down your logic, and hence the early error is likely a good thing.

Switch the if and else branches

A suggestion I often make when you have two branches, such that one is significantly shorter than the other, is to have the shorter branch first. The reason for this is to give the reader less of a "stack" to recall as they are reading through the code. Often when reading code you get to the end of the long branch, to find a shorter second branch but you've forgotten what the original condition was. So, it's tempting to suggest to modify the original condition as:

    if not (url.lower().startswith('https') and verify):
        conn.cert_reqs = 'CERT_NONE'
        conn.ca_certs = None
        conn.ca_cert_dir = None
        cert_loc = None
    else:
        # Allow self-specified cert location.
        if verify is not True:
            cert_loc = verify

        if not cert_loc:
            cert_loc = DEFAULT_CA_BUNDLE_PATH

        if not cert_loc or not os.path.exists(cert_loc):
            raise IOError("Could not find a suitable TLS CA certificate bundle, "
                          "invalid path: {0}".format(cert_loc))

        conn.cert_reqs = 'CERT_REQUIRED'

        if not os.path.isdir(cert_loc):
            conn.ca_certs = cert_loc
        else:
            conn.ca_cert_dir = cert_loc

I hesitate to suggest it in this case because the original if branch has code which depends upon the condition, namely that verify is not Falsey. However, more generally this can be something of a win for the person reading the code.

So, two pretty simple refactors, thoughts?

Scottish Local Elections 2017

I wanted to compare the manifestos for the local elections so I did a quick search but it came up with nothing. I was hoping for a simple page somewhere that simply listed where to find all of the parties' manifestos. To save you the bother, here is where to find the manifestos of the five parties listed on my postal ballot form (in the order that they are listed), all of the homepages were found with a simple google search:

Green Party

Their homepage has a large button stating clearly where the manifesto is, which goes to a web version, which is nice for reading on say a smartphone, as well as a PDF download. Good job in making it easy.

SNP

There homepage has one mention of the word manifesto but it relates to 2016. There is a big panel labelled 'Council Elections' but the three buttons are labelled 'Volunteer', 'Register to vote', and 'vote by post'. They have some kind of policy base which seems to generally list their priorities, and under that there is a local government section so apparently you have to register to add this as an interest (which I have no real idea what that does).

At this point, I gave up. I guess the SNP have not yet launched their manifesto, but this seems to be leaving it a bit late as postal voters have their ballot papers. They could at least have something bold on their home page stating that a manifesto is coming, so people do not waste time looking for it.

Liberal Democrats

Search within their homepage for the word 'manifesto' and you will be disappointed. Though the first link on the page is to something called 'Make councils work for you'. This then has a button labelled 'Read our priorities for local government'. This then links to a kind of web readable http://www.scotlibdems.org.uk/council2017 which also has a link to a PDF download, but at no point does it actually state this is the manifesto.

Labour Party

Their homepage has very prominently some negative campaigning directed against the SNP. There is a 'campaigns' section but that does not have anything relevant to the 2017 local elections. One of the news stories is about their campaign manager urging voters that local services are at stake, and states that there is a choice between Labour and the SNP (apparently no other parties exist for the Labour campaign manager). This story even states 'Postal ballots started landing on doorsteps across Scotland this morning'. But still does not actually tell us where their manifesto might be found.

Search for manifesto and you will find a bunch of references to past SNP manifestos which Labour claim contain broken promises. However, Labour need to make some promises themselves, and for that, we need their manifesto, where is it? If Labour want to understand why they are doing poorly in Scotland they might realise that 'Please vote for us and not the SNP' is not really a manifesto.

Conservatives

Their homepage has most prominently a call to sign a petition to avoid a second independence referendum, which I guess is fair enough since 'Unionist' is in their party name. Down in the footer they have a link to manifestos but that currently does not contain the 2017 local election manifesto.

Weirdly a google search took me to this page regarding the edinburgh conservatives, who at first glance are not clearly associated at all with the main Scottish Conservative and Unionist Party, who do not seem to link to them. Anyway if this is indeed their manifesto, for Edinburgh at least, then it is available here which is unfortunately not a web-readable version but simply hosts a link to the pdf version.

Groan, it's 16 pages long.

Summary

Surprisingly difficult or impossible to find some of the parties' manifestos. I may be being an idiot, but wouldn't you wish to make finding your party's manifesto for an upcoming election as idiot proof as possible?

Green party clear winners in making it easy to find their manifesto for the local elections, providing both a web-readable and PDF version as well.

Poker puzzle

Bill the lizard has an excellent post describing an interesting puzzle as to the most desirable full house in a game of poker. The rest of this post depends upon you having read that puzzle so you may wish to do that now, or skip this post.

I think this represents an instance where the mathematically correct answer is not correct for other concerns. To be clear, I think the given answer is the mathematically correct answer, and the blog post linked above never suggested to answer anything other than that. In addition, there isn't really any action you can take regardless of which full house you think is better to have, the difference in the odds is small enough not to have any real-world applicability.

To summarise the result above, the idea is that the a full house of three aces over two kings is not as good a hand to have as a full house of three aces over two sixes. This is restricted to a game of five card draw with no wild cards etc. This is because although the AK full house is ranked higher than a full house of A6, they cannot both occur in the same hand. So to decide which you would rather have you have to look at how many possible hands can beat you. Both hands can be beaten by four-of-a-kind or a straight-flush. Because the two sixes break up more possible straight-flushes than the two kings in theory having the A6 full-house will win more hands than the AK full-house and is therefore a better hand to have.

However, this assumes that all straight-flushes are as likely, but this is five-card draw. Two things make a high straight-flush more likely after the draw than a low straight-flush. Firstly, someone with low cards that may make up part of a low straight-flush after the draw, may be bet out of the hand before the draw takes place. Secondly, players are more likely to hold on to higher valued cards going into the draw.

For example, suppose three players against you are dealt:

  • Kh, Qh, 8d, 5s, 4s
  • 10d, 9s, 5h, 4h, 2d
  • Qs, Qc, 5c, 4c, 3c

The first player here has two possible ways of drawing to make a straight-flush, but, if they make it to the draw, they will almost certainly retain the king and queen of hearts and not the 5 and 4 of spades. The second player has just as much chance of making a straight-flush that might beat your full-house, but chances are they won't make it to the draw. The third player actually has a better chance of making a straight-flush but there is a good chance they hold on to the pair of queens. An analogous player with a pair of 3s and J-Q-K of diamonds is a bit more likely to throw away their pair (than a player with a pair of queens) and thus stick around with a chance of beating your full-house with a straight-flush.

So which full-house is the better one to have is dependent upon what you think the odds are of players remaining in the hand with a possibility of a high straight-flush against the probability of a player remaining in the hand with a possibility of a low straight-flush. That isn't something that can be numerically calculated (at least not easily, you might be able to assume perfectly rational players that both do whatever gives them the highest probability of winning and assume others do the same).

unittest.mock small gotcha - a humbling tale of failure

Developing a small web application I recently had reason to upgrade from Python 3.4 to Python 3.6. The reason for the upgrade was regarding ordering of keyword arguments and not related to the bug in my test code that I then found. I should have been more careful writing my test code in the first place, so I am writing this down as some penance for not testing my tests robustly enough.

A Simple example program

So here I have a version of the problem reduced down to the minimum required to demonstrate the issue:

import unittest.mock as mock

class MyClass(object):
    def __init__(self):
        pass
    def my_method(self):
        pass

if __name__ == '__main__':
    with mock.patch('__main__.MyClass') as MockMyClass:
        MyClass().my_method()
        MockMyClass.my_method.assert_called_once()

Of course in reality the line MyClass().my_method() was some test code that indirectly caused the target method to be called.

Output in Python 3.4

$ python3.4 mock_example.py
$

No output, leading me to believe my assertions passed, so I was happy that my code and my tests were working. As it turned out, my code was fine but my test was faulty. Here's the output in two later versions of Python of the exact same program given above.

Output in Python 3.5

$ python3.5 mock_example.py
Traceback (most recent call last):
  File "mock_example.py", line 12, in <module>
    MockMyClass.my_method.assert_called_once()
  File "/usr/lib/python3.5/unittest/mock.py", line 583, in __getattr__
    raise AttributeError(name)
AttributeError: assert_called_once

Assertion error, test failing.

Output in Python 3.6

$ python3.6 mock_example.py
Traceback (most recent call last):
  File "mock_example.py", line 12, in <module>
    MockMyClass.my_method.assert_called_once()
  File "/usr/lib/python3.6/unittest/mock.py", line 795, in assert_called_once
    raise AssertionError(msg)
AssertionError: Expected 'my_method' to have been called once. Called 0 times.

Test also failing with a different error message. Anyone who is (pretty) familiar with the unittest.mock standard library module will know assert_called_once was introduced in version 3.6, which is my version 3.5 is failing with an attribute error.

My test was wrong

The problem was, my original test was not testing anything at all. The 3.4 version of the unittest.mock standard library module did not have a assert_called_once. The mock, just allows you to call any method on it, to see this you can try changing the line:

        MockMyClass.my_method.assert_called_once()

to

        MockMyClass.my_method.blahblah()

With python3.4, python3.5, and python3.6 this yields no error. So in the original program you can avoid the calling MyClass.my_method at all:

if __name__ == '__main__':
    with mock.patch('__main__.MyClass') as MockMyClass:
        # Missing call to `MyClass().my_method()`
        MockMyClass.my_method.assert_called_once() # In 3.4 this still passes.

This does not change the (original) results, python3.4 still raises no error, whereas python3.5 and python3.6 are raising the original errors.

So although my code turned out to be correct (at least in as much as the desired method was called), had it been faulty (or changed to be faulty) my test would not have complained.

The Actual Problem

My mock was wrong. I should instead have been patching the actual method within the class, like so:

if __name__ == '__main__':
    with mock.patch('__main__.MyClass.my_method') as mock_my_method:
        MyClass().my_method()
        mock_my_method.assert_called_once()

Now if we try this in all version 3.4, 3.5, and 3.6 of python we get:

$ python3.4 mock_example.py 
$ python3.5 mock_example.py 
Traceback (most recent call last):
  File "mock_example.py", line 12, in <module>
    mock_my_method.assert_called_once()
  File "/usr/lib/python3.5/unittest/mock.py", line 583, in __getattr__
    raise AttributeError(name)
AttributeError: assert_called_once
$ python3.6 mock_example.py 
$ 

So Python 3.4 and 3.6 pass as we expect. But Python3.5 gives an error stating that there is no assert_called_once method on the mock object, which is true since that method was not added until version 3.6. This is arguably what Python3.4 should have done.

It remains to check that the updated test fails in Python3.6, so we comment out the call to MyClass().my_method:

$ python3.6 mock_example.py 
Traceback (most recent call last):
  File "mock_example.py", line 12, in <module>
    mock_my_method.assert_called_once()
  File "/usr/lib/python3.6/unittest/mock.py", line 795, in assert_called_once
    raise AssertionError(msg)
AssertionError: Expected 'my_method' to have been called once. Called 0 times.

This is the test I should have performed with my original test. Had I done this I would have seen that the test passed in Python3.4 regardless of whether the method in question was actually called or not.

So now my test works in python3.6, fails in python3.5 because I'm using the method assert_called_once which was introduced in python3.6. Unfortunately it incorrectly passes in python3.4. So if I want my code to work properly for python versions earlier than 3.6, then I can essentially implement assert_called_once() with assert len(mock_my_method.mock_calls) == 1. If we do this then my test passes in all three version of python and fails in all three if we comment out the call MyClass().my_method().

Conclusions

I made an error in writing my original test, but my real sin was that I was a bit lazy in that I did not make sure that my tests would fail, when the code was incorrect. In this instance there was no problem with the code only the test, but that was luck. So for me, this served as a reminder to check that your tests can fail. It may be that mutation testing would have caught this error.

Flask and Pytest coverage

I have written before about Flask and obtaining test coverage results here and with an update here. This is pretty trivial if you're writing unit tests that directly call the application, but if you actually want to write tests which animate a browser, for example with selenium, then it's a little more complicated, because the browser/test code has to run concurrently with the server code.

Previously I would have the Flask server run in a separate process and run 'coverage' over that process. This was slightly unsatisfying, partly because you sometimes want coverage analysis of your actual tests. Test suites, just like application code, can grow in size with many utility functions and imports etc. which may eventually end up not actually being used. So it is good to know that you're not needlessly maintaining some test code which is not actually invoked.

We could probably get around this restriction by running coverage in both the server process and the test-runner's process and combine the results (or simply view them separately). However, this was unsatisfying simply because it felt like something that should not be necessary. Today I spent a bit of time setting up the scheme to test a Flask application without the need for a separate process.

I solved this now, by not using Flask's included Werkzeug server and instead using the WSGI server included in the standard-library wsgiref.simple_server module. Here is, a minimal example:

import flask

class Configuration(object):
    TEST_SERVER_PORT = 5001

application = flask.Flask(__name__)
application.config.from_object(Configuration)


@application.route("/")
def frontpage():
    if False:
        pass # Should not be covered
    else:
        return 'I am the lizard queen!' # Should be in coverage.



# Now for some testing.
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import pytest
# Currently just used for the temporary hack to quit the phantomjs process
# see below in quit_driver.
import signal

import threading

import wsgiref.simple_server

class ServerThread(threading.Thread):
    def setup(self):
        application.config['TESTING'] = True
        self.port = application.config['TEST_SERVER_PORT']

    def run(self):
        self.httpd = wsgiref.simple_server.make_server('localhost', self.port, application)
        self.httpd.serve_forever()

    def stop(self):
        self.httpd.shutdown()

class BrowserClient(object):
    """Interacts with a running instance of the application via animating a
    browser."""
    def __init__(self, browser="phantom"):
        driver_class = {
            'phantom': webdriver.PhantomJS,
            'chrome': webdriver.Chrome,
            'firefox': webdriver.Firefox
            }.get(browser)
        self.driver = driver_class()
        self.driver.set_window_size(1200, 760)


    def finalise(self):
        self.driver.close()
        # A bit of hack this but currently there is some bug I believe in
        # the phantomjs code rather than selenium, but in any case it means that
        # the phantomjs process is not being killed so we do so explicitly here
        # for the time being. Obviously we can remove this when that bug is
        # fixed. See: https://github.com/SeleniumHQ/selenium/issues/767
        self.driver.service.process.send_signal(signal.SIGTERM)
        self.driver.quit()


    def log_current_page(self, message=None, output_basename=None):
        content = self.driver.page_source
        # This is frequently what we really care about so I also output it
        # here as well to make it convenient to inspect (with highlighting).
        basename = output_basename or 'log-current-page'
        file_name = basename + '.html'
        with open(file_name, 'w') as outfile:
            if message:
                outfile.write("<!-- {} --> ".format(message))
            outfile.write(content)
        filename = basename + '.png'
        self.driver.save_screenshot(filename)

def make_url(endpoint, **kwargs):
    with application.app_context():
        return flask.url_for(endpoint, **kwargs)

# TODO: Ultimately we'll need a fixture so that we can have multiple
# test functions that all use the same server thread and possibly the same
# browser client.
def test_server():
    server_thread = ServerThread()
    server_thread.setup()
    server_thread.start()

    client = BrowserClient()
    driver = client.driver

    try:
        port = application.config['TEST_SERVER_PORT']
        application.config['SERVER_NAME'] = 'localhost:{}'.format(port)

        driver.get(make_url('frontpage'))
        assert 'I am the lizard queen!' in driver.page_source

    finally:
        client.finalise()
        server_thread.stop()
        server_thread.join()

To run this you will of course need flask as well as pytest, pytest-cov, and selenium:

$ pip install flask pytest pytest-cov selenium

In addition you will need the phantomjs to run:

$ npm install phantomjs
$ export PATH=$PATH:./node_modules/.bin/

Then to run it, the command is:

$ py.test --cov=./ app.py
$ coverage html

The coverage html is of course optional and only if you wish to view the results in friendly HTML format.

Notes

I've not used this extensively myself yet, so there may be some problems when using a more interesting flask application.

Don't put your virtual environment directory in the same directory as app.py because in that case it will perform coverage analysis over the standard library and dependencies.

In a real application you will probably want to make a pytest fixture out of the server thread and browser client. So that you can use each for multiple separate test functions. Essentially your test function should just be the part inside the try clause.

I have not used the log_current_page method but I frequently find it quite useful so included it here nonetheless.

Python and Lambdas

I come from a functional programming background, so I a lot of love for functions and so-called anonymous functions or lambdas. However, I have realised that I don't make use of Python's lambda syntax much, and I wanted to articulate why. This may mostly sound pretty negative towards lambdas, but bear in mind I'm certainly not against lambdas in general.

A lambda expression could always be avoided by simply defining the function in question. So the question becomes when is a lambda more readable than a definition? That is almost always because the function you're attempting to create is so simple that the definition syntax gets in the way. It is often used when the function in question is going to be used as an argument to some method. For example the sort method on a list takes as argument they key to be used when comparing the elements. So if you have a list of items and you want to sort them by their prices you can do something such as:

list_of_items.sort(key=lambda x: x.price)

We could of course have gotten the use of lambda with a definition:

def price(item):
    return item.price
list_of_items.sort(key=price)

This is not quite equivalent because it introduces a new name into the current scope that would not otherwise have been there, but I don't ever recall reaching for a lambda expression in order to avoid polluting the name space.

We could even define a generic higher-order function to make a function out of attribute access.

def f_attr(attr_name):
    def attr_getter(item):
        return getattr(item, attr_name)
    return attr_getter

list_of_items.sort(key=f_attr('price'))

This is surely worse than the first two attempts, but if you have multiple sorts to do, then using definitions can get tedious:

def price(item):
    return item.price
list_of_items.sort(key=price)
do_something(list_of_items)

def quantity(item):
    return item.quantity
list_of_items.sort(key=quantity)
do_something(list_of_items)

def margin(item):
    return item.margin
list_of_items.sort(key=margin)
do_something(list_of_items)

as compared with the lambda version:

list_of_items.sort(key=lambda x: x.price)
do_something(list_of_items)
list_of_items.sort(key=lambda x: x.quantity)
do_something(list_of_items)
list_of_items.sort(key=lambda x: x.margin)
do_something(list_of_items)

The f_attr version is similar:

list_of_items.sort(key=f_attr('price'))
do_something(list_of_items)
list_of_items.sort(key=f_attr('quantity'))
do_something(list_of_items)
list_of_items.sort(key=f_attr('margin'))
do_something(list_of_items)

In fact that could be done with a loop:

for attribute in ['price', 'quantity', 'margin']:
    list_of_items.sort(key=f_attr(attribute))
    do_something(list_of_items)

I think the lambda version is at least sometimes more readable. However, I often miss the self-documenting nature of a definition, in that giving the function a name acts as some pretty decent documentation especially for the kind of simple function you might otherwise use a lambda expression for.

So, for a lambda expression to be the right choice, the function in question has to be so simple as to not benefit from the extra documentation afforded by its name (such as simple attribute access).

I just don't seem to find myself in that situation very often. Perhaps I'm just not making my own functions general enough by accepting functions as arguments.

Minor Refactorings

Looking through the code of an open-source Python project I came across code that amounts to the following:

    host_name = self.connection.getsockname()[0]
    if self.server.host_name is not None:
        host_name = self.server.host_name

This looks odd because if the condition is True then we forget about the host_name derived from the call to self.connection.getsockname. However it could be that that call has some side-effect. Assuming that it does not we can re-write the above code as:

    if self.server.host_name is not None:
        host_name = self.server.host_name
    else:
        host_name = self.connection.getsockname()[0]

Or even, if you prefer:

    host_name = self.server.host_name if self.server.host_name is not None else self.connection.getsockname()[0]

Or:

    host_name = self.server.host_name
    if host_name is None:
        host_name = self.connection.getsockname()[0]

Or finally:

    host_name = self.server.host_name or self.connection.getsockname()[0]

Note that this last version has slightly different semantics in that it will use the call to getsockname if self.server.host_name is at all falsey, in particular if it is ''.

I love finding minor refactoring opportunities like this. I enjoy going through code and finding such cases. I believe if a project's source code has many such opportunites it hints that the authors do not engage in routine refactoring. However, this post is more about how such a refactoring interacts with Python's `@property' decorator.

The @property decorator

In this case we call getsockname, so when considering our refactoring we have to be careful that there are no significant side-effects. However even if the code had been:

    host_name = self.connection.host_name
    if self.server.host_name is not None:
        host_name = self.server.host_name

We would still have to go and check the code to make sure that either connection or host_name is not implemented as property and if so, that they do not have significant side-effects.

One question arises. What to do if it does have significant side-effects? Leaving it as is means subsequent readers of the code are likely to go through the same thought process. The code as is, looks suspicious.

First off, if I could, I'd refactor self.connection.host_name so that it does not have side-effects, perhaps the side-effects are put into a separate method that is called in any case:

    self.connection.setup_host_name()
    if self.server.host_name is not None:
        host_name = self.server.host_name
    else:
        host_name = self.connection.host_name

Assuming I don't have the ability to modify self.connection.host_name, I could still refactor my own code to have the side-effects separately. Something like:

class Whatever(...):
    def setup_host_name(self):
        self._host_name = self.connection.host_name

    ....
    self.setup_host_name()
    if self.server.host_name is not None:
        host_name = self.server.host_name
    else:
        host_name = self._host_name

I think that is better than the easy opt-out of a comment. Obviously in this case we have some control over the class in question since we're referencing self. But imagine that self was some other variable that we don't necessarily have the ability to modify the class of, then a more lightweight solution would be something like:

    if other.server.host_name is not None:
        other.connection.host_name
        host_name = `other`.server.host_name
    else:
        host_name = `other`.connection.host_name

This of course assumes that the side-effects in question are not related to other.server.host_name. An alternative in either case:

    connection_host_name = other.connection.host_name
    if other.server.host_name is not None:
        host_name = other.server.host_name
    else:
        host_name = connection_host_name

I'm not sure that either really conveys that other.connection.host_name is evaluated for its side-effects. Hence, in this case, I would opt for a comment, but you may disagree.

Conclusion

The @property decorator in Python is very powerful and I love it and use it regularly. However, we should note that it comes with the drawback that it can make code more difficult to evaluate locally.

A comprehensive test-suite would alay most fears of making such a minor refactor.

Finally, finding such code in a code-base is well worth refactoring. The odd bit of code like this is always likely to sneak in unnoticed, and can often be the result of several other simplifications. However, if you find many of these in a project's source code, I believe it is symptomatic of at least one of two things; either the authors don't engage in much refactoring or general code-maintenance, or there is a lack of a decent test-suite which makes such refactorings more of a chore.

Final irrelevant aside

In a previous post I made a suggestion about if-as syntax. I'm still against this idea, however this is the sort of situation it would (potentially) benefit. Our second syntax possibility could have been:

    host_name = self.server.host_name as hn if hn is not None else self.connection.getsockname()[0]

Whilst this is shorter and reduces some repetition, it's unclear to me if this is more readable.

List arguments and isinstance

More than a decade and a half ago, Kragen Javier Sitaker wrote a blog post isinstance() considered harmful, a lot of which I believe holds up pretty well today. It's well worth reading it in its entirety but the gist is that rather than testing a value against a specific type, you generally wish to check that a value confirms to a particular interface. Or if you prefer, using isinstance is imparting nominative typing in a language where duck typing is the convention.

I came across what I initially thought was a reasonable use of isintance and this post is about my attempts to remove the reliance on isinstance.

The case

Writing a web application often involves checking the contents of the current page for specific content. Usually in the form of CSS selectors, or simply text. I had written a simple method to check whether specific texts were contained in any elements matching a given CSS selector. For example, you may have some way of flashing messages and you want to check that after a given request the resulting web page contains the list of messages that you expect.

However, the general idea can be condensed to the more simple task of checking whether a list of texts are present with a main text. So the general idea is something like the following:

def contains_texts(content, texts):
    for t in texts:
        if t not in content:
            return False
    return True

test_content = "abcdefg qwerty"
assert contains_texts(test_content, ['abcdefg', 'qwerty']) # Passes

This works fine, but this is in testing code, so we want to make sure that tests do not inadvertently pass. The problem here is that the texts argument is intended to be a list, or more generally, a container of some strings. The potential problem is that it is easy enough to make the mistake of calling this method with a single string. Strings are iterable, so unfortunately the following assertion passes:

test_content = "abcdefg qwerty"
assert contains_texts(test_content, 'gfedcba') # Passes

This is because each of the characters in the single string argument are themselves in the main content, although the string itself is not.

A potential fix for this is to use isinstance on the argument:

def isinstance_contains_texts(content, texts):
    if isinstance(texts, str):
        return texts in content
    for t in texts:
        if t not in content:
            return False
    return True

test_content = "abcdefg qwerty"
assert isinstance_contains_texts(test_content, 'abcdefg') # Passes
assert not isinstance_contains_texts(test_content, 'gfedcba') # Passes

In this case we do the right thing if the method is called with a single string as the argument, but you could of course instead raise an error. So this basically solves our problem, but at the cost of using isinstance.

What happens if the content becomes a bytestring, and the input strings themselves are bytestrings:

assert isinstance_contains_texts(b'hello world', [b'hello', b'world']) # Passes
assert isinstance_contains_texts(b'hello world', b'olleh') # Passes

We have the same problem as before. We could of course solve this by using if isintance(texts, str) or isinstance(texts, bytes), but that feels like we're not really solving the essence of the problem.

Without isinstance

So instead of using isinstance we could check for what interface is supported by the argument. If we want to check that it is a singleton argument then we would be looking for something supported by strings and bytestrings, but not by, lists, sets, or other general containers:

>>> str_and_bytestring = [x for x in dir('') if x in dir(b'')]
>>> stringy_but_not_listy = [x for x in str_and_bytestring if x not in dir([])]
>>> stringy_but_not_listy
['__getnewargs__', 'capitalize', 'center', 'endswith', 'expandtabs', 'find', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

Okay so our test could be whether or not the texts argument has an attribute capitalize:

def interface_contains_texts(content, texts):
    if hasattr(texts, 'capitalize'):
        return texts in content
    for t in texts:
        if t not in content:
            return False
    return True

test_content = "abcdefg qwerty"
assert interface_contains_texts(test_content, 'abcdefg') # Passes
assert not interface_contains_texts(test_content, 'gfedcba') # Passes
assert interface_contains_texts(b'hello world', [b'hello', b'world'])
assert not interface_contains_texts(b'hello world', b'olleh')

Sort of success, but this feels as if we're not really writing down what we mean. We don't really care if texts has a capitalize attribute at all. What we want to avoid is texts being a single argument that happens to be iterable. We can check which attributes containers such as lists and sets have that strings or bytestrings do not:

>>> list_and_set = [x for x in dir([]) if x in dir(set())]
>>> str_or_bytestring = dir('') + dir(b'')
>>> listy_but_not_string = [x for x in list_and_set if x not in str_or_bytestring]
>>> listy_but_not_string
['clear', 'copy', 'pop', 'remove']

So we can check if the first argument has a 'clear' attribute:

def interface_contains_texts(content, texts):
    if not hasattr(texts, 'clear'):
        return texts in content
    for t in texts:
        if t not in content:
            return False
    return True

test_content = "abcdefg qwerty"
assert interface_contains_texts(test_content, 'abcdefg') # Passes
assert not interface_contains_texts(test_content, 'gfedcba') # Passes
assert interface_contains_texts(b'hello world', [b'hello', b'world']) # Passes
assert not interface_contains_texts(b'hello world', b'olleh') # Passes

Unfortunately, you now cannot pass a generator, the following fails with a type error because interface_contains_texts attempts to check if the generator is in the given string.

assert interface_contains_texts(test_content, (x for x in test_content.split())) 

Unfortunately there is no attribute that is on a list, set, and generator that is not on a string or bytestring. But generators do have a close attribute so we could do:

def interface_contains_texts(content, texts):
    if not hasattr(texts, 'clear') and not hasattr(texts, 'close'):
        ...

This does feel close to what we want. We are saying "If the argument is not a list, set style container, or a generator, then we treat it as one singleton".

All of this could be avoided

Another way to do this would be to allow a variable number of arguments into the contains_texts function:

def variable_args_contains_texts(content, *texts):
    for t in texts:
        if t not in content:
            return False
    return True

test_content = "abcdefg qwerty"
assert variable_args_contains_texts(test_content, 'abcdefg') # Passes
assert not variable_args_contains_texts(test_content, 'gfedcba') # Passes
assert variable_args_contains_texts(test_content, *['abcdefg', 'qwerty']) # Passes
assert variable_args_contains_texts(b'hello world', b'hello', b'world') # Passes
assert not variable_args_contains_texts(b'hello world', b'olleh') # Passes
assert variable_args_contains_texts(test_content, *(x for x in test_content.split())) # Passes

This works for all our cases. You do have to star the argument if you already have a list or generator:

assert variable_args_contains_texts(test_content, *['abcdefg', 'qwerty']) # Passes
assert variable_args_contains_texts(test_content, *(x for x in test_content.split())) # Passes

However, recall our original problem was that if you accidentally passed in a single string as the argument your test function may quietly, and erroneously, pass. However, if you forget the star to this function:

>>> variable_args_contains_texts(test_content, ['abcdefg', 'qwerty'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in variable_args_contains_texts
TypeError: 'in <string>' requires string as left operand, not list

You get an error. That's quite alright with me, this will flag up immediately that I've written my test incorrectly and I can fix it immediately. Even if I had the original contains_texts and update it to this variable args version, when I re-run my test-suite it will let me know of any places I've forgotten to update the call.

Conclusion

Its debateable whether or not the interface_contains_texts version was better than the original isintance_contains_texts version. I prefer it because I feel it is a bit more general. I definitely prefer the eventual variable_args_contains_texts version. I guess that's the main conclusion for now, that checking your uses of isinstance may lead to a better implementation.

Although I was able to use variable arguments to the solve the problem in this case, there will obviously be cases where that fix does not apply. However, the main takeaway is that you don't necessarily have to swap your isinstance use for a hasattr use, thereby directly translating the use of isinstance. It could be that by having a slightly broader look at the general problem you're trying to solve, another solution is available.

Login not required pattern

Introduction

Typically routes in a web application that require login are explicitly marked as such. Whilst routes that are open to general (anonymous) users, are left unmarked and hence implicitly do not require login. Because the default is to allow all users to visit a particular route, it is easy to forget to mark a route as requiring a login.

I'm going to show a small pattern for making sure that all routes in a web application are explicitly marked as either requiring login or not-requiring login. As an example this will be done in a Python, Flask-based web application using Flask-Login but the general idea probably works in at least some other Python web application frameworks.

The Basics

Let's start with a very simple Flask web application that just has two routes:

import flask

app = flask.Flask(__name__, static_folder=None)

@app.route('/unprotected')
def unprotected():
    return 'Hello everyone, anonymous users included!'

@app.route('/protected')
def protected():
    return 'Hello users, only those of you logged-in!'

if __name__ == '__main__':
    app.run()

Now, obviously we want the protected route to only be available to those users who have logged in, whilst the unprotected route is available to all. We'll leave the details of how users actually sign-up, log-in and log-out, see the Flask-Login documentation for examples.

To mark our protected route we can use the decorator provided by Flask-Login:

import flask_login
...

@app.route('/protected')
@flask_login.login_required
def protected():
    return 'Hello users, only those of you logged-in!'
...

I only added two lines, the import of flask_login and a second decorator to the protected method. This is how Flask-Login works. You decorate those routes that you wish to be protected. As I said, this scheme is fine, but it is easy enough to forget to mark a route that should be protected.

Explicitly mark all routes

The scheme I used, was to make a decorator that accepted a parameter. If the parameter is True then all routes associated with the decorated view function are marked as requiring login. In addition, this decorator sets an attribute on the view function itself indicating whether or not login is required. We can then check all view functions and assert that they have the chosen attribute. So first the new view functions, with a decorator we'll add.

@app.route('/unprotected')
@login_required(False)
def unprotected():
    return 'Hello everyone, anonymous users included!'

@app.route('/protected')
@login_required(True)
def protected():
    return 'Hello users, only those of you logged-in!'

So fairly simple stuff. Now to create the decorator:

def login_required(required):
    def decorator(f):
        if required:
            f = flask_login.login_required(f)
        f.login_required = required
        return f
    return decorator

This just calls the original @flask_login.login_required decorator in the case that the argument is True, but in addition adds an attribute to the view function that we can check later.

Ensure all are marked

So to ensure that all routes have been marked in some way you just need to check all view functions have the login_required attribute set:

for name, view_f in app.view_functions.items():
    message = "{} needs to set whether login is required or not".format(name)
    assert hasattr(view_f, 'login_required'), message

if __name__ == '__main__':
    app.run()

Now if you run this you will get such an error. That is because you don't set up all of your own routes, in particular Flask provides a static route. That's easy enough to ignore though:

ignored_views = ['static']

for name, view_f in app.view_functions.items():
    message = "{} needs to set whether login is required or not".format(name)
    assert name in ignored_views or hasattr(view_f, 'login_required'), message

if __name__ == '__main__':
    app.run()

Tidy up

There are just a couple of small caveats. Firstly, Flask-Login, provides a decorator to indicate that not only is login required, but it must be a fresh login. We can just allow passing "fresh" as an argument to our login_required decorator:

def login_required(required):
    def decorator(f):
        if required == 'fresh':
            f = flask_login.fresh_login_required(f)
        elif required:
            f = flask_login.login_required(f)
        f.login_required = required
        return f
    return decorator

To avoid polluting the namespace with our ignored_views name and to indicate what the code is doing without needing a comment we can wrap our check in a function:

def check_all_views_declare_login_required():
    ignored_views = ['static']

    for name, view_f in app.view_functions.items():
        message = "{} needs to set whether login is required or not".format(name)
        assert name in ignored_views or hasattr(view_f, 'login_required'), message
check_all_views_declare_login_required()

if __name__ == '__main__':
    app.run()

Flask extensions

Many Flask extensions provide their own routes which are mapped to view functions that they have defined. You could add those to the ignored_views list, but this seems like additional maintenance. Worse, something like Flask-Admin adds a large number of view functions. In addition, if you setup Flask-Admin to automatically generate model views for each of the models in your database, then if you add to your database model it will generate more view functions and you will have to update your ignored_views list again. The alternative is to perform your check_all_views_declare_login_required before you call init_app on your admin instance. Suppose you have:

import flask
import flask_login

import flask_debugtoolbar
from flask_sqlalchemy import SQLAlchemy

database = SQLAlchemy()
admin = flask_admin.Admin(name='My app', template_mode='bootstrap3')

.... Code to setup the database and add flask-admin model views ...

.... Actual web application code ...

check_all_views_declare_login_required()
# Initialise the Flask-Admin and Flask-DebugToolbar extensions after we have
# checked our own views for declaring login required.
admin.init_app(app)
flask_debugtoolbar.DebugToolbarExtension(app)

if __name__ == '__main__':
    app.run()

Final code

Just to provide the code in a simplest-as-possible form:

import flask
import flask_login

app = flask.Flask(__name__, static_folder=None)

def login_required(required):
    def decorator(f):
        if required == 'fresh':
            f = flask_login.fresh_login_required(f)
        elif required:
            f = flask_login.login_required(f)
        f.login_required = required
        return f
    return decorator


@app.route('/unprotected')
@login_required(False)
def unprotected():
    return 'Hello everyone, anonymous users included!'

@app.route('/protected')
@login_required(True)
def protected():
    return 'Hello users, only those of you logged-in!'

def check_all_views_declare_login_required():
    ignored_views = ['static']

    for name, view_f in app.view_functions.items():
        message = "{} needs to set whether login is required or not".format(name)
        assert name in ignored_views or hasattr(view_f, 'login_required'), message
check_all_views_declare_login_required()

if __name__ == '__main__':
    app.run()

Conclusion

I quite like that I now explicitly state for each route whether I require login or not, rather than relying on a default. You could of course extend this to your particular privileges scheme, for example you may have levels of authentication, such as, administrator, super-user, paid-user, normal-user, anonymous, or whatever.

Because the check is done whenever the module is imported, this check will also be performed when running your test-suite.

This won't work if your view functions are actually view methods because you won't be able to set the attribute on the view method. It is for exactly this reason that we had to use a list of ignored_views and could not just do:

    static_view_function = app.view_functions['static']
    static_view_function.login_required = False

An alternative to this scheme would be to create your own route decorator that takes as parameter whether login is required as well as the normal route information. Essentially combine route and login decorators.

if as syntax possibility

I have a small niggle that comes up in my Python programming. I'm going to describe it and propose a possible addition to the Python programming language that would mostly solve the problem. However, I have not thought through this language modification at all thoroughly so at the moment it's just a germ of an idea. It wouldn't be a major change or fix any important problem in any case, it would just solve my own personal peeve.

Properties

First of all, a brief reminder of the very useful 'property' decorator in Python. You may have an attribute on a class which is initially a simple value. However, at some point later you realise that you need to calculate that attribute everytime it is accessed. You may have something like:

class Person(object):
    def __init__(self, gender):
        self.gender = gender
        self.age = 0

You then realise that this means you would have to continually update the person's age. So instead you implement age as a "property":

class Person(object):
    def __init__(self, gender):
        self.gender = gender
        self.dob = datetime.today()

    @property
    def age(self):
        return (datetime.today() - self.dob).years

For the most part I like this ability and use it frequently. You may even have two distinct classes which both implement a particular protocol, part of which is that the object in question must have a particular attribute. It may be that for one kind of object the attribute is indeed a simple attribute, but that for another it must be calculated.

So properties are, I think, generally a good addition to the language. One downside is that sometimes a simple attribute access, my_object.my_attribute can be a more expensive operation than it looks because it is actually implemented as a property.

Peeve

In Python I find myself doing something like the following quite often:

if object.attribute:
    do_something(object.attribute)
else:
    do_something_else(object)

For example, I might be showing a user of a web application why they cannot perform some action they are attempting to:

if post.unpublishable_reasons:
    for reason in post.unpublishable_reasons:
        flask.flash(reason)
    return render_template('error.html', ...)
else:
    return render_template('published.html', ...)

Where post.unpublishable_reasons might be a property on a class which may take some time to calculate:

class Post(database.Model):
    ....
    @property
    def unpublishable_reasons(self):
        ... something that takes time and ultimately calculates reasons...
        return reasons

Which then leads me to have code like:

reasons = post.unpublishable_reasons
if reasons:
    for reason in reasons:
        flask.flash(reason)
    return render_template('error.html', ...)
else:
    return render_template('published.html', ...)

This just irks me as a little inelegant. So I confess that I would quite like Python to allow something like:

if post.unpublishable_reasons as reasons:
    for reason in reasons:
        flask.flash(reason)
    return render_template('error.html', ...)
else:
    return render_template('published.html', ...)

Comprehensions

This might also partially solve a problem with comprehensions in that you can filter-then-map but not map-then-filter. A filter-then-map looks like:

positive_doubles = [i * 2 for i in my_numbers if i > 0]

We first filter on each number (whether it is greater than zero) then each of those numbers which makes it through the filter is multiplied by two. It's a bit awkward to do the opposite, which is mapping the value first and then filtering on the result of the map. So if we wanted all the squares of a list of numbers that are less than 100 we require to do the operation twice:

small_squares = [i * i for i in my_numbers if (i * i) < 100]

Notice we had to have i * i twice. It might be that the operation in question is quite expensive, so instead we can have two comprehensions:

small_squares = [x for x in (i * i for i in my_numbers) if x < 100]

Of course if the list is very long this might be a bit slow because we iterate through it twice. Edit. Not true as we use a generator on the inner loop as pointed out in the comments below.

Now if we allow some kind of if-as syntax we might be able to do something like:

new_list = [y for x in old_list if f(x) as y]

This doesn't allow general filtering but would allow filtering out of falsey values. I'm much less keen on this as I feel if Python were to attack the map-then-filter problem for comprehensions then it should solve it completely.

In particular this would not work for the small_squares example, for that we would need to allow something like:

small_squares = [x for i in my_numbers if (i * i as x) > 100]

Note that this is even more of an extension than that proposed. That is assigning x to a particular sub-expression of the condition.