News about my FOSS projects

Monday, December 21, 2009

Assessment of unittest 2.7 API : new features and opinions

Yesterday I found the time to review the new features introduced in unittest module for Python 2.7a. In this post I will mention what's new and express my opinion about the current semantics of the decorators included in there. I invite you to discover the reasons that make me think that they may be confusing. Perhaps you will agree with me that some enhacements will be very helpul or, otherwise, that current documentation has to be more explicit and explain the tricky part of it. I am concerned about your health : let's prevent your headaches. If you are not very familiar with unittest module for Python 2.7a I recommend you to read an article about standard Python testing frameworks, and an overview of the xUnit paradigm.

What's new in unittest (Python 2.7a) ?

In this version the API has been extended. In order to get this done the former implementation was patched. This means that the new features are included in the same classes you used before. If read until the end you will understand why I don't like this approach. On the other hand unittest module for Python 2.7a becomes a package. It contains the following modules :

  • __init__: Provides access to the core classes implemented in sub-modules
  • __main__: Main entry point. Wraps main.
  • case: Everything you need in order to write functional test cases.
  • loader: Test loaders can be found here.
  • main: unittest module for Python 2.7a main program. Used to run tests from the command line.
  • result: Contains core test results.
  • runner: Includes test runners and their (specific) test results
  • suite: Includes suites and classes used to group test cases
  • util: Various utility functions

Here I just wanted to mention that I think it would be nice to have a module named unittest.exceptions in order to define all unittest-specific exceptions in there.

The first modification is that most of fail* methods have been deprecated. The reason to do this is to use a common and less redundant vocabulary. Possitive assertions starting with assert prefix still remain. Some of them have been enhanced. For instance, it is possible to register type specific equality functions that will be invoked by assertRaises and will generate a more useful default error message. On the other in this version you will also find new assertion methods. The ones I find more useful are the following :

  • assertMultiLineEqual(self, first, second, msg=None): Assert that two multiline strings are equal
  • assertRegexpMatches(text, regexp[, msg=None]): Very useful. Check that a pattern (i.e. regular expression) matches a text.
  • assertSameElements(expected, actual, msg=None): Check that two sequences contain the same elements.
  • assertSetEqual(set1, set2, msg=None): Compare two sets and fail if not equal. However it fails if either of set1 or set2 does not have a set.difference() method.
  • assertDictEqual(expected, actual, msg=None): Very useful. Fail if two dictionaries are not equal, considering their key/value pairs.
  • assertDictContainsSubset(expected, actual, msg=None): Fails if the key/value pairs in dictionary actual are not a superset of those in expected.
  • assertListEqual(list1, list2, msg=None) and assertTupleEqual(tuple1, tuple2, msg=None): Similar to assertSameElements but considers the repetitions and the ordering of items in a tuple or list.
  • assertSequenceEqual(seq1, seq2, msg=None, seq_type=None): Similar to assertListEqual and assertTupleEqual but can be used with any sequence.

Here you have an example. If you set the test case instance's longMessage attribute to true then the following detailed messages are also shown even if you specify your own. Otherwise the custom message overrides the buit-in message.

from unittest import TestCase, main

class AlwaysFailTestCase(TestCase):
def test_assertMultiLineEqual(self):
l1 = r"""This is my long
very long
looooooooong line
l2 = r"""
This one is not so long ;o)
self.assertMultiLineEqual(l1, l2)
def test_assertRegexpMatches(self):
# Verbatim copy of trac.notification.EMAIL_LOOKALIKE_PATTERN
# the local part
r"[a-zA-Z0-9.'=+_-]+" '@'
# the domain name part (RFC:1035)
'(?:[a-zA-Z0-9_-]+\.)+' # labels (but also allow '_')
'[a-zA-Z](?:[-a-zA-Z\d]*[a-zA-Z\d])?' # TLD
self.assertRegexpMatches("", EMAIL_LOOKALIKE_PATTERN,
                     "This one is ok")
self.assertRegexpMatches("This is not an email address",

def test_assertSameElements(self):
self.assertSameElements([1,2,2,2,1], [1,2]) # ok
self.assertSameElements([1,2,3,2,1], [1,2])
def test_assertSetEqual(self):
self.assertSetEqual(set([1,2,3]), set([1,4]))
def test_assertDictEqual(self):
self.assertDictEqual({1: 1, 2: 2, 3: 3}, {2: 2, 3: 3, 1: 1})  # ok
self.assertDictEqual({1: None}, {1: False})
def test_assertDictContainsSubset(self):
self.assertDictContainsSubset({1: 1}, {1: 1, 2: 2, 3: 3}) # ok
self.assertDictContainsSubset({1: 1, 4: 4}, {1: 1, 2: 2, 3: 3},
                         "(4,4) is missing")
def test_assertListEqual(self):
self.assertListEqual([1,2,3,4], [1,3,2,4])
def test_assertTupleEqual(self):
self.assertListEqual((1,2,3,4), (1,3,2,4))


Let's pay attention to the results

test_assertDictContainsSubset (__main__.AlwaysFailTestCase) ... ERROR
test_assertDictEqual (__main__.AlwaysFailTestCase) ... FAIL
test_assertListEqual (__main__.AlwaysFailTestCase) ... FAIL
test_assertMultiLineEqual (__main__.AlwaysFailTestCase) ... FAIL
test_assertRegexpMatches (__main__.AlwaysFailTestCase) ... FAIL
test_assertSameElements (__main__.AlwaysFailTestCase) ... FAIL
test_assertSetEqual (__main__.AlwaysFailTestCase) ... FAIL
test_assertTupleEqual (__main__.AlwaysFailTestCase) ... FAIL

ERROR: test_assertDictContainsSubset (__main__.AlwaysFailTestCase)
Traceback (most recent call last):
File "<stdin>", line 36, in test_assertDictContainsSubset
File "/usr/bin/python2.7/lib/unittest/", line 723, in assertDictContainsSubset
standardMsg = 'Missing: %r' % ','.join(missing)
TypeError: sequence item 0: expected string, int found

FAIL: test_assertDictEqual (__main__.AlwaysFailTestCase)
Traceback (most recent call last):
File "<stdin>", line 32, in test_assertDictEqual
- {1: None}
+ {1: False}

FAIL: test_assertListEqual (__main__.AlwaysFailTestCase)
Traceback (most recent call last):
File "<stdin>", line 38, in test_assertListEqual
AssertionError: Lists differ: [1, 2, 3, 4] != [1, 3, 2, 4]

First differing element 1:

- [1, 2, 3, 4]
?        ---

+ [1, 3, 2, 4]
?     +++

FAIL: test_assertMultiLineEqual (__main__.AlwaysFailTestCase)
Traceback (most recent call last):
File "<stdin>", line 10, in test_assertMultiLineEqual
+     This one is not so long ;o)
- This is my long
-     very long
-     looooooooong line

FAIL: test_assertRegexpMatches (__main__.AlwaysFailTestCase)
Traceback (most recent call last):
File "<stdin>", line 23, in test_assertRegexpMatches
AssertionError: Regexp didn't match: "[a-zA-Z0-9.'=+_-]+@(?:[a-zA-Z0-9_-]+\\.)+[a-zA-Z](?:[-a-zA-Z\\d]*[a-zA-Z\\d])?" not found in 'This is not an email address'

FAIL: test_assertSameElements (__main__.AlwaysFailTestCase)
Traceback (most recent call last):
File "<stdin>", line 27, in test_assertSameElements
AssertionError: Expected, but missing:

FAIL: test_assertSetEqual (__main__.AlwaysFailTestCase)
Traceback (most recent call last):
File "<stdin>", line 29, in test_assertSetEqual
AssertionError: Items in the first set but not the second:
Items in the second set but not the first:

FAIL: test_assertTupleEqual (__main__.AlwaysFailTestCase)
Traceback (most recent call last):
File "<stdin>", line 40, in test_assertTupleEqual
AssertionError: First sequence is not a list: (1, 2, 3, 4)

Ran 8 tests in 0.015s

FAILED (failures=7, errors=1)

Have you noticed anything weird ? Well, yes, assertDictContainsSubset method has a bug. The former implementation (rather than this one) was very stable and testers could close their eyes and write their tests. This one needs to be improved yet (this is one of the things I don't like).

TestResult class has been enhanced too. The methods startTestRun() and stopTestRun() have been added. They are very useful. The former allows to prepare and activate the resources needed for reporting purposes before starting the whole test run; whereas the later may be used to tear them down.

Now we have test discovery in the standard library. You can use, pattern='test*.py', top_level_dir=None) method so as to find and return all test modules from the specified start directory, recursing into subdirectories to find them. Only test files that match pattern will be loaded. It relies on a hook known as the load_tests protocol ... but I am not really very fond of this implementation. I prefer to use dutest.PackageTestLoader, but it seems that guys don't like it ...

Finally the command line interface now allows to run all the tests defined in a module or a class. Formerly it was only possible to run individual test methods.

Advanced features in TestCase class

PixelBlocks/001/PythonImage by ptshello via Flickr

Another feature introduced in this version is resource deallocation. Let'sconsider that you are writing your test cases and you have some brances and loops in your code. In one such branch you allocate some resources that are not used in the other branches. In this case your test case might not allocate the resource at all, but if it does then you need to release it once everything's done. By using addCleanup(function[, *args[, **kwargs]]) you can add a function to be called after tearDown() to cleanup resources used during the test. This can help you since you won't need to write complex rules checking if the resource is allocated or not. These functions will be called in reverse order to the order they are added (LIFO). They are called with any arguments and keyword arguments passed into addCleanup() when they are added. All this process takes places inside doCleanups() method.

Custom test outcomes

A feature that coders demand more and more is support for custom test outcomes. This is very important because sometimes testers need to bind test data with test cases, and reload it after test execution in order to perform post-mortem analysis. The new API support expected failures, unexpected successes and skipped test cases. What's the syntax ?

Test methods can be skipped by using a skip decorator. According to the documentation basic skipping looks like this :

class MyTestCase(unittest.TestCase):

@unittest.skip("demonstrating skipping")
def test_nothing(self):"shouldn't happen")

@unittest.skipIf(mylib.__version__ < (1, 3),
              "not supported in this library version")
def test_format(self):
 # Tests that work for only a certain version of the library.

@unittest.skipUnless(sys.platform.startswith("win"), "requires Windows")
def test_windows_support(self):
 # windows specific testing code

As you can see every condition specified in the example shown above is static. In other words, if you evaluate it over and over the result should be the same. What happens if you need to skip a test if it is run after a given time. You might think that the example will fix your problem, but it won't:

from unittest import TestCase, main, skipIf
from datetime import datetime, time

class TimedTestCase(TestCase):
@skipIf( > time(0, 7), "Timeout")
def test_nop(self):
ts =
self.assertEqual(1, 0, "Current time %s" % (ts,))


Take a look at the following three results

test_nop (__main__.TimedTestCase) ... FAIL

FAIL: test_nop (__main__.TimedTestCase)
Traceback (most recent call last):
File "<stdin>", line 5, in test_nop
AssertionError: Current time 00:05:57.390000

Ran 1 test in 0.000s

FAILED (failures=1)

test_nop (__main__.TimedTestCase) ... FAIL

FAIL: test_nop (__main__.TimedTestCase)
Traceback (most recent call last):
File "<stdin>", line 5, in test_nop
AssertionError: Current time 00:07:04.109000

Ran 1 test in 0.000s

FAILED (failures=1)

test_nop (__main__.TimedTestCase) ... skipped 'Timeout'

Ran 1 test in 0.000s

OK (skipped=1)

The first one and the last one behave just like expected. However the second is run at after the deadline and the test case is not skipped. What's wrong ? In order to answer that question we need to dive into the xUnit paradigm and pay attention to the steps involved in the whole process. If you have not read the an article article about standard Python testing frameworks or if you forgot, well, here we go. Firstly test case classes are created when their declarations (i.e. class statements) are executed. Next the test cases running specific test methods are instantiated by TestLoaders. Finally the test cases are executed thereby performing assertions about the behavior of the system under test. If you think that the condition supplied in to skipIf is evaluated at this time then you are wrong. The second example fails because the test class was created before 12:07 AM and the condition was evaluated at this moment. So the decorator decided to execute the test case. When it was run at 12:07:04.109000 AM the decision was already made. On the other hand in the other two examples both events happen respectively before and after the deadline. If you want to know my personal suggestion is BE CAREFUL.

Well, finally, at the TestResult level there is a separate list for each novel test outcome (i.e. skipped, expectedFailures, unexpectedSuccesses). Each one contains binary tuples of TestCase instances and informative messages (i.e. strings). Entries are appended to those lists by calling one of TestResult.addSkip(test, reason), addExpectedFailure(test, err), addUnexpectedSuccess(test).


My opinion is that the basic unittest classes we all know have been there since so long and are very stable. Therefore you can rely on them in order to write your test cases. On the other hand the former API covered >80% of the test cases you will write. I agree with all those saying that it is nice to include further tools in the standard library to support the 20% that's left, or to enhance the interoperability between third-party frameworks providing solutions for sophisticated requirements. Nonetheless the current implementation introduces the following issues:

  • Additional and more complex semantics, and specially beyond the scope of basic testing concepts.
  • Monolithic design. Suppose that other (useful) features will be added in upcoming versions so as to cover the 10% not covered yet. Will we modified the core classes once again in order to get it done ?
  • All this means that it's not extensible.
  • Potential incompatibilities with previous versions (and|or) new bugs introduced, and users won't be able to fall back to the former stable classes.
  • The semantics of decorators are tricky, and they do not include annotations in the target test cases.
  • It does not offer a common (stable and extensible) infraestruture to report custom test outcomes.
  • It's not possible to determine whether test cases are skipped due to an error condition (i.e. a failure) or a recoverable situation (i.e. a warning).

I do think that the new functionalities are very important. However I still consider that it is better to leave the classes just like they were before and incorporate further details in new extensions built on top of them. In my opinion, that's the spirit of object-oriented APIs like xUnit frameworks. Let's find the way to make it better !

What do you think ? Will you join us ?

Reblog this post [with Zemanta]