Home      |      Products      |      Services      |      About      |      Portfolio
  • About
    Mission statement, culture, and background information.
  • Services
    extollIT provides specialized and targeted application software development services backed with support commitment.
  • Portfolio
    Interested in reviewing some of my work? Click here;
  • Contact
    Need to get in-touch with us? click here.

Why Automated Tests Suck

07-Nov-2014

...or how to build a test plan that doesn't kill fairies.  This post discusses a few core automated test plan guidelines.  Bad tests can actually defeat the purpose of automated tests entirely and really bad tests make manual testing a better option (wat?).

Noise vs. Signal

Let's take a moment to compare a test plan to something that would make Carl Sagan feel like a kid in a candy store: exoplanet research!  Scientists often search for exoplanets by seeking-out an oscillating signal amid a great deal of noise.  Essentially you're looking for evidence of a wobble in the star's movement because of a planet's (or a borg transwarp nexus') gravity is tugging on it as it revolves around its star.  But at such great distances away such a signal is encumbered by a myriad of problems.  The signal to noise ratio for this analysis is less than 1%.  Finding genuine planets is incredibly difficult and false positives are common (Zarmina).  What's my point?  The more noise, the harder it is to isolate a genuine signal.  Similarly, if you have tests that fail (and it's ok, those failures don't really count) then it becomes much harder to isolate genuine test failures from test failures that "don't matter."  Particularly for a release test plan, your release test plan should have zero failures and zero skipped tests.  The more you skip tests or ignore tests, the more you erode its utility and reliability as a device that produces a definitive and objective result.

I'm a New Hire Idiot

Another reason to ensure that your release test plan has absolutely no failed or skipped tests is because your test plan should be self-evident and be independent of idiomatic understanding.  For example you should never have to say that "test x is failing because we haven't updated the x87g filter data model for the cogsworth subsystem metacontroller yet, just ignore it because I don't think anybody uses that component anyway."  This is an example of idiomatic understanding and causes the naive person to lose trust in your test plan (and pull their hair out).  When new hires join your organization (or transfers from another team) often ramping-up on idiomatic understanding is so difficult because it is fragmented, subjective, inconsistent, and exemplary of the human vices that lead to conflicts.  Make tests a priori in their function and purpose.

Tests as Strong as Bismuth!

Tests should not be so brittle that you're spending more time updating your test data than you are writing code that brings value to your users.  In fact, when this is true often the tests are abandoned and left failing and then ignored and then forgotten and yet they still linger and co-ops ask about them and get an earful about the myth of the one test and the war of the first build, but it was all for naught because Isildur was struck down by a brittle test; volatile data pierced his heart.  And yet no co-op has been found worthy to take the one test deep into JMeter and destroy its design once and for all.

One might think that tests like these ought to be removed, but it isn't clear to the average person whether that would do more harm than good.  If a test fails too often then one of two things has to happen: either you update the test data, or you redesign the test.  Removal of a test is quite simply a step backward.  Someone saw fit to provide that coverage initially, so upon what basis does one reason that such coverage is no longer necessary?  One side-effect of test removal is that can signal to the naive person that the associated features have also been removed (then you get a call from an angry customer threatening to sue your company for breach of contract when the feature broke and was never fixed, well, more often they will simply move-on and invest their dollars elsewhere silently.  The sad thing is this will happen too with brittle constantly failing tests that go ignored).

Tests Express Simple Statements

Each individual test should make a specific statement that can be easily tested for falsity.  Here are some examples:
  1. When I call the delete function passing existing user x to it, it should yield the HTTP 200 success response.
  2. If I call the delete function passing a non-existent user x to it, it should yield the expected HTTP 404 response.
  3. When I call the get model foo function, the response body should contain a JSON object containing at least all of the following field names.
Avoid writing tests that involve too many test conditions and assertions.  This may manifest as either multiple assertion statements or for example the exclusive comparison of a result against a template.  These are both examples of testing too many conditions in one test and does not resolve to a clear statement.  For example, compare the following two statements:
  1. The result should contain the following fields in this precise order and contains nothing else besides those fields and have values match these other values precisely.
  2. The result should contain at least the following fields.
Number two is a good definition for a test whereas number one can be broken-down into 4 separate statements (and consequently 4 separate tests).  Tests that test multiple conditions are often duplicating those conditions in other tests, and when they fail it simply adds more noise that developers and QA have to sift through like a child drowning in a playpen full of plastic balls.  Test plan noise erodes usability, refined targeted tests that reflect clear and concise statements increase clarity and usability.

It's Not My Fault! The Server Was Down

Tests should be consistent in terms of the environment and resources upon which they depend.  The other dimension to consider besides space is time.  A good way to rattle a developer's nerves is to introduce a test that downloads JSON from some bizarre and esoteric web service in China on Wednesdays unless it's a full-moon with the exception of when Mars is at apposition precisely 39.2 hours before daylight savings.  This is a hyperbole, I don't think anyone would actually write a test like this (knock on wood), but it illustrates a few scoping problems concerning time and space (e.g. China doesn't even use DST).  A reliable, self-contained, stateless test should not be contingent upon transient state such as specific dates, times and system locale.  With respect to the environment, all of the above can break, and it's never clear when that occurs.  A machine's internal clock can get thrown out of whack, or the system locale may be subtly different (e.g. en_CA vs en_US).

The environment and resources for which a test plan requires should be reproducible upon one machine and one machine only.  If it is absolutely necessary to introduce external dependencies then categorize your tests (discussed later).  Furthermore, tests should never assume locale or timings, you are not the master of the universe, He-Man beat you to the punch decades ago; or a century ago if this blog is still around eighty some-odd years from when it was written; or I suppose any x number of years ago from now (your now, not my now) since it's impossible for me to predict when this post will be read (...or how a crazy person made a point by stumbling into his own dog food, tune in at 6pm tonight for the full story).  Here are some examples of unhelpful failures of tests using such kinds of resources:

  1. An authentication token used in requests expires on a specific date
  2. A DNS failure occurs for a host
  3. A static IP was changed for a host
  4. A host was down
  5. Matching localized error messages against a template
  6. Testing values from a database maintained by another team / department
  7. Using sleep to wait for some resource to become available
And suggested resolutions for these conditions:
  1. Generate a token as part of the test setup phase, destroy it as part of teardown.
  2. Use multiple DNS otherwise see #4.
  3. Never ever rely on static IPs, they're not even descriptive and you don't have a monopoly on what IT does with the infrastructure.
  4. Rather than fail the test, flag it as skipped, categorize this test elsewhere.
  5. Compare error codes, not messages
  6. Have your own copy of the database, update the database in tandem with the tests.  Consider the database as part of the coverage.
  7. Either poll or block, but don't sleep because it's not reliable to predict how much time a resource will require.  Apply a timeout, and if it elapses then flag the test as skipped, categorize this test elsewhere.
In general a test contingent upon external resources that become unreachable from time to time, if you must write such tests, they should not fail when the external resources become unavailable, rather they should be flagged as skipped. i.e. employ trinary state: pass, fail, skipped.

One may argue that it is sufficient to run the tests again if they fail because they often succeed eventually.  This is essentially saying that it's okay that these features don't always work, so don't worry if it blows-up, just keep hitting refresh until something shows-up.  This is another example of idiomatic understanding that incurs an additional cost: productivity.  Running a suite of tests is very time-consuming.  Depending on your organization a test suite can take anywhere from half an hour to more than a day (maybe even more?).  If you expect people to simply re-run the test suite to get tests to pass, you're asking them, in the worst case scenario, to invest another day particularly if your tests aren't entirely stateless.

Categorize Your Tests

This is more of a suggestion than a rule of thumb.  TDD is not an exact science, in fact software engineering as a whole is an inexact science.  If you must have tests contingent upon transient, temporal or volatile state, then divide your test plan into categories characterized by the type of test:
  • Self-contained tests
  • Tests that interact with external resources
  • Tests involving temporal state
  • Tests that cannot be run concurrently with others
Additional to categorizing tests, thoroughly document any unreliable characteristics of the test so that the naive person will understand better what's happening when it is mysteriously skipped.  All test plan categories should be stored within the same test plan.  In other words, when someone opens the test-plan, all tests should be visible or at least evident and no tests should be hidden away in some obscure folder of the file-system.  Each category should also have a description.

Build a House then Tear it Down

Tests should be stateless, idempotent, without side-effects.  Running one test should not subsequently affect the result of another test.  It should be possible to run any number of tests, cherry-picked or otherwise, in any order with consistent results each time.  Each and ever test should have its own setup and teardown.  Each test should setup the state it requires and teardown that state afterward leaving nothing behind that wasn't there initially and restoring anything that was there initially.  This may dramatically increase the time to execute a test plan, however a high-performance set of tests is not conducive to the fundamental purpose for a test plan.  Troubleshooting tests whose results appear to vary with each other is notoriously difficult particularly if such variance is not reflected in the purpose of the test.

Everyone Loves Minions

...or many threads to do your bidding.  Design your tests in such a manner that they can be forked and run concurrently.  The reason for this is to improve the performance of your test plan and make it scale so that you can simply farm it out to as many machines as you need to reduce that day-long test suite execution to just under an hour.  Another notable benefit of such a design depends largely on your product, but particularly for service-oriented software, concurrency is a critical factor of realistic use cases.

Here are some good design principles for a test plan that scales:

Remember Test-Driven Development?

Before you sit down to write a feature consider that it should be testable.  Think about testable code with respect to tests that can scale.  Can your code reliably service multiple requests concurrently without race conditions?  If you find it is too difficult to write tests that can be executed concurrently without randomly failing then it's likely not the test rather the feature design that's implicated.  This is why it's important to think about the test before implementing the feature.  The feature and its tests (the tests and its feature?) have an inextricable relationship.

Unique IDs

When you establish state for your test, or more simply put, creating records during setup, then create your entities with unique names keyed by the test name and current thread executing in the test plan.  This should be little more complex than using a name and a numeric sequence.  A numeric sequence to represent thread is more helpful than some ubiquitous unique ID because for example in the case of a race condition it can hint at the point of failure during thread interleaving.  For example, if you're testing deleting users, and the user name is the primary key, come-up with a username like "bob.smith-deletetest-thread-3."  When you tear down your state, rather than track the entities created during setup, remove all entities that match the template used to generate them (e.g. remove all "bob.smith-deletetest-thread-< any number >" entities).  This way if there are side-effects from a previous execution of the test, this will account for them.  It is also good practice to perform this before setup as well for it is reasonable to assume that the test owns any state keyed by a specific pattern that the test uses during setup especially if the name of the test is part of that key pattern.  Here is some example code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
function cleanup()
{
    var objects = service.getUsers();
    var re = /^bobsmith-deletetest-thread-\d+$/;
     
    for (var c = 0; c < objects.length; ++c)
    {
        if (re.test(objects[c].name))
            service.deleteUser(objects[c].name);
    }
}
function setup()
{
    cleanup();
    service.createUser({
        name: 'bobsmith-deletetest-thread-' + threading.getCurrentThreadNum()
    });
}
function teardown()
{
    cleanup();
}

Some tests by their nature imply a has before relationship, for example the "test global lockout policy" test that locks out all users for security reasons would probably cause all other tests to fail too.  It may be prudent to separate these tests into yet another isolated category so that the distinction in test characteristics is clear.

Last Word

In this post I discussed how test noise can erode the usability and reliability of a test plan.  Eventually people will learn not to trust what the tests are telling them and bugs are permitted to fly beneath the radar.  I also discussed how this is related to idiomatic understanding within the scope of the company, team, or even individual responsible for a particular test, and how this leads to conflicts and misunderstandings and broken builds and weeping release managers.  Tests should be focused and targeted and not try to account for a lot of different conditions.  Tests that involve external dependencies or temporal, transient and volatile resources are also more brittle than tests with a more self-contained environment.  Categorize tests based on their nature so that if you must have tests more brittle than others, at least it's clear which ones are more brittle than others and document the why and the how (to resolve).  Tests should be tasteless, I mean stateless (it's lunchtime and I skipped breakfast).  Where possible, build tests that can be executed concurrently to improve performance and ensure that your features can handle concurrency well.

Mars Tycoon, The "Sandbox World" Strategy Game

09-Jun-2011

extollIT Enterprises is developing a new kind of innovative video game. We call this genre the Sandbox World game. There are three other games I know of that are similar to our concept, but only one can be credited with its inspiration:
  • Dwarf Fortress
  • Infiniminer
  • Minecraft
  • Roblox
If you haven't heard of these games, I suggest trying Minecraft or Roblox. Dwarf Fortress has inspired Mars Tycoon, but it is very difficult to get into if you are a novice gamer.

Our small team of three has been developing this game since the fall of 2008. You are a space-pioneer in the year 2492 and desire to escape the chaotic political climate of Earth to investigate a strange new mineral on Mars (and beyond) first discovered on a small asteroid that impacted Earth. Scientists have dubbed it with the name Loganite and although it's chemical signature has long been familiar to scientists it has never been found in nature let alone in vast quantities.

Loganite has significant commercial value as a nuclear fuel as well as having other peculiar benefits. Scientists are only just beginning to discover Loganite's value to mankind.

Your job is to establish a mining base on Mars, extract the Loganite, and compete against other nations scrambling for the mineral. You will have to defend your mining base from saboteurs, natural disasters, and skirmish attacks.

Launch of Mars Tycoon is scheduled for 2015. If you posses any of the following skills / experience / knowledge and would be interested in joining our team, please contact us using the About -> Contact Us menu at the top of the page:
  • 3D Modeling / Animation
  • 3D Programming Experience
  • Programming for CG, CUDA, PhysX
  • AI Programming Experience
  • Biochemical Engineering
  • Mechanical Engineering
  • Civil Engineering
  • Chemical Engineering
  • Geophysics
  • Horticulture
  • Astrobiology
  • Genetic Engineering
"Wait a sec! These are pretty steep qualifications!"
Actually not really. Although some of these "qualifications" demand post-doctoral work, you don't even need to have a post-secondary degree here. Self-directed study is very acceptable. For example, we are interested in people with Chemical Engineering knowledge to help us design a realistic Asimovian Sci-Fi experience.

We have a rich and comprehensive vision for Mars Tycoon in terms of both storyline and gameplay mechanics, so Mars Tycoon will be an introductory game into this universe. If it is successful, we plan to release sequels and prequels each sporting both an expanded storyline and new gameplay mechanics.

Computing Arc-Length of a Spline - Advanced

15-Jan-2010

Here is a snippet of my own software used in another project that computes the arc-length of a spline using the Gauss-Legendre quadrature (numeric method of integration) implemented with recursive template meta-programming in C++:

First-off, you will need a constant 5x2 array of doubles to initialize the abscissae for, which is essentially a pre-computed table of the roots of the Legendre polynomials of order n. In this example I use order of 5, which was accurate enough for my purposes:
 const double ABSCISSAE_5[5][2] = {
-
0.90617984593866399280, 0.23692688505618908751,
-
0.53846931010568309104, 0.47862867049936646804,
0.0
, 0.56888888888888888889,
+
0.53846931010568309104, 0.47862867049936646804,
+
0.90617984593866399280, 0.23692688505618908751
};

Stub out an exception to generate runtime errors when incorrectly implementing the class:
 class CalcEx : public std::exception {};

Below the recursive class template is defined. It uses template-inheritance and since templates are interface-agnostic, you will need to define this method on your class: inline real f (const real t) const. It's essentially providing the implementation for an abstract method declared in the base class.
 template <typename T, typename Derived, int NN>
class
GaussLegendreInt
{

private
:
template
<int N> T abscissa (const unsigned int i, const unsigned int j) const
{

throw
CalcEx ();
}

template
<> T abscissa <5> (const unsigned int i, const unsigned int j) const
{

return
static_cast <T> (ABSCISSAE_5[i][j]);
}


template
<int N> T summate (const T val, const T a, const T b) const
{

return
val + summate <N - 1> (
abscissa <NN> (N - 1, 1) *
static_cast
<const Derived *> (this) -> f(
(
b + a) / 2 +
((
b - a) / 2) * abscissa <NN> (N - 1, 0)
),

a, b
);
}

template
<> T summate <0> (const T val, const T a, const T b) const { return val; }
public
:
inline
T compute (const T a, const T b) const
{

return
((b - a) / 2) * summate <NN> (0, a, b);
}
};


Now it's time to put our algorithm to use and build a class that takes a spline object (usually four points or two points and two tangents) and computes the arc-length using the Gauss-Legendre algorithm we've prepared:
template <typename Spline, typename real>
class
ArcLength : private GaussLegendreInt <real, ArcLength <Spline, real>, 5>
{

private
:
const
Spline & _spline;

public
:
inline
ArcLength(const Spline & spline)
:
_spline(spline) {}

inline
real f (const real t) const
{

return
MAGNITUDE(_spline.computeAt (t));
}


static inline
real calculate (const Spline & spline)
{

return
ArcLength(spline).compute(0, 1);
}
};

The spline object assumes a function called computeAt that returns a vector (2D, 3D or whatever) of which the function MAGNITUDE computes the vector magnitude of. You will define these yourself including your implementation of splines. There are plenty of examples on the Internet on how to implement splines. Some examples include Cubic B-Splines, Catmull-Rom Splines, and NURBS.


Emerging Cultural Acceptance of Video Games

21-Jul-2009

For decades now, video games have been evolving. And they are still evolving; the fat lady hasn't sung. One of the key things happening here is that video games are still searching for its primary role in society. Video games have proven themselves both financially and artistically, but still needs to gain cultural and social respect. Right now video games are experiencing the honeymoon of money-making and about to experience the hang-over as game development costs soar into the millions. We can't respect video games as a medium of communication making positive contributions to society. Games like Grand Theft Auto don't do much to instill confidence in moms and dads. Video games have earned a very tainted track record. It's no surprise why society rejects this medium.

Games like Grand Theft Auto or World of Warcraft are just the sort of games that sell and pay the bills. Beneficial and "socially acceptable" games employ realistic, and therefore unpopular, consequences; it's not easy to win, and you're not always the over-powered hero that traditional video games make you out to be. There are some positive side-effects to playing a game like World of Warcraft, but in general very little of the experience translates to the real world and so in terms of time the cost outweighs the benefit. When money is on the line, there is a strong voice that rejects any titles that do not elicit violence, hyper-sexualization, or ape-man thinking, a voice that the industry caters to almost exclusively. Let's get our minds off this one-track approach to games. And let's also do away with partisan anti-gaming rants using Doom or Wolfenstein 3D as examples. For one thing those games are ancient! get with the times! secondly, could they say the same about Sim City or The Sims? Yes, making my sim call-up a blind date will cause me to bring grenades to school on monday. I see the connection, no really.

Setting aside trigger-happy gaming for a moment, the Wii has managed to appeal to baby boomers and has advanced cultural and social acceptance of interactive entertainment more than anything else has in decades. People of all ages enjoy playing Wii fit, it's fun, and it stretches those creaky joints. The Sims, Civilization, Sim City, and other relatively tame simulation and strategy games are in-fact the best-selling PC games (2007 ESA survey). Guitar Hero fans have picked-up a real instrument with either aspirations of stardom or casual enjoyment. Simulation-type games have been used to help unemployed people learn about various industries and the results have been very effective in helping them find jobs they enjoy. We seldom hear about these, and even these barely scratch the surface of interactive media potential.

There is woefully little serious video game research being conducted. We just simply haven't discovered what games are capable of. We just don't know. There are some obvious facts: games are serious having the capacity to enslave the mind, but so do a myriad of other things, but we don't throw them out as altogether innately evil. They must be treated with care but not necessarily rejected just as a credit card with no limit must be treated the same. Video game addiction is not psychotropic drug addiction, there are no chemicals involved other than entirely natural dopamine produced by the brain. It is not enough to simply blame video games and be done with it just as so many people need a devil to blame their problems on. The individual has complete control here. The gamer needs a little self-discipline with this powerful medium and learn to use it instead of allowing himself to become used by it. And consider this: what's to prevent a young gamer from escaping to video games to cope with confused and detached parents fearful about him playing video games?

There is some research and evidence in support of video games. They stimulate neural growth especially at young ages. They foster complex problem solving skills, quicken reflexes and hand-eye coordination. Better hand-eye coordination can lead to increased enjoyment of playing sports contrary to the popular belief that video games lead to a sedentary lifestyle. This 2005 survey of 200~ college students indicates that video games are forging very capable and competent leaders able to think outside the box and tackle problems aggressively and proactively. One caveat: it alters one's way of thinking that is unpopular with educators and not very compatible with traditional methods of education. Consequently gamers typically have lower grades than non-gamers. Overall there is strong evidence that playing video games can have a very positive impact on the individual.

If you have a problem with gaming then here's an invitation for you to get acquainted. Pick-up a game and discipline yourself to play it everyday for one week. During this exercise focus on wrapping your head around how a typical gamer thinks. If you understand how gamers think, you will better know how to help them see things from your point of view. Otherwise without having a clear and balanced perspective, one will only contribute to the frustration and confusion. Eventually gamers will out-number non-gamers and we will see gamers, who are socially responsible and lead balanced lifestyles, move into positions of power. Eventually we will figure out what this interactive medium is capable of and learn all about the human relationship with interactive media.

Innovation 101 for the Perfectionist

30-Jun-2009

To innovate is to introduce something new to the public that is usable and meets a perceived need. This article will dispel some common misconceptions that the perfectionist may have with his approach to design. Since women are perfect anyway and have this area covered, I will only use masculine pronouns. Actually it's for the sake of brevity. As someone who IS a perfectionist, this article is written for other perfectionists just getting his or her feet wet with this stuff. I suppose you could say that a perfect design goes hand-in-hand with proven innovation. But as the perfectionist I cannot say I've always shared this perspective and it's difficult to get over it. Hopefully this article can dispel some of the myths holding the typical perfectionist back from greater success.

MYTH: Perfection in design is necessary.

The fundamental reason perfectionism fails as a good approach to software design is because perfectionism does not factor human error into code design. Human error is inevitable and "nobody's perfect". To expect consistent and perpetual perfection from yourself, your team, or even your customer's requirements is unrealistic. Therefore, the perfectionist must accept human error as a valid and acceptable factor in the design process. It is haphazard to turn a blind-eye to this variable. Expect loose-ends to come from anywhere. Probably the most dangerous assumption to make is that the customer has a good understanding of his or her business challenge. In the end trial and error is inevitable to evolve and refine a product.

MYTH: Perfectly designed software leads to a quality product

How do you measure the quality of a product? Simply by how accurately it accomplishes its original purpose, which is defined by the customer's requirements. But how often are those requirements written in stone? How often can we say that they everyone understands those requirements perfectly? The perfectionist operates with these assumptions. Furthermore, it takes too long to develop a product that is perfect. Since software requirements change quite frequently, the result can be a lot of wasted time on a polished but irrelevant design. If you can't get it out the door, if it's incomplete, then it is far from perfect. One key characteristic shared by all of the most successful products out there is that they each do exactly one thing very very well and that has been attained through multiple iterations of trial and error.

MYTH: A perfectly designed product is highly maintainable.

Since details of implementation are often up to one individual programmer, his perception of perfection may be way-off the mark to someone else. Furthermore, code that conforms to standard OO design patterns religiously is not maintainable because it proves to be an overwhelming task to trace program flow through a deluge of, however highly cohesive and decoupled, unfamiliar classes. An experienced developer knows that any algorithm that can fit conveniently into various design patterns is common-place and more than likely already in a reusable framework somewhere anyhow. For the rest of your intellectual property, you yourself know that it doesn't fit conveniently into standard design patterns. Moreover, you know that your product isn't innovative because it is common-place. And so it becomes necessary to bend the rules while ironically keeping in mind other developers who might have to look at it later.

Home  |  Contact  |  About  |  Services  |  Products  |  Clientele  |  Portfolio  |