a blog for software quality assurance engineers

Lessons reinforced through knitting


CAST this year had an interdisciplinary focus, including talks on how labor triage, music, and visualization relate to testing.  While there, I began considering how one of my personal passtimes could inform my testing: knitting.

Knitting passes through many of the same steps as a software project.  There is an initial design, a pattern.  There is a development process, and usually some sort of error detection.  Here are a few testing lessons that knitting reinforces (quite viscerally, for any active knitter).  Most of these are so deeply analagous that explanation of application is unnecessary.

1.       Test Early, Test Often (source appears to be John Dowdell, at Macromedia, for this oft repeated quote).  Review the design before you start, to make sure you understand where it’s going and you don’t see any points where the numbers don’t add up.  Typically when I knit a project I try to keep a running stitch count in my head and make sure the knitting decreases/increases add up.  If I reach a point where a four stitch pattern is repeated across 30 stitches, for example, I begin to suspect a problem with either my interpretation of the pattern or the pattern itself.  Either way, I shouldn’t start knitting until I’ve figured out how to handle the mismatch.  This is also standard procedure where I work; for most projects, development doesn’t begin until a tester has read the design doc to look for common sources of future issues.

 

Likewise, once knitting has started, testing continues periodically.  A dropped stitch will ruin patterns for the rest of the project, so it’s a good idea to count up how many stitches are on the needles occasionally.  Looking at the pattern (if any) can also be a quick way to find errors, since the human eye can quickly pick out inconsistencies intuitively.  Errors detected early are (a) easier to find, since there’s less fabric to cover, (b) cheaper to fix, since you don’t have to backtrack as far, and (c) less likely to cause additional compensatory errors.

 

2.       Some problems aren’t worth fixing.  More than once I’ve noticed an extra stitch late in a sock.  Reviewing the last couple of inches, I can’t find the source.  Clearly a mistake was made, but I almost never pull out the needles and rip back to the last point I know was correct.  It’s easy to understand why if you’re a knitter: an inch might mean an hour of work to fix a problem that will never be noticed by anyone but a another knitter.  When I test, it’s easy for me to forget this lesson: not every problem is a show stopper.

 

3.       Quality isn’t discrete, or linear.  In my drawer right now I have a couple of pairs of socks that are quite well knit (if I do say so myself).  The stitches are tight, the fabric is beautiful, the pattern was followed flawlessly.  But I’m wearing store-bought socks today.  In one key dimension of quality, they fail meet a critical requirement: the length of my foot.  They shrank.  Nothing else matters if the sock doesn’t fit.  There may be other people that can wear those socks, but not me (the key stakeholder, as it were).  So there is no quality “unit”, or at least none that has independent (linear) value; the other dimensions can never add up to a good sock.  One good bug can make an unacceptable product.

 

4.       New tools and techniques mean new bugs.  When I started knitting, I did it the way most americans start to knit: hands held out in front of me with the yarn fed from my right hand, throwing loops over the needle.  After a couple of years, I heard about a new, faster method: the continental style.  Yarn was held in the left hand, and picked through with the right hand needle.  No more throwing loops!  Less stress on the hands!  (Incidentally, like most software and knitting “innovations”, this is actually an older technique that’s been used forever… somewhere else).  I started trying to use it on my first sweater.  I’m pretty sure I restarted it five times.  The end result still doesn’t look right; I don’t wear it.  It turned out the new technique resulted in a different twist to the stitches, which had to be compensated for.

 

A year later, I uncovered another new (read: ancient this time, dating over two hundred years ago) technique using a knitting sheath tucked into the belt, which would steady the right hand needle for tighter stitches and, again, less stress on the hands.  I tried it on a pair of socks I was knitting for my father.  Six times I restarted (missed the deadline too), trying to get the size right for the socks.  The socks are still a little big.

 

New techniques introduced new kinds of problems.  In the long run, they improved my craft, but the first project had a much higher error count. 

That’s all for now (if you are a knitter, I’ll see you on ravelry.com as yurodivuie).  The conference was fantastic again; I was especially glad to meet many of my colleagues from around the world that I had already spent time with in BBST classes.  Next year is in Colorado Springs; I know I’ll be there.

The AST

I’ve been quite busy lately, partially in completing the Blackbox Software Testing Bug Advocacy Course (BBST 200a).  It was four weeks of grueling work.  I still wish I could have spent more time on it, though, since there’s a lot of value in the community of testers that gathers for one of the BBST courses.  It’s one of the advantages of being a member of the AST (www.associationforsoftwaretesting.org).

At this point I’d like to insert the obligatory plug for the convention in July (14-16) in Toronto; information can be found at the website.  I’m most interested in the keynotes by Jerry Weinberg and Cem Kaner, but there are many excellent presenters this year (and the participants tend to be some of the most “awake” testers I’ve met).  Hope to see you there.

More later on the infinite bug pool problem.

SDET, SQAE, Tester

The “testing” profession, as such, is fragmented.  Besides the multiplicity of titles, the endless debates on the skillset a “tester” should bring, there is the fundamental but unacknowledged divides between different schools of testing.  The certification debate is perhaps the most hilarious example of this; it takes on Seussian proportions in the about.com article on Quality Assurance and Software Testing Certification.

The article lists fully seven different vendor-neutral certification organizations, followed with an interesting piece of advice: “Those listed here… are a must-have for anyone considering an entry into the world of testing and Quality Assurance.”  Do I really need 7 different certifications to test?  Wikipedia adds four more programs to this list, but the article mercifully contains a few timely caveats: “No certification currently offered actually requires the applicant to demonstrate the ability to test software. No certification is based on a widely accepted body of knowledge. This has led some to declare that the testing field is not ready for certification.”

So how does one prepare to enter this field, where there is no common body of knowledge or agreed upon methodology?  The fact is that there is no universal answer.  I imagine it’s like trying to get a job as a “cook”.  If you walk into McDonalds, there’s a different set of expectations than at TGI Fridays, or at a 24-hour diner, or at a five star restaurant.  There really is no universally required professional skill; in some companies everything is prepackaged before it reaches you, so all you need to do is follow instructions (which may not even require literacy).  In another position, you’ll be expected to exercise independence, creativity, and problem-solving.  There simply are no absolutely necessary skills or facts applicable in every situation.  You must tailor your resume to the market segment you’re pursuing.  Certainly, there are generally useful skills, and there’s no excuse for avoiding professional development and training, but it requires personal discretion from every tester.

Too many people in our profession are offering advice like that found on about.com; not specifically regarding certification, but ideologically-constructed authoritative statements nonetheless.  As far as I’m concerned, there are only two authorities one should always consult: integrity and expedience.

Hidden Boundaries

Cataloging heuristic failures gives me a sort of perverse pleasure; to learn from one’s mistakes or oversights is a wonderful opportunity.  These are the interesting defects, the ones that you would never imagine existing prior to testing.  In this case, it wasn’t because of any brilliant decision on my part that the defect was uncovered.  And why not?

When I came in to test today, the program crashed as soon as I began processing test data.  I assumed it was because the component the developer had fixed last night was not, in fact, fixed.  I sent a cursory email with a log segment and moved on to other projects while I waited for a stable build.  The developer, however, assured me that it wasn’t anything he had changed last night (a simple “if” statement).  My response was an explanation of my heuristic: “it worked yesterday; you made a change; it doesn’t work today: ergo, you broke it… somehow.”

But he didn’t; at least not last night.  The developer continued to investigate and discovered that there was a feature I was not aware of: files were processed differently on even days than on odd days.  And even days were broken (referencing an invalid column in the database).  The “even day” bug had been introduced well before the testing window, but hadn’t been noticed until I understood the system well enough to isolate defects in the program from defects in my testing environment.   As soon as I heard the explanation, I realized two things: I never would have thought of that kind of bug, and I would have to post it.

Regrettably, there’s no easy solution for finding this kind of bug.  I saw the code when the developer pointed out the offending segment; inspection probably wouldn’t have caught such a subtle flaw.  I certainly don’t want to add “test on even and odd days” to every test plan (though I will for this system, in the future.)  What systematic method would detect this kind of problem?  If it hadn’t crashed the program, I might not have even noticed it.

But I expect there’s no good (simple) answer to the question.  One of the first exercises that I saw in James Bach’s tutorial on adult self-education for testers was a discussion of boundary testing that demonstrated the possibility of endless hidden boundaries in a program; traditional tests that try to separate fields into different classes are only as useful as the definitions of those classes.  We never came up with a good number of test to perform on that field; exhaustive testing may be feasible for the MASPAR example, but not for most situations I face.  What I have learned is that even the simplest heuristics can fail; hopefully I’ll be paying enough attention to notice when they do.

Tradeoffs in Tool Use

I don’t have fully automated tests, but I do use tools; some of these I have written myself, some were made by my company, and some were made by third parties.  Tool use multiplies the speed with which I can address some problems.  Unfortunately, it also introduces bugs that are not material to the product.

For instance, when moving to a new server I had to transplant many of my scripts, and I used the opportunity to translate many of them from shell to perl.  This transition was about half complete when I started testing my current project.  Midway through testing, the program stopped working, immediately after updating my build.  I started getting complaints from the database regarding missing packages.  My first thought was that the developer broke it and committed a bad build (using the generally powerful heuristic that if it worked before I updated, then it’s the developer’s fault that it doesn’t work afterwards, which I’ll call from now on the “borrowed car” heuristic).

I was wrong.  It turned out that the developer had actually fixed a component, which was now correctly noticing that the database was incomplete.  My scripts for building the database (which I had hardcoded to match the last project’s specs, as a quick fix during the transition to the new server) were at fault.

Now if I had hand built the database, one command at a time, I wouldn’t have had this problem.  But every project startup would take an additional X minutes as well.  It’s actually not possible to test software without some tools, and those tools will always have bugs, which may or may not interfere with testing.  For example, I have to have a computer, and it needs to run an operating system, but that system will always have bugs.

So testing is more akin to statistics than algebra in this regard.  Eventually one of these bugs in my tools will affect the accuracy of my tests.  So how can I ever be sure what I’m seeing is a real bug?  What if I’m nothing more than a Boltzmann brain, after all?  Time to study more epistemology.

Succession of Models

I’m currently on my third attempt at reading Marcel Proust’s Swann’s Way, the first volume of Remembrance of Things Past.  The first time: thirty pages.  The second time: two hundred pages.  I am now most of the way through the book.

Some styles of writing are more difficult to follow than others.  Proust is a genius, and he writes with a beautiful, digressive, obsessive style that meanders through a loosely organized narrative at a gentle amble.  It has taken me three attempts to acclimatize myself to the style, to enter his multipage paragraphs with appreciation for the detailed description and analysis of psychological states.  What other novelists pass over as a footnote, a suggestion, perhaps a brief interlude between events, he makes the subject matter.  The same passages I might read over quickly in other works, anxious to follow the plot, is actually the whole of the plot in Swann’s Way.  Essentially, it defeated my reading algorithm.  Twice.

At times, in testing, my algorithm is defeated.  A testing strategy fails to deliver useful information.  Early in my career, I assumed that these failures were the result of bad testing, inexperience, and foolish assumptions.  This is only half true.  Any new problem that is presented to us will most likely appear, on the surface, to be similar to an older problem.  However, failing to successfully apply an inappropriate testing technique can provide useful, deep information about the structure of the object under test.  This is part of the succession of models that we develop for the product.  Successful testing strategies reveal quality-related information, but unsuccessful strategies can reveal structural information that feed back into successful test design.

For instance, when testing a field in a recent product, I expected that invalid values would be marked as bad data before being sent to the next component.  I noticed that there was slightly different terminology in the specification for how this was being handled, but I ignored this as irrelevant since it did not fit my model of how bad data was handled (I assumed that the writer was poorly describing the expected behavior, which has a non-zero probability).  My testing showed that nothing was being marked as bad data before being sent; clearly this was not functioning as expected, but this component should already have been tested in a previous pass by one of my colleagues; the recent changes to the field must have disabled the logic.  However, instead of writing an issue, I questioned the evidence, and looked at the results in the next component.  Sure enough, they were being cancelled as soon as they arrived: the validation was being handled at a different stage.  Reviewing the specification revealed that this was, in fact, the expected behavior, although it did not map to any of my previous experience with similar applications.  I revised my model of the behavior and was able to design better tests (that did, in fact, reveal defects of a different nature).

In short, failures in our model (once discovered!) are opportunities to rejoice in discovering important structural information, much as only by failing to read Proust (twice) have I learned to appreciate his writing for what it really is, as opposed to what my expectations modeled it to be.

Art and Responsibility in Testing

The role of a tester is a difficult thing to view with objectivity.   Ideally, all that I do is present technical information about a product to stakeholders.  This role is complicated.

I have several classes of stakeholders: project managers, testing managers, developers, customer support, clients, end-users, and “the company”.  I can only provide information, in my setting, to the first three classes, although I am generally doing this with the interestes of the final four categories (with whom I hardly ever interact) firmly in mind.  The needs of these various classes can and do conflict, and there is no hard and fast rule as to the relative importance of each stakeholder.  The project manager has the final decision, but I provide the information that (hopefully) influences that decision.  So what information do I provide, and how?  This becomes more difficult when one is in the delicate position of disagreeing with the project manager over quality related decisions.

This is one facet of why I believe that testing is an art that cannot be reduced to a simple procedure.  The decisions made are, by their nature, subjective; there is no formula for balancing stakeholders’ interests.  I can present information in such a light that it seems unimportant, and probably not worth resolving.  I can provide a curt summary, which may not reveal the impact of a defect.  I can be exhaustive and manipulative, implying that this is a material defect in the product that would violate contractual obligations.  I can research the defect and present the exact location and resolution for a problem, or I can simply state that “X component doesn’t work”.  I can do any of these and technically fulfill my role as a tester; what’s more, it’s likely that most of my stakeholders will not notice the difference (since they probably haven’t given a great deal of thought to the latitude I have in my role).  These differences, which will inevitably result from my interpretation of the stakeholders’ needs and relative importance, will affect the quality of the product, but will not affect it in a consistent or objective fashion.

Calculus of Testing

Regrettably I’ve recently discovered a disturbing fact: once I have automated a task, I can no longer remember how to perform the task manually.

This realization hit me when I was forced to begin testing in a different environment, where there was no time to import and adapt my many helper scripts.  I spent half the project trying to remember how to execute common tasks without my elaborate support network.   I felt like Darth Vader with his mask pulled off, trying to breath without the machines (sorry, that’s a bit of a spoiler for Return of the Jedi).

There is always a tradeoff between different testing tasks; between preparation, execution, and documentation (and perhaps other categories).  The tricky part is that these different testing activities do not live in isolation; each of these affects the speed of which other tasks can be completed (and their effectiveness).  For instance, importing all of my scripts would have made the testing faster.  In fact, spending an hour in this preparation would have saved around three to four hours of testing time.  Similarly, I was aided in my testing by thorough documentation of a nearly identical project; the time spent in documentation paid off in faster testing later.

And if time were the only concern, these decisions would seem simple.  However, by not using scripts I was reminded of several important platform features that I had forgotten.  By reusing documentation, I may have missed problems in this project that I missed in the original project.

I’ve been thinking about these subjects more lately as my business makes a push for efficiency (and our workload exceeds our bandwidth).  Of course, there is a less comely side to these calculations: the more work I accomplish, the less likely it is that I’ll receive additional resources.

Continuing Education

I happen to have a history of changing jobs at fairly regular intervals, from banker to teacher to tester, most recently.  Part of what has kept me in testing for so long is the opportunity for professional growth.  By improving how I test, I never actually reach the point where I feel like I have “arrived” and start getting bored.  Here are a few of the things I’m doing to keep growing as a tester (in my particular context).

1. I’m studying Spanish so that I can provide better feedback on some of our bilingual applications.

2. I’m learning Perl, and applying it to developing better and more complete automation of some of my duller less efficient tasks.

3. I’m following about twenty blogs by fellow testers in the field.

I definitely think I could be doing more, however, particularly in studying more of the history and techniques of testing.  So how are you working to become a better tester?  Do you have a book you’d recommend?

The Microsoft Tester Center: Open(?) for Business

Matt Heuser over at Creative Chaos (one of about a dozen qa blogs I follow) recently posted regarding the new Microsoft Tester Center announced at STAR West.  His opinion was tentatively positive, but I’m somewhat skeptical.

Note the launch note on the front page:

At the Tester Center, our goal is to provide a community where software testers can share knowledge and learn from each other about testing, our day-to-day job functions, processes, the tools we use, and the various roles we play. As you look around the site, you’ll see videos, articles, blogs, and other information. With your participation this site could be the start of many a conversation in our Software Testing Discussion Forum, where you can join other test professionals to exchange experiences and knowledge. Additionally, questions you ask at the Software Testing Discussion Forum will help guide the type of content we look to create over time. I hope you participate in this community and share your unique insights into the profession of software testing.

This seems open and participatory, but the material presented looks more like a token effort at community that actually provides little content from outside the campus.  Scanning the website reveals a single paper written by Scott Barber, the vice president of the AST.  All of the videos are of MS employees (and all in the same room).  Even their blog page lists five MS employees only a single third party blog in the “other” category (also Scott Barber).  The ratio looks even worse on their tester biographies page.

On the whole, I’m deeply suspicious (as always… I am a tester).  I’m tempted to test the participatory nature of the site, and to that end I’ve already submitted feedback suggesting a few more blogs for their “other” category.  I suggest you do the same; I’ll post a followup if I see any results.