Modelling for Testers

In this post, we’ll go on a short journey through the concept of formal modelling and how you can apply modelling techniques to help you test more effectively.

Firstly, what is a model?

Traditionally used in mathematics and science, a model helps communicate and describe something. A very basic model that we’ve all come across is the one used to calculate the approximate volume of any cardboard box width height length. If we stick some random numbers in to the formula, the output is the volume of randomly generated, virtual cardboard box. If we take a cardboard box and measure each dimension, we can calculate the approximate volume of a real box – that’s pretty useful, especially if you’re a company like Amazon!

Screen Shot 2018-07-31 at 16.03.00

Models don’t have to be formula, they can be diagrams; information flows; or physical objects. It took me a little while to make the connection, but that Airfix kit you might’ve built as a child (or grown up), is called a model because it is a model of the real thing.

Using Provided Models to Aid Testing

Not only are there different ways to model something, there are also different uses for them. Requirements are written models that describe the expected behaviour of a system. This type of model is used to communicate how something should work, so that it can be made. These are really useful models for us as testers as they give us an expected outcome. Remember that Airfix kit, well here’s a formal model that you use to help build your physical model.

Screen Shot 2018-07-31 at 18.11.11

We can test the model before a line of code is written, by asking questions of the model and testing its limitations. By doing this early, we can extend the model to be more specific or we can help correct the model where it is deficient. This helps increase the likelihood that what we build, is actually what was required.

When it comes to testing the output of the requirements, we can reuse those models to determine if what we built, meets the expectations of the model. Traditionally we might call these Test Cases: for a given requirement (formula) and set of data (dimensions) we expect the output of the software to be deterministic (volume). This is what we testers call checking and is a prime candidate for automation (but that’s a different topic!). It may even be better still to use this as a way to drive the writing of the system’s code – I’m sure you’ve come across TDD and it’s many variants.

Creating Our Own Models to Aid Testing

As a tester, even if you hadn’t made the connection yourself before, you should now realise that you’re using models all of the time in your daily routine. Models provided to us are great for testers, but modelling techniques are even more useful.

You may ask, if the models have been provided for me, why would I need to model anything myself? Great question! As testers, our primary responsibility is to provide information about the product, focussing on the risks to its value. If the scope of testing activities that we partake in are guided only by the models that we’ve been provided, we run the risk of only reporting on what we thought we knew and not the actual behaviours of the system.

To counteract this risk, we testers can look to use modelling techniques ourselves, to explore and describe the system as it is, not just what it was supposed to do. Here’s a really simple modelling exercise that you can do right now.

Now It’s Your Go!

Firstly, pick a web site to model against. If you work on one, pick that because everything you do in this example will add real value to your testing efforts. If not pick, something relatively simple, maybe your favourite online store.

Next, pick a part of the site to model. Keep it a single page for now, so if it’s an online store, use the search page or product details page and use the mobile version as it’ll reduce the work we need to complete. I’m going to pick Alan Richardson’s Blog to demonstrate the exercise.

Head over to your chosen page in Chrome and once it’s finished loading, open the developer tools (instructions are here) and click the network tab. If they’re not there already, add the Domain and Method columns to the network table.  Order the list by Domain either descending or ascending.

gif_addDomAndMethod_mid

Clear the log in case there’s anything in there already, then refresh the page. This will list out all of the network calls that the page makes client side and it’s these we’re going to model.

In your favourite diagramming tool or on a large piece of paper, stick the page we’re modelling in the centre. I’m going to use a mind map for now.

Head back to Chrome and take note of each of the domains your page calls out to. Create a node in the diagram for each domain and that’s the first part of your model finished.

Screen Shot 2018-07-31 at 17.32.52

You now have a useful, visual record of all of the domains that your site calls out to and you can sit with your friendly neighbourhood developer or architect to determine if there’s anything that looks odd.

A More Useful Model

So that exercise didn’t take too long did it! The great thing is, it won’t take much longer to make it even more useful. Follow these steps to add to your diagram and make that conversation even more interesting.

For each domain you’ve added to your model, refer back to the network tab in Chrome and make a note of the Type of request made and the address up to any query. You may find it’s useful to group some types: images and gifs are a good example. You can also see from my example, I’ve called out that there’s a redirect because I’m using the mobile view in Chrome.

Screen Shot 2018-07-31 at 17.38.08

Don’t stop there, keep adding more information to your model. There’s lots of information that you can add to spark off a conversation. Here’s some ideas on how what you can add to your model as you explore the software even deeper:

  • highlight a domain or address that you don’t recognise for questioning later;
  • call out a request to the same address requesting the same payload more than once;
  • annotate with the size of a response if it falls outside some boundaries
    • a really small response (single bytes)
    • a really large response (varies but something over a few hundred KB is worth questioning);
  • a response that takes a long time, start with anything >300ms (set the Waterfall column to show Total Duration);
  • a response that returns a 4** or 5** error code;
  • any further redirects
  • or if the response body mentions an error.

Now what?

That’s entirely up to you! The important thing to remember is that whilst you’re building this model you’re actually exploratory testing. You’re learning about the product and you’re questioning it. In this simple example we’re only questioning the various calls that the product is making client side, but they are good questions.

Amongst many other examples, this specific activity can help find performance issues through:

  • responses that take too long or are blocking
  • responses that are repeated
  • requests that are pointless and can be canned

It can also help find security issues

  • if you’re making calls that should be https but they’re not
  • any redirects that are happening that you don’t expect
  • any other calls that don’t make sense or may be sending information that they shouldn’t

You can use the model to decide which calls you should manipulate to help you understand how doing so might impact the user experience:

  • try to block a call
  • slow a call down
  • intercept a call and manipulate the response

And we’ve only covered client side requests. By using the same technique, you can take a capability and slowly break it down, level by level, building your knowledge of the product.  Share your knowledge with your team to help identify problems and then use it to give you a ideas on where to focus your testing based on dependencies and areas of brittleness.

I haven’t even mentioned how you can use these models to confidently share your test coverage, that’s a post for another time.

We have no evidence to suggest we shouldn’t release

Screen Shot 2018-04-27 at 16.03.34

“We have no evidence to suggest we shouldn’t release.”

I sometimes use this statement or a variation of it, when asked if something we’re testing is ready to ship. For me it succinctly describes the current state of our knowledge of the software under test.

A longer form could read

“We have completed all of the testing we could think. From that testing, no new information has appeared which suggests that we should not ship, based on the goals and risks associated with it.”

If the person who asked the initial question about our release readiness is interested in our testing efforts and their value, I’ve found that the statement becomes a good conversation starter to go deeper.

The statement is designed to suggest that we’ve done all we can within the allotted time. It also leaves room for unknowns which may impact the software should we release.

These unknowns could be categorised as:

“We haven’t been able to describe the unknowns sufficiently, therefore allowing them to be considered tangible risks and therefore ‘known unknowns’.”

or

“There are known unknowns i.e. gaps in our application knowledge that have been identified, but for some reason or another we haven’t been able to fill through testing.”

The language that we use is often a product of our state of mind. Acknowledging that it is impossible to know everything and therefore we’re unable to offer an absolute, I chose to offer the next best thing that my language skills allowed.

As long as we can back up the response with the testing efforts so far completed, and/or have not completed if the time for testing has expired, I believe that this answer is an ok response in this context. It leaves everyone on the same page, sharing the uncertainty that comes with complex systems and allowing the decision to release to become a group decision.

But why do I still feel a little dirty using it? James Bach terms it ‘safety language’ and that’s what it is to me. It’s a nod to the uncertainty of software and that we can’t know everything. Having said that, part of me feels like there may be a better alternative.

Firstly, the English is appalling! A double negative would be an instant fail in many an English exam and because of this, it leans towards being cryptic, as if it was meant to catch people out.

It was not designed to be cryptic, but when spoken in a particular way, it can definitely be made to make most people raise an eyebrow and ask for it to be repeated slowly.

If we were to take out the double negative, alternatives could be

“The release can go ahead because we do not have sufficient evidence to stop it.”

Or

“The evidence we have gathered suggests we can release.”

Both seem ok on face value. On closer inspection, the first option has broken my rule on deferring the decision to the appropriate owner. If I were to use option 1, I would feel like I’d just made the decision to release and that’s not something I think Testers and test teams should do.

Option 2 fairs better on that score. We’ve qualified ‘can’ with a level of uncertainty in the use of the word ‘suggests’. But to me it still feels a little too much like the tester is making the decision.

Depending on your relationship with your decision makers and the culture of the team, option 2 might come across as a better alternative, one that is phrased without a possibility of being misunderstood as deceptive.

“What other alternatives are there?”, you ask. How about this:

“We have/have not completed all of the critical testing we can. With more time we may/are unlikely to, find a reason not to release.”

This isn’t as concise as our opening gambit, but is it better in other aspects?

We’ve gone from talking about a lack of evidence to making a statement on what has or has not been tested. We’ve also covered the level of value of the testing, by focussing on what is critically important.

We’ve also described a possible future situation whereby if we were to continue testing, we believe new risks would or would not be uncovered.

Finally we’ve also finished off with a refined version of our passive yet positive outcome, that the release could go ahead, but still without being explicit and appearing to make the decision ourselves.

I’ll give this a go for a while and share my experience.

It would be great to find out what works for you!

UKStar 2018 In Retrospect

It has been a while old friend

It was with a stroke of luck that lead me to attend this years UKStar conference. Our Principal QA had won a ticket through a Guy Fawkes competition and then unexpectedly was unable to attend. When he offered out the ticket to the Leads group I jumped at the chance to take it.

It’s been a long time since I had attended what I would call a traditional testing conference. I was working in video games at that point in time and a lot of what was talked about didn’t seem particularly relevant. The speakers and their topics felt a little insular, less about testing and maybe a little more about us and them. I don’t remember the exact question or the response, but I do remember asking a question with a games perspective and being told I was wrong. Maybe it’s that feeling which is overriding the view of the rest of my time there.

With that said, I’d love to go back in time and attend again to see whether or not I’d feel the same about it now.

After that brief bit of history, I’m incredibly pleased to say that my opinion of UKStar couldn’t be more opposite. Both days were invigorating experiences that have once again given me a buzz to work within the testing community. It was amazing to meet a whole bunch of people who are passionate about the area I’ve been lucky enough to have had a long and exciting career in. As well as thanking the celebs of the testing world, who were happy for me to come and say hi, I need to send a massive shout out to James Lyndsey for coming over at the beginning of day 1 and saying hello.

I think I am a typical introvert. I get incredibly anxious introducing myself to people who I don’t know and especially so in big crowds. Situations where, even if I have met you before but I’ve forgotten your name, can be equally as awkward for me.

I’m pretty sure that had James not come over and said ‘hello’ right then, my enjoyment of the rest of the conference might have been slightly more muted.

Pick of the talkers

I thoroughly enjoyed all of the talks and workshops that I attended and with that I offer another massive round of applause to all of you. You were brave enough to put yourself in front of us and deliver with high quality – bravo!

Two talks stood out for me, not just because I find their subjects fascinating, but more importantly, because both speakers were individually inspirational. Dorothy Graham and Isabel Evans were absolutely the highlights of my two days at UKStar. Two incredibly intelligent women, owning the stage and engrossing every single one of their audience members. I sat in awe of Dorothy ending her talk with a song! That’s one incredibly high bar that’s been set with those 60 seconds alone.

Then Isabel’s day two keynote kicked off the day with an explanation of individual behaviour and group hierarchy’s which was delivered with the panache you’d expect of Stephen Fry hosting the BAFTAs! And just as I thought that couldn’t be topped, minutes later it was! Isabel referenced Morgan cars and double dip clutches in her tools and automation focussed workshop!

And then it’s over

All too quickly the two days ended and we head back to our normal working lives. Some, more sensible than others, headed off to TestBash Brighton to keep the momentum rolling and I’m looking forward to hearing and reading about their experiences there.

For me, I’m still digesting my own experience at UKStar and trying to find opportunities in my role to slip in my learnings. Firstly, over the coming weeks, I’ll be condensing the two days in to 30minutes, to present back to my team a little more detail on my experience, hoping to inspire my friends and colleagues to head off and join in with the testing community in the near future.

We’re also going through a little change in our team at present. The thoughts, insight and encouragement from the presenters at UKStar will remain fresh in mind to help me help the team make our change as smooth as possible. Leading to hopefully an even more successful outcome than what was originally possible.

You never know, at UKStar 2019 I might be lucky enough to share the stage with some of you and talk about it. We’re definitely implementing some interesting ideas that I’m quite excited about and may be of interest to some of you.

Thanks again UKStar for a great two days, I can’t wait for UKStar 2019!

My UKStar Itinirary

Day 1

Keynote – Growing a Testing Community of Practice & Navigating ‘Traditional’ Mindsets, Christina Ohanian

Workshop – Mapping Cognition to Software Modelling & Testing, Alan Richardson & Dr. Gustav Kuhn

Testification – Learn Testing with Gamification, Nicholas Hjelmberg

Blockchain Applications and How to Test Them, Rhian Lewis

The Testing Swing, Laurant Py

Experiences in Testing Infrastructure Projects, Jesper Ottosen

Testing Skills I learned Playing Dungeons & Dragons, Magnus Petterson

The Testers’ Three C’s: Criticism, Communication and Confidence, Dorothy Graham

Keynote – The User Illusion: Why Magic Works, Dr. Gustav Kuhn

Day 2

Keynote – Leadership, Fellowship & Followership, Isabel Evans

No More Shelfware – Let’s Drive!, Isabel Evans

Testing Through the Log File, Johan Sandell

Talking About Talking About Testing, Richard Paterson

Conversation – If the Universities Won’t Help Us, How Do We #MakeATester, Simon Prior

Conversation – Daring to Provide More Value Than Simply Testing, Joel Montvelisky

Basic Pathologies of Simple Systems, James Lindsay

Keynote – Connecting the Beats with the Bytes, Frank Wammes

Why I like Testing in Production

Preamble

This post is not a internal environment vs production environment face off nor is it an attempt to convince you to change your beliefs on testing in production.

It is an attempt to show, that testing in production can be another tool in the testers arsenal, to be used when the context fits and if the team you’re working with have the capability to do so now, or the desire to build that capability for the future.

If it does encourage you to investigate what testing in production might look like for you, share your story, I’d love to hear it.

But you should never test in production…

There is a school of thought which prescribes that testing of an application should only be completed in internal integrated environments.

I’ve recently seen a post discuss control – that new processes should never be tested in production. I’ll often hear about increased risk to the business and customers through shipping unfinished features. Occasionally someone will accuse me of treating our customers as guinea pigs.

Not forgetting data, I’m told that there’s a risk that my testing will be skewing production analytics for customer engagement, account tracking and stock levels if I’m testing in production.

These are all valid risks in their own context and each introduces varying degrees of impact should they realise. There is no wrong in any of these arguments.

Where would you prefer to test?

Ask yourself, given zero risk, would you ever test in production for any reason?

My answer for this is, given zero risk, I would test everything I could in production. I would test new features, integrations of new features with old features, integrations of my application with other applications and dependencies. I would also conduct all of my non-functional testing in production: performance, load, security, etc. Why would I use an approximation of the live environment if I could use the real thing?

But of course zero risk doesn’t exist, so I’m going to take my utopia and start to break it down until I find a level of risk that is suitable for the context in which I would like to test. As part of that exercise, I would need to be clear on what I mean by testing in production.

I define testing in production to be an encapsulation of two distinct states.

  1. Testing in production of an as yet un-launched, hidden version of the application that customers cannot see or use
  2. Testing in production following the launch of a new version of the application to customers

Both activities offer their own value streams but solve very different problems.

Everyone can benefit from and should think about spending some time with number 2. Your application is live, your customers are using it. Assuming you could learn something new about what you’ve already shipped or even test out some of your earlier made assumptions, why wouldn’t you want to do that in production? Run a bug bash in production, keep it black box (only things customers can do) if you’re particularly worried about it and observe. You may find something that’s slipped through your net and if you do, you’ve proven its worth.

Testing hidden features

It’s option 1 that I find most interesting. I’ve recently read an article introducing testing in production from the Ministry of Test Dojo – Testing in Production the Mad Science Way The article discusses two distinct models that you can implement to provide you with the means to test in production.

We’ve implemented a variation on the circuit breaker method referenced in the article. In doing so, we have the ability to use feature flags to determine which code path the application should go through and therefore, what behaviours the customer has access to.

In its default state a feature flag is set to off. This means that the customer sees no change despite the new code having been deployed to production. When it’s there, our circuit breakers allow us to turn features on client side. This means that testers can go to production, set the feature flag to on for the feature they want to test and are able to happily test against production code for the duration of their session. Once testing is complete and the feature is ready to go live, we can change the configuration of the feature flag for all customers, safe in the knowledge that we can turn it off again if something were to go wrong. The deployment of configuration is quick and we have two mechanisms to permeate the rollback to our customers – either slowly as sessions expire or we can force them through on their next page load. When rolling forward we only do so as sessions expire.

In shipping this changed code we make the assumption that we’re able to determine whether the introduction of this new code has not had a detrimental impact on the current feature set and customer experience. We derive this confidence through testing during the development phase of the new feature and through our automated checking suite which runs in our pipeline. We also have a third line of defence, a set of automated checks for our core journeys, created by a central team who own our path to live.

This mechanism takes time to mature and we’ve definitely made a few mistakes along the way. With perseverance we’ve been able to ship fairly large changes to our production site, with no impact to customers, test those changes and then turn them on when we’re confident we’re ready to.

Whilst we can mitigate concerns such as customer impacting stock levels by being careful to only use non-low stock items, there are still some structural areas which we do not test in production such as peak load tests and updates to our feature switch and rollback/roll forward mechanisms. Anything else will be considered on a case by case basis, discussed during 3 Amigos within the team(s) and agreed on before actioning.

My thoughts

For some contexts, I prefer testing in production over testing in internal integrated environments because it provides me with these key benefits :

  1. The likelihood of my testing being blocked by an issue with a dependency is greatly reduced
  2. The data is at peak scope, complexity and usefulness
  3. Any bug that I find in the application under test is an actual and real issue 
  4. Any issues that I find with the environment will be having real impact to our customers and/or the business

In my experience, these benefits derive from flaws with the practices put in place to build and support internally integrated environments. 

Internally integrated environments do provide their own benefits. There are scenarios and processes which I would be reluctant to test in production and I’ve outlined some of those above. This article also does not discuss inside out testing techniques such as those on code – unit tests and component tests.

How a Spanish joke reminded me of Software Testing.

Before the hustle and bustle of the work day set in, a colleague of mine and I were discussing a picture someone had drawn on a post-it and left on his monitor.

Screen Shot 2017-12-08 at 09.11.22

The message was clear, someone wanted to thank him for his efforts recently, which is a lovely gesture. What wasn’t so clear was the little picture in the bottom corner. Within the context of the message it was easy, the picture represented someone in the water. However, take the message away and the picture becomes less clear. I described it as one of those balsa airplanes with a rubber band propeller (indicated below), flying above the clouds.

Screen Shot 2017-12-08 at 09.18.12

This led my colleague to introduce me to a wonderful joke that he was told when he was growing up in Spain. The joke goes like this:

Teacher: “Listen carefully: Four crows are on the fence. The farmer shoots one. How many are left?”
Little Johnny: “None.”
Teacher: “Can you explain that answer?”
Little Johnny: “One is shot, the others fly away. There are none left.”
Teacher: “Well, that isn’t the correct answer, but I like the way you think.”

Little Johnny: “Teacher, can I ask a question?”
Teacher: “Sure.”
Little Johnny: “There are three women in the ice cream parlor. One is licking, one is biting and one is sucking her ice cream cone. Which one is married?”
Teacher: “The one sucking the cone.”
Little Johnny: “No. The one with the wedding ring on, but I like the way you think.”

In the first part, the teacher’s context was that of an arithmetic problem. The teacher was likely expecting Little Johnny to say 4, but Little Johnny may have been considering a less abstract interpretation of the information provided. Little Johnny hears the word ‘shot’ and his internal model converts that to a gun, a loud noise and the jittery nature of birds. In that context it’s perfectly understandable that Little Johnny gave the answer he did.

What was the trigger for this misunderstanding? I think it’s caused because the teacher, unknowingly at the time, left their question ambiguous in the face of their audience. This is something Little Johnny takes advantage of in the second part, setting a trap for the teacher to lead to the inevitable punch-line. Little Johnny provides very specific information to describe the scene, building a picture or model in the head of the teacher and in doing so, introducing biases of perspective that the teacher will use to answer the question. Little Johnny then comes in left field with a question, which the teacher will attempt to answer with the specifically limited information Little Johnny provides and laughter ensues.

Why is this an important lesson for Software Testers?

Each and everyday, the life of a tester is filled with information that is provided either knowingly incomplete, or more dangerously, obliviously incomplete.

It’s very much the role for us Software Testers to remember Little Johnny and the teacher when we’re communicating across the disciplines of the teams we work with. As we learn what it is that the product is expected to do, we must force ourselves to remember that it is very likely that our interpretation of the information is incorrect or incomplete based on our own biases and perspective.

We can counter this by asking questions, even if we feel like it might come across as dumb to do so – I’d almost argue that this is exactly when we must ask questions.

The 5 Why’s technique is a useful tool for understanding the primary objectives and drivers of an activity. With it we can challenge the assumptions made by ourselves and others and take them from the world of implicit to explicit.

Specification by Example is another technique that is practised throughout the software industry to provide a consistent language to describe behaviours and expectations, however I find that it’s rarely used to its full potential. Yes GWT scenarios can provide a suite of regression checks but the real power is in the conversation that can be had between a group of people to again, make the implicit, explicit – this will be the subject of another post, so keep tuned!

Even if we think we have a complete picture, the reality will be that we don’t. Rarely have I met anyone who can keep a whole network of related systems, dependencies, contracts and expectations in their head or even down on paper, in a sufficiently useful way, to remove any risk of misunderstanding or gaps in understanding.

That’s why for us Software Testers, our most useful tool can be our ability to explore the landscape in front of us, with a specifically chosen context, to build up a more complete understanding of the actual with respect to what we think we know.