This post is about my project’s experience implementing Acceptance Test Driven Development of our web applications, the process around automating our Given, When, Then (GWT) specifications, the path we took, problems that surfaced, and how we dealt with those problems. The motivation for automated testing was a corporate mandate. Most of the company’s previous applications were services APIs and I expect the main goal was to gain a higher level of confidence with those APIs.
Below you will find a section that provides basic familiarity and more resources.
At the onset, most team members (two development teams with 4-6 developers and 2 QA each) had no experience with Gherkin, Cucumber and/or Protractor. We started out with a proof of concept on how automation of the GWT would technically work. Then we setup exercises to be used to familiarize all the teammates. Once we got rolling we hit several hurdles, both technical and procedural, but we kept pushing forward. Now we’re about 9 months into our ATDDand I believe most team members have come to realize the value in automation. Some of the most notable benefits are less time spent on regression testing, more time spent on exploratory testing, and the developers are much more involved with the quality of the application.
Adam Miller and I prepared exercises to cover the process. There were three basic parts to the exercises: 1) writing Gherkin, 2) writing Step Definitions using Protractor1, and 3) Actual feature implementation. We picked a couple of features and created the Gherkin. Then we wrote the associated Step Definitions. We used Protractor to interact with our deployed site. The Gherkin and Step Definitions generated for the exercises were used at the beginning of each part of the training. We did this in case any team was not able to fully complete a part then all the teams would be on the same page when we moved on.
There was a 3-day training to get all team members aware of the terminology and the flow of ATDD. Of the three days we had 1.5 days for the workshop portion (there was some Agile/SAFe2 stuff thrown in at the beginning). At the training the Product Owner went through the feature stories as is normally done, describing how the feature was supposed to work and look. We broke into small teams made up of developers and QA. One team member per team was familiar with the examples so they could help guide the team. Then all the teams worked through a specification workshop to establish the GWT for the same story. The output of this workshop was our Gherkin to be used to guide our development of the feature. We walked through every team’s Gherkin to discuss how the good/not so good and otherwise ask/answer questions. During the next step we automated the Gherkin via Step Definitions using CucumberJS(basically uses regular expression matching against the Gherkin to run a specific piece of code). Armed with our Gherkin and Step Definitions we then wrote the code to implement the features. The features were somewhat simple yet valid and they were implemented in the actual code base. Once again we took a look at what had been done by all the teams and shared our thoughts and asked/answered questions.
In the end everyone had participated in the process and there was a good feeling. At this point we really had only two guidelines:
- Don’t test that everything is on a page by default (i.e. Don’t test that every input field, label, etc… was on the page). The argument against this was that scenarios existed within the Gherkin that would use those inputs so they would fail if the input didn’t exist. However, if some action removed some button or disabled it then test that scenario.
- Don’t test things like “This button is on the left and this button is on the right”. Determining the location in the browser would be too costly to accomplish vs. the value that would come from it.
You should squash expectations of this running with each build, assuming you’re using Continuous Integration (CI). Automated tests against the browser will take quite a bit of time. You could tag a few scenarios/features to run with each build but I’m not sure I’d bother. Your developers should be in the habit of running their tests as they’ve tackled each scenario and running the entire Protractor suite before calling that feature ready for code review. We currently run the scenarios on a daily basis.
We started immediately having Specification Workshops (during sprint planning) to establish the Gherkin up front. It became extremely clear not everyone was on the same page as far as our goals for testing. There were many discussions over testing the 2 things we said we weren’t going to test during the training. We had lengthy and regular conversations of how to handle those tests that will always need to be tested manually (e.g. resetting your password requires you to go to your email and retrieve the password, likely too much work for an automated test). Then we started a “re-design” iteration where we did not do specification workshops. We were generating Gherkin based on how a particular feature worked. Feature files were popping up for the existing features that the re-design touched without any bindings. This is sort of ATDD without the automation.
Initially we planned to spend a couple of story points each sprint implementing the Step Definitions to avoid derailing feature development. This did not happen. The Step Definitions were being left to the end and never got completed. There were also a few two-week sprints where we focused on the re-design of existing components. Once the re-design sprints were over we entered a couple of sprints of “hardening” where we focused on bugs, implementing Step Definitions (aka bindings), and clearing up other technical debt. As suggested earlier, many feature files had piled up without bindings and the feature files were in disarray. We were not following the same grammar, not all of them were human readable (there were selectors in the Gherkin), and some of them were attempting to test things that would not be valuable. The state of the feature files slowed our bindings implementation efforts. Much of the Gherkin was completely re-written.
Where we are now
Once we got rolling and more people became comfortable with writing Gherkin and step definitions we were able to move much more quickly. We still learn new things and update our process to fit the new information. As the number of already implemented bindings grow you can implement entire Gherkin scenarios with little effort (you may only need 1 or 2 new steps for an entire set of scenarios). The review process has become streamlined as the QA and developer work together to generate the Gherkin before development begins (but still not during planning). There is also more up front clarification of what it means for the feature to be done than before the ATDD process began. The daily communication between the developer and QA has increased (which is a good thing). At this time we don’t allow a story to be called Done until these bindings have been completed (and passing). One of the main benefits of all this is the time it takes for regression testing of our application has decreased by at least 60%.
There are a number of things that you will need to accomplish during this process. Using grunt tasks we set up different environments to run against (localhost vs deployed version of site). We set up reporting so that we could generate an html report of the specification output (we’re not currently exposing this, it just runs locally).
Our tests are currently not being ran against any browser aside from Chrome. There are ways to configure Protractor to run in other browsers as seen here. A list of third-party drivers can be found here.
We tackled some major risk up front as far as using Mockjax and Protractor’s executeAsyncScript to short-circuit some common items such as logging in/out, clearing local storage, and getting the appropriate response from the server when necessary. For example we are fully testing the login form, but in all the scenarios that we need to be logged in we didn’t want to have the app go through the UI to log us in. So executeAsyncScript was used to short-circuit that process. Mockjax was used to force the server to return no items. We try to mock server responses sparingly as it doesn’t provide a true state of the environment.
We haven’t quite addressed cleaning up after ourselves after making changes to data that causes false positives. We do have plans to do so via services exposed to completely remove a particular user.
What I would do differently
- I would start with spending way more time on the team’s understanding of Gherkin. Establishing a clear grammar for how things will be referred to in the Gherkin (i.e. text box vs. input box, “Choose to Submit” vs “Click the Submit button”).
- In the beginning, during backlog refinement (planning) I would only workshop (create the Gherkin) for one story a sprint until the team became proficient. People don’t like meetings. Some people don’t even like useful meetings. So start small to increase participation. Then review the Gherkin and Step Definitions as a team (developers, QA, and Product Owner).
- A story that gets worked that has Gherkin also gets Step Definitions implemented.
- Developers should not move on to another story until the current story is Done! Done => Developed (that means unit tests, Gherkin, and step definitions), QA’d and shown to the Product Owner for approval.
Is Automated Testing worth it?
Hell yes! Not only are the QA folks now spending less time on regression testing, the developers and QA have more interaction early on and throughout the development of a feature. The specifications are also now in a place that is accessible by developers as well as QA. Bugs have a tough time recurring because we write a feature test (and unit tests of course) as we fix them. Large and controversial refactorings now are quite noticeable since changing a bunch of markup ids and class names is likely to break lots of scenarios.
ATDD has many purposes. It can serve as a guide to what your feature must accomplish to be complete. This helps avoid gold-plating. The Gherkin could be used as documentation on how to use your site. ATDD also reduces bug recurrences and at the same time reduces QA time of clicking through all the things during a regression test period. Here is a great video from Uncle Bob Martin on Automated Acceptance Testing.
Cucumber is a tool (in our case CucumberJS) that will take your Gherkin and run code for you (using regular expression matching). Within Cucumber, in order to reach a browser you typically use the Selenium WebDriver or some implementation of it (in our case Protractor) to navigate a site and interact with it.
Implementation of the selenium web driver
Example of Protractor code used within cucumber:
You can find an example of a Step Definition in the Resources section although this is not what we used in our exercises.
Not necessarily promoting SAFe, just saying it was discussed before we talked about Gherkin, Cucumber, and Protractor.↩