Jenkins Overhaul

In May 2018, Shotgun Software’s (now Shotgrid Software) Jenkins pipeline was much simpler than it is today, had far fewer stages, and was run serially—the actual Jenkinsfile consisted of only 48 lines! Tests fell into 3 categories: Ruby unit tests, Karma integration tests, and Selenium e2e tests.

Test Coverage at a Glance (May 2018)

Note: There are 16 Selenium tests pictured in the graph above

(Some of) the Problems with Selenium

Low coverage / slow execution time - These were rudimentary smoke test checks that verified a very low baseline of functionality - things like logging in, accessing admin pages, and creating/editing a few different types of entities. The tests took about 13 minutes to complete.
Unreliable - the tests would often fail a build due to timeout issues from dom elements that sometimes rendered slowly in the CI environment. There were many waits that were added to mitigate this issue, but invariably, some new factor affecting page load speed would be discovered and cause the tests to have to be rejiggered (again).
Low developer engagement - the selenium tests were maintained by 1 person who operated outside of any of the scrum teams doing feature work and bugfixes.
Test harness and tests were difficult to understand - because the test harness was homegrown, the files and directories were laid out according to the conventions and preferences of a single person. It was very challenging to simply differentiate test files from support files. The structure was undocumented, and there were thousands of files spread out in hundreds of directories. Not a single developer succeeded in getting the tests to run. I tried myself, and could not do it!
Lack of actionable feedback when test failures occurred - when failures occurred, there was no way of getting a screenshot of what occurred. You had to inquire with the 1 person who understood how the entire harness worked.
Low Trust - Tests were not modified at a rate commensurate with UI development - which led to brittleness. Because the stability of the tests were already suspect, developers would reflexively blame the flakiness of the tests instead of operating under the assumption that a real bug was caught.

An Important note about Selenium

Although we only had 16 Selenium tests running in CI, we actually had dozens more that covered more complex, high-value workflows. Unfortunately, these tests were too unstable to integrate into our Jenkins pipeline, and were instead executed on an adhoc basis manually. Some people felt that doing this added value. Others felt that manually running Selenium tests, while simultaneously second-guessing their results, cancelled out any benefits.

How Things Changed

At the end of May 2018, I heard about the Cypress test framework from a dev manager in my company who worked on a different product. His team was enthusiastically embracing Cypress, and fully committed to rewriting all of their Selenium tests when I met them. Following his recommendation, I looked into it.

Installation was simple

npm install cypress --save-dev

Followed by this 10 minutes later...

describe('Hello World Test', function() {
    it('makes a simple assertion', function() {
        assert.isTrue(1 > 0);
    })
});

Followed by a working UI test spec 30 minutes later...


describe('Web App Login', function() {
    it('visits the login page and logs in', function() {
        // Got o the login form
        cy.visit('/user/login');
        // Enter your login
        cy.get('#user_login').type(Cypress.config('admin_login'));
        // Enter your password
        cy.get('#user_password').type(Cypress.config('admin_pwd'));
        // Submit
        cy.get('button[name="commit"]').click();
        // Assert that your url has changed
        cy.url().should('not.contain', '/user/login');
        // Assert on SG.globals.current_user.id
        cy.window().its('SG.globals.current_user.id').should('eq', Cypress.config('admin_id'));
        // Assert that the UI conforms to expectation
        cy.get('#sg_global_nav').should('be.visible');
    });
});

Cypress’s strengths seemed to resolve the most glaring problems of Selenium

Setup and configuration were simple - projects are scaffolded in a way that emphasizes non-arbitrary and logical directory structures for test specs, support files, fixtures, etc.
Test code could be written in modern Javascript, and therefore looked a lot like front-end application code - ie: easy to understand by a developer
Visual feedback in the test runner, along with the standard Chrome debugging tools, made it extremely easy to iterate on and improve unstable tests (often within minutes)
The promise-like nature of Cypress’s element locators removed the unpredictability of dom rendering - Cypress locators automatically retried until they succeeded or timed out
The framework had built-in support for jQuery, Lodash, Moment.js, Chai assertions, screenshots, and video recording with configurable compression settings - natively available, with no need for additional import statements

I started used Cypress for my Team’s REST API - slowly at first

Around May 2018, my team was accelerating its development work for a new REST API, and iterating at a very fast clip. At the time, I had a suite of 49 good automated tests - written in Python using the Robot framework. I continued running those tests several times a day, while concurrently developing new Javascript tests using Cypress. The results surprised me. Not only was I able to reproduce the 49 Robot tests in less than a week using Cypress, I was able to quickly add and modify tests at the same rate that developers were adding and modifying API endpoints. Below is a screenshot from our Jenkins build server from early June 2018.

Screenshot of a jenkins build from june 2018

**431 tests in Jenkins (on a branch) completing in 01:20**

Rate of Cypress test development over time

The graph below shows the total number of newly created tests I was adding to Jenkins on a test branch in the few months after getting started. It became clear to me that the framework lent itself to extremely fast test development and iteration, and that my positive initial experiences were most likely not at all unique.

REST API Test Coverage in First 3 Months

As a side note, the importance of having thorough test coverage for our REST API was something which, in hindsight, was absolutely essential. In the months following our mvp release on May 15, 2018 adoption began climbing exponentially (see graph below).

REST API adoption in its first 7 months

Fully Integrating Cypress into our Jenkins Pipeline

On 7/23/2018, I merged 465 Cypress tests into master - after a thumbs-up technical review of the work I had done on my branch. The total time this added to our build was about 2.5 minutes. The only hiccups we experienced during the first month were occasional failures due to network connection problems arising when Cypress video recordings were transferred to the Cypress dashboard. As soon as we removed the file transfer step, the occasional failures stopped.

To recap, I had downloaded and installed Cypress on May 31. I spent the month of June investigating the ins and outs of the framework, writing tests, and developing a suitable Docker image to run those tests in Jenkins. I waited three weeks for another engineering team to conduct a technical review of my work, after which I merged my work into master, where 465 new tests began running on every push commit on every branch.

Advocating for Cypress as a general purpose e2e test framework

In December 2018, I put together a short demo of working UI tests to showcase Cypress’s suitability as a replacement for Selenium with respect to end to end testing. I was sure to cover a broad range of features: drag and drop graph configuration, xss checks, web-based database querying, and schema modification. I was convinced, from my experience working on the REST API feature work, our engineering team should aggressively pursue replacing the Selenium tests with Cypress.

1st cypress ui test demo

Friday Hackathon 12/19/2018

Why we finally said goodbye to Selenium

1. Cypress was gaining traction & visibility

Between July 2018 and January 2019, I had brought our Cypress test total up to 505 - and the rapid rate of test coverage growth was getting attention
I had also trained one of my teammates to take over the Cypress work - a very capable python programmer who was somewhat new to Javascript
I gave Cypress demos to other scrum masters and developers; building a case for its adoption and spreading the word that it was a worthwhile framework to invest in
A few of our senior developers began talking to other developers from different software products who had also successfully leveraged Cypress

2. developer frustration with Selenium failures

Between October 2018 and February 2019, there were several instances of developers merging PR’s into master for the sole reason of commenting out the Selenium tests because false positives were blocking a build with a critical fix or feature.

October 11, 2018
October 11, 2018 (a second PR the same day!)
Nov 22, 2018
Feb 11, 2019
March 13, 2019

3. New leadership, a mandate, and a bigger team

A new director of engineering wanted to empower the scrum teams to take ownership of test automation, rather than have this function outsourced to a specialized automation team / individual. One of his first priorities was to get everyone on the same page with automation.
We received word in August 2018 that Cypress was officially blessed by our larger organization, fully aligned with the wider company goal of getting all our cloud software products to 100% CI/CD
We hired a few more front-end developers who either had experience using Cypress, or could easily ramp up because of their experience with Node.js

Results - Before and After Comparison

As of November 14, 2019, our Jenkins Pipeline was running 903 integration and E2E tests on every build of every branch, using the Cypress framework I put into place on July 23, 2018.

Test Coverage

The Major Differences Today

Between 2015 and 2018, there was a single person entirely responsible for test automation. He was not in an agile team that developed application code. And because he wasn’t a subject matter expert, wasn’t properly equipped to determine what was and was not important to test. In 3+ years time, he produced 16 tests of dubious value, using a test harness that was incomprehensible to every other developer.

Getting started with automation now is a lot easier

With Selenium, difficulty recruiting enthusiastic developers was the primary factor that prevented it from gaining support. And the main reason for that was the difficulty in understanding how it all worked. That is not the case today. Now, any developer can do any of the following with ease:

Install the test harness and understand the internals of what makes it all work
Write a basic test quickly with super simple “test templates”
Run tests locally or in the docker container, and see the result
Trust the result
Check in a test to their branch, then be notified as soon as a test failure occurs (via Slack), then click through to the build server, read useful debugging output, and even view a video recording of the failure
Reproduce a failing test locally, make modifications, then confidently commit and push to start another build

better tools & higher developer engagement

What we have running on our build servers today has greatly surpassed what we had a year earlier, precisely because it is now the product of more than 16 developers, spread across 9 different agile teams. Some of the improvements that have been made are:

The Cypress Docker image now pulls from an internal Docker registry to guarantee security-hardened base images
Parallelized test execution ensures that no single long-running test holds up the build; which means that 903 E2E tests can complete in < 11 minutes
Jenkins now flags any build that has a failing Cypress UI test as unstable without failing the build. It then triggers a Slack notification to the commit author with 1-click access to the build, where he/she can view a video recording of the failure
Built-in configurable retries provide a flexible mechanism by which test writers can avoid test failures that arise simply because an upstream service was slow to respond (ie: cloud transcoding)
Cypress tasks can now drive the Ruby console over http routes, using a new networked Docker service, greatly speeding up complex setups and teardowns

decomplexification has won over developers

Nowadays, we have developers checking in tests within the same day of looking at sample test code. Often, as features are reworked, people will make small tweaks to tests they have not written, or are examining for the very first time - because the code is easy to understand. QA Engineers are empowered to check out test branches, modify React locators so that new tests will work, then add their own commits to the developer’s branch. Test improvement is incremental and ongoing, and no longer seems impossible, because it is happening all the time in every agile team. And test automation isn’t the eye-roll-inducing catch phrase it once was, because developers know what to do, and how to do it. That is pretty much, the secret sauce IMHO.