How we run end-to-end tests in Buildkite CI
End-to-end tests are an important part of our testing infrastructure at Sourcegraph. At the top of the testing pyramid, end-to-end testing ensures key user flows work properly from the user interacting with the browser all the way to backend services that work behind the scenes. However, end-to-end tests pose a couple of unique challenges. If not done right, these lead to flakiness and developers learn not to trust test failures when they happen - resulting in real bugs going unfixed. How do we prevent this?
A good end-to-end testing system
When a commit causes an end-to-end test to fail, we need to make sure the developer who caused the failure is notified, has the confidence that the failed test identified a real problem and has enough insight into the issue to solve it. For example, we would want to show which commit caused the failure (including the message), which test is failing, and why it is failing. Before the redesign of our end-to-end testing system, we had a custom service that ran the test suite periodically and posted to a Slack channel when a run failed. This wasn’t a good experience for a number of reasons:
- The failure could not be mapped back easily to a commit and author.
- There was no UI to retry a run.
- If the tests kept failing, the bot would keep spamming the Slack channel.
- There wasn’t a good way to see if the end-to-end tests were already failing before.
- The Slack UI wasn’t a good display for test results. For example, it couldn’t show CLI colors, and it wasn’t easy to add screenshots to a webhook payload.
- There was no clear way to run end-to-end tests on branches/PRs while still delivering notifications only to the relevant person.
- Logs were not streamed.
Some of these issues could have been solved by investing time to make our end-to-end bot smarter, but it would have added more complexity.
We were already using Buildkite for other continuous integration tasks and knew that it already provided these features and much more:
- Every failure is linked to a commit and author.
- Builds and steps can be rerun.
- UI for seeing recent runs.
- CLI colors.
- Streamed logs.
- Customizable email notifications.
- Integration into notification tools like CCMenu.
- Prevents deploys while tests are running.
- My favorite feature: Screenshots in CLI output.
Defining a Buildkite pipeline for end-to-end tests
To accomplish testing Sourcegraph truly “end-to-end”, the tests are run against a real deployment. At Sourcegraph, we use Docker and Kubernetes to deploy our application. Our pipeline builds the images for the current commit, then the deploy
step in the pipeline deploys the fresh images with kubectl set-image
to a dedicated staging cluster and waits for the rollout to finish with kubectl rollout status
.
The YAML pipeline definition looks similar to this:
The concurrency_group
and concurrency_limit
settings here prevent other deploys from running at the same time and ensures they are run in order of creation. It acts like a “lock” on the staging cluster - no other build (not even from a different pipeline) can touch the staging cluster until the end-to-end test run of this build completed.
Writing an end-to-end test suite
For the actual tests, we use Puppeteer - a lightweight library by Google to control a headless Google Chrome instance. It will navigate to pages, click elements, and assert that certain elements appear or have the right content. Together with a test runner like Mocha that supports async
/await
in tests, it enables tests that are both easy to read and write.
Making assertions
Being run against an actual deployment, end-to-end tests are subject to latency, so most actions and assertions need to account for variable loading times. That means it is not possible to, for example, programmatically click an element and directly assert that the desired effect occurred. Adding an artificial timeout between the action and the assertion doesn't work well because the time that needs to be waited for can vary. If the delay is too short, then the test will fail when it should pass, but if the delay is too long then it slows down the entire test suite. A better approach is to retry every assertion a fixed number of times, with a small delay between every retry. The p-retry
module from npm makes this very easy.
Mocha’s --retries
option is also helpful to prevent flakiness, but be aware that this might hide actual failures that only happen a fraction of the time.
To make sure end-to-end tests are not accidentally broken, we use special CSS classes prefixed with e2e
for elements that are asserted on in tests.
Giving insight into failures with screenshots
If a test fails, we tell Puppeteer to save a screenshot to disk named after the test name:
In the pipeline, we defined that all .png files are uploaded as artifacts to Buildkite. We then use Buildkite’s special ANSI escape sequence to make it display the screenshot right in the log output of the test failure.

This is incredibly valuable to reveal why a test might have failed - for example, did only a button not appear, or is its whole parent component not rendered? Another benefit is that real failures are less likely to be dismissed as flakiness because the screenshot serves as a proof that something is truly wrong.
Conclusion
So far, this new end-to-end testing system works well for us - confidence in tests has increased, and engineers feel more responsible to fix failures quickly.
Do you have interesting end-to-end testing stories? Tweet us @sourcegraph!