Lowe’s is a nearly $90B home improvement retailer that operates about 2,200 stores and employs more than 300,000 associates. By building an automated testing and monitoring system that prevents performance regressions from deploying to production, Lowe’s Site Speed Team was able to improve its website performance, ranking among the top retail sites.
The Site Speed Team’s goal is to make the Lowe’s site one of the fastest e-commerce sites in terms of page load performance. Before they built their automated testing and monitoring system, Lowe’s website developers were unable to measure performance automatically in pre-production environments. Existing tools only conducted tests in the production environment. As a result, inferior builds slipped into production, creating a poor user experience. These inferior builds would remain in production until they were detected by the Site Speed Team and reverted by the author.
The Site Speed Team used open source tools to build an automated performance testing and monitoring system for pre-production environments. The system measures the performance of every pull request (PR) and gates the PR from shipping to production if it does not meet the Site Speed Team’s performance budget and metric criteria. The system also measures SEO and ADA compliance.
From a sample of 1 team over 16 weeks deploying 102 builds, the automated performance testing and monitoring system prevented 32 builds with subpar performance from going into production.
Where it used to take the Site Speed Team three to five days to inform developers that they had shipped performance regressions into production, the system now automatically informs developers of performance problems five minutes after submitting a pull request in a pre-production environment.
Code quality is improving over time, as measured by the fact that fewer pull requests are being flagged for performance regressions. The Site Speed Team is also gradually tightening governance budgets to continuously improve site quality.
In general, having clear ownership of problematic code has shifted the engineering culture. Instead of begrudging reactive corrections because it was never clear who actually introduced the problems, the team can make proactive optimizations with ownership of problematic code being objectively attributable.
The heart of the Site Speed Governance (SSG) app is Lighthouse CI. The SSG app uses Lighthouse to validate and audit the page performance of every pull request.
The SSG app causes a build to fail if the Site Speed Team’s defined performance budget and metric targets are not reached. It enforces not only load performance but also SEO, PWA, and accessibility. It can report status immediately to authors, reviewers, and SRE teams. It can also be configured to bypass the checks when exceptions are needed.
Automated Speed Governance (ASG) process flow #
Start point. A developer merges their code into a pre-production environment.
- Deploy the pre-production environment with CDN assets.
- Check for the successful deployment.
- Run a Docker container to start building the ASG application or send a notification (in the event of deployment failure).
Jenkins and Lighthouse #
- Build the ASG application with Jenkins.
- Run a custom Docker container that has Chrome and Lighthouse installed. Pull
lighthouserc.jsonfrom the SSG app and run
lhci autorun --collect-url=https://example.com.
Jenkins and SSG app #
assertion-results.jsonfrom lhci and compare it to predefined budgets in
budgets.json. Save the output as a text file and upload it to Nexus for future comparisons.
- Compare the current
assertion-results.jsonto the last successful build (downloaded from Nexus) and save it as a text file.
- Build an HTML email with the success or failure information.
- Send the email to the relevant distribution lists with Jenkins.