In This Article
Android QA looks different from the spreadsheet-driven test plan most teams used three years ago. The fragmentation tax is real but manageable: Android 14, Android 15, and Android 16 cover 85 percent of active devices, Compose has stabilized as the UI default, and the testing tooling around it has caught up. Maestro, Compose UI Test, Firebase Test Lab, and Roborazzi together cover what used to take three or four separate frameworks. Google‘s official Android developer documentation hosts the canonical testing pyramid this article maps to.
Below are the seven steps modern Android teams follow to ship reliable releases, with the current 2026 tooling and the practical thresholds we see in well-run teams.
TL;DR
The pick: Run unit tests on every commit (target 80 percent coverage on shared logic, 50 percent on UI). Run integration and visual regression on every merge. Run a full Firebase Test Lab matrix nightly.
Runner-up: Use Maestro for end-to-end and Compose UI Test for screen-level UI testing. Both have settled as the 2026 defaults.
Skip if: Skip device-farm cloud testing on every PR; the cost-benefit only pencils out at nightly or pre-release cadence for most teams.
Step 1: Unit tests with mocking and coroutines support
JUnit 5 plus MockK plus Turbine for Flow assertions is the 2026 default unit test stack. Target 80 percent coverage on shared business logic in the data and domain layers and 50 percent on UI logic. The right ratio depends on the codebase, but those are the thresholds we see at well-run teams.
Run the unit test suite on every push, blocking on coverage regressions. With Gradle’s configuration cache and parallel execution, a 3000-test suite runs in under 90 seconds on a Compose-default project.
Step 2: Compose UI test for screen-level testing
Compose UI Test (androidx.compose.ui.test) has replaced Espresso for new projects. Screen-level tests run on Robolectric for speed (no emulator needed) and produce identical results to instrumented tests for the vast majority of cases.
Roborazzi adds screenshot testing on top of Compose UI Test. Capture a screenshot per screen state, commit the baseline images to the repo, and fail any pull request that changes the rendering without intent. This single addition catches an enormous share of regressions before they reach manual QA.
Step 3: Integration tests on real flows
Integration tests live between unit and end-to-end: real ViewModel plus real repository plus a fake or in-memory data source. The goal is to catch wiring bugs (DI mistakes, navigation issues, state propagation) without the brittleness of full UI automation.
Hilt’s TestInstaller plus Compose UI Test gives you a working integration stack with minimal ceremony. Aim for one integration test per significant user journey.
Step 4: End-to-end tests with Maestro
Maestro has consolidated the end-to-end testing market. YAML-defined flows that exercise the real app on a real device or emulator. Flake rate is roughly 10x lower than Appium for equivalent flows.
Run end-to-end against three to five critical journeys per release: signup or sign-in, the primary feature flow, payment if applicable, settings rotation. Resist the urge to E2E-test everything; the maintenance cost grows superlinearly.
Step 5: Device matrix testing on Firebase Test Lab
Firebase Test Lab covers a representative cross-section of Android devices, including Samsung One UI variations, Xiaomi MIUI, and the Pixel reference behavior. Run your full integration and end-to-end suite on a 12-device matrix nightly: latest Pixel, latest Samsung flagship, mid-range Samsung A series, a Xiaomi or OnePlus, an older Pixel 6 for baseline, and a tablet form factor.
Per-PR device matrix testing is rarely cost-effective. Nightly is the sweet spot; pre-release is mandatory.
Step 6: Performance and memory profiling
Macrobenchmark and Microbenchmark together give you reliable performance regression detection. Baseline Profiles compile critical paths ahead of time and meaningfully reduce cold start time on lower-end devices.
Set startup time budgets in CI (cold start under 1.5 seconds on Pixel 6a, 2.5 seconds on a mid-range Samsung) and block regressions. Memory leak detection with LeakCanary in debug builds catches the vast majority of leaks before they ship.
Step 7: Pre-release manual QA and exploratory testing
Automated testing covers regression. Manual QA covers exploratory testing, accessibility validation, and the user-experience nuance that automation misses. Plan two to three days of manual QA per release on a 12-day cadence; less on shorter cadences.
Run TalkBack screen reader testing on every release. Accessibility regressions are easy to introduce and easy to miss without explicit testing.
The setup, step by step
- 1
Unit tests on every commit
JUnit 5 plus MockK plus Turbine; block on coverage regressions.
- 2
Compose UI tests with screenshot baselines
Compose UI Test plus Roborazzi; commit screenshot baselines to the repo.
- 3
Integration tests on critical wiring
Hilt TestInstaller plus Compose UI Test; one test per significant journey.
- 4
End-to-end tests with Maestro
Three to five critical user journeys, run on every merge.
- 5
Device matrix on Firebase Test Lab nightly
12-device matrix covering Pixel, Samsung, mid-range, older baseline, and tablet.
- 6
Performance benchmarks in CI
Macrobenchmark plus Baseline Profiles; block on cold-start regressions.
- 7
Pre-release manual QA plus TalkBack
Two to three days of exploratory and accessibility validation per release.
FAQ
How much QA time is reasonable as a percentage of dev time?
For a well-instrumented team, automated QA is roughly 15 to 25 percent of development effort and manual QA is another 5 to 10 percent. Less than that and bug escape rates climb noticeably.
What about Espresso for older projects?
Espresso still works, but Compose UI Test plus Robolectric runs faster and covers Compose components more completely. New tests should use Compose UI Test; existing Espresso suites are worth migrating opportunistically rather than wholesale.
Is on-device testing on real hardware still worth it?
Yes, for OEM-specific behavior. Samsung One UI, Xiaomi MIUI, and OnePlus OxygenOS all have quirks that emulators miss. Firebase Test Lab’s real device pool covers the meaningful cases.
Should we run accessibility tests in CI?
Yes. Compose UI Test plus the Accessibility Test Framework catches a useful slice of issues. Manual TalkBack passes catch the rest. Both are worth the effort.
The verdict
Modern Android QA is more automated, more reliable, and considerably more affordable than it was in 2022. The teams that ship cleanly run unit, integration, and end-to-end on every change, do device matrix nightly, profile performance on every release, and reserve human QA for the things automation cannot cover. Get the seven steps wired up once and they pay back every sprint after.











