Writing Tests in C++
Unit Tests vs Integration Tests (Browser Tests, End-to-End Tests)
It's important to be aware of the purpose of each kind of test.
Unit tests are meant to test individual components ("units", typically classes). When a unit test fails, it should direct you to the exact component that needs fixing. Unit tests should be very fast to run, since they require a only a minimal test environment.
Integration tests are meant to test interactions between components. They tend to be larger, more complicated tests, and are slower to run. Chrome has several kinds of integration tests, most importantly browser tests and end-to-end (E2E) tests.
Browser tests, unlike unit tests, run inside a browser process instance and are attached to a window for rendering. These are most often used for testing UIs, but have other uses as well.
E2E tests are automated tests performed on actual Chromebooks. A complete ChromeOS build is flashed to the hardware and a test framework called Tast is used to run tests against the device. Multiple devices, including both Chromebooks and Android phones, can be tested in tandem, allowing for full tests of Bluetooth interactions, for example. These tests are by far the slowest and flakiest kind of test, but they can detect a wide array of problems that would not show up in other kinds of tests.
This just scratches the surface. Check out these resources for even more kinds of tests:
Tips for writing good unit tests
Test individual components in isolation
When testing a complicated component with many dependencies, it can be tempting to use real components to satisfy the dependencies. This tends to produce lower quality quasi-integration tests. You're no longer testing just the component under test, but also its dependencies. When a breaking change occurs, many tests may fail at once, and the tests are both more cumbersome to maintain and less informative when they break.
The way to fix this problem is to use fakes. As a bonus, code written in this way tends to be more modular and flexible.
Test the API, not the implementation
Unit tests should focus on the public API of the component rather than testing the internal implementation. This has several benefits:
- The internal implementation can change without breaking the tests. As long as the component maintains the same semantics in its public API, you are free to use any algorithm or data structure internally to achieve the result, and the tests just work.
- The tests provide a good example of how to use the component. Unlike documentation, tests exercising the API are never out-of-date.
- You don't need to find ways to circumvent access modifiers like
private
orprotected
in the tests. - The tests function more similarly to how the code is used in production.
Test coverage
"Test coverage" or "code coverage" refers to the percentage of lines in a file that are executed by unit tests. If a file has low test coverage, then that is a strong indication that the code in question needs to be tested more thoroughly.
Note: The inverse is not always true. A file with high test coverage may not be well-tested, since test coverage does not care what you're testing for, only that the code is run by tests.
Test coverage is a somewhat crude metric for testing quality, but it has the virtue of being easy to understand and calculate. For this reason, it is the primary method for gauging whether the team is performing adequate testing, and efforts to improve "operational excellence" will often target this metric.
How much test coverage is enough?
As a rough guideline, aim for at least 80%, preferably 90% test coverage.
Trying for more than 90% test coverage is often counterproductive. It is often
the case that code will contain some lines that are never meant to be run under
normal circumstances. These should be guarded with NOTREACHED()
, LOG(FATAL)
,
etc. Tests that target this kind of code will usually require a lot of abuse of
the code under test to even make it possible for these lines to be run, and they
don't add much value over a simple CHECK()
statement.
In some cases, e.g. short boilerplate files, getting to 80% test coverage can be difficult or unreasonable simply because there isn't anything worth testing. This isn't a problem necessarily, so long as you are able to justify it for your CL reviewer.
That said, experienced CL reviewers will be suspicious of code with test coverage less than 80%, and will usually request that you write more tests. It's better to be proactive and write thorough tests for your code before sending it out for review.
Checking test coverage in Gerrit
Gerrit automatically calculates and displays test coverage for each file in a CL. All you need to do is click the "DRY RUN" button and wait for the tests to finish running. It's best to do this before sending a CL for review, since your reviewer will want to know that the tests are passing and see the test coverage.
For each file, Gerrit displays two types of coverage, absolute coverage (|Cov|) and incremental coverage (ΔCov). Absolute coverage is based on all lines in the file, regardless of whether they have been changed in the CL or not. Incremental coverage only considers the lines that have been changed. For the purpose of reviewing a CL, incremental coverage matters most. It's unreasonable to expect the author to write tests for code unrelated to the change.
There is also a "Code Coverage Check" warning that will be displayed if test coverage is below 70%.
Finally, Gerrit color-codes individual lines so that you can see whether they have test coverage or not. This makes it easy to decide what tests to write in order to increase coverage.
Looking at test coverage by directory
The code coverage dashboard (go/chrome-coverage) is a good way to find up-to-date test coverage information by file or by directory. Be sure to select "ChromeOS" for the platform.
Looking at test coverage using CL hashtags
When developing a new feature, we often want to get a quick idea of overall test coverage for the feature without having to tease apart which directories were affected by the new feature, and which were not. This handy PLX dashboard (go/chrome-feature-code-coverage) allows you to specify a Gerrit hashtag and see the incremental test coverage for all CLs with that tag.
For this to work, you need to apply Gerrit hashtags to each of your CLs. This is most easily done by agreeing with your team on what hashtag to use before beginning development.
There are two ways to set hashtags:
- When uploading a new CL, put the hashtag in the title of the CL enclosed with square brackets. On first upload, Depot Tools will parse the title for tags and apply them automatically.
- Inside Gerrit, the left pane has a "Hashtags" field where you can change the hashtags later on. You may need to click the "SHOW ALL" button for this field to be displayed.
WARNING: Changing a CL's title after the first upload will not add/remove
hashtags from the CL. It only works on the very first git cl upload
.
Common testing patterns
Friending the tests
Using the friend
keyword is a way to give another class exclusive access to
the private members of a class. It should usually be avoided since it's a sign
of poor design, but it can be handy for writing tests against helper functions
or for setting private members of a class in a test. Just be aware that it's
better style to first try to get it working
using only public members.
The pattern looks like this:
class ClassUnderTest {
private:
friend class ClassUnderTestFixture;
int PrivateHelperFunction(bool arg);
};
// Unit tests
class ClassUnderTestFixture {
public:
int CallPrivateHelperFunction(bool arg) {
// We're allowed to call this private function because this text fixture
// was declared to be a friend of ClassUnderTest.
return PrivateHelperFunction(arg);
}
};
Testing asynchronous code
Asynchronous work and task runners
Task runners are used to post tasks to be executed asynchronously (see Chromium documentation to learn more). We also encounter asynchronous code in the form of methods that accept completion callbacks, or when working with Mojo.
Asynchronous code and continuation passing style
Asynchronous code in Chromium is most often written in continuation passing style, e.g.
void DoTheThing(int arg1, bool arg2, base::OnceClosure callback);
The idea here is that the function can return quickly after beginning the
(potentially long-running) operation, but the operation hasn't actually been
completed until callback
has been called.
We often need to write unit tests for functions like this. Naively, you might consider just creating a callback
TEST(TestExamples, DoTheThing) {
bool callback_called = false;
base::OnceClosure callback = base::BindLambdaForTesting(
[&callback_called]() {
callback_called = true;
}
);
DoTheThing(1, true, std::move(callback));
EXPECT_TRUE(callback_called); // Might not succeed !!
}
There are a couple of problems with this example:
- Lambdas introduce some boilerplate and can be difficult to read
- Depending on whether
DoTheThing()
posts tasks to the thread pool, the callback may not be called until afterEXPECT_TRUE(callback_called)
.
We can fix (2) by adding a call to RunLoop::RunUntilIdle()
, but this makes the
tests flaky (see below) and should be avoided, and it doesn't help with (1).
TEST(TestExamples, DoTheThing) {
bool callback_called = false;
base::OnceClosure callback = base::BindLambdaForTesting(
[&callback_called]() {
callback_called = true;
}
);
DoTheThing(1, true, std::move(callback));
base::RunLoop::RunUntilIdle(); // AVOID DOING THIS !!
EXPECT_TRUE(callback_called);
}
A better solution is to use TestFuture
.
TestFuture
To test asynchronous code, consider using a TestFuture. This allows you to wait for a return value from an asynchronous method using a very concise syntax, for example:
TestFuture<ResultType> future;
object_under_test.DoSomethingAsync(future.GetCallback());
const ResultType& actual_result = future.Get();
TestFuture::Get()
will synchronously block the thread until a result is
available, similar to if you had created a RunLoop
(see below).
See the documentation in test_future.h for more details on usage.
RunLoops - Prefer QuitClosure()+Run() to RunUntilIdle()
Another recommended option (per the
Chromium style guide)
is to use base::RunLoop
. A RunLoop will run the message loop asynchronously
and verify the behavior is expected, or injecting a task runner so tests can
control where tasks are run. Chromium best practice for these types of tests is
as follows: when writing a unit test for asynchronous logic, prefer
base::RunLoop
's QuitClosure()
and Run()
methods to target precisely which
ongoing tasks you want to wait for to finish -- instead of relying on
RunUntilIdle()
to let all tasks finish. As per the
documentation,
RunUntilIdle()
can cause flaky tests for the following reasons:
- May run long (flakily timeout) and even never return
- May return too early. For example, if used to run until an incoming event has occurred but that event depends on a task in a different queue -- e.g. another TaskRunner or a system event.
QuitClosure()
and Run()
also provide the benefit of being able to block on
specific conditions.
To use QuitClosure()
+ Run()
:
- Pass the
QuitClosure()
into the async call that is being tested Run()
the closure when the task is expected to be run- Explicitly call the blocking
Run()
method on thebase::RunLoop
to guarantee that the test will not progress until the quit closure is invoked. - Verify the results are expected
Example 1: Binding the QuitClosure
to a callback
void OnGetGroupPrivateKeyStatus(base::OnceClosure callback,
GroupPrivateKeyStatus status) {
get_group_private_key_status_response_ = status;
std::move(callback).Run();
}
TEST_F(DeviceSyncClientImplTest, TestGetGroupPrivateKeyStatus) {
SetupClient();
base::RunLoop run_loop;
client_->GetGroupPrivateKeyStatus(
base::BindOnce(&DeviceSyncClientImplTest::OnGetGroupPrivateKeyStatus,
base::Unretained(this), run_loop.QuitClosure()));
SendPendingMojoMessages();
fake_device_sync_->InvokePendingGetGroupPrivateKeyStatusCallback(
expected_status);
run_loop.Run();
EXPECT_EQ(expected_status, get_group_private_key_status_response_);
}
Example 2: Using base::BindLambdaForTesting
TEST_F(NearbyPresenceTest, UpdateRemoteSharedCredentials_Success) {
std::vector<mojom::SharedCredentialPtr> creds = CreateSharedCredentials();
base::RunLoop run_loop;
nearby_presence_->UpdateRemoteSharedCredentials(
std::move(creds), kAccountName,
base::BindLambdaForTesting([&](mojom::Status status) {
EXPECT_EQ(mojom::Status::kOk, status);
auto creds = fake_presence_service_->GetRemoteSharedCredentials();
EXPECT_FALSE(creds.empty());
EXPECT_EQ(3u, creds.size());
EXPECT_EQ(std::string(kSecretId1.begin(), kSecretId1.end()),
creds[0].secret_id());
EXPECT_EQ(std::string(kSecretId2.begin(), kSecretId2.end()),
creds[1].secret_id());
EXPECT_EQ(std::string(kSecretId3.begin(), kSecretId3.end()),
creds[2].secret_id());
run_loop.Quit();
}));
run_loop.Run();
}
Testing With multiple Feature Flags
The most robust way to test a class affected by feature flags is to run all unit tests under all combinations of affecting feature flags enabled/disabled. This is easily achieved using Value-Parametrized Tests.
The idea is to pass feature bit masks as parameters to the test suite in which
each bit represents a feature flag being enabled (1
) or disabled (0
). Every
combination of flags can be represented with num flags
bits by counting up
from 0
to 2^(num flags)-1
: when passed in as parameters, these bit masks are
translated into enabled and disabled flags upon test suite creation.
Example:
Consider 2 flags, Flag0
and Flag1
. To represent each combination of their
enabled/disabled state, 2 bits are used:
00
-> both disabled01
->Flag0
enabled,Flag1
disabled10
->Flag1
enabled,Flag0
disabled11
-> both enabled
Instantiate Test Suite
A parameterized test suite is defined using the INSTANTIATE_TEST_SUITE_P
macro
called below the unit tests in its suite.
The first argument to INSTANTIATE_TEST_SUITE_P
is a name unique to the test
suite. The second is the name of the test pattern. The third argument is the
parameter generator, which in this case generates the range of whole numbers
[0, 2^num flags)
.
INSTANTIATE_TEST_SUITE_P(UniqueTestSuiteName,
TestPatternName,
testing::Range<size_t>(0, 1 << kTestFeatures.size()));
kTestFeatures
should be defined at the top of the file in an unnamed namespace
per the
Chromium c++ style guide.
const std::vector<base::test::FeatureRef> kTestFeatures = {
features::Flag0, features::Flag1, features::Flag2};
Note: size_t
only represents 2^16
numbers, so for feature flag lists with
size > 16
use type int
for the feature mask instead. If that many flags are
in use, the class should probably be tested differently anyways.
Create Feature List
Inside the test class, the feature mask needs to be translated into enabled and
disabled features. This is accomplished by mapping each bit to a feature flag in
kTestFeatures
by index and enabling it if the bit is 1
and disabling it if
0
.
void CreateFeatureList(size_t feature_mask) {
std::vector<base::test::FeatureRef> enabled_features;
std::vector<base::test::FeatureRef> disabled_features;
for (size_t i = 0; i < kTestFeatures.size(); i++) {
if (feature_mask & 1 << i) {
enabled_features.push_back(kTestFeatures[i]);
} else {
disabled_features.push_back(kTestFeatures[i]);
}
}
scoped_feature_list_.InitWithFeatures(enabled_features, disabled_features);
}
base::test::ScopedFeatureList scoped_feature_list_;
Call CreateFeatureList
somewhere inside of the test class' constructor. The
test suite parameter is obtained with GetParam()
.
class UniqueTestSuiteName : public testing::Test {
public:
explicit UniqueTestSuiteName
: CreateFeatureList(GetParam())
...
}
Write Parameterized Unit Tests
Parameterized unit tests require the TEST_P
macro. The first argument to
TEST_P
is the same unique test suite name that is the first argument to
INSTANTIATE_TEST_SUITE_P
. The second is the name of the test.
If the unit test's contents should only be run for a subset of flag conditions,
remember to sequester them using an if
branch: this both prevents the test
from failing and the test's contents from executing unnecessarily, which saves
computation.
TEST_P(UniqueTestSuiteName, Flag0Enabled_Test) {
if(IsFlag0Enabled()) {
...
}
}
Mojo
See Stubbing Mojo Pipes for pointers on how to unit test Mojo calls.