Protection against data override by old Sync clients
Overview
Problem
This document outlines the necessary steps to prevent data loss scenarios with Sync's Model API for multi-client Sync users. A typical problematic scenario is as follows:
- New proto field
F
is introduced in data specifics (e.g.PasswordSpecifics
). - Client
N
(a newer client) submits a proto containing the introduced fieldF
. - Client
O
(an older client) receives the proto, but doesn’t know the fieldF
, discarding it before storing in the local model. - Client
O
submits a change to the same proto, which results in discarding field’sF
data from clientN
.
Solution
To prevent the described data loss scenario, it is necessary for the old client
(client O
above) to keep a copy of a server-provided proto, including
unknown fields (i.e. fields not even defined in the .proto file at the time
the binary was built) and partially-supported fields (e.g. functionality
guarded behind a feature toggle). The logic for caching these protos was
implemented in DataTypeLocalChangeProcessor
.
To have this protection for a specific datatype, its
DataTypeSyncBridge
needs to be updated to include the
cached data during commits to the server (more details in the
Implementation
).
Checklist
To implement this solution, a Sync datatype owner should follow these steps:
- Override
TrimAllSupportedFieldsFromRemoteSpecifics
function (see thissection
). - [Optional] Add DCHECK to local updates flow (see this
section
). - Include unsupported fields in local changes (see this
section
). - Redownload the data on browser upgrade (see this
section
). - [Optional] Add sync integration test (see this
section
).
The result of these steps is that:
- Local updates will carry over unsupported fields received previously from the Server.
- Initial sync will be triggered if upgrading the client to a more modern version causes an unsupported field to be newly supported.
Implementation
Trimming
Storing a full copy of a proto may have performance impact (memory, disk). The Sync infrastructure allows and encourages to trim proto fields that do not need an additional copy (if the field is already well supported by the client).
Trimming is a functionality that allows each data type to specify which proto
fields are supported in the current browser version. Any field that is
not supported will be cached by the
DataTypeLocalChangeProcessor
and can be used
during commits to the server to prevent the data loss.
Fields that should not be marked as supported:
- Unknown fields in the current browser version
- Known fields that are just defined and not actively used, e.g.:
- Partially-implemented functionality
- Functionality guarded behind a feature toggle
TrimAllSupportedFieldsFromRemoteSpecifics
is a function of
DataTypeSyncBridge
that:
- Takes a
sync_pb::EntitySpecifics
object as an argument. - Returns
sync_pb::EntitySpecifics
object that will be cached by theDataTypeLocalChangeProcessor
. - By default trims all proto fields.
To add a data-specific unsupported fields caching, override the trimming
function in the data-specific DataTypeSyncBridge
to
clear all its supported fields (i.e. fields that are actively used by the
implementation and fully launched):
sync_pb::EntitySpecifics DataSpecificBridge::TrimAllSupportedFieldsFromRemoteSpecifics(
const sync_pb::EntitySpecifics& entity_specifics) const {
sync_pb::EntitySpecifics trimmed_entity_specifics = entity_specifics;
{...}
trimmed_entity_specifics.clear_username();
trimmed_entity_specifics.clear_password();
{...}
return trimmed_entity_specifics;
}
Safety check
Forgetting to trim fields that are supported might result in:
- I/O, memory overhead (caching unnecessary data)
- Unnecessary sync data redownloads on browser startup (more details below)
To prevent this scenario, add a check that:
- Takes a local representation of the proto (containing supported fields only)
- Makes sure that trimming it would return an empty proto
This should be done before every commit to the Sync server:
DCHECK_EQ(TrimAllSupportedFieldsFromRemoteSpecifics(datatype_specifics.ByteSizeLong()), 0u);
Local update flow
To use the cached unsupported fields data during commits to the server, add the code that does the following steps:
- Query cached
sync_pb::EntitySpecifics
from theDataTypeLocalChangeProcessor
(Passwords example
). - Use the cached proto as a base for a commit and fill it with the supported
fields from the local proto representation
(
Passwords example
). - Commit the merged proto to the server
(
Passwords example
).
Browser upgrade flow
To handle the scenario when unsupported fields become supported due to
a browser upgrade, add the following code to your data-specific
DataTypeSyncBridge
:
- On startup, check whether the unsupported fields cache contains any field
that is supported in the current browser version. This can be done by using
the trimming function on cached protos and checking if it trims any fields
(
Passwords example
). - If the cache contains any fields that are already supported, simply force
the initial sync flow to deal with any inconsistencies between local and
server states
(
Passwords example
).
It’s important to implement the trimming function correctly, otherwise the client can run into unnecessary sync data redownloads if a supported field gets cached unnecessarily.
If the trimming function relies on having data-specific field present in the
sync_pb::EntitySpecifics
proto (example
),
make sure to skip entries without these fields present in the startup check (as
e.g. cache can be empty for entities that were created before this solution
landed). This can be tested with the following Sync integration test
.
Integration test
Add a Sync integration test
for the caching / trimming flow.
Limitations
Sync support horizon
The proposed solution is intended to be a long-term one, but it will take some time until it can be used reliably. This is due to the facts that:
- Browser clients need to actually have the version with the mentioned
implementation
merged. - Sync supported version horizon is pretty long (multiple years).
Deprecating a field
Deprecated fields should still be treated as supported to prevent their unnecessary caching.
Migrating a field
This requires client-side handling as the newer clients will have both fields present and the legacy clients will have access to the deprecated field only. Newer clients should:
- Keep filling the deprecated field for legacy clients to use
- Add a logic to pick a correct value from a deprecated and new field to account for updates from legacy clients
Repeated fields
No client-side logic is required - the solution will work by default.
Nested fields
Protecting nested fields is possible, but requires adding client-side logic to
trim single child fields or the top level field if none of the child fields are
populated (Passwords notes example
).