diff options
Diffstat (limited to 'toolkit/components/telemetry/docs/fhr/identifiers.rst')
-rw-r--r-- | toolkit/components/telemetry/docs/fhr/identifiers.rst | 83 |
1 files changed, 83 insertions, 0 deletions
diff --git a/toolkit/components/telemetry/docs/fhr/identifiers.rst b/toolkit/components/telemetry/docs/fhr/identifiers.rst new file mode 100644 index 000000000..82ad0e49e --- /dev/null +++ b/toolkit/components/telemetry/docs/fhr/identifiers.rst @@ -0,0 +1,83 @@ +.. _healthreport_identifiers: + +=========== +Identifiers +=========== + +Firefox Health Report records some identifiers to keep track of clients +and uploaded documents. + +Identifier Types +================ + +Document/Upload IDs +------------------- + +A random UUID called the *Document ID* or *Upload ID* is generated when the FHR +client creates or uploads a new document. + +When clients generate a new *Document ID*, they persist this ID to disk +**before** the upload attempt. + +As part of the upload, the client sends all old *Document IDs* to the server +and asks the server to delete them. In well-behaving clients, the server +has a single record for each client with a randomly-changing *Document ID*. + +Client IDs +---------- + +A *Client ID* is an identifier that **attempts** to uniquely identify an +individual FHR client. Please note the emphasis on *attempts* in that last +sentence: *Client IDs* do not guarantee uniqueness. + +The *Client ID* is generated when the client first runs or as needed. + +The *Client ID* is transferred to the server as part of every upload. The +server is thus able to affiliate multiple document uploads with a single +*Client ID*. + +Client ID Versions +^^^^^^^^^^^^^^^^^^ + +The semantics for how a *Client ID* is generated are versioned. + +Version 1 + The *Client ID* is a randomly-generated UUID. + +History of Identifiers +====================== + +In the beginning, there were just *Document IDs*. The thinking was clients +would clean up after themselves and leave at most 1 active document on the +server. + +Unfortunately, this did not work out. Using brute force analysis to +deduplicate records on the server, a number of interesting patterns emerged. + +Orphaning + Clients would upload a new payload while not deleting the old payload. + +Divergent records + Records would share data up to a certain date and then the data would + almost completely diverge. This appears to be indicative of profile + copying. + +Rollback + Records would share data up to a certain date. Each record in this set + would contain data for a day or two but no extra data. This could be + explained by filesystem rollback on the client. + +A significant percentage of the records on the server belonged to +misbehaving clients. Identifying these records was extremely resource +intensive and error-prone. These records were undermining the ability +to use Firefox Health Report data. + +Thus, the *Client ID* was born. The intent of the *Client ID* was to +uniquely identify clients so the extreme effort required and the +questionable reliability of deduplicating server data would become +problems of the past. + +The *Client ID* was originally a randomly-generated UUID (version 1). This +allowed detection of orphaning and rollback. However, these version 1 +*Client IDs* were still susceptible to use on multiple profiles and +machines if the profile was copied. |