I've been thinking a bit about roles and responsibilities for the
primary and secondary nodes of a log. Here I'm sketching a model that is
mostly compatible with the current replication protocol, and which makes
the nodes a bit more independent (e.g., could be run by different
organizations).
A log instance consists of a primary node and (ideally) several
secondary nodes. A log is identified by its key, i.e., the key that is
used to sign the log's advertised tree heads (the same tree heads that
are cosigned by witnesses, and for which the log operator's states
intended reliability etc). Each node is identified by a separate node
key.
* Local trees
Each node (including the primary) keeps its own local tree. That tree is
possibly larger (but not smaller, except when a new node is starting up)
than the log's advertised tree. Each node is identified by its node key.
The node key is used to sign the tree heads of its local tree. These
signatures must not be confused with the log's signed tree heads; if
it's not enough that separate keys are used, they could use a separate
signature namespace.
The semantics of the signatures on local trees is that the node promises
that it's local tree is append-only, and that all data covered by the
signed tree head is committed to local storage. I.e., the tree should
survive events like a local power outage. However, reliability is best
effort. If the node suffers a disk failure, or is decommissioned for any
other reason, the contents of the tree may be lost (except for parts of
it replicated elsewhere, as described below).
* Primary node
The primary node's responsibility is to accept new leaves from users,
commit into its local tree, and sign resulting local tree using its node
key. Periodically, it queries the signed tree heads of the secondary
nodes' trees, checks consistency, and publishes new versions of the
*log*'s signed tree head once data is replicated to all secondaries. (If
we have a larger number of secondaries, we could consider allowing the
primary to proceed even in the case that a single secondary is behind or
unreachable).
* Secondary nodes
Secondaries only accept new leaves from the primary. A secondary that is
new or for some reason is behind, will first get the log's signed tree
head, and retrieve all leaves it is missing. It must check inclusion and
consistency before committing the leaves to its local tree and
underlying storage. Next, it will periodically get the primary node's
local tree head (verifying the signature using the node key of the node
that is the current primary), and similarly incorporate after inclusion
and consistency checks pass. Periodically, or when asked by the primary,
it will sign the head of its local tree using its own node key.
So at all time we have this relation between tree sizes:
log's tree <= each secondary node tree <= primary node tree
Extensions: It may be useful to enable secondaries to also act as mirrors,
republishing the latest tree head it has received from the primary node,
together with available cosignatures. It may be possible to distribute
new leaves in more of a peer-to-peer fashion, instead of each secondary
retrieving them directly from the primary.
* Migration on primary failure
What needs to happen when a primary fails or is to be replaced? We need
the following steps:
0. If possible, the primary node's access to the log signing key should
be removed.
1. Each secondary must be configured that the primary is down. This must
likely be a manual procedure, with a human determining that the
primary should no longer be used. On each secondary, this means that
the node key of the old primary is removed from the configuration.
2. Once all the secondaries agree that there is no longer any primary
node, one of the secondaries can become new primary. If the local
trees of the secondaries are of different sizes, the one with the
largest tree should be selected as the new (interrim) primary, but
not yet with access to the log's signing key. (If, for some reason, a
different node is chosen, the nodes that are ahead of the chosen node
must be reset: Discard the extra leaves, destroy previous node key
and create a new one).
3. The secondaries that were not chosen as primary are now reconfigured
to use the chosen node (identified by node key, as usual) as primary,
and retrieve all leaves and commit them to their local trees.
4. After some time, nodes should all be in sync. If desired, the chosen
node can now be demoted back to secondary (after which all the other
secondaries will again be reconfigured that there is no primary), and
a new primary node can be selected.
5. Finally, the new primary should be given access to the log's
signing key and start normal operation (accept leaves from users,
advertise new tree heads, request cosignatures, etc).
If we are willing to have secondaries coordinate with eachother, part
of this process could potentially be automated. If all nodes are
connected to each other (with the exception of the failing primary,
which is explicitly and manually removed from the set of nodes in step
(1) above), we could maybe have a protocol that lets nodes first agree
that there is no primary, and then elect a new primary based on tree
size, and which nodes are configured as candidates for getting access to
the log's signing key.
Regards,
/Niels