Thanks Rasmus! I've cc'd the list and added Bob who's interested in this
topic too.
What submit latency are you willing to accept? I'm asking because
> depending on if you need ~1s or ~10s will influence the options.
>
I'd like to keep this latency as low as possible. It would be a breaking
change across the ecosystem if we upped latency to ~10s, as I'm assuming
clients have not configured their timeouts to expect this high of a
latency. That's not to say we couldn't make this change, …
[View More]as we could
provide a different API, I'd just like to explore a low latency initially.
I.e., the log can keep track of a witness' latest state X, then provide
> to the witness a new checkpoint Y and a consistency proof that is valid
> from X -> Y. If all goes well, the witness returns its cosignature. If
> they are out of sync, the log needs to try again with the right state.
Assuming that all witnesses are responsive and maintain the same state,
this could work. Keeping track of N different witnesses is doable, but I
think it's likely they would get out of sync, e.g. a request to cosign a
checkpoint times out but the witness still verifies and persists the
checkpoint.
This isn't a blocker though, it's just an extra call if needed.
The current plan for Sigsum is to accept up to T seconds of logging
> latency, where T is in the order of 5-10s. Every T seconds the log
> selects the current checkpoint, then it collects as many cosignatures as
> possible before making the result available and starting all over again.
This seems like the most sensible approach assuming that latency can be
accepted by the ecosystem. Batching entries is something we've discussed
before, there's other performance benefits besides witnessing.
> An alternative implementation of the same witness protocol would be as
> follows: always be in the process of creating the next witnessed
> checkpoint. I.e., as soon as one finalized a witnessed checkpoint,
> start all over again because the log's tree already moved forward. To
> keep the latency down, only collect the minimum number of cosignatures
needed to satisfy all trust policies that the log's users depend on.
This makes sense, though I think adding some latency as suggested above
makes this more straightforward. One detail, which may not be relevant
depending on your order of operations, is that we just need to confirm that
the inclusion proof returned will be based on the cosigned checkpoint.
Currently our workflow is first requesting an inclusion proof for the
latest tree head, then signing the tree head.
On Fri, Feb 2, 2024 at 3:37 AM Rasmus Dahlberg <rgdd(a)glasklarteknik.se>
wrote:
> Hi Hayden,
>
> Exciting that you're exploring this are, answers inline!
>
> On Thu, Feb 01, 2024 at 01:05:48PM -0800, Hayden Blauzvern wrote:
> > Hey y'all! I was reading up on Sigsum docs and witnessing and had a
> > question about if or how you're handling logs with significant traffic.
> >
> > Context is I've been looking at improving our witnessing story with
> > Sigstore and exploring the viability of the bastion-based witnessing
> > approach. Currently, the Sigstore log does no batching of entry uploads,
> > and so the tree head/checkpoint is frequently updated. Consequently this
> > means that two witnesses are very unlikely to witness the same
> checkpoint.
> > To solve this, we added a 'stable' checkpoint, one that is published
> every
> > X minutes (5 currently). Witnesses are expected to compute consistency
> > proofs off that checkpoint so that multiple witnesses verify the same
> > checkpoint.
>
> Sounds similar the initial witness protocol we used: the log makes
> available a checkpoint for some time, and witnesses poll to cosign it.
>
> We moved away from this communication pattern to solve two problems:
>
> 1. High submit latency, which is the issue you're experiencing.
> 2. Ensure logs without publicly reachable endpoints are not excluded.
>
> While reworking this, we also tried to keep as many of the properties we
> liked with the old protocol. For example, the bastion host stems from
> the nice property that witnesses can be pretty locked down behind a NAT.
>
> >
> > I've been exploring the bastion-based approach where for each entry or
> tree
> > head update, the log requests cosignatures from a set of witnesses. What
> > I'm pondering now is how to deal with a log that frequently updates its
> > tree head due to frequent new entries.
> > One solution is to batch entries for a long enough period, let's say 1
> > minute, so that the log can fetch cosignatures from a quorum of witnesses
> > while accounting for some latency. But this is not our preferred user
> > experience, to have signers wait that long.
> > Lowering the batch to 1 second would solve the UX issue.
>
> What submit latency are you willing to accept? I'm asking because
> depending on if you need ~1s or ~10s will influence the options.
>
> > However now
> > there's an issue for updating a witness's checkpoint. Using the API
> Filippo
> > has documented for the witness, the log makes two requests to the
> witness:
> > One for the latest witness checkpoint, one to provide the log's new
> > checkpoint.
>
> The current witness protocol allows the log to collect a cosignature
> from a witness in a single API call, see the add-tree-head endpoint:
>
>
> https://git.glasklar.is/sigsum/project/documentation/-/blob/d8de0eeebbb5bb0…
>
> (Warning: the above API document is being reworked and moved to C2SP.
> The new revision will revolve around checkpoint names and encodings.
> You'll find links to all the decided proposals on www.sigsum.org/docs.)
>
> I.e., the log can keep track of a witness' latest state X, then provide
> to the witness a new checkpoint Y and a consistency proof that is valid
> from X -> Y. If all goes well, the witness returns its cosignature. If
> they are out of sync, the log needs to try again with the right state.
>
> > This seemingly would not work with a high-volume log since the
> > witness's latest checkpoint would update too frequently.
> >
> > Did you have any thoughts on how to handle this?
>
> The current plan for Sigsum is to accept up to T seconds of logging
> latency, where T is in the order of 5-10s. Every T seconds the log
> selects the current checkpoint, then it collects as many cosignatures as
> possible before making the result available and starting all over again.
>
> The rationale is: a witness that is online will be able to respond in
> 5-10s, so waiting longer than that will not really do much. I.e., the
> witness is either online and responding or it isn't. So: under normal
> circumstances one would expect cosignatures from all reliable witnesses.
>
> An alternative implementation of the same witness protocol would be as
> follows: always be in the process of creating the next witnessed
> checkpoint. I.e., as soon as one finalized a witnessed checkpoint,
> start all over again because the log's tree already moved forward. To
> keep the latency down, only collect the minimum number of cosignatures
> needed to satisfy all trust policies that the log's users depend on.
>
> For example, if you're opinionated and say users should rely on 10
> selected witnesses with a 3-of-10 policy; the log server can publish the
> next checkpoint as soon as it received cosignatures from 3 witnesses.
>
> Both approaches work, but depending on which one you choose the
> properties and complexity will be slightly different. Avoiding to hash
> out that analysis here in order to keep this initial answer brief, but
> if you need the ~1s latency the second option should get you close.
>
> By the way, would it be OK to @CC the sigsum-general list? Pretty sure
> this is a conversation other folks would be interested in as well!
>
> -Rasmus
>
[View Less]