Hi Hayden! Thank you for reaching out with this.
The requirements you describe seems analogous to what I (with my Go project hat) need for the Go Checksum Database, and was indeed a motivation for moving to a synchronous witness API.
My plan is what Rasmus described, except with somewhat more optimistic expectations of the latency of witnesses.
- Incorporate a batch of new leaves, possibly holding the submission requests.
- Sign a new checkpoint.
- Send out parallel requests to all witnesses to cosign the checkpoint, over keep-alive connections.
- As soon as enough cosignatures are returned, publish the checkpoint and release the client requests.
- If a witness doesn't return by the time the next checkpoint is signed, ignore it for the next round(s).
- If a witness times out, assume it kept the previous state. If that's incorrect, it will send a 409 Conflict response with its correct state (this is a recent API change based on Trust Fabric feedback) and can be contacted successfully in the next round.
I believe all of that can happen in 1s. 500ms to batch leaves, 500ms for m-of-n witnesses to respond to a single HTTP request over an established connection. US west coast to EU RTT is <150ms, that leaves 350ms for the witness to do signature verify+sign and database read+write.