Let me start with a description of my current understanding of the checksum field, present in the tree_leaf struct in the spec.
The submitter submits a message M to the log (M typically a hash of some data not disclosed to the log), together with a public key and signature.
The log first verifies the signature, and then adds a leaf to the merkle tree. The signatures are done using ssh format, configured to use sha256. This implies that sign and verify operations on which includes computing SHA256(M), and we call this "checksum" and include it in the tree_leaf struct together with the signature.
My first question: Does it matter in any way that the checksum happens to be a value used internally in the ssh signature formatting?
If we instead publish a signature of M created using the SHA512 hash internally, as is the ssh-keygen -Y sign default, and publish this signature together with checksum = SHA256(M), wouldn't that work just as well?
Next question: Do we really need to publish the checksum at all? It serves as a unique and random-looking identifier for the message M, but who's using this id? We have the following roles:
1. Submitter. Will collect signatures on the submitted leaf. Obviously knows everything needed to query for the leaf hash, and will crete the "sigsum proof package" package to distribute to sigsum verifiers.
2. Sigsum verifier (the party that gets the message M and wants to verify that it is properly logged). The verifier needs to get (by other means than querying the log by itself) all of
M itself signature of M inclusion proof for the leaf including this signature witness signatures all related public keys
As far as I see, signer will clearly recompute the checksum, in the internals of the signature verification. It could also explicitly compare the checksum it to the value stored in the leaf, but what benefit does that give, if the signature is already verified? On the other hand, it seems essential to verify that the signature and the public key hash in the leaf are as expected.
I think this is the core of the question: Is there any reason for the verifier to validate the checksum stored in the leaf, in addition to verifying the signature? If not, what use is that field?
3. Witness. The witness doesn't have access to M, so to a witness the checksum is just random string, there's no way to validate it. It could possibly use it to verify the signature (not by using ssh-keygen though, but by digging into internals if ssh signatures), except that the witness is not expected to have access to the submitter's public key.
4. Monitors. The purpose of a monitor is to query the log and alert whenever an unexpected signature appears. My understanding of monitoring is somewhat fuzzy, but I think the monitor is expected to query for recent tree heads (relying on witness cosignatures to know that what it gets is recent), download all (new) leaves in the tree, and filter on one or more public key hashes of interest. For the leaves found, it will then alert key owner on "unexpected" checksums. However, couldn't one do without the checksums and just as well look at unexpected signatures?
The checksum uniquely (except for hash collisions) identifies a single message M. But the signature itself also uniquely identifies a single message M (it seems highly unlikely to have collisions, even if we allow the public key to vary, and in case we insist on having the same public key, any collision represents a break of the security of the signature algorithm).
The difference is that the checksum can be (re)computed from M only, while computing the signature also requires the private key. That's sonds like a big dfference, but the only roles above that are expected to know M, are the submitter and the verifier. The submitter by definition knows the private key. And the verifier should be provided with the signature by other means, and just verify it.
Regards, /Niels
Hi Niels,
Top-posting because I think all questions are quite related. Ultimately, what we need is for each signature that appears in a leaf to be verifiable without any further information. Otherwise, a monitor cannot distinguish between a leaf that was fabricated by the log and an actual signature operation by someone.
1. Removing checksum: this leaves a monitor with a 64-byte Ed25519 signature. In other words, the bytes that were signed would be missing completely. 2. Permitting SHA-512 while storing SHA-256(M) as checksum: this leaves a monitor with a 64-byte Ed25519 signature and a 32-byte checksum that is unrelated to signature verification.
Let me know if this context did not answer all of your questions, in which case I will take a stab at them more explicitly in a follow-up email.
-Rasmus
rgdd--- via Sigsum-general sigsum-general@lists.sigsum.org writes:
Top-posting because I think all questions are quite related. Ultimately, what we need is for each signature that appears in a leaf to be verifiable without any further information. Otherwise, a monitor cannot distinguish between a leaf that was fabricated by the log and an actual signature operation by someone.
Thanks, I was missing that usecase for verifying a leaf signature based only on the leaf and the public key. So a monitor is expected to be configured with the actual public key to look for, rather than just it's key hash.
- Removing checksum: this leaves a monitor with a 64-byte Ed25519 signature. In other words, the bytes that were signed would be missing completely.
- Permitting SHA-512 while storing SHA-256(M) as checksum: this leaves a monitor with a 64-byte Ed25519 signature and a 32-byte checksum that is unrelated to signature verification.
I see one problem with this, though. The monitor can't simply use the ssh-keygen command to verify the signature, since that command expects to get the *message* as input, not the hash thereof. Which kind-of defeats the idea of piggybacking on ssh tools.
To make it possible to verify hte signature based on only public key and leaf, and sticking to black-box usage of ssh-style signatures, I think we need one more level of hashing.
message ; submitted to the log
checksum = H(message) ; to be published by log
signature = ssh-style signature on checksum (i.e., M = checksum in the signature format spec).
The ssh-style signature will internally compute H(H(message)) when formatting the data passed to the ed25519 signature primitives.
And in the typical case that message = H(data) of some data not revealed to the log, we will end up with H^3(data). Which certainly looks like overdoing it, but the nice thing is that each level of hashing is owned by its own layer, and they're not interacting. It will, e.g,, work perfectly fine with
message=SHA3(data) ; application layer checksum = SHA256(message) ; sigsum layer hash = SHA512(checksum) ; ssh signature layer
And in this model, the only purpose of the sigsum layer hash, as I understand it, is to avoid log poisoning. And the hashing in the ssh layer serves no purpose for us, but it's the way that signature operation is defined (likely because it makes it easier to sign large files, something that we don't need).
Regards, /Niels
On Thu, Sep 29, 2022 at 02:46:43PM +0200, Niels Möller wrote:
rgdd--- via Sigsum-general sigsum-general@lists.sigsum.org writes:
Top-posting because I think all questions are quite related. Ultimately, what we need is for each signature that appears in a leaf to be verifiable without any further information. Otherwise, a monitor cannot distinguish between a leaf that was fabricated by the log and an actual signature operation by someone.
Thanks, I was missing that usecase for verifying a leaf signature based only on the leaf and the public key. So a monitor is expected to be configured with the actual public key to look for, rather than just it's key hash.
Yep!
- Removing checksum: this leaves a monitor with a 64-byte Ed25519 signature. In other words, the bytes that were signed would be missing completely.
- Permitting SHA-512 while storing SHA-256(M) as checksum: this leaves a monitor with a 64-byte Ed25519 signature and a 32-byte checksum that is unrelated to signature verification.
I see one problem with this, though. The monitor can't simply use the ssh-keygen command to verify the signature, since that command expects to get the *message* as input, not the hash thereof. Which kind-of defeats the idea of piggybacking on ssh tools.
I disagree. The value of piggy-backing on SSH tooling is for the signer who can access their private key with good solutions that already exist.
Note that the verifier will never get sufficient amounts of verification by only using ssh-keygen. For example, ssh-keygen does not "speak" transparency log proofs, transparency log policies, etc. Sigsum needs to provide such tools and libraries, and verifiers need to rely on them.
To make it possible to verify hte signature based on only public key and leaf, and sticking to black-box usage of ssh-style signatures, I think we need one more level of hashing.
message ; submitted to the log
checksum = H(message) ; to be published by log
signature = ssh-style signature on checksum (i.e., M = checksum in the signature format spec).
The ssh-style signature will internally compute H(H(message)) when formatting the data passed to the ed25519 signature primitives.
And in the typical case that message = H(data) of some data not revealed to the log, we will end up with H^3(data). Which certainly looks like overdoing it, but the nice thing is that each level of hashing is owned by its own layer, and they're not interacting. It will, e.g,, work perfectly fine with
message=SHA3(data) ; application layer
Note that message must be exactly 32 bytes in Sigsum. So, you wouldn't be able to use SHA3 here (and you probably shouldn't; that means you rely on two hash functions to be collision resistant instead of one).
checksum = SHA256(message) ; sigsum layer hash = SHA512(checksum) ; ssh signature layer
And in this model, the only purpose of the sigsum layer hash, as I understand it, is to avoid log poisoning.
Yes, the sigsum layer hash is to avoid poisning; and the primary purpose of an application layer hash is to limit what a log learns about application messages. (Small messages are of course also nice though.)
Note that we wouldn't need a separate "sigsum-layer hash" if we were OK accepting arbitrary-sized application messages (which we are not).
And the hashing in the ssh layer serves no purpose for us, but it's the way that signature operation is defined (likely because it makes it easier to sign large files, something that we don't need).
I see your point if it is a desired property to verify leaf signatures in isolation with ssh-keygen. Would you say that the complexity is decreased, about the same, or increased if this change was proposed?
-Rasmus
sigsum-general@lists.sigsum.org