Last updated on: October 21, 2021
Last year, almost exactly to the day, I declared HPKP effectively dead. I believed then—and I still do—that HPKP is too complex and too dangerous to be worth the effort. The biggest problem lies in the fact that there is no sufficient margin of safety; pinning failures are always catastrophic. That’s always bothered me and I wondered if it was possible to somehow fix HPKP without starting from scratch. That’s what this blog post is about.
If you haven’t already read my last year’s blog post, I suggest that you do so now as it will make the discussion easier to follow. I’ll wait for you patiently until you come back.
Today I am exploring the possibility of fixing HPKP with an introduction of pin revocation, which would be used in case of emergency. Please note that, even though I’ll be trying to save HPKP from a technical perspective, I am not necessarily declaring that HPKP is worth saving. The landscape of PKI had changed and today we have Certificate Transparency (CT), which addresses one set of problems that HPKP was supposed to solve, and also Certification Authority Authorization (CAA), which addresses another set of problems. One could argue that, between CT and CAA, there is perhaps not enough left for HPKP to do, given its complexities. I’ll leave that discussion for some other time. For now, let’s attempt the challenge of making HPKP more palatable.
Assumptions and Constraints
The working assumption is that any technology that is not able to recover from operator mistakes is bound to fail. In the case of HPKP, a pinning failure could conceivably mean complete business failure, and so why would anyone want to use a fragile technology like that? Perhaps it’s still possible to save HPKP, if we are able to somehow extend it with a safety mechanism that eliminates or significantly reduces the business risk.
However, I don’t think that’s the only thing we need to do. Pin revocation doesn’t come for free. Depending on the design, it could further increase the complexity of HPKP. For example, one design approach could be to introduce a special revocation key, but that would only mean that you had one to lose.
Then there is also the problem of convincing browser vendors that HPKP is worth fixing and that they shouldn’t just throw it out. The vendors are not going to like any solutions that add a lot of new code without clear benefits.
Finally, any proposal that requires any sort of modification server-side, or changes to the TLS protocol is an instant non-starter. There’s very unlikely to be a critical mass of people interested in fixing HPKP to invest a sufficient amount of time to make this approach work.
Can we find an approach that will satisfy these criteria?
Using OCSP for Pin Revocation
Perhaps we can, and we don’t have to look very far. In PKI we already have OCSP, a revocation checking mechanism that’s designed to check certificate status in real time. Even though OCSP is effectively not used any more for its intended purpose, perhaps we can piggy back on it for the purpose of HPKP pin revocation? My proposal is straightforward: rather than treat pins as separate identities, we instead link the lifetime of each pin to the lifetime of the leaf certificate that was used to access the web site. Let’s call this public key pinning variant HPKPbis.
Under normal circumstances, HPKPbis looks and feels the same as HPKP. If your operational procedures don’t fail everything will be exactly the same. Pin revocation kicks in only when it’s needed to fix a huge mess, your web site becoming unavailable for an extended period of time. To recover, you just need to revoke your certificates. That, in turn, revokes your pins and your users can proceed, in most cases unaware that there had been a problem.
HPKPbis Implementation Details
In this section I’ll try to specify in some more detail how HPKPbis would work exactly. As with everything else in PKI, the devil is in the details. To understand this section you’ll need to be familiar with HPKP as specified in RFC 7469.
The differences from the RFC are as follows:
- Pins are activated when they are observed for the first time on an error-free connection. User agents must store the observed pins as well as enough information to validate the entire certificate chain in the future. An additional requirement for pin activation is that the leaf certificate includes OCSP revocation information.
- Once pinning is enabled for a hostname, on subsequent visits at least one fully- or partially- activated pin must match the presented certificate chain.
- There are two types of pin activation. Full activation occurs is a matching certificate (public key) is observed on an error-free connection. Partial activation occurs when a pin is observed (i.e., it’s sent in a HTTP response header), but it’s not yet seen in a certificate. Both types of pins are valid, but they have different security properties.
- A partially-activated pin is upgraded to full activation if it’s matched to a certificate received on an error-free connection. When the upgrade takes place, the link with an earlier certificate chain is removed.
- Pin lifetime is inseparably linked to the lifetime of the certificate chain that was presented at the time of activation. Because we pin public keys and not certificates, it’s possible that a pin is linked to more than one certificate chain. A pin is considered expired if all its linked chains had expired. Similarly, a pin is considered revoked if all its linked chains had been revoked. As with standard HPKP, the maximum lifetime of a pin is limited by the advertised policy duration.
- During normal operation, pins are not checked for revocation. However, if an attempt is made to access a pinned site and the returned certificate chain doesn’t match, the user agent attempts to break the known pins. Looping through the stored pinning information; whenever a certificate chain is found to be expired or revoked, all the corresponding pins (both fully- and partially- activated) are considered to be invalid. At the end of this process, if no valid pins remain, the user agent proceeds to the web site.
- Leaf certificates are checked for revocation via OCSP. During this process user agents must not be tolerant to OCSP responder failures. If they are unable to obtain revocation information for a given certificate, user agents must assume that the certificate is still valid.
- Intermediate certificates can be checked via OCSP if that information is available. Alternatively, we can assume that the intermediates can be revoked using existing vendor-proprietary mechanisms.
- Partially-activated pins are used to allow for deployment of backup certificate chains, for example after a loss of control of the primary key material. Advanced operators may choose to upgrade partially-activate pins to full status, for example by rotating their certificate chains on regular basis.
Let’s consider how HPKPbis would be used in practice:
- As with HPKP, sites should pin with two or more public keys in certificate chains from different CAs. This approach avoids a central point of failure.
- As with HPKP, An operator who loses control of one pin (certificate) can fall back to using one of the backup certificates. The remaining fully- or partially- activated pins can be used to keep the site online.
- An operator who loses control of all pins can recover by revoking all certificates associated with them. In the worst case, if no record of the certificates exist, they can be recovered from the Certificate Transparency logs.
- If pinning is activated unintentionally or maliciously, the site operator can recover by tracking down all the pinned certificates in the CT logs and revoking them.
- A MITM with a valid but fraudulent certificate is not able to impersonate a pinned site because their certificate chain won’t match what had been pinned. In this situation user agents, when presented with a non-matching chain, use OCSP to verify the pin status. They fail closed, if they receive “still valid” responses or if they fail to reach the OCSP responders.
Comparison to HPKP/RFC 7469
How does the proposal for HPKP with revocation compare to the current version of HPKP as defined in RFC 7469?
- It’s possible to recover from pinning failures stemming from configuration mistakes and malicious pinning alike.
- The HPKP workflow is unchanged in absence of pinning failure.
- There is no performance impact in absence of pinning failure.
- Reliability and performance of pin revocation depend on the quality of the chosen CAs’ OCSP infrastructures.
- Your selected CAs have the ability to revoke pins, either by mistake or because of a successful attack by an adversary.
The advantage of HPKPbis are all nice and clear, but let’s look at the disadvantages in more detail. There are two questions that arise, first about the suitability of the OCSP infrastructure for the task and second about the risk of unauthorised pin revocation.
OCSP Infrastructure Reliability
Traditionally, OCSP has the reputation for being flaky. We’ve seen great improvements recently with CAs transitioning to CDNs for OCSP response delivery, but lots of people are still nervous about it. The problem is this: when you do need pin revocation to work flawlessly, will your CA’s OCSP responders be available and will they take the load?
The good news is that site operators actively choose which CAs they work with, which means that the risk can largely be mitigated by choosing CAs with proven record in maintaining high-availability revocation infrastructure.
Alternatively—as Richard Barnes pointed out to me—it might be possible to repurpose OCSP stapling for the transport of pin revocation information. There is presently a (little-used) TLS extension that supports transport of multiple stapled OCSP responses. In one deployment scenario, one OCSP response could be used to prove freshness of the current certificate, with additional stapled OCSP responses used to revoke all the earlier pins.
Technically, this is doable. The advantage of using OCSP stapling to revoke HPKP pins is that pin revocation information can be reliably delivered to end users. On the other hand, unlike with on-demand OCSP (which is invoked by user agents only when they have pins to break), stapled OCSP responses for pin revocation would need to be delivered to all users all the time, until the earlier pinning configuration expires (usually one or two weeks, but up to two months). This would increase the size and thus slow down every full TLS handshake.
Malicious Revocation Attack Vector
Of more interest is the potential new attack vector in the form of unauthorised certificate revocation. There is no doubt that HPKPbis is technically weaker in this respect. There are two scenarios:
- A malicious CA could revoke your certificate to break the HPKP pins. An outright revocation of an active certificate is likely to break things, making it highly noticeable. As an improvement, a malicious and resourceful CA could selectively respond with “revoked” certificate status only to those who are under attack. (Revocation transparency may help here in the long run, if it’s implemented.)
- A CA could also be convinced to revoke your certificate using a social engineering attack. This attack is less interesting, because there is no stealth; it’s likely to be very easy to detect.
Outright unauthorised revocation is a clumsy attack that can technically break HPKP but at great cost. Even in absence of significant breakage (unlikely), the attack can be easily detected with continuous monitoring of certificate status.
Stealthy revocations are of more interest, because they don’t break anything and can’t be detected. They would be potentially very expensive because they can be carried out only in cooperation with the CA who issued the certificates.
Both types of attacks can be largely mitigated by using multiple CAs (possibly in different legal jurisdictions) and fully-activated pins. Additionally, most organisations don’t need to defend against this type of attack when it’s carried out by government agencies in their own countries. Thus, a simpler and effective mitigation approach could be a strong relationship with a competent CA. Organisations are already used to dealing with this type of risk, for example to protect their domain names.
HPKPbis extends HPKP with addition of a pin revocation mechanism. The resulting public pinning approach has weaker security properties, but compensates for that by substantially improved usability and substantially reduced damage potential.
- HPKPbis represents a significant improvement for those who are planning to deploy public key pinning or have it already deployed. They would appreciate the addition of the pin revocation mechanism to help them recover in emergency.
- On the other side of the spectrum we have those who don’t want to use pinning, but are perhaps worried that it can be used against them maliciously, for example after DNS hijacking. This group will be happier knowing there is a recovery mechanism.
- Browser vendors—if they don’t ditch HPKP altogether—may like the idea of a less-dangerous pinning mechanism. We haven’t yet had cases of malicious pinning, but they are likely to take at least some blame for any downtime incurred in this way. They might also get a share of blame in high-profile self-inflicted pinning failures.
- Those who are currently not happy with HPKP are unlikely to embrace HPKPbis. For them it will probably be the case of too little, too late.
In the end, the fate of HPKP is in the hands of the browser vendors. Removing HPKP support is far easier than investing more effort into it, especially if their minds are already made up.
Thanks to Hanno Böck, Richard Barnes, and Ryan Sleevi for reading the early versions of this blog post, and for their useful feedback.