Your overall device and network security is related to your PKI design and certificate policy. Just as you’d build your house with an earthquake-resistant foundation, or ensure your roof can withstand a hurricane, you should design and deploy your PKI to resist common threats.
Many of these decisions need to be made upfront, when you’re developing and designing your product or software. Implementing the right security measures in your PKI takes effort, but making the proper preparations will help you mitigate future security risks.
Consider this: If a certificate on your network was compromised, what risk would it pose to your security? Could that certificate be used to authenticate into a server? Could it be used in a man-in-the-middle attack against your users?
These are questions to review when developing an app or device that will use certificates for authentication or secure connections. You’ll need to make technical decisions around how your product will handle certificates, and how you’ll design and manage your PKI.
This blog post is for developers and manufacturers working with private-trust client or device certificates, such as those used in a software application or IoT device.
We often talk with developers who aren’t aware of the options they have for designing their PKI and certificate policies. With private-trust PKI, you have a lot of flexibility with your client and device certificates, allowing you to strengthen your software or device security.
Here, we’ll dive into three important considerations you should make to enhance your PKI: (1) deciding on certificate validity periods and replacement, (2) protecting private keys, and (3) using certificate revocation—and how you can properly use these measures to mitigate risk.
Certificates provide encryption and authentication, but it isn’t as simple as deploying them and calling it a day. Both of these properties can be compromised, but can also be strengthened with the proper mitigations.
It may seem easiest to create a substandard PKI and never worry about managing your certificates, but this involves security compromises you may not have considered.
Let’s use the recent SHA-1 deprecation as an example. The SHA-1 hashing algorithm, intended to provide cryptographic signatures to uniquely identify certificates, was known to be weak by researchers. Google demonstrated a practical collision—two different files with the same hash—last year.
This collision represented the final nail in SHA-1’s coffin and many certificates were replaced with the newer SHA-2 algorithm to maintain security. This included long-lived certificates, which would become more vulnerable as each year passes (computing power increases make it easier to exploit). A SHA-1 collision may seem unlikely today, but what about 5 years from now? 20 years? If your products will be in use far into the future, these are important considerations.
This simple example quickly reveals the multi-faceted nature of PKI security. To mitigate the security risks of a broken hashing algorithm, you would need a way to reissue and replace certificates on your devices, a revocation mechanism to deal with certificates you know were affected, and the confidence that your network and users are no longer vulnerable.
Compromised technologies are an unavoidable problem in the SSL/TLS space. The core cryptographic technologies that are used by the protocol are designed with a deprecation date in mind—we know that the future’s more powerful computers will eventually compromise the cryptography of the present.
In the last decade, we’ve seen major transitions, including the move from the MD5 and SHA-1 hashing algorithms, and 1024-bit RSA keys. There will come a day when we’ll need to replace 2048-bit keys. Planning for these types of changes will help avoid a lot of headaches.
When choosing a validity period for your certificates, you’re balancing the convenience of a long-lived certificate with the strain of protecting long-lived keys. Over time, protecting these keys becomes harder as cryptography standards weaken, are eventually superseded, and as your corpus of certificates grows. Eventually—as in our SHA-1 example—a broken algorithm may cause an urgent need to replace certificates for continued security.
It’s critical to consider not only the validity period of your certificates, but also the method for replacing them. In most cases, we’ve found that there are too many security trade-offs in attempting to use a single certificate for the lifetime of the device.
By choosing longer validity periods, you create a broader range of certificates (and their private keys) that need to be kept safe. This increases the number of targets for attackers, and incentivizes them to compromise a certificate, since it gives them access for longer periods. This, in turn, increases the importance of having a revocation system, and makes it necessary to retain revocation information for longer periods—resulting in larger revocation files and more network activity.
But designing a safe PKI doesn’t mean you must replace certificates yearly. You can effectively plan for these changes and still use long-lived certs. Replacing and renewing device certificates allows you to use the validity period that’s best for you without worrying that a reason to replace them will occur.
Many of the same security considerations for certificate validity exist for key compromise. If attackers can steal a private key, they can impersonate the device, decrypt and read data, and authenticate to a network.
If you want to provide meaningful authentication and encryption, you must protect keys from compromise, as well as revoke them and replace them if they become compromised. This means you’ll want to avoid storing keys on device in plain text, where they could easily be extracted. Instead, consider a software solution such as an encrypted key store, or hardware protection in the form of a secure chip (TPM), which provides meaningful protection from attackers.
Even if you believe you’ve sufficiently protected your keys, having a working revocation system is critical. If attackers realize you have no feasible way to stop them when they steal a key, then figuring out how to break your security measures becomes quite attractive. Revocation is an added layer of defense that neutralizes and deters attackers.
These defenses are closely related. If your keys are easy to compromise, then providing a reliable revocation system—one that can handle a high volume of revocations—becomes more important and more expensive.
Some manufacturers and developers believe they can’t support certificate revocation because it’s a ‘high-cost’ service that requires an active internet connection and high availability. This is not the case. Using industry standard technologies, you can check revocation information without connecting to a DigiCert server or without an internet connection entirely.
Two industry standard methods exist for checking revocation information: CRL (Certificate Revocation Lists) and OCSP (Online Certificate Status Protocol). For those unfamiliar with these systems, a CRL is like a blacklist of certificate serial numbers. With OCSP, the client makes a query over the internet to a central service to retrieve the revocation status of a single certificate—like querying an API. Both CRL and OCSP are parts of the X.509 certificate protocol.
CRL is the simpler of the two and allows flexibility in scenarios where your device may not have a reliable or fast internet connection. Traditionally, the issuing CA signs a CRL file daily, which is retrieved by the client over the internet. But in situations where the device can’t easily or regularly connect to the internet, the CRL can be stored and cached.
Like a certificate, CRLs are signed and have a validity period. Because CRLs are signed by the CA, they can be trusted. You don’t need to have a CRL delivered directly to the device from the CA. Instead, they can be distributed through a network, like a centralized cloud server or internal network. This is an advantage over a simple blacklist or whitelist. You can get a CRL file from anywhere without fear of tampering if its signature is valid.
A CRL can be cached on a device and used until it expires, which can be set for weeks or longer. This makes it a good option for devices with unreliable or intermittent internet connections. In many scenarios, this allows you to retain the benefits of revocation checking without the technical costs of frequently fetching fresh information.
OCSP can also be used when the devices themselves have no internet access, so long as they have access to a gateway or server which does. “Stapling” is an optional OCSP feature that allows the revocation information to be delivered with the TLS handshake, improving network performance. You can deploy both OCSP and CRLs, relying on the most recent CRL as a fallback.
The advantage of using one of these standard methods is that a commercial CA will already support them. As developed standards, they also support a wealth of options that make them adaptable for your specific needs.
The purpose of all these measures is to limit and mitigate risk. Well protected keys are not an attractive target for attackers—neither are keys they know can be quickly and easily revoked and replaced.
Your PKI design and certificate policy decisions are interconnected. Imagine a scenario where you have an extremely responsive revocation system, but private keys are stored on the device in plain text. Compromising these keys would be trivial and you would need to revoke your certificates the moment you reissue them. Conversely, if you have strong protections on your private keys but no efficient means of marking a compromised key as revoked, you will also end with an insecure system.
Building a strong PKI that considers the technical needs of your product creates a strong security foundation for your devices and network. Simply choosing the most permissive policies now may only breed difficult engineering problems later.