Technical Details of Recent Firefox Extension Crash

Original author: Eric Rescorla
  • Transfer
About the author. Eric Rescorla - Technical Director of the Firefox group at Mozilla

Recently, an incident occurred in Firefox when most add-ons (extensions, add-ons) stopped working. This is due to an error on our part: we did not notice that one of the certificates, which is used to sign the add-ons, has expired, which has led to the disconnection of the vast majority of them. Now that we have fixed the problem and most of the add-ons have been restored, I would like to tell in detail what happened, why and how we fixed it.

For reference: extensions and their signature


Although many use Firefox as is out of the box, the browser also supports a powerful extension mechanism. They add third-party features to Firefox that extend the features we offer by default. There are currently over 15,000 Firefox add-ons: from ad blocking to managing hundreds of tabs .

Firefox requires all installed add-ons to be digitally signed . This requirement is intended to protect users from malicious extensions by requiring a minimum standard of verification by Mozilla employees. Before we introduced this requirement in 2015, we had serious problems with malicious extensions.

The signature works through the pre-installed "root certificate" of Firefox. It is stored offline in the hardware security module (HSM) . Every few years, it is used to sign a new “intermediate certificate,” which is stored online and is used in the signing process. When the extension is submitted for signature, we generate a new temporary “end-entity certificate” and sign it with an intermediate certificate. The destination certificate is then used to sign the extension. Visually, it looks like this:



Please note that each certificate has a “subject” (to which the certificate belongs) and a “publisher” (signatory). In the case of a root certificate, this is one and the same, but for other certificates, the publisher is the subject who signed it.

The important point here is that each add-on is signed with its own certificate of the final object, but almost all add-ons have the same intermediate certificate (several very old add-ons were signed by another intermediate link). This is where the problem arose: each certificate has a fixed expiration date. Before or after this window, the certificate will not be accepted, and the extension signed by this certificate cannot be uploaded to Firefox. Unfortunately, the intermediate certificate that we used expired on May 4 after 1:00 UTC, and immediately every add-on that is signed by this certificate became unverified and could not be uploaded to Firefox.

Although all add-ons expired at about one in the morning, the consequences were not immediately felt. The reason is that Firefox does not constantly check add-ons for validity. They are checked approximately every 24 hours, and the verification time is different for each user. As a result, some people experienced problems right away, some much later. We at Mozilla first learned about the problem around 6:00 p.m. PST on Friday May 3 and immediately put together a team to rectify the situation.

Damage limit


As soon as we understood what we were faced with, we took several steps to avoid the deterioration of the situation.

First, we turned off the signing of new additions. At that moment it was reasonable, because the signature put an invalid certificate. Looking back, it seems that it was possible to leave this function, but it turned out that it also conflicts with the softening of “hard date firmware”, which we will discuss below (although in the end we did not use it). Therefore, it is good that we have retained this option. So, the signing of new additions is now delayed.

Secondly, we immediately released a quick fix that suppresses re-verification of extension signatures. The idea was to protect users who have not yet been retested. We did this before we had any other fix, and now removed when the fix is ​​available.

Parallel work


Theoretically, the solution to this problem looks simple: make a new, valid certificate and reissue each addition with this certificate. Unfortunately, we quickly determined that this would not work for a number of reasons:

  1. There are a lot of extensions (over 15,000), and the service is not optimized for mass signing, so just re-signing each add-on will take more time than we wanted.
  2. After the add-ons are signed, users will need to get a new add-on. Some are hosted on Mozilla servers, and Firefox will update them within 24 hours, but users will have to manually update any add-ons that are installed from other sources, which is very inconvenient.

Instead, we focused on trying to develop a fix that would fix the situation with little or no manual intervention from users.

Having considered a number of approaches, we quickly agreed on two main strategies that we carried out in parallel:

  1. Firefox patch to change the date used to verify the certificate. In this case, existing add-ons will magically work again, but delivery of a new build of Firefox will be required.
  2. Generate a new valid certificate and somehow convince Firefox to accept it instead of the existing expired one.

We were not sure what exactly would work, so we decided to carry out the work in parallel and implement the first one, which would look like a working solution. At the end of the day, we completed the deployment of the second fix - a new certificate, which I will describe in more detail.

Replacement certificate


As mentioned above, there were two main steps to follow:

  1. Create a new valid certificate.
  2. Remotely install it in Firefox.

To understand why this works, you need to know a little more about how Firefox checks for add-ons. The add-on itself comes in the form of a package of files, which includes the certificate chain used to sign it. As a result, the addon is independently checked if the root certificate is known, which is configured in Firefox during build. However, as I said, the intermediate certificate was broken, so the add-on was not really verifiable.

But it turns out that when Firefox tries to verify the extension, it is not limited to using only certificates in the extension itself. Instead, it tries to create a valid certificate chain, starting with the endpoint certificate and continuing to the root directory. The algorithm is complex, but at a high level, you start with a certificate of the final object, and then find a certificate whose subject is equal to the publisher of the certificate of the final object (i.e., an intermediate certificate). In the simple case, this is only an intermediate link that comes with the add-in, but it can be any certificate that the browser knows about. If we can remotely add a new, valid certificate, Firefox will also try to build such a chain. The figure below shows the situation before and after installing a new certificate.



After installing a new certificate, Firefox has two options for checking the certificate chain: use an old invalid certificate (which will not work) or use a new valid certificate (which will work). An important feature here is that the new certificate has the same subject name and public key as the old certificate, so its signature on the certificate of the final object is valid. Fortunately, Firefox is smart enough to try both methods until it finds a working one, so the extension becomes valid again. Please note that this is the same logic that we use to verify TLS certificates, so this is a relatively well-understood code that we were able to use (readers familiar with WebPKI will understand that cross-certification works this way).

The great thing about this fix is ​​that it does not require a change to any existing extensions. When we install the new certificate in Firefox, even extensions with old certificates will pass the test. The trick to delivering a new certificate in Firefox is to do it automatically and remotely, and then get Firefox to double-check all the extensions that may have been disabled.

Normandy and research system


Ironically, the solution to the problem was a special type of extension called system add-on(SAO). For studies of the audience (Studies), we previously developed a system called Normandy, which can deliver SAO to Firefox users. These SAOs are automatically executed in the user's browser. Although they are commonly used for experimentation, they also have wide access to the internal APIs in Firefox. In this case, it’s important that they can add new certificates to the certificate database that Firefox uses to check extensions (technical note: we don’t add the certificate with any special privileges; it gets its privileges by signing with the root certificate. We just add it to the certificate pool that Firefox can use, so we don’t add a new privileged certificate in Firefox).

So, the solution here is to create an SAO that does two things:

  1. Installs the new certificate we made.
  2. Causes the browser to re-check each add-on to activate those that have disconnected.

But wait, you say. Add-ons do not work, so how to make SAO work? Well, we will sign it with a new certificate!

Putting it all together ... and why so long?


So now we have a plan: to issue a new certificate to replace the old one, build a system add-on to install it in Firefox, and deploy it to Normandy. We started work at about 6:00 p.m. PST on Friday May 3, and sent the fix to Normandy at about 2:44 a.m., i.e. less than 9 hours, and then it took another 6-12 hours before most users received it. This is actually a very good start, but I saw on Twitter a series of questions, why we could not do it faster. There are a number of steps that are time consuming.

Firstly, it took some time to issue a new intermediate certificate. As I mentioned above, the root certificate is located in the hardware security module, which is stored offline. This is a good security practice, since you very rarely use the root certificate, and therefore want to keep it safe. But obviously, this is somewhat inconvenient when you need to issue a new certificate in an emergency. In any case, one of our engineers had to go to a safe place where HSM is stored. Then there were several false starts, when we could not issue the correct certificate, and each attempt was worth an hour or two testing, before we knew exactly what to do.

Secondly, system development takes some time. Conceptually, everything is very simple, but even simple programs require some caution, and we really wanted to make sure that we did not worsen the situation. And before sending SAO, it was necessary to test it, and this takes time, especially considering that it needs to be signed. But the signature system was disabled, so we had to look for workarounds.

Finally, once the SAO was ready to ship, it took a while to deploy. Firefox clients check Normandy updates every 6 hours, and, of course, many clients are offline, so the update was not distributed instantly to all Firefox users. However, at the moment, most have received an update and / or a new release, which we released later.

Last steps


Although the system addon, deployed through the Studies system, should correct the situation for most users, it did not reach everyone. In particular, several types of users require a different approach:

  • Users who have disabled telemetry or research.
  • Users of Firefox for Android (Fennec), where we have no research.
  • Users of subsequent builds of Firefox ESR who do not subscribe to telemetry reports.
  • Users who are behind HTTPS MiTM proxies, because our add-on installation systems force keys for these connections, which conflicts with the proxy.
  • Users of very old builds of Firefox, to which the Studies system cannot reach.

We can’t do anything with the last group - they will have to upgrade to the new version of Firefox, because older versions usually have quite serious unpatched security vulnerabilities. We know that some people stayed on older versions of Firefox because they wanted to run old style extensions, but many of them now work with newer versions of Firefox. For other groups, we developed a patch for Firefox that will install a new certificate after the upgrade. It is also released as a new version of Firefox “dotted”, so people should get it - and probably already received it - through the regular update channel. If you have a downstream build, you need to wait for the update from the maintainer.

We acknowledge that none of this is perfect. In particular, in some cases, users lose data associated with add-ons (for example, an extension like “containers with multiple accounts” ).

We were not able to develop a patch that avoids this side effect, but we believe that in the short term this is the best approach for most users. In the long run, we will look for the best architectural approaches to solve such problems.

The lessons


Firstly, I want to say that the team did an amazing job here: they developed and sent the fix in less than 12 hours from the time of the initial report. As the person who attended the meeting where this happened, I can say that people worked incredibly hard in a difficult situation and that very little time was wasted.

Given this, it is obvious that this is not an ideal situation, and this should not have happened at all. We clearly need to adjust our processes to reduce the likelihood of this and similar incidents and to facilitate their correction.

Next week we will conduct a formal debriefing and publish a list of changes that we intend to make, but for now, here are my initial thoughts on this subject. Most importantly, we should have a much better way to monitor the status of all systems in Firefox that are a potential time bomb. You need to make sure that none of them suddenly stops working. We are still working on the details here, but at least we need to take an inventory of such systems.

Secondly, we need a mechanism to quickly update users, even when - especially when - everything else does not work. It’s great that we were able to use the Studies system, but it was also not the most perfect tool that we put into operation, and which had some undesirable side effects. In particular, we know that many users have automatic updates turned on, but they would prefer not to participate in research, and this is a reasonable preference (to say nothing, I myself set up the browser like that!), But at the same time we should be able to push updates. Whatever the internal technical mechanisms, users should be able to select updates (including fixes), but abandon everything else. In addition, the update channel should be faster. Even on Monday, we still had users who did not pick up the fix or the new release, which is clearly not perfect.

Finally, we will take a more general look at our extension security architecture to ensure that it provides security correctly with minimal risk of failure.

Next week we will publish the results of a more thorough analysis of this situation.

Also popular now: