A little about the bugs in the BIOS / UEFI laptop Lenovo / Fujitsu / Toshiba / HP / Dell

    In this article, I will describe the bugs in the BIOS / UEFI of laptops that I had to work with and for which I had to adapt the bootloaders. First of all, we will talk about bugs that are not visible to the user, but which can interfere with the bootloader, even if everything was done correctly. Bugs were identified both in the interfaces of the respective runtime environments and in the Intel SMM mode code. The cited material is based on accumulated experience, which is extended over a sufficiently large period of time. Therefore, at the time of writing, the list of specific models was lost. Nevertheless, a list of manufacturing companies has survived, on the laptops of which there were problems. Bugs will be described sequentially, from simple to the most complex. Also in the course of the description will be given a way to bypass them.

    Before we get started


    In order to have a full understanding of the circumstances in which I had to deal with the problems described, I will briefly tell you what kind of work you have to do. There is a product encrypting the system drive. Therefore, at the stage of starting the PC, you need to decrypt the drives so that the OS can start. Therefore, a bootloader was developed that performs this role. After installing all of its interceptors, this bootloader transfers control to the original bootloader. Further in the description process, the term “loader” will be used to refer to our loader. And the term “OS loader” will be used to mean the loader that we are replacing.

    Bootloader Launch Issues (Lenovo, UEFI)


    UEFI is known to implement global variables. In particular, there are global variables, each of which describes the option to start the PC (load option entry). There is also a global variable BootOrder that describes the order in which these options are called. Thus, the bootloader was written to the UEFI system partition, and a new entry was created for it when this bootloader was placed first in the queue in BootOrder. However, when starting the PC, the Windows boot loader was called. It turned out that UEFI completely ignored the value of BootOrder and always loaded Windows if it found its entry.

    We managed to get around the problem by changing the boot loader of Windows itself. This, of course, adds work, as Now the spoofing file must be protected in the operating system itself.

    Problems sending USB commands (HP, UEFI)


    The bootloader works with USB devices. Namely, with CCID readers. To work with USB devices, the protocol provided for these purposes was used - EFI_USB_IO_PROTOCOL. The problem was that the bootloader that was started did not detect any USB devices, while on other PCs the same bootloader detected them. At first glance, it might seem that these are completely non-functioning USB drivers, but when working with a laptop, I could not ignore the fact that the laptop successfully started from a flash drive. Then it turned out that the problem occurs when sending commands through the control channel (control transfer pipe) using the UsbControlTransfer function of the EFI_USB_IO_PROTOCOL protocol. The function prototype is shown below.

    typedef EFI_STATUS (EFIAPI *EFI_USB_IO_CONTROL_TRANSFER) (
    	IN EFI_USB_IO_PROTOCOL*		This,
    	IN EFI_USB_DEVICE_REQUEST*		Request,
    	IN EFI_USB_DATA_DIRECTION		Direction,
    	IN UINT32				Timeout,
    	IN OUT VOID*				Data OPTIONAL,
    	IN UINTN				DataLength OPTIONAL,
    	OUT UINT32*				Status
    ); 

    The function always returned with the error EFI_USB_ERR_TIMEOUT. It turned out that the type EFI_USB_DATA_DIRECTION was not implemented by the developers in accordance with the UEFI specification. The definition of the type itself from the specification is given below.

    typedef enum {
    	EfiUsbDataIn,
    	EfiUsbDataOut,
    	EfiUsbNoData
    } EFI_USB_DATA_DIRECTION; 

    The error in the type implementation was that on the corresponding laptop, EfiUsbDataIn and EfiUsbDataOut were mixed up. Therefore, when the bootloader called the UsbControlTransfer function with the third parameter equal to EfiUsbDataOut, then in reality it was not writing to the device, but reading from it. And vice versa. Since EfiUsbDataOut is the first in the application code, it turned out that the USB driver was trying to read data from the device, which cannot be when sending requests. Accordingly, the function was timed out.

    The solution to the problem is extremely ugly. At startup, the bootloader checked whether the FirmwareRevision field of the EFI_SYSTEM_TABLE structure contains the string “HPQ”, and if it did, it was checked that the FirmwareRevision field contained the value 0x10000001. If both conditions were met, then when calling the corresponding functions, we intentionally changed the values ​​of EfiUsbDataIn and EfiUsbDataOut to the opposite.

    Problems receiving USB responses (Fujitsu LifeBook E743, UEFI)


    Outwardly, the problem manifested itself in the fact that not all CCID devices worked in the bootloader. Old families worked flawlessly, new no. It turned out that the problem occurs when the UsbBulkTransfer function of the EFI_USB_IO_PROTOCOL protocol is called. The function always returned an EFI_DEVICE_ERROR error.

    It is known that a USB host controller communicates with devices with fixed-length packets. Also, USB developers assume that the device can return a short packet. In this case, the host controller will return the status of the completion of the transfer is not “Success” and “Short Packet”. And the USB driver interpreted this answer as an error. Those. The UsbBulkTransfer function always returned EFI_DEVICE_ERROR if the device responded with a short packet.

    It so happened that the old CCID families always answered in long packets, while the new ones in short. We managed to get around the problem by analyzing the output buffer. The figure below shows the format of RDR_to_PC_DataBlock packages of CCID devices. The device returns this packet to commands such as PC_to_RDR_IccPowerOn, PC_to_RDR_Secure, and PC_to_RDR_XfrBlock.

    #pragma pack( push, 1 )
    struct RDR_to_PC_DataBlock {
    	UINT8		bMessageType;
    	UINT32		dwLength;
    	UINT8		bSlot;
    	UINT8		bSeq;
    	UINT8		bStatus;
    	UINT8		bError;
    	UINT8		bChainParameter;
    	UINT8*		abData[0];
    };
    #pragma pack( pop ) 

    The bMessageType field identifies the type of package, and for an RDR_to_PC_DataBlock package, it is always 0x80. Therefore, before receiving a response from the device in the buffer, this field was previously reset to zero. If the UsbBulkTransfer function returned an error, then the value of this field was checked, and if it was 0x80, then it was believed that the device actually answered correctly. In this case, the dwLength field was used to calculate the size of the response, and this size was already returned to the original requestor.

    Problems with the memory card (Toshiba Satellite U200, BIOS)


    Outwardly, the problem manifested itself in the fact that the bootloader refused to work, because could not find a piece of memory in which he could be accommodated. The analysis revealed problems while scanning the memory card. During this scan, part of the ranges was skipped and not analyzed.

    This is an int 15h interrupt service 0xe820. Because the loader left part of the code resident, it was necessary to allocate memory and place its code in this section. For its part, this required a modification of the memory card so that the operating system does not use the area allocated by us during its launch. Accordingly, during the start-up, the entire card was read, properly modified and replaced by intercepting int 15h.

    Below are the input and output parameters of the function to obtain a memory card.

    • Input parameters:

      • EAX - function code, always 0xe820;
      • EBX - continued, on the first call, the value should be equal to 0, on subsequent calls, the value should be equal to the value returned by the function after the call. This register indicates the function from which record to continue receiving the memory card;
      • ES: DI - pointer to the buffer where the entry describing the specific memory range is returned;
      • ECX - buffer size, must be at least 20, because the first revisions of this function returned records of size 20 bytes. On modern systems, the record size is 24 bytes;
      • EDX is a signature, always equal to 'SMAP'. Used to verify the interrogator.

    • Output Parameters:

      • CF - error, if 0, then there is no error;
      • EAX is a signature, always equal to 'SMAP'. Used to verify BIOS;
      • ES: DI - buffer pointer, the same as at the input;
      • ECX - the size of the record returned by the function;
      • EBX - the value that should be submitted to the input of the function to get the next record. Also, one should not make assumptions about the meaning itself, because it can be an offset, an index, or any other entity in the internal representation of the function itself.

    Using this function, the loader reads the entire memory card in a loop. And the bootloader was designed to provide direct compatibility with future BIOS versions. Those. at the input, the ECX register contained 64. As follows from the description of the function itself, the function in the ECX register will return the size of the record that was written to the buffer. Since at the moment the maximum record size is 24, it could not be more than this value in the register. Also, a function should always return exactly one record.

    However, on a particular laptop, it turned out that the function interprets the ECX value in a slightly different way. Those. it is used not to determine the size of the record that the requestor supports, but to determine how much the function can return at all in one call. It so happened that when the function was called, the bootloader read not two records, but one. And therefore, one of them has always been ignored by the bootloader. This led to the fact that the bootloader could not find a piece of memory in which it could place the resident code.

    The problem was solved by passing the value 24 to ECX. The idea of ​​direct compatibility had to be abandoned. There were thoughts on how to determine the size of the record, but, understanding the stability of different BIOS versions, there is a risk that the algorithm will also not work stably because of this.

    Problems stopping USB 3.0 and reinitializing PIC controllers (HP, BIOS)


    Visually, the problem looked like this: after the user successfully connected the smart card and entered the PIN, the screen went dark, a message was displayed stating that the OS was loading, and everything stopped precisely on this message. PC hangs tight.

    Since the BIOS loader is based on RTOS, the user shell itself works in protected processor mode, which, of course, required re-initialization of the classic PIC controller. Accordingly, when transferring control to the bootloader, the processor returned to real mode. And this in turn required the return of the PIC controller to its original state.

    A preliminary analysis revealed that the processor was returning to real mode, but then there was a hang of the PC. Further, it turned out that the problem only occurred if the bootloader initialized the USB host controllers. Before returning to real mode and before returning the PIC controller to its original state, the USB host controllers also stopped.

    The USB 3.0 host controller may have a USBLEGSUP register. This register allows you to transfer control of the controller from the BIOS to the OS and vice versa. First of all, it may be needed, for example, to emulate classic keyboard I / O ports in order to ensure compatibility with old software. Those. when these ports are accessed, an SMI interrupt will occur, and the processor of this interrupt will already do the rest. And on modern machines, only USB keyboards are increasingly being used. The register format is described below.

    • Capability ID (Bits 0-7) - identifier of functionality. For this register, the field is 1
    • Next Capability Pointer (Bits 8-15) - pointer to the next capability register
    • HC BIOS Owned Semaphore (Bit 16) - if installed, then the BIOS controls the host controller
    • Reserved (Bits 17-23)
    • HC OS Owned Semaphore (Bit 24) - before using the host controller, the operating system must set this bit, in response to this, the BIOS will reset bit 16, after which you can use the host controller
    • Reserved (Bits 25-31)

    RTOS, when the host controller stops, also resets bit 24 of the USBLEGSUP register. Thus, she returns control over it to the BIOS. Next, RTOS returns the PIC of the controller to its original state. It is also known that the PIC controller no longer exists in hardware, and it is also emulated using the SMM mode. Therefore, when the PIC controller was reset, an SMI interrupt occurred while working with its registers. The analysis revealed that since RTOS did not wait for bit 16 to be set in the USBLEGSUP register and since immediately after setting bit 24 of this register, the PIC of the controller returned to its initial state, the SMM mode code returned control over the host controller, and the PIC controller, which, in fact, generated SMI interrupt, not processed at all. Since PIC initialization is carried out in several steps, the controller has remained partially in an uninitialized state. Because of this, interrupt delivery broke. Immediately after the processor returned to its real mode, at the first interruption, the processor got up on an invalid vector, because of which it began to execute a meaningless stream of instructions.

    The problem was circumvented by waiting for the setting of bit 16 in the USBLEGSUP register before returning the PIC to its original state.

    PIC controller interrupt delivery problems (Dell Latitude E7240, BIOS)


    Outwardly, the problem looked like this: when the bootloader started and issued an invitation to connect a smart card, the bootloader hung up tightly. In this case, the problem arose exclusively when the PC was rebooted, when turned on, everything worked fine.

    A preliminary analysis revealed that the processor fell into a page fault. Subsequent investigation of the problem showed that RTOS uses separate stacks for each interrupt, whose size is very small (256 bytes). All of these stacks are adjacent, as shown in the figure below.


    It was also possible to find out that the page fault occurred on the memory page that followed immediately before the page with interrupt stacks. Therefore, subsequent analysis was carried out already at this level.

    RTOS during initialization of the USB host controller also includes the delivery of PIC interrupts from the line on which the controller is located. The call interrupt handler resolves all interrupts on the processor, after which it calls sequentially registered handlers for this line. After calling all registered handlers, the interrupt handler sends an interrupt termination command (EOI) to the PIC to the controller.

    It is known that the PIC controller has an ISR register. This register is used to determine which interrupts are currently being processed by the processor and which are not. And if the processor processes a specific interrupt, then even if a request is present on the corresponding line, it will not be delivered. Until the processor issues an EOI command to the PIC to the controller, after which the PIC resumes delivery of this interrupt.

    Subsequent analysis revealed that during the call of the registered PIC handlers, the controller delivered the interrupt again, even though the EOI command had not yet been sent to the PIC. Of course, this is a PIC controller emulation error. This led to the fact that at first the stack of the corresponding interrupt was overflowed, then the stack of other interrupts was damaged, and, ultimately, access was made to the unimaged memory page. And this led to a page fault, the handler of which stops the bootloader.

    The problem was circumvented by prohibiting the delivery of the corresponding interrupt on the PIC controller before calling registered handlers and resolving it after they were called.

    Conclusion


    The above list of bugs is far from complete. Only those cases that were recalled are described. Worst of all, a radical solution to the stability problem has not yet been invented. It was only possible to achieve stability only in certain moments. Anyway, there are instances with errors that an experienced developer will have to invent. And even worse, spend three days analyzing and fixing the problem. And some cases are far from easy. Three days to fix the problem is, of course, not so many, but when there are a dozen problems, it is already well knocked out of the work schedule.

    Understanding reality forced the Windows bootloader to reverse engineer in order to understand what mechanisms it uses. For me, this means that I can also use them safely. If you move away from these rules, then the work of the bootloader cannot be guaranteed.

    After a couple more problems with USB in UEFI, I came to the point that I placed my host controller drivers in the bootloader. To do this, you have to stop those drivers that work in UEFI itself, and load your own. I have never been impressed with adding so-called “crutches”. In addition, such a code will become difficult to develop over time due to piling up.

    As for its drivers, this makes a lot of sense, because There is a FastBoot mode that does not guarantee the loading of USB drivers. This is not a bug, but a stone in the direction of UEFI itself as a standard that does not provide a mechanism for loading unloaded drivers.

    To conclude the description of the problems, I would like to note the following: it seems that the current BIOS / UEFI is developed in isolation from a full understanding of the principles of operation of these systems, or testing is not carried out properly. From experience, both have occurred. It is enough to run Windows and Linux on the produced PC. Everything else is production costs. And I don’t need to tell who the client will blame.

    Based on their experience, BIOS and UEFI are the most unstable runtimes. In particular, the EFI MacBook is a special exception, and working with it is the hardest. But that's another story.

    Also popular now: