Linux, deferred driver loading and broken interrupts

Today I will talk about unexpected problems that occurred when connecting the matrix keyboard to the Linux ARM-board in the Bercut-ETN device (ETN is a new hardware revision of the Bercut-ET ). And specifically about why the adp5589 driver did not want to receive interrupts and how we were able to get him to do this.

Who cares - welcome to cat.



Table of Contents:

  • Description of the iron part
  • Where is our problem?
  • A few words about Device Tree
  • A little bit about registering devices and drivers
  • Lazy driver loading mechanism
  • How to make everything work

Description of iron around the keyboard:

The keyboard itself does not have a controller - it is connected via the I2C bus using a special matrix keyboard controller - adp5589 chip. The chip has an interrupt line wound up on one of the ARM SoC GPIO pins. As a result, the connection diagram looks something like this:



portb is the port whose pin is interrupted by the keyboard controller;
intc - main interrupt controller;
i2c0 - i2c bus controller.

The adp5589 driver for some reason stubbornly does not want to receive an interrupt number. What could be the reason for this behavior? Perhaps some resources are missing to load the keyboard driver. Maybe the devices on which it depends did not have time to boot? Let's see what devices it can depend on:

Firstly, from the I2C bus controller to which it is connected.
Secondly - from the port controller, on the pin of which we have an interrupt line.

Now let's see in which order the drivers for these devices are loaded:

gic
designware-i2c
adp5589
dw-apb-gpio-port

Yeah! That's the reason - when the keyboard driver loads, its interrupt-parent is not loaded yet. As a result, the keyboard driver does not receive an interrupt number. A standard solution to this problem is a delayed driver loading mechanism.

Its essence is that the driver may require reloading if any resource it needs is not yet available. And he can demand it by returning the value -EPROBE_DEFER from his probe function . Then this driver will be reloaded later. By then, either the desired resource will already be available, or the driver loading will again be delayed.

Add a test to the probe driver function of the keyboard driver:

if (!client->irq) {
    dev_err(&client->dev, "no IRQ boss?\n");
    return  -EPROBE_DEFER;
}

Hoping to look at the new boot order:
gic
adp5589
designware-i2c
dw-apb-gpio-port
(deferred) adp5589
(deferred) adp5589
(deferred) adp5589

Something went wrong - the keyboard driver reloaded after the GPIO driver, but the interrupt was wrong and did not receive. It seems like you have to dig deeper into the source code than expected.

Three possible solutions suggest itself:

  • Hardcode the interrupt number directly into the driver
  • In some way, set the boot order of drivers
  • Deal with the mechanism of deferred driver loading, which for some reason did not work

The first option:

The option is working, but not desirable. Suitable as a temporary one, but if you change something in the hardware (for example, connect the interrupt output to another GPIO port), you will have to make changes not only in the Device Tree, but also in the driver source code.

The second option:

Explicitly set the loading order of drivers is not possible. So this option is not suitable.

Third option:

The most correct. We will consider it.

Here, perhaps, it’s worth briefly telling about such a thing as Device Tree, as there will be references to it later on.

Device Tree is one form of describing the hardware of the device on which we want to use Linux. It is presented in the form of a tree of nodes in which the necessary information is set. DT exists in the form of human-readable text files ( .dts ; .dtsi ) and a binary file ( .dtb ) assembled from them .

For example, consider a piece of a .dts file that describes the connection structure of our keyboard controller to other SoCa devices.

i2c0: i2c@ffc04000 {
     compatible = "snps,designware-i2c";     
     keybs@34 { 
         compatible = "adi,adp5589";
         interrupts = <19 IRQ_TYPE_LEVEL_LOW>; 
         interrupt-parent = <&portb>; 
    }; 
};
intc: intc@fffed000 {
     compatible = "arm,cortex-a9-gic";
     #interrupt-cells = <3>;
     interrupt-controller;
};
portb: gpio-controller@0 {
     compatible = "snps,dw-apb-gpio-port";
     interrupt-controller;
     #interrupt-cells = <2>;
     interrupts = <0 165 4>;
     interrupt-parent = <&intc>;
};

(The nodes and properties that are not of interest to us now are cut out for easier understanding)

i2c0 , keybs , inc and portb are the nodes, everything else is their properties. From the code, it immediately becomes apparent that the keyboard controller chip is connected to the I2C bus. In the compatible property , a string that describes the manufacturer and model of the device. It is for this property that the OS understands which driver needs to be associated with this device.

interrupt-controller is a property indicating that this device can be an interrupt controller, and interrupt-parent indicates to whom the interrupt from the current device is connected.

# interrupt-cells- a property indicating the number of parameters that describe interrupts for a given interrupt controller, and interrupts - a property in which parameters for this interrupt are set.

For example, portb says: # interrupt-cells = <2> This means that in the nodes for which portb is interrupt-parent, two parameters must be described in the interrupts property . portb is an interrupt-parent for keybs . We look in keybs . It indicates: interrupts = <19 IRQ_TYPE_LEVEL_LOW>. What does it mean?

Two parameters are described here. The first - is the number of pins in the port portb, on which we have an interrupt line from the keyboard controller. The second is the type of interruption (low or high). How do you know how many parameters for an interrupt controller you need to describe, and what does each of them mean? Usually this is written in the documentation. So, about portb it is written in this file: Documentation / devicetree / bindings / gpio / snps-dwapb-gpio.txt .

& portb - link to the portb node (in our case, the link to portb will be equal to / soc / gpio @ ff709000 / gpio-controller @ 0)
We don’t need the rest of the properties yet, you can read more about them and the Device Tree in general here: devicetree .org / Device_Tree_Usage .

It will not be superfluous to mention the process of registering devices and drivers (do not worry, we will return to the main topic in the next paragraph). According to the Linux Device Model:

A device is a physical or virtual object that is connected to a bus (possibly also virtual).
A driver is a software object that can be connected to a device and can perform any control function.
A bus is a device designed to be the “attachment point” of other devices. The basic functionality of all buses supported by the kernel is determined by the bus_type structure . The nested subsys_private structure is declared in this structure, in which two lists are declared: klist_devices and klist_drivers .
klist_devices - a list of devices that are connected to the bus.
klist_drivers - a list of drivers that can control devices on this bus.
Devices and drivers are added to these lists using the device_register and driver_register functions . In addition, device_register and driver_register associate the device with a suitable driver. device_register goes through the list of drivers and tries to find a driver suitable for this device. ( driver_register goes through the list of devices and tries to find devices that it can control) Checking if the driver is suitable for the device using the match function(dev, drv), a pointer to which is in the bus_type structure .



Now we can move on to the main topic - the implementation of the delayed driver loading mechanism. Let's look at the drivers / base / dd.c file. Here is a brief description of what we will see there:

There are two lists for controlling driver reloading - deferred_probe_pending_list and deferred_probe_active_list .

deferred_probe_pending_list - a list of devices for which a driver is missing some resources.
deferred_probe_active_list - a list of devices whose driver you can try to restart again.

Function really_probe calls the probe functionfor the bus on which the device is located. In our case, this is the function i2c_device_probe and it looks like this dev-> bus-> probe (dev) . The return value is checked for errors, and if it is -EPROBE_DEFER , then the device is added to deferred_probe_pending_list .

But the most interesting is how and when the driver is called again. While drivers return -EPROBE_DEFER , devices are sequentially added to deferred_probe_pending_list . But as soon as the probe function was successful for any driver, all devices from deferred_probe_pending_list are transferred to deferred_probe_active_list. It looks logical - perhaps the driver that we last downloaded successfully, and was not enough for the normal loading of pending drivers. A second attempt to start the drivers from deferred_probe_active_list is made by the deferred_probe_work_func function . It calls bus_probe_device for each device in the list.

Calling bus_probe_device will ultimately lead us back to the really_probe function for a pair from our device and its driver (see above).



But wait! We are now talking about calling the probe function for the bus on which the device is located. That is about i2c_device_probe. But what about the probe driver function of the keyboard? No, we have not forgotten about it, it will just be called from i2c_device_probe . You can verify this by looking at its code in the drivers / i2c / i2c-core.c file :

I2c_device_probe Code
static int i2c_device_probe(struct device *dev)
{
	struct i2c_client	*client = i2c_verify_client(dev);
	struct i2c_driver	*driver;
	int status;
	if (!client)
		return 0;
	driver = to_i2c_driver(dev->driver);
	if (!driver->probe || !driver->id_table)
		return -ENODEV;
	if (!device_can_wakeup(&client->dev))
		device_init_wakeup(&client->dev,
					client->flags & I2C_CLIENT_WAKE);
	dev_dbg(dev, "probe\n");
	status = of_clk_set_defaults(dev->of_node, false);
	if (status < 0)
		return status;
	status = dev_pm_domain_attach(&client->dev, true);
	if (status != -EPROBE_DEFER) {
	//Вот и вызов probe драйвера клавиатуры (в нашем случае)
		status = driver->probe(client, i2c_match_id(driver->id_table,
					client));
		if (status)
			dev_pm_domain_detach(&client->dev, true);
	}
	return status;
}


Okay, reloading seems to work, so why doesn't the keyboard driver get an interrupt number?
Let's try to track how the interrupt number should get into our driver.

The client structure is passed to the adp5589_probe function (struct i2c_client * client, const struct i2c_device_id * id) , one of the fields of which is irq - the interrupt number that our device (keyboard controller) will generate. adp5589_probe is called from the i2c_device_probe function (struct device * dev). The device structure is passed to it , from the pointer to which the pointer to the i2c_client structure is calculated (using magicmacro container_of ).

A few words about container_of
This macro takes an input to a pointer to a structure field, the type of this structure and the name of the field that the pointer points to, and returns a pointer to the structure itself.



About his work is well painted here .

So you need to find where the i2c_client structure is populated . It is filled in the function i2c_new_device (struct i2c_adapter * adap, struct i2c_board_info const * info); Specifically, the irq field is copied from the same field of the i2c_board_info structure .

struct i2c_client	*client;
client->irq = info->irq;

The i2c_board_info structure is populated in the of_i2c_register_devices function (struct i2c_adapter * adap).

info.irq = irq_of_parse_and_map(node, 0); 

irq_of_parse_and_map is a wrapper for a chain of two functions - of_irq_parse_one and irq_create_of_mapping ; The of_irq_parse_one function tries to find the node that is declared in the device tree as an interrupt-controller for the current device.
Remember these few lines in the device tree?

expander: pca9535@20 { 
	interrupt-parent = <&portb>; 
}; 

It is portb that searches for of_irq_parse_one , and according to the results of its work it fills out the structure of_phandle_args , which is passed to the irq_create_of_mapping function . irq_create_of_mapping is already and returns the desired interrupt number.

For the first time, of_irq_parse_one does not find the GPIO port, which he swears at the log:

irq: no irq domain found for / soc / gpio @ ff709000 / gpio-controller @ 0!

And what happens when the driver is reloaded? But nothing. Only i2c_device_probe and adp5589_probe are called .
That’s the problem. The interrupt is installed only for the first time and remains so forever, no matter how much we reload our driver.

Found a problem, but how to fix it?

You can try transferring the interrupt code to i2c_device_probe . Prior to this, we do not need an interrupt number anywhere, so there should be no problems.

But better, let's take a look at the sources of a more recent version of the kernel (we have version 3.18 installed). Here's what we will see there:
The i2c client interrupt setting was moved to the i2c_device_probe function .

if (!client->irq && dev->of_node) {
        int irq = of_irq_get(dev->of_node, 0);
        if (irq == -EPROBE_DEFER)
                return irq;
        if (irq < 0)
                irq = 0;
        client->irq = irq;
}

The structure i2c_board_info though left field irq but it is not used. So in new kernel versions the problem is fixed.

It remains only to transfer the changes to our version. All changes will affect the drivers / i2c / i2c-core.c file.
We will add the i2c client interrupt setting to our i2c_device_probe, which appeared in the latest version, and delete the interrupt setting in the of_i2c_register_devices function .

Listing changes from git diff
--- a/drivers/i2c/i2c-core.c
+++ b/drivers/i2c/i2c-core.c
@@ -626,6 +626,17 @@ static int i2c_device_probe(struct device *dev)
        if (!client)
                return 0;
+       if (!client->irq && dev->of_node) {
+               int irq = of_irq_get(dev->of_node, 0);
+
+               if (irq == -EPROBE_DEFER)
+                       return irq;
+               if (irq < 0)
+                       irq = 0;
+
+               client->irq = irq;
+       }
+
        driver = to_i2c_driver(dev->driver);
        if (!driver->probe || !driver->id_table)
                return -ENODEV;
@@ -1407,7 +1418,12 @@ static void of_i2c_register_devices(struct i2c_adapter *adap)
                        continue;
                }
-               info.irq = irq_of_parse_and_map(node, 0);
+               /*
+                * Now, we don't need to set interrupt here, because we set
+                * it in i2c_device_probe function 
+                * info.irq = irq_of_parse_and_map(node, 0);
+                */
+
                info.of_node = of_node_get(node);
                info.archdata = &dev_ad;


Check - the keyboard is working. We look in / proc / interrupt:

$ grep 'adp5589_keys' /proc/interrupts
305:          2         -  20  adp5589_keys

Press a few buttons:

$ grep 'adp5589_keys' /proc/interrupts
305:          6         -  20  adp5589_keys

The problem is resolved.

Also popular now: