Developer's 10G Ethernet View
Hello!
Many experts know that top-end network equipment uses special chips to process traffic. I take part in the development of such threshers and I want to share my experience in creating such high-performance devices (with 10/40 / 100G Ethernet interfaces).
To create a new channel, networkers most often take optics, a pair of SFP + modules, stick them into devices: the lights light up happily, packets begin to arrive: the chip begins to transmit them to the recipients. But how does the chip receive packets from the transmission medium? If interested, then welcome to cat.
Ethernet is a standard adopted by the IEEE. 802.3 standards cover all possible variations of Ethernet (from 10M to 100G). We focus on the specific implementation of the physical layer: 10GBASE-R (“normal” 10G, no frills).
This figure shows the layers of the OSI model and how they are mapped to sub-layers of the Ethernet protocol.
Sublevels:
PHY is divided into the following parts:
Terms:
Each type of physical layer can have its own implementation of separate PHY sublevels: different coding, different transmission frequencies (wavelengths) are used, but a clear division into levels can be traced everywhere. The presence of an environment-independent interface (XGMII) simplifies the development of chip application logic, as with any connection, the developer will get XGMII somewhere. We'll talk about what XGMII is later.
The PMD sublevel is located closest to the environment: special modules that are well known to network specialists solve its problems:
This table already has a familiar abbreviation: XAUI. Let's leave the XENPAK / X2 review in the middle of the article, and turn to the most popular modules: XFP and SFP +.
XFI and SFI actually represent the same interface: a differential pair operating at speeds from 9.95 to 11.10 gigabytes. The speed set is determined by the fact that several standards can use this interface: from 10GBASE-W WAN to 10GBASE-R over G.709. We are interested in 10GBASE-R LAN with a speed of 10.3125 gigabytes. One diffpara is used for reception, the other for transmission.
The tasks of the PMA and PCS sublevels can be solved on a chip, where we will perform further processing of Ethernet packets (after we extract them from XGMII). Let me remind you that in the PMA sublevel, it is necessary to select the clock frequency at the reception and deserialize the input signal. Such work can be done by special hardware units that cannot be used for other tasks. These blocks are called transceivers. A detailed article can go into their detailed description: anyone who is interested can look at the Altera FPGA transceivers block diagram.
After deserialization, the data goes to the PCS sublayer, where descrambling and decoding is performed (64b / 66b) and the data is sent in the form of XGMII to the MAC side. On the reverse steps are performed.
PCS can be implemented both using special hardware units (Hard PCS), and using the logic available to the user (Soft PCS). Of course, this statement is true only for FPGAs: in ASICs, everything is done in hardware. FPGA manufacturers lay down hardware PCS blocks for standard protocols, saving the developer time and FPGA resources. The presence of such blocks is very captivating, because many standard experience protocols work out of the box, and for most of them, the code is provided free of charge by the FPGA manufacturer.
Transceivers in FPGAs are expensive, an additional dozen transceivers can significantly raise the price of a chip. There are cheaper chips, with transceivers operating at lower speeds (they can serialize / deserialize data at lower frequencies). Another high-frequency interface, which is defined in section 4 of the 802.3 standard, is XAUI: 4 differential pairs with a transmission speed of 3.125 gigabytes (for one transmission line).
When using XAUI, an optional XGXS level occurs, which allows you to distance the PHY and MAC from each other by a distance. For example, run in different chips.
The task of the PMA and PCS in such a connection can be performed by special 10G transceivers (I admit that there may be confusion, because a little earlier the “transceivers” popped up in the FPGA, and now this term appears. By the way, XFP / SFP + modules are also called transceivers .)
Examples of 10G transceivers:
This transceiver is a separate chip, placed between the XFP / SFP + module and "our" chip, which will process Ethernet packets. In fact, such a transceiver using the PMA and PCS blocks converts XFI / SFI to XGMI, and then XGMII is converted to XAUI.
XAUI is fed to the ASIC / FPGA, which uses transceivers similar to those that were previously considered, but at a speed of 3.125G. The operation of the transceiver differs from that in the 10G mode:
XAUI PCS outputs an XGMII interface.
Some PHY transceivers can immediately issue the XGMII interface to pins and then transceivers in ASIC / FPGA do not need to be used:
This connection method has significant disadvantages:
As I promised, we got to these types of modules. It is easy to see that their connection is reduced to the second option, only without using an external transceiver chip. The module will take on the tasks of the PMD, PMA and PCS sublevels.
XGMII is defined in clause 46 of the 802.3 standard. This interface consists of independent reception and transmission. Each of the directions has a 32-bit data bus (RXD / TXD [31: 0]), four control signals (RXC / TXC [3: 0]) and a block on which the direction works (RX_CLK / TX_CLK). The standard defines that data and pilot buses are analyzed on each edge of the block (DDR). The packet itself goes through the data bus, control signals determine the beginning, help to “highlight” the beginning and end of the packet, and also report accidents.
The value of RX_CLK / TX_CLK is 156.25 MHz. Multiplication of 156.25 * 10 ^ 6 * 32 * 2 gives exactly 10 Gbit / s. Most often they go away from snapping on both edges of the shred, increasing the frequency or width of the data:
The lower the frequency, the easier it is to process this data and the more budget chips can be used. Only top-end (read, expensive) FPGAs can afford operation at frequencies of ~ 300 MHz.
In order to “grab” a package from the XGMII, a special MAC kernel is used:
Of course, this kernel has a transmitting part, which the package “converts” to the XGMII interface.
Most often, such a kernel is implemented on logic that is available for custom tasks. However, there is a manufacturer of FPGA, which implemented the MAC core in hardware, saving user resources.
The MAC core, having selected the packet from XGMII and placed the packet in the internal memory of the chip, “transfers” control of the packet to the chip application logic: parsers, filters, switching systems, etc. For example, if the chip is on a network card and a decision is made Since the packet must be sent to the host, it can be sent using PCIe to the RAM connected to the CPU.
With L1 to a greater extent have to deal with circuit engineers who breed boards for devices. FPGA-programmers work with this only at the beginning of the hardware upgrade: when XGMII started working and all transceivers passed the tests, we will concentrate on how to do traffic processing. In one device, the connection was made according to the first option: SFI directly enters the FPGA. In the other two in the second embodiment (using a transceiver and XAUI). There is also a device that has a connection both directly to SFI and through XAUI, but without a transceiver (FPGA connects to another chip).
To use external transceivers (and indeed, most specialized chips), you must sign an NDA. With this, special problems most often do not arise. Along with the NDA, various docks are issued, for example, chip register settings. From the experience of working with transceivers from two different manufacturers, I note that when raising the iron in the first batch, there are some problems with tuning the transceiver that resolved relatively quickly: the transceivers are multifunctional and sometimes it is necessary to trick them into settings for the required operating mode. Sometimes it happens that the documentation for the chips is very bad, and you have to sort through different options, and the technical support does not respond or openly declares that it does not provide support for these chips.
One of the advantages of using a transceiver chip is that, along with the documentation, a set of firmware settings can be distributed, which must be loaded into the transceiver when a certain type of module is installed. As far as I understand, these firmwares make tricky equalizer settings, without which a certain type of modules will work with bit errors. One of these SFP + modules (with a limiting amplifier) was treated in this way. If you connect without a transceiver, then you need to prepare such settings yourself for ASIC / FPGA, which can be a non-trivial task.
The presence of an interface that is independent of the transmission medium greatly simplifies life, as code (application logic: parsers, generators, analyzers, filters, etc.) is very easy to port from old projects to new ones, because no matter what type of connection was used.
Connecting (and processing) 40G / 100G to an ASIC / FPGA is similar to 10G, however, it has its own nuances. If it is interesting, it will be possible to devote a separate article to this, although it will not be large.
Take a regular UDP packet with the line “Hello, habr!” And send it to the device to see how it will look on XGMII.
I have a disassembled device on my table , on which testing of new features most often occurs: we use it for a clear example. To do this, prepare a special firmware and connect the debugger to see the signals inside the chip. 10G connection is made according to the second option: using an external transceiver, which sends data via XAUI to the side of FPGA. This transceiver is two-channel: it can work with two SFP +.
What XGMII (and our package) looks like inside FPGA:
This device inside FPGA uses a 72 bit XGMII bus operating on the positive edge of the frequency 156.25 MHz.
Legend:
You may notice that there is little left to receive the Ethernet packet: find its beginning and end (by control characters) and cut out the excess: IDLE , PREAMBLE and TERM .
Thank you for your time and attention! If you have questions, ask without a doubt.
PS I
thank my des333 and paulig colleagues for constructive criticism and advice.
Many experts know that top-end network equipment uses special chips to process traffic. I take part in the development of such threshers and I want to share my experience in creating such high-performance devices (with 10/40 / 100G Ethernet interfaces).
To create a new channel, networkers most often take optics, a pair of SFP + modules, stick them into devices: the lights light up happily, packets begin to arrive: the chip begins to transmit them to the recipients. But how does the chip receive packets from the transmission medium? If interested, then welcome to cat.
IEEE 802.3
Ethernet is a standard adopted by the IEEE. 802.3 standards cover all possible variations of Ethernet (from 10M to 100G). We focus on the specific implementation of the physical layer: 10GBASE-R (“normal” 10G, no frills).
This figure shows the layers of the OSI model and how they are mapped to sub-layers of the Ethernet protocol.
Sublevels:
- PHY is the physical sublevel.
- MAC is a sublayer of medium access control.
PHY is divided into the following parts:
- PMD - provides the transmission and reception of individual bits on the physical interface.
- PMA - provides serialization / deserialization of data, as well as the allocation of shred from serial data (at the reception)
- PCS - provides scrambling / descrambling, as well as encoding / decoding (64b / 66b) of data blocks
- XGXS - XGMII expander: used if PHY and MAC are at a distance from each other (optional).
- RECONCILIATION is a sublayer that translates XGMII into MAC signals.
Terms:
- Medium - transmission medium.
- MDI is a media dependent interface.
- XGMII - 10G media independent interface. The goal of XGMII is to provide a simple and cheap connection between PHY and MAC.
- XAUI - 10G interface for connecting to a transceiver.
Each type of physical layer can have its own implementation of separate PHY sublevels: different coding, different transmission frequencies (wavelengths) are used, but a clear division into levels can be traced everywhere. The presence of an environment-independent interface (XGMII) simplifies the development of chip application logic, as with any connection, the developer will get XGMII somewhere. We'll talk about what XGMII is later.
PMD
The PMD sublevel is located closest to the environment: special modules that are well known to network specialists solve its problems:
Module type | Interface |
---|---|
Xenpak | Xaui |
X2 | Xaui |
XFP | Xfi |
SFP + | Sfi |
This table already has a familiar abbreviation: XAUI. Let's leave the XENPAK / X2 review in the middle of the article, and turn to the most popular modules: XFP and SFP +.
XFI / SFI
XFI and SFI actually represent the same interface: a differential pair operating at speeds from 9.95 to 11.10 gigabytes. The speed set is determined by the fact that several standards can use this interface: from 10GBASE-W WAN to 10GBASE-R over G.709. We are interested in 10GBASE-R LAN with a speed of 10.3125 gigabytes. One diffpara is used for reception, the other for transmission.
XFI / SFI connects directly to ASIC / FPGA
The tasks of the PMA and PCS sublevels can be solved on a chip, where we will perform further processing of Ethernet packets (after we extract them from XGMII). Let me remind you that in the PMA sublevel, it is necessary to select the clock frequency at the reception and deserialize the input signal. Such work can be done by special hardware units that cannot be used for other tasks. These blocks are called transceivers. A detailed article can go into their detailed description: anyone who is interested can look at the Altera FPGA transceivers block diagram.
After deserialization, the data goes to the PCS sublayer, where descrambling and decoding is performed (64b / 66b) and the data is sent in the form of XGMII to the MAC side. On the reverse steps are performed.
PCS can be implemented both using special hardware units (Hard PCS), and using the logic available to the user (Soft PCS). Of course, this statement is true only for FPGAs: in ASICs, everything is done in hardware. FPGA manufacturers lay down hardware PCS blocks for standard protocols, saving the developer time and FPGA resources. The presence of such blocks is very captivating, because many standard experience protocols work out of the box, and for most of them, the code is provided free of charge by the FPGA manufacturer.
Connection via an external chip transceiver
Transceivers in FPGAs are expensive, an additional dozen transceivers can significantly raise the price of a chip. There are cheaper chips, with transceivers operating at lower speeds (they can serialize / deserialize data at lower frequencies). Another high-frequency interface, which is defined in section 4 of the 802.3 standard, is XAUI: 4 differential pairs with a transmission speed of 3.125 gigabytes (for one transmission line).
When using XAUI, an optional XGXS level occurs, which allows you to distance the PHY and MAC from each other by a distance. For example, run in different chips.
The task of the PMA and PCS in such a connection can be performed by special 10G transceivers (I admit that there may be confusion, because a little earlier the “transceivers” popped up in the FPGA, and now this term appears. By the way, XFP / SFP + modules are also called transceivers .)
Examples of 10G transceivers:
- www.vitesse.com/products/productLine/10GE-PHYs
- www.marvell.com/transceivers/alaska-x-gbe
- www.broadcom.com/products/Physical-Layer/10-Gigabit-Ethernet-PHYs
This transceiver is a separate chip, placed between the XFP / SFP + module and "our" chip, which will process Ethernet packets. In fact, such a transceiver using the PMA and PCS blocks converts XFI / SFI to XGMI, and then XGMII is converted to XAUI.
XAUI is fed to the ASIC / FPGA, which uses transceivers similar to those that were previously considered, but at a speed of 3.125G. The operation of the transceiver differs from that in the 10G mode:
- Four transceivers are needed (four hardware units), as 4 diffpairs are used for this interface.
- XAUI PCS uses 8b / 10b encoding. 10G PCS uses 64b / 66b.
XAUI PCS outputs an XGMII interface.
Some PHY transceivers can immediately issue the XGMII interface to pins and then transceivers in ASIC / FPGA do not need to be used:
This connection method has significant disadvantages:
- High pin consumption: in the XGMII variant, one chip uses a minimum of 78 legs, compared to 16 in the XAUI variant.
- Parallel interfaces may require alignment of tracks on the board, which is sometimes non-trivial.
XENPAK / X2 Connection
As I promised, we got to these types of modules. It is easy to see that their connection is reduced to the second option, only without using an external transceiver chip. The module will take on the tasks of the PMD, PMA and PCS sublevels.
XGMII
XGMII is defined in clause 46 of the 802.3 standard. This interface consists of independent reception and transmission. Each of the directions has a 32-bit data bus (RXD / TXD [31: 0]), four control signals (RXC / TXC [3: 0]) and a block on which the direction works (RX_CLK / TX_CLK). The standard defines that data and pilot buses are analyzed on each edge of the block (DDR). The packet itself goes through the data bus, control signals determine the beginning, help to “highlight” the beginning and end of the packet, and also report accidents.
The value of RX_CLK / TX_CLK is 156.25 MHz. Multiplication of 156.25 * 10 ^ 6 * 32 * 2 gives exactly 10 Gbit / s. Most often they go away from snapping on both edges of the shred, increasing the frequency or width of the data:
- Bus 36 bit (32 + 4) at a frequency of 312.5 MHz.
- Bus 72 bit (32 * 2 + 4 * 2) at a frequency of 156.25 MHz.
The lower the frequency, the easier it is to process this data and the more budget chips can be used. Only top-end (read, expensive) FPGAs can afford operation at frequencies of ~ 300 MHz.
In order to “grab” a package from the XGMII, a special MAC kernel is used:
- Proprietary. After purchasing a license for such an IP core, you (most often) receive encrypted sources (without the possibility of modification) and there is no particular restriction on the number of chips in which this core can be used. An example .
- Open source. Such kernels are very useful for beginners, as The code is open, and you can figure out how it works. Use license is determined separately. An example .
- Samopisnoe.
Of course, this kernel has a transmitting part, which the package “converts” to the XGMII interface.
Most often, such a kernel is implemented on logic that is available for custom tasks. However, there is a manufacturer of FPGA, which implemented the MAC core in hardware, saving user resources.
The MAC core, having selected the packet from XGMII and placed the packet in the internal memory of the chip, “transfers” control of the packet to the chip application logic: parsers, filters, switching systems, etc. For example, if the chip is on a network card and a decision is made Since the packet must be sent to the host, it can be sent using PCIe to the RAM connected to the CPU.
Personal experience
With L1 to a greater extent have to deal with circuit engineers who breed boards for devices. FPGA-programmers work with this only at the beginning of the hardware upgrade: when XGMII started working and all transceivers passed the tests, we will concentrate on how to do traffic processing. In one device, the connection was made according to the first option: SFI directly enters the FPGA. In the other two in the second embodiment (using a transceiver and XAUI). There is also a device that has a connection both directly to SFI and through XAUI, but without a transceiver (FPGA connects to another chip).
To use external transceivers (and indeed, most specialized chips), you must sign an NDA. With this, special problems most often do not arise. Along with the NDA, various docks are issued, for example, chip register settings. From the experience of working with transceivers from two different manufacturers, I note that when raising the iron in the first batch, there are some problems with tuning the transceiver that resolved relatively quickly: the transceivers are multifunctional and sometimes it is necessary to trick them into settings for the required operating mode. Sometimes it happens that the documentation for the chips is very bad, and you have to sort through different options, and the technical support does not respond or openly declares that it does not provide support for these chips.
One of the advantages of using a transceiver chip is that, along with the documentation, a set of firmware settings can be distributed, which must be loaded into the transceiver when a certain type of module is installed. As far as I understand, these firmwares make tricky equalizer settings, without which a certain type of modules will work with bit errors. One of these SFP + modules (with a limiting amplifier) was treated in this way. If you connect without a transceiver, then you need to prepare such settings yourself for ASIC / FPGA, which can be a non-trivial task.
The presence of an interface that is independent of the transmission medium greatly simplifies life, as code (application logic: parsers, generators, analyzers, filters, etc.) is very easy to port from old projects to new ones, because no matter what type of connection was used.
Connecting (and processing) 40G / 100G to an ASIC / FPGA is similar to 10G, however, it has its own nuances. If it is interesting, it will be possible to devote a separate article to this, although it will not be large.
Hello habr!
Take a regular UDP packet with the line “Hello, habr!” And send it to the device to see how it will look on XGMII.
I have a disassembled device on my table , on which testing of new features most often occurs: we use it for a clear example. To do this, prepare a special firmware and connect the debugger to see the signals inside the chip. 10G connection is made according to the second option: using an external transceiver, which sends data via XAUI to the side of FPGA. This transceiver is two-channel: it can work with two SFP +.
What XGMII (and our package) looks like inside FPGA:
This device inside FPGA uses a 72 bit XGMII bus operating on the positive edge of the frequency 156.25 MHz.
Legend:
- xgmii_rxc - set of control signals.
- xgmii_rxd - a set of data signals (broken by bytes for convenience).
- IDLE - signals of the absence of packet transmission.
- PREAMBLE - preamble, indicates the beginning of the transmission of the packet.
- L2_HDR - Layer 2 header: Ethernet.
- L3_HDR - Layer 3 header: IP.
- L4_HDR - Layer 4 header: UDP.
- MSG is our post ("Hello, habr!").
- PAD - padding. Present in the packet if the original payload length was less than 60 bytes.
- FCS is the checksum of the packet. It can be used to determine whether a packet was broken during forwarding or not.
- TERM - signal to end the transmission of the packet.
You may notice that there is little left to receive the Ethernet packet: find its beginning and end (by control characters) and cut out the excess: IDLE , PREAMBLE and TERM .
Thank you for your time and attention! If you have questions, ask without a doubt.
PS I
thank my des333 and paulig colleagues for constructive criticism and advice.