
Drives, Controllers, OS, and Advanced Format

What is Advanced Format?
Advanced Format - a new sector layout format used in some hard drives. Instead of the traditional sector of 512 bytes, 4096 bytes are used. Some SCSI / SAS / FC drives may use 520- and 528-byte thick sectors for additional data integrity control , but this is not the topic of this article.
The increase in sector size by 8 times is associated with the need to increase the efficiency of data placement on modern disks. The overhead associated with 512-byte markup is starting to hamper the further increase in HDD capacity. In addition to service fields, each 512-byte sector has a field with an error correction code (ECC) of 50 bytes in length. In the 4096-byte sector, the length of the ECC field is 100 bytes. The overall storage efficiency was improved by about 10%.

Naturally, support for non-standard sectors is required by disk controllers and operating systems. An additional 512E standard was introduced to address compatibility issues., which refers to disks with a physical sector size of 4096 bytes, but emulating a regular sector size of 512 bytes. Advanced Format discs without emulation are designated 4KN . Thus, now there are three markup options:
Format | Logical sector size | Physical sector size |
512 bytes | 512 bytes | |
512 bytes | 4096 bytes (4KiB) | |
4096 bytes (4KiB) | 4096 bytes (4KiB) |
Compatibility
Operating Systems
At first glance, it seems that using emulation of the 512-byte sector removes all compatibility issues, but this is not so. Firstly, a performance problem immediately arises. What happens when a block of 512 bytes in size is written to a disk with a sector size of 4096 bytes (albeit emulating the presence of 512 byte sectors)? The classic read-modify-write process will take place, instead of one operation, two will be needed: read the 4096 byte sector, change 512 bytes in it (a writable block) and write 4096 bytes back. A similar problem appears in the absence of alignment, when the recorded data block can be quite large and even a multiple of 4096 bytes, but it is shifted relative to the boundaries of real sectors:

In modern conditions, write operations with blocks of less than 4096 bytes are extremely rare, but the problem with alignment remains. For example, on older Windows (prior to Windows Server 2008), during installation, the boot partition is created with an offset of 63 sectors. This has historically happened since the time when the BIOS used real disk geometry instead of LBA. Of course, the offset at 63x512 is not divided by 4096, which leads to misalignment for all subsequent sections and reduced performance. For the first time, this problem was noticed due to the use of RAID controllers and the need to align partitions by stripe borders, and it was solved in Windows Vista / Windows Server 2008 (and at about the same time in other OSs) by introducing border alignment of 1024 KiB ( 1MiB), i.e.
Why exactly 1MiB, if a smaller offset is suitable (the main thing is to divide by 4096 bytes)? Just because you need a reserve, because in addition to a physical disk, volumes on RAID controllers (with the default stripe size, for example, Adaptec 256KiB), SSD (with a large page size), or disk images when using virtualization can act as a block device , the recommended NTFS cluster size for SQL or Exchange is 64KiB, etc.
Problem number two is potential data loss for synchronous recording scenarios. For situations with writing a block less than 4096 bytes or an unaligned block, synchronous recording in fact will fail. It remains to “teach” the OS not to use blocks of less than 4096 bytes on 512E disks when writing, but there are certain problems with this.
Microsoft
For Microsoft OS, there are the following official ( source ) data:
Format | Logical sector size | Physical sector size | Compatible OS |
512 bytes | 4096 bytes (4KiB) |
| |
4096 bytes (4KiB) | 4096 bytes (4KiB) |
|
You can check the alignment of existing partitions and set the offset for new partitions in Windows using diskpart. Example (partition on disk 0 with an offset of 1024 KiB or 2048 512-byte sectors):
select disk 0 create partition primary align = 1024The easiest way to check is through WMI (example):
wmic partition get Blocksize, StartingOffset, Name BlockSize Name StartingOffset 512 Disk # 0, partition # 0 1048576 512 Disk # 0, Partition # 1 368050176 512 Disk # 2, partition # 0 135266304 512 Disk # 1, partition # 0 1048576
The StartingOffset column should be 1024KiB for the first section, for the rest it should be divided by 1024KiB, which means that everything will be divided by 4096 bytes and all other "good numbers" (sizes of stripe and NTFS clusters).
Let me remind you that in modern Windows, the offset of 1024KiB is already used by default, so you only need to check / set it manually for the OS from the “63-sector” era. When you automatically create GPT markup (via Disk Management) on a 512N or 512E drive, you will see an offset for the first partition of 17KiB. This is not a cause for alarm, as it is an MSR service partition . The first standard section will be created with an offset of 135266304 bytes (129MiB) - it is perfectly divided into any of our "good numbers".
Linux
Compatibility table for Linux (only common server distributions are listed):
Format | Logical sector size | Physical sector size | Compatible OS |
512 bytes | 4096 bytes (4KiB) |
| |
4096 bytes (4KiB) | 4096 bytes (4KiB) |
|
see the sizes of the physical and logical blocks in / sys / block / sdX / queue / physical_block_size and in / sys / block / sdX / queue / logical_block_size respectively.
GNU Fdisk will automatically use a 1MiB offset at startup with the -c and -u switches (disable DOS compatibility mode and use the sector as the unit). Normal Fdisk does not know how to work with GPT, so it is useless for disks> 2TiB, and you need to use Parted or GPT Fdisk . The latter, by default, uses the desired offset in 2048 sectors for 512N / 512E disks:
Disk / dev / sde: 7814037168 sectors, 3.6 TiB Logical sector size: 512 bytes Disk identifier (GUID): BE7D7D71-F6ED-4371-ACFE-B04819A4DDC2 Partition table holds up to 128 entries First usable sector is 34, last usable sector is 7814037134 Partitions will be aligned on 2048-sector boundaries Total free space is 7814037101 sectors (3.6 TiB)
Example for GNU Parted (for 512N / 512E drives):
# create new GPT markup mklabel gpt # create a section on all free space with an offset of 2048 sectors (parted) mkpart part1 2048s 100% (parted) print Model: ATA WDC WD40EFRX-68W (scsi) Disk / dev / sde: 7814037168s Sector size (logical / physical): 512B / 4096B Partition Table: gpt Number Start End Size File system Name Flags 1 2048s 7814035455s 7814033408s part1
Everything is fine in LVM: the default offset is 1MiB and the PE size (physical extent) is a multiple of 1MiB.
# check the offset #pvs / dev / sde -o + pe_start PV VG Fmt Attr PSize PFree 1st PE / dev / sde VolRed lvm2 a-- 3.64t 3.64t 1.00m # check the size of PE #pvdisplay / dev / sde --- Physical volume --- PV Name / dev / sde VG Name VolRed PV Size 3.64 TiB / not usable 3.84 MiB Allocatable yes PE Size 4.00 MiB Total PE 953861 Free PE 953861 Allocated PE 0 PV UUID 9AfJr9-OOtC-PB34-dUnq-kCDK-L1fN-aTAxus
VMware
The VMware Knowledge Base article claims that neither 512E nor 4KN drives are supported. Support for 4KN drives is announced in vSphere 6.0.
With the advent of VMFS-5, we got a single block size - 1MiB and the correct 1MiB offset for the first partition. Previously, a not always suitable 64KiB offset was used. But all this does not cancel the statement of VMware that 512E disks are not supported. Apparently, this is due to the fact that the VMDK format stores data with a granularity of 512 bytes.
Other OS
Mac OSX supports Advanced Format starting with Tiger. There are still FreeBSD and other * BSD, Oracle Solaris and many other OSs, but a detailed discussion of the situation with Advanced Format disks in them is beyond the scope of this article.
Microsoft Services
Hyper v
Despite the fact that 512E disks are supported in Windows Server 2008 and 2008 R2 (see the table for requirements for installed KB), a problem arises in Hyper-V: the VHD virtual disk file format uses 512-byte structures for dynamic ("thin") and differential VHD, which naturally leads to regular read-modify-write. The situation is aggravated by the fact that for a guest OS, a virtual disk looks like it has physical sectors of 512 bytes. Use fixed VHDs, but if possible, do not use 512E drives to host VHD files.
Windows Server 2012 introduced the VHDX format, which does not have the problems described above (it can be created in any form - 512N / 512E / 4KN).
Exchange server
There are features related to replication in DAG:
- All disks used in the Exchange Availability Group (DAG) Exchange for storing databases and logs must use the same physical sector size.
- 4KN discs are not supported
- 512E drives supported since Exchange 2010 Service Pack 2
SQL Server
The situation is the same as for Exchange Server - in fault-tolerant configurations for databases and logs, all nodes must use disks with the same physical sector size.
When using Storage Spaces, an interesting situation arises: the presented size of the physical sector turns out to be 4 KiB regardless of which disks Storage Spaces are assembled from (storage Spaces can be created from different disks - 512N and 512E, of course, you cannot mix with 4KN, except in cases using tiering with SSD). The VHDX format (virtual disk) is created by default as 512E. You can verify this by running fsutil fsinfo ntfsinfo <drive name>:
Bytes Per Sector: 512 Bytes Per Physical Sector: 4096
When using VHDX on a Storage Spaces volume (or hardware RAID) consisting of 4KN disks, VHDX itself is also desirable to make 4KN:
New-VHD -Path D: \ image4kn.vhdx -Fixed -SizeBytes 500GB -LogicalSectorSizeBytes 4096 -PhysicalSectorSizeBytes 4096
Is this safe for SQL and other synchronous write applications? The answer is yes, since the large granularity of storage does not violate the integrity of the data, this also does not affect performance, since 4096 is divided by 512.
Services using ESENT
Not a very actual problem in Windows Server 2008. Services that use the Extensible Storage Engine API (AD, WINS, DHCP) may crash when the physical sector is resized (for example, when migrating from a 512N drive to 512E). Detailed description and hotfix see here .
Other software
Obviously, software designed to manage partitions (cloning, moving, resizing) and to automate backups must take into account the features of working with Advanced Format disks. Here's how it goes:
- Acronis Products .
- Symantec Backup Exec supports Advanced Format disks (512E and 4KN) starting with version 2012 revision 1798 Service Pack 2. Earlier releases may work with 512E disks, but Symantec claims this combination is not officially supported.
- Symantec Norton Ghost does not support 4KN drives.
Controllers
Universal rules for all controllers:
- 4KN and 512N / 512E drives cannot be mixed in the same array.
- For Adaptec and LSI controllers, metadata is located at the end of the disk, user space is available with LBA0. This means that there will be no alignment problems for 512E drives.
- An array of 4KN disks will also have a physical / logical sector size of 4KiB, i.e. GPT and UEFI are needed to boot from them.
- Do not forget to update management utilities and drivers with the firmware.
- How will the LUN created on 512E disks be presented - 512N or 512E? From what we were able to verify: LSI 9260, Adaptec 6th series controllers, Infortrend ESDS storage systems report 512N (logical / physical blocks of 512 bytes), i.e. the problem with synchronous recording remains. Be sure to use the write-back cache (of course, with protection) and UPS. Moreover, it is possible that when changing the firmware, the storage system and the controller may suddenly behave “correctly”, and LUNs will turn into 512E with all the ensuing consequences for compatibility.
Adaptec by PMC
- SAS HBA series 5 and 6: support 512E, do not support 4KN
- SAS HBA series 6H and 7H: support 512E, 4KN - starting with firmware 10467.
- RAID controllers of series 7 and 8: support 512E, 4KN - starting with firmware 30862.
Compatibility Lists for Adaptec controllers .
LSI / Avago
LSI-based controllers use Dell, IBM, Lenovo, Fujitsu, Intel, and Supermicro. The correspondence between the models from LSI and OEM-options can be set on a chip.
- Older LSI1078-based controllers: don't support Advanced Format drives at all
LSI 3ware 9750 series based on LSI2108 and earlier 3ware: do not support Advanced Format drives at all.- LSISAS2108 (LSI 9260/61/80): support 512E since MR4.8 firmware, 4KN does not support. Compatibility list (4KN drives are present, but apparently refer to the LSI 2208, see below).
- LSISAS2208 (LSI 9265/66/71/85/86): support 512E since MR5.5 firmware, support 4KN since MR5.8 firmware. Compatibility List .
- LSISAS3108 (LSI 9361/80): supports 512E and 4KN. Compatibility List .
- SAS HBA на базе LSISAS2008 и LSISAS2308 (LSI 9211/9200/9207): поддерживают 512E и 4KN. Список совместимости.
- SAS HBA на базе LSISAS3008 (LSI 9311/9300): поддерживают 512E и 4KN. Список совместимости.
- RAID на базе LSISAS2008 (LSI 9240, прошивка iMR): поддерживают 512E, 4KN не поддерживают. Список совместимости.
- RAID на базе LSISAS3008 (LSI 9340, прошивка iMR): поддерживают 512E, 4KN не поддерживают. Список совместимости.
Update as of March 25, 2015: the latest LSI MegaRAID based 3108 RAID controllers have a poorly documented volume (VD) property called Emulation Type . In the controller BIOS, the possible values are Default, Disabled, and Forced. You can also switch through MSM or StorCLI:
storcli / cx / vx set emulationType = 0 | 1 | 2
This property is precisely responsible for the block sizes presented to the host:
Default (0): if there are 512E disks in the volume, it will be presented as 512E. If all the disks? Are 512N, then the volume is presented as 512N
Disabled (1): The volume is always presented as 512N despite the presence of 512E
Forced disks (2): The volume is always presented as 512E even in the absence of disks 512E
Emulation Type was ported to SAS2 controllers (LSI 2108/2208), but with no Forced value (2).
Software RAID on Intel Chipsets (RST / RSTe)
4KN is not supported at all, Intel RST on 512E drives requires fresh drivers .
Advanced Format in enterprise-class drives. What awaits us?
It will be about enterprise-class drives of the latest series. Desktop HDD and positioned for NAS or video surveillance did not get here.
Vendor | Series | Form factor | Interfaces | Spindle rotation speed, rpm | Additionally | |||
Seagate | Enterprise Performance 10K HDD (10k.8) | 2.5 " | SAS | 10,000 | Y | Y | Y | for 512N capacity is limited: 600 / 1200GB |
Seagate | Enterprise Performance 15K HDD (15k.5) | 2.5 " | SAS | 15,000 | Y | Y | Y | 32GB internal SSD cache |
Seagate | Enterprise Capacity 2.5 HDD (V.3) | 2.5 " | SAS, SATA | 7200 | Y | Y | ||
Seagate | Enterprise Capacity 3.5 HDD (V.4) | 3.5 " | SAS, SATA | 7200 | Y | Y | ||
Seagate | Archive HDD | 3.5 " | SATA | 7200 | Y | Positioned for archival use, less MTBF and worse BER | ||
Seagate | Terascale hdd | 3.5 " | SATA | 5900/7200 | Y | Positioned for cloud use, less MTBF and worse BER | ||
Hgst | Ultrastar C10K1800 | 2.5 " | SAS | 10,000 | Y | Y | Y | for 512N capacity is limited: 300/600/900 / 1200GB |
Hgst | Ultrastar C15K600 | 2.5 " | SAS | 15,000 | Y | Y | Y | |
Hgst | Ultrastar C7K1000 | 2.5 " | SAS | 7200 | Y | |||
Hgst | Ultrastar He 8 | 3.5 " | SAS, SATA | 7200 | Y | Y | ||
Hgst | Ultrastar He 6 | 3.5 " | SAS, SATA | 7200 | Y | |||
Hgst | Ultrastar 7K6000 | 3.5 " | SAS, SATA | 7200 | Y | Y | ||
Hgst | MegaScale DC 4000.B | 3.5 " | SATA | 5400 | Y | Positioned for cloud use, less MTBF and worse BER | ||
Wd | Xe | 2.5 "/3.5" | SAS | 10,000 | Y | |||
Wd | Re | 3.5 " | SATA | 7200 | Y | |||
Wd | Se | 3.5 " | SATA | 7200 | Y | Positioned for cloud use, less MTBF and worse BER | ||
Wd | Ae | 3.5 " | SATA | 5760 | Y | ? | Positioned for archival use, less MTBF and worse BER | |
Toshiba | AL13SE | 2.5 " | SAS | 10,000 | Y | |||
Toshiba | AL13SX | 2.5 " | SAS | 15,000 | Y | |||
Toshiba | AL13SEL | 3.5 " | SAS | 10,000 | Y | |||
Toshiba | MG03ACA / MG03SCA | 3.5 " | SAS, SATA | 7200 | Y | |||
Toshiba | MG04ACA | 3.5 " | SATA | 7200 | Y | Y | ||
Toshiba | MG04SCA | 3.5 " | SAS | 7200 | Y | Y | ||
Toshiba | MC04ACA | 3.5 " | SATA | 7200 | Y | Positioned for cloud use, less MTBF and worse BER |
SSD
SSDs have their own characteristics. You can read and write data in pages of 2–4–8–16 KiB, depending on the architecture of the SSD. At the same time, for recording, it is necessary to provide preliminary erasing of the cells, which is carried out not in pages, but in blocks of several hundred pages. For example, the Samsung 840 EVO has 2MiB blocks, each of which consists of 256 pages of 8KiB. In this case, of course, any block size presented to the host - 512 or 4096 bytes - will be an abstraction.
Some of the modern SAS / SATA SSDs emulate a 512E drive, but most of the compatibility considerations are 512N. No special measures are required in connection with this, since in an enterprise-class SSD, the contents of the cache are necessarily protected from power loss. It is enough to ensure alignment to page size.
Some PCI-E SSDs, for example, manufactured by Fusion IO, allow using proprietary utilities to change the size of the logical sector when formatting, i.e. Switch between 512E and 4KN modes. For some SAS SSDs, this is also possible, for example, the Seagate 1200 supports sector resizing with regular sg_format. Switching to the 4KiB sector in some scenarios can significantly increase productivity.
conclusions
- 512E drives are not suitable for use on servers with legacy operating systems that ignore the size of the physical sector. In desktop applications, this is not a big deal, since nobody usually uses synchronous recording.
- Внимательно изучите свою инфраструктуру: ОС, используемые сервисы, контроллеры, СХД, режимы кэширования на контроллерах и СХД. При наличии потенциальных проблем с производительностью и/или целостностью данных примите необходимые меры.
- Проблемы с устаревшими ОС можно обойти при помощи виртуализации, но по-прежнему нужно обращать внимание на выравнивание разделов.
Ссылки
- IBM DeveloperWorks: Linux on 4 KB sector disks: Practical advice
- Документация RHEL6: IO limits и IO hints в Linux
- Выравнивание в fdisk, LVM и MD
- Поддержка дисков с большим размером сектора в Hyper-V
- Группы обеспечения доступности Exchange Server 2010 и размер сектора
- Использование Hyper-V на дисках с большим размером сектора в Windows Server 2008 и 2008 R2
- SQL Server и новые диски с размером сектора 4K
- The "Windows Setup could not configure Windows on this computer's hardware" error when installing Windows 7 or Windows Server 2008 R2
- Issues with Extensible Storage Engine API (ESENT) applications when resizing a physical sector
- Understanding 4KB Sector Support for Oracle Files
- Effect from innodb log block size 4096 bytes