
Free replacement VMware vSphere Storage Appliance based on DRBD

Briefly, the essence lies in the possibility of building a fault-tolerant virtual infrastructure without external storage. For implementation, two or three virtual machines are installed (one for each host) that replicate the free space of the disk subsystem of ESXi servers and provide it as a common storage to all the same ESXi hosts. Details in Russian Storage Appliance described here .
An interesting idea, but the price bites - around $ 6K. In addition, if you think about performance, will it fail to get a drawdown on the speed of the disk array? Approaching the issue on the other hand, you can think of many other ways of organizing external storage. For example, you can create external storage from almost any hardware with the required number of disks and installed software Openfiler, FreeNAS, Nexenta, Open-E - in these software products there is the possibility of replication between systems.
This approach is practiced by many companies that do not have the opportunity to purchase expensive storage systems of a renowned manufacturer, which would provide sufficient performance and reliability. Typically, such systems are equipped with two controllers, redundant power system, high-speed disks and more ...
However, back to the beginning and look at the scheme that VMware offers:

What do we see? 3 ESXi hosts with virtual machines deployed on them, one for each host. The machines are clustered and give us internal drives as external.
The idea to put together a similar solution from the available tools has long been in the air, but could not find any justification. And then VMware itself gave an impetus in order to try everything in a test environment.
Solutions for building fault-tolerant storage - a bunch, for example, based on Openfiler + DRBD + Heartbeat. But at the heart of all these decisions is the idea of building an external storage. Why not try to do something similar, but based on virtual machines?
As a foundation, take 2 virtual machines with OS Ubuntu, Ubuntu documentation on building failover iSCSI-target and try to make your own Appliance.
Partitioning disks on both nodes of the cluster: The disk size sdd1 is selected as an example. In fact, all the remaining free space on the local storage of the ESXi host is taken. ISCSI network: Private network: / etc / network / interfaces: For node1: For node2: File / etc / hosts for both nodes: Installing packages: Restart the servers. Changing file ownership and permissions: Use /etc/drbd.conf to describe the configuration. We define 2 resources: 1. DRBD device, which will contain ISCSI configuration files; 2. DRBD device, which will become our iSCSI-target. For node1: Copy the configuration to the second node:
/dev/sda1 - 10 GB / (primary' ext3, Bootable flag: on)
/dev/sda5 - 1 GB swap (logical)
/dev/sdb1 - 1 GB (primary) DRBD meta-данные. Не монтируем.
/dev/sdc1 - 1 GB (primary) DRBD диск, используемый для хранения конфигурационных файлов iSCSI. Не монтируем.
/dev/sdd1 - 50 GB (primary) DRBD диск для iSCSI-target.
iSCSI server1: node1.demo.local IP address: 10.11.55.55
iSCSI server2: node2.demo.local IP address: 10.11.55.56
iSCSI Virtual IP address 10.11.55.50
iSCSI server1: node1-private IP address: 192.168.22.11
iSCSI server2: node2-private IP address: 192.168.22.12
auto eth0
iface eth0 inet static
address 10.11.55.55
netmask 255.0.0.0
gateway 10.0.0.1
auto eth1
iface eth1 inet static
address 192.168.22.11
netmask 255.255.255.0
auto eth0
iface eth0 inet static
address 10.11.55.56
netmask 255.0.0.0
gateway 10.0.0.1
auto eth1
iface eth1 inet static
address 192.168.22.12
netmask 255.255.255.0
127.0.0.1 localhost
10.11.55.55 node1.demo.local node1
10.11.55.56 node2.demo.local node2
192.168.22.11 node1-private
192.168.22.12 node2-private
apt-get -y install ntp ssh drbd8-utils heartbeat jfsutils
chgrp haclient /sbin/drbdsetup
chmod o-x /sbin/drbdsetup
chmod u+s /sbin/drbdsetup
chgrp haclient /sbin/drbdmeta
chmod o-x /sbin/drbdmeta
chmod u+s /sbin/drbdmeta
/etc/drbd.conf:
resource iscsi.config {
protocol C;
handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
}
startup {
degr-wfc-timeout 120;
}
disk {
on-io-error detach;
}
net {
cram-hmac-alg sha1;
shared-secret "password";
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
syncer {
rate 100M;
verify-alg sha1;
al-extents 257;
}
on node1 {
device /dev/drbd0;
disk /dev/sdc1;
address 192.168.22.11:7788;
meta-disk /dev/sdb1[0];
}
on node2 {
device /dev/drbd0;
disk /dev/sdc1;
address 192.168.22.12:7788;
meta-disk /dev/sdb1[0];
}
}
resource iscsi.target.0 {
protocol C;
handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
}
startup {
degr-wfc-timeout 120;
}
disk {
on-io-error detach;
}
net {
cram-hmac-alg sha1;
shared-secret "password";
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
syncer {
rate 100M;
verify-alg sha1;
al-extents 257;
}
on node1 {
device /dev/drbd1;
disk /dev/sdd1;
address 192.168.22.11:7789;
meta-disk /dev/sdb1[1];
}
on node2 {
device /dev/drbd1;
disk /dev/sdd1;
address 192.168.22.12:7789;
meta-disk /dev/sdb1[1];
}
}
scp /etc/drbd.conf root@10.11.55.56:/etc/
We initialize the disks with meta-data on both servers: Run drbd: Now you need to decide which server will act as primary and which will be secondary in order to synchronize between the disks. Let's say that primary is node1. Run the command on the first node: Format the command and mount the / dev / drbd0 partition: Create a file on the first node and then switch the second one to Primary mode: For node1: For node2: A file of 100 MB in size will be visible on the second node. We delete it and again switch to the first node: On node2: On node1: Run the ls / srv / data command. If there is no data on the partition, then replication was successful.
[node1]dd if=/dev/zero of=/dev/sdс1
[node1]dd if=/dev/zero of=/dev/sdd1
[node1]drbdadm create-md iscsi.config
[node1]drbdadm create-md iscsi.target.0
[node2]dd if=/dev/zero of=/dev/sdс1
[node2]dd if=/dev/zero of=/dev/sdd1
[node2]drbdadm create-md iscsi.config
[node2]drbdadm create-md iscsi.target.0
[node1]/etc/init.d/drbd start
[node2]/etc/init.d/drbd start
[node1]drbdadm -- --overwrite-data-of-peer primary iscsi.config
cat /proc/drbd:
version: 8.3.9 (api:88/proto:86-95)
srcversion: CF228D42875CF3A43F2945A
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:1048542 nr:0 dw:0 dr:1048747 al:0 bm:64 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:52428768
[node1]mkfs.ext3 /dev/drbd0
[node1]mkdir -p /srv/data
[node1]mount /dev/drbd0 /srv/data
[node1]dd if=/dev/zero of=/srv/data/test.zeros bs=1M count=100
[node1]umount /srv/data
[node1]drbdadm secondary iscsi.config
[node2]mkdir -p /srv/data
[node2]drbdadm primary iscsi.config
[node2]mount /dev/drbd0 /srv/data
ls –l /srv/data
[node2]rm /srv/data/test.zeros
[node2]umount /srv/data
[node2]drbdadm secondary iscsi.config
[node1]drbdadm primary iscsi.config
[node1]mount /dev/drbd0 /srv/data
We proceed to the installation of iSCSI-target. We select the first node as Primary and synchronize the sections: Wait for synchronization ... Install the iscsitarget package on both nodes: Turn on the option to start iscsi as a service: Delete entries from all scripts: Move the config files to the drbd section: We describe the iSCSI-target in the / srv / file data / iscsi / ietd.conf: Now you need to configure heartbeat to control the iSCSI-target's virtual IP address when a node fails. We describe the cluster in the /etc/heartbeat/ha.cf file: Authentication mechanism Change permissions on the / etc / heartbeat / authkeys file:
[node1]drbdadm -- --overwrite-data-of-peer primary iscsi.target.0
cat /proc/drbd
version: 8.3.9 (api:88/proto:86-95)
srcversion: CF228D42875CF3A43F2945A
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:135933 nr:96 dw:136029 dr:834 al:39 bm:8 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
ns:1012864 nr:0 dw:0 dr:1021261 al:0 bm:61 lo:1 pe:4 ua:64 ap:0 ep:1 wo:f oos:51416288
[>....................] sync'ed: 2.0% (50208/51196)M
finish: 0:08:27 speed: 101,248 (101,248) K/sec
cat /proc/drbd
version: 8.3.9 (api:88/proto:86-95)
srcversion: CF228D42875CF3A43F2945A
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:135933 nr:96 dw:136029 dr:834 al:39 bm:8 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:52428766 nr:0 dw:0 dr:52428971 al:0 bm:3200 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
[node1]apt-get -y install iscsitarget
[node2]apt-get -y install iscsitarget
[node1]sed -i s/false/true/ /etc/default/iscsitarget
[node2]sed -i s/false/true/ /etc/default/iscsitarget
[node1]update-rc.d -f iscsitarget remove
[node2]update-rc.d -f iscsitarget remove
[node1]mkdir /srv/data/iscsi
[node1] mv /etc/iet/ietd.conf /srv/data/iscsi
[node1]ln -s /srv/data/iscsi/ietd.conf /etc/iet/ietd.conf
[node2]rm /etc/iet/ietd.conf
[node2]ln -s /srv/data/iscsi/ietd.conf /etc/iet/ietd.conf
Target iqn.2011-08.local.demo:storage.disk.0
# IncomingUser geekshlby secret - закомментируем, чтоб не авторизоваться при подключении
# OutgoingUser geekshlby password
Lun 0 Path=/dev/drbd1,Type=blockio
Alias disk0
MaxConnections 1
InitialR2T Yes
ImmediateData No
MaxRecvDataSegmentLength 8192
MaxXmitDataSegmentLength 8192
MaxBurstLength 262144
FirstBurstLength 65536
DefaultTime2Wait 2
DefaultTime2Retain 20
MaxOutstandingR2T 8
DataPDUInOrder Yes
DataSequenceInOrder Yes
ErrorRecoveryLevel 0
HeaderDigest CRC32C,None
DataDigest CRC32C,None
Wthreads 8
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 120
bcast eth0
bcast eth1
node node1
node node2
/etc/heartbeat/authkeys:
auth 2
2 sha1 NoOneKnowsIt
chmod 600 /etc/heartbeat/authkeys
We describe the cluster resources in the file / etc / heartbeat / haresources - the main node, virtual IP, file systems and services that will be launched: Copy the configuration to the second node: Unmount / srv / data, make the first node as Secondary. We start heartbeat We reboot both servers. After starting the heartbeat, we translate the first node into primary mode, the second - secondary (otherwise, it will not start). We look tail –f / var / log / syslog We are waiting ... After some time ...
/etc/heartbeat/haresources
node1 drbddisk::iscsi.config Filesystem::/dev/drbd0::/srv/data::ext3
node1 IPaddr::10.11.55.50/8/eth0 drbddisk::iscsi.target.0 iscsitarget
[node1]scp /etc/heartbeat/ha.cf root@10.11.55.56:/etc/heartbeat/
[node1]scp /etc/heartbeat/authkeys root@10.11.55.56:/etc/heartbeat/
[node1]scp /etc/heartbeat/haresources root@10.11.55.56:/etc/heartbeat/
[node1]/etc/init.d/heartbeat start
[node1]/etc/init.d/drbd start
[node2]/etc/init.d/drbd start
[node1]drbdadm secondary iscsi.config - необязательно
[node1]drbdadm secondary iscsi.target.0 - необязательно
[node2]drbdadm primary iscsi.config
[node2]drbdadm primary iscsi.target.0
[node1]cat /proc/drbd
[node1]/etc/init.d/heartbeat start
[node2]drbdadm secondary iscsi.config
[node2]drbdadm secondary iscsi.target.0
[node1]drbdadm primary iscsi.config
[node1]drbdadm primary iscsi.target.0
Aug 26 08:32:14 node1 harc[11878]: info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp
Aug 26 08:32:14 node1 ip-request-resp[11878]: received ip-request-resp IPaddr::10.11.55.50/8/eth0 OK yes
Aug 26 08:32:14 node1 ResourceManager[11899]: info: Acquiring resource group: node1 IPaddr::10.11.55.50/8/eth0 drbddisk::iscsi.target.0 iscsitarget
Aug 26 08:32:14 node1 IPaddr[11926]: INFO: Resource is stopped
Aug 26 08:32:14 node1 ResourceManager[11899]: info: Running /etc/ha.d/resource.d/IPaddr 10.11.55.50/8/eth0 start
Aug 26 08:32:14 node1 IPaddr[12006]: INFO: Using calculated netmask for 10.11.55.50: 255.0.0.0
Aug 26 08:32:14 node1 IPaddr[12006]: INFO: eval ifconfig eth0:0 10.11.55.50 netmask 255.0.0.0 broadcast 10.255.255.255
Aug 26 08:32:14 node1 avahi-daemon[477]: Registering new address record for 10.11.55.50 on eth0.IPv4.
Aug 26 08:32:14 node1 IPaddr[11982]: INFO: Success
Aug 26 08:32:15 node1 ResourceManager[11899]: info: Running /etc/init.d/iscsitarget start
Aug 26 08:32:15 node1 kernel: [ 5402.722552] iSCSI Enterprise Target Software - version 1.4.20.2
Aug 26 08:32:15 node1 kernel: [ 5402.723978] iscsi_trgt: Registered io type fileio
Aug 26 08:32:15 node1 kernel: [ 5402.724057] iscsi_trgt: Registered io type blockio
Aug 26 08:32:15 node1 kernel: [ 5402.724061] iscsi_trgt: Registered io type nullio
Aug 26 08:32:15 node1 heartbeat: [12129]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Aug 26 08:32:15 node1 harc[12129]: info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp
Aug 26 08:32:15 node1 ip-request-resp[12129]: received ip-request-resp IPaddr::10.11.55.50/8/eth0 OK yes
Aug 26 08:32:15 node1 ResourceManager[12155]: info: Acquiring resource group: node1 IPaddr::10.11.55.50/8/eth0 drbddisk::iscsi.target.0 iscsitarget
Aug 26 08:32:15 node1 IPaddr[12186]: INFO: Running OK
Aug 26 08:33:08 node1 ntpd[1634]: Listen normally on 11 eth0:0 10.11.55.50 UDP 123
Aug 26 08:33:08 node1 ntpd[1634]: new interface(s) found: waking up resolver
ifconfig
eth0 Link encap:Ethernet HWaddr 00:50:56:20:f9:6c
inet addr:10.11.55.55 Bcast:10.255.255.255 Mask:255.0.0.0
inet6 addr: fe80::20c:29ff:fe20:f96c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3622 errors:0 dropped:0 overruns:0 frame:0
TX packets:8081 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:302472 (302.4 KB) TX bytes:6943622 (6.9 MB)
Interrupt:19 Base address:0x2000
eth0:0 Link encap:Ethernet HWaddr 00:50:56:20:f9:6c
inet addr:10.11.55.50 Bcast:10.255.255.255 Mask:255.0.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:19 Base address:0x2000
eth1 Link encap:Ethernet HWaddr 00:50:56:20:f9:76
inet addr:192.168.22.11 Bcast:192.168.22.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe20:f976/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1765 errors:0 dropped:0 overruns:0 frame:0
TX packets:3064 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:171179 (171.1 KB) TX bytes:492567 (492.5 KB)
Interrupt:19 Base address:0x2080
We connect the resulting iSCSI-target to both ESX (i) hosts. After both hosts saw the storage, we assemble the HA cluster. Although there is no space left for creating virtual machines on the hosts themselves, now this space appears to be virtual storage. If any of the nodes fails, the virtual machine on the second node will go into Primary mode and will continue to work as iSCSI-target.
Using hdparm, I measured the speed of a disk in a virtual machine installed on target'e:

Naturally, such a storage system is not suitable for serious production systems. But if there are no heavily loaded virtual machines or if it is necessary to test the possibility of building an HA cluster, then this way of providing shared storage has the right to life.
After reading this material, many will probably say that this is “wrong”, “there will be drawbacks in performance”, “the possibility of failure of both nodes”, etc. Yes! Maybe it will be so, but after all, for some reason, VMware released its Storage Appliance?
PS: By the way, who is too lazy to shovel everything manually, there is a Management Console for setting up a DRBD cluster: http://www.drbd.org/mc/screenshot-gallery/ .
madbug ,
senior systems engineer, DEPO Computers