DEPOteam October 17, 2011 at 14:00

Free replacement VMware vSphere Storage Appliance based on DRBD

Recently, VMware announced new products in the vSphere 5 line, and we were very interested in what is the VMware vSphere Storage Appliance?

Briefly, the essence lies in the possibility of building a fault-tolerant virtual infrastructure without external storage. For implementation, two or three virtual machines are installed (one for each host) that replicate the free space of the disk subsystem of ESXi servers and provide it as a common storage to all the same ESXi hosts. Details in Russian Storage Appliance described here .

An interesting idea, but the price bites - around $ 6K. In addition, if you think about performance, will it fail to get a drawdown on the speed of the disk array? Approaching the issue on the other hand, you can think of many other ways of organizing external storage. For example, you can create external storage from almost any hardware with the required number of disks and installed software Openfiler, FreeNAS, Nexenta, Open-E - in these software products there is the possibility of replication between systems.

This approach is practiced by many companies that do not have the opportunity to purchase expensive storage systems of a renowned manufacturer, which would provide sufficient performance and reliability. Typically, such systems are equipped with two controllers, redundant power system, high-speed disks and more ...

However, back to the beginning and look at the scheme that VMware offers:

What do we see? 3 ESXi hosts with virtual machines deployed on them, one for each host. The machines are clustered and give us internal drives as external.

The idea to put together a similar solution from the available tools has long been in the air, but could not find any justification. And then VMware itself gave an impetus in order to try everything in a test environment.

Solutions for building fault-tolerant storage - a bunch, for example, based on Openfiler + DRBD + Heartbeat. But at the heart of all these decisions is the idea of building an external storage. Why not try to do something similar, but based on virtual machines?

As a foundation, take 2 virtual machines with OS Ubuntu, Ubuntu documentation on building failover iSCSI-target and try to make your own Appliance.

Partitioning disks on both nodes of the cluster: The disk size sdd1 is selected as an example. In fact, all the remaining free space on the local storage of the ESXi host is taken. ISCSI network: Private network: / etc / network / interfaces: For node1: For node2: File / etc / hosts for both nodes: Installing packages: Restart the servers. Changing file ownership and permissions: Use /etc/drbd.conf to describe the configuration. We define 2 resources: 1. DRBD device, which will contain ISCSI configuration files; 2. DRBD device, which will become our iSCSI-target. For node1: Copy the configuration to the second node:

/dev/sda1 - 10 GB / (primary' ext3, Bootable flag: on)

/dev/sda5 - 1 GB swap (logical)


/dev/sdb1 - 1 GB (primary) DRBD meta-данные. Не монтируем. 

/dev/sdc1 - 1 GB (primary) DRBD диск, используемый для хранения конфигурационных файлов iSCSI. Не монтируем.

/dev/sdd1 - 50 GB (primary) DRBD диск для iSCSI-target.

iSCSI server1: node1.demo.local IP address: 10.11.55.55

iSCSI server2: node2.demo.local IP address: 10.11.55.56

iSCSI Virtual IP address 10.11.55.50

iSCSI server1: node1-private IP address: 192.168.22.11

iSCSI server2: node2-private IP address: 192.168.22.12

auto eth0

iface eth0 inet static

 address 10.11.55.55

 netmask 255.0.0.0

 gateway 10.0.0.1


auto eth1

iface eth1 inet static

 address 192.168.22.11

 netmask 255.255.255.0

auto eth0

iface eth0 inet static

 address 10.11.55.56

 netmask 255.0.0.0

 gateway 10.0.0.1


auto eth1

iface eth1 inet static

 address 192.168.22.12

 netmask 255.255.255.0

127.0.0.1 localhost

10.11.55.55 node1.demo.local node1

10.11.55.56 node2.demo.local node2

192.168.22.11 node1-private

192.168.22.12 node2-private

apt-get -y install ntp ssh drbd8-utils heartbeat jfsutils

chgrp haclient /sbin/drbdsetup

chmod o-x /sbin/drbdsetup

chmod u+s /sbin/drbdsetup

chgrp haclient /sbin/drbdmeta

chmod o-x /sbin/drbdmeta

chmod u+s /sbin/drbdmeta

/etc/drbd.conf:


resource iscsi.config {

 protocol C;


 handlers {

 pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";

 pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";

 local-io-error "echo o > /proc/sysrq-trigger ; halt -f";

 outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; 

 }


startup {

 degr-wfc-timeout 120;

 }


disk {

 on-io-error detach;

 }


net {

 cram-hmac-alg sha1;

 shared-secret "password";

 after-sb-0pri disconnect;

 after-sb-1pri disconnect;

 after-sb-2pri disconnect;

 rr-conflict disconnect;

 }


syncer {

 rate 100M;

 verify-alg sha1;

 al-extents 257;

 }


on node1 {

 device /dev/drbd0;

 disk /dev/sdc1;

 address 192.168.22.11:7788;

 meta-disk /dev/sdb1[0];

 }


on node2 {

 device /dev/drbd0;

 disk /dev/sdc1;

 address 192.168.22.12:7788;

 meta-disk /dev/sdb1[0];

 }

}


resource iscsi.target.0 {

 protocol C;


 handlers {

 pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";

 pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";

 local-io-error "echo o > /proc/sysrq-trigger ; halt -f";

 outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; 

 }


startup {

 degr-wfc-timeout 120;

 }


disk {

 on-io-error detach;

 }


net {

 cram-hmac-alg sha1;

 shared-secret "password";

 after-sb-0pri disconnect;

 after-sb-1pri disconnect;

 after-sb-2pri disconnect;

 rr-conflict disconnect;

 }


syncer {

 rate 100M;

 verify-alg sha1;

 al-extents 257;

 }


on node1 {

 device /dev/drbd1;

 disk /dev/sdd1;

 address 192.168.22.11:7789;

 meta-disk /dev/sdb1[1];

 }


on node2 {

 device /dev/drbd1;

 disk /dev/sdd1;

 address 192.168.22.12:7789;

 meta-disk /dev/sdb1[1];

 }

}

scp /etc/drbd.conf root@10.11.55.56:/etc/

We initialize the disks with meta-data on both servers: Run drbd: Now you need to decide which server will act as primary and which will be secondary in order to synchronize between the disks. Let's say that primary is node1. Run the command on the first node: Format the command and mount the / dev / drbd0 partition: Create a file on the first node and then switch the second one to Primary mode: For node1: For node2: A file of 100 MB in size will be visible on the second node. We delete it and again switch to the first node: On node2: On node1: Run the ls / srv / data command. If there is no data on the partition, then replication was successful.

[node1]dd if=/dev/zero of=/dev/sdс1

[node1]dd if=/dev/zero of=/dev/sdd1

[node1]drbdadm create-md iscsi.config

[node1]drbdadm create-md iscsi.target.0


[node2]dd if=/dev/zero of=/dev/sdс1

[node2]dd if=/dev/zero of=/dev/sdd1

[node2]drbdadm create-md iscsi.config

[node2]drbdadm create-md iscsi.target.0

[node1]/etc/init.d/drbd start

[node2]/etc/init.d/drbd start

[node1]drbdadm -- --overwrite-data-of-peer primary iscsi.config

cat /proc/drbd:


version: 8.3.9 (api:88/proto:86-95)

srcversion: CF228D42875CF3A43F2945A

0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----

 ns:1048542 nr:0 dw:0 dr:1048747 al:0 bm:64 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

 1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----

 ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:52428768

[node1]mkfs.ext3 /dev/drbd0

[node1]mkdir -p /srv/data

[node1]mount /dev/drbd0 /srv/data

[node1]dd if=/dev/zero of=/srv/data/test.zeros bs=1M count=100

[node1]umount /srv/data

[node1]drbdadm secondary iscsi.config

[node2]mkdir -p /srv/data

[node2]drbdadm primary iscsi.config

[node2]mount /dev/drbd0 /srv/data

ls –l /srv/data

[node2]rm /srv/data/test.zeros

[node2]umount /srv/data

[node2]drbdadm secondary iscsi.config

[node1]drbdadm primary iscsi.config

[node1]mount /dev/drbd0 /srv/data

We proceed to the installation of iSCSI-target. We select the first node as Primary and synchronize the sections: Wait for synchronization ... Install the iscsitarget package on both nodes: Turn on the option to start iscsi as a service: Delete entries from all scripts: Move the config files to the drbd section: We describe the iSCSI-target in the / srv / file data / iscsi / ietd.conf: Now you need to configure heartbeat to control the iSCSI-target's virtual IP address when a node fails. We describe the cluster in the /etc/heartbeat/ha.cf file: Authentication mechanism Change permissions on the / etc / heartbeat / authkeys file:

[node1]drbdadm -- --overwrite-data-of-peer primary iscsi.target.0


cat /proc/drbd


version: 8.3.9 (api:88/proto:86-95)

srcversion: CF228D42875CF3A43F2945A

 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----

 ns:135933 nr:96 dw:136029 dr:834 al:39 bm:8 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----

 ns:1012864 nr:0 dw:0 dr:1021261 al:0 bm:61 lo:1 pe:4 ua:64 ap:0 ep:1 wo:f oos:51416288

 [>....................] sync'ed: 2.0% (50208/51196)M

 finish: 0:08:27 speed: 101,248 (101,248) K/sec

cat /proc/drbd


version: 8.3.9 (api:88/proto:86-95)

srcversion: CF228D42875CF3A43F2945A

 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----

 ns:135933 nr:96 dw:136029 dr:834 al:39 bm:8 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----

 ns:52428766 nr:0 dw:0 dr:52428971 al:0 bm:3200 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

[node1]apt-get -y install iscsitarget

[node2]apt-get -y install iscsitarget

[node1]sed -i s/false/true/ /etc/default/iscsitarget

[node2]sed -i s/false/true/ /etc/default/iscsitarget

[node1]update-rc.d -f iscsitarget remove

[node2]update-rc.d -f iscsitarget remove

[node1]mkdir /srv/data/iscsi

[node1] mv /etc/iet/ietd.conf /srv/data/iscsi

[node1]ln -s /srv/data/iscsi/ietd.conf /etc/iet/ietd.conf

[node2]rm /etc/iet/ietd.conf

[node2]ln -s /srv/data/iscsi/ietd.conf /etc/iet/ietd.conf

Target iqn.2011-08.local.demo:storage.disk.0

 # IncomingUser geekshlby secret - закомментируем, чтоб не авторизоваться при подключении

 # OutgoingUser geekshlby password

 Lun 0 Path=/dev/drbd1,Type=blockio

 Alias disk0

 MaxConnections 1

 InitialR2T Yes

 ImmediateData No

 MaxRecvDataSegmentLength 8192

 MaxXmitDataSegmentLength 8192

 MaxBurstLength 262144

 FirstBurstLength 65536

 DefaultTime2Wait 2

 DefaultTime2Retain 20

 MaxOutstandingR2T 8

 DataPDUInOrder Yes

 DataSequenceInOrder Yes

 ErrorRecoveryLevel 0

 HeaderDigest CRC32C,None

 DataDigest CRC32C,None

 Wthreads 8

logfacility local0

keepalive 2

deadtime 30

warntime 10

initdead 120

bcast eth0

bcast eth1

node node1

node node2

/etc/heartbeat/authkeys:


auth 2

2 sha1 NoOneKnowsIt

chmod 600 /etc/heartbeat/authkeys

We describe the cluster resources in the file / etc / heartbeat / haresources - the main node, virtual IP, file systems and services that will be launched: Copy the configuration to the second node: Unmount / srv / data, make the first node as Secondary. We start heartbeat We reboot both servers. After starting the heartbeat, we translate the first node into primary mode, the second - secondary (otherwise, it will not start). We look tail –f / var / log / syslog We are waiting ... After some time ...

/etc/heartbeat/haresources


node1 drbddisk::iscsi.config Filesystem::/dev/drbd0::/srv/data::ext3

node1 IPaddr::10.11.55.50/8/eth0 drbddisk::iscsi.target.0 iscsitarget

[node1]scp /etc/heartbeat/ha.cf root@10.11.55.56:/etc/heartbeat/

[node1]scp /etc/heartbeat/authkeys root@10.11.55.56:/etc/heartbeat/

[node1]scp /etc/heartbeat/haresources root@10.11.55.56:/etc/heartbeat/

[node1]/etc/init.d/heartbeat start

[node1]/etc/init.d/drbd start

[node2]/etc/init.d/drbd start


[node1]drbdadm secondary iscsi.config - необязательно

[node1]drbdadm secondary iscsi.target.0 - необязательно


[node2]drbdadm primary iscsi.config

[node2]drbdadm primary iscsi.target.0 


[node1]cat /proc/drbd


[node1]/etc/init.d/heartbeat start

[node2]drbdadm secondary iscsi.config 

[node2]drbdadm secondary iscsi.target.0 


[node1]drbdadm primary iscsi.config

[node1]drbdadm primary iscsi.target.0

Aug 26 08:32:14 node1 harc[11878]: info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp

Aug 26 08:32:14 node1 ip-request-resp[11878]: received ip-request-resp IPaddr::10.11.55.50/8/eth0 OK yes

Aug 26 08:32:14 node1 ResourceManager[11899]: info: Acquiring resource group: node1 IPaddr::10.11.55.50/8/eth0 drbddisk::iscsi.target.0 iscsitarget

Aug 26 08:32:14 node1 IPaddr[11926]: INFO: Resource is stopped

Aug 26 08:32:14 node1 ResourceManager[11899]: info: Running /etc/ha.d/resource.d/IPaddr 10.11.55.50/8/eth0 start

Aug 26 08:32:14 node1 IPaddr[12006]: INFO: Using calculated netmask for 10.11.55.50: 255.0.0.0

Aug 26 08:32:14 node1 IPaddr[12006]: INFO: eval ifconfig eth0:0 10.11.55.50 netmask 255.0.0.0 broadcast 10.255.255.255

Aug 26 08:32:14 node1 avahi-daemon[477]: Registering new address record for 10.11.55.50 on eth0.IPv4.

Aug 26 08:32:14 node1 IPaddr[11982]: INFO: Success

Aug 26 08:32:15 node1 ResourceManager[11899]: info: Running /etc/init.d/iscsitarget start

Aug 26 08:32:15 node1 kernel: [ 5402.722552] iSCSI Enterprise Target Software - version 1.4.20.2

Aug 26 08:32:15 node1 kernel: [ 5402.723978] iscsi_trgt: Registered io type fileio

Aug 26 08:32:15 node1 kernel: [ 5402.724057] iscsi_trgt: Registered io type blockio

Aug 26 08:32:15 node1 kernel: [ 5402.724061] iscsi_trgt: Registered io type nullio

Aug 26 08:32:15 node1 heartbeat: [12129]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL

Aug 26 08:32:15 node1 harc[12129]: info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp

Aug 26 08:32:15 node1 ip-request-resp[12129]: received ip-request-resp IPaddr::10.11.55.50/8/eth0 OK yes

Aug 26 08:32:15 node1 ResourceManager[12155]: info: Acquiring resource group: node1 IPaddr::10.11.55.50/8/eth0 drbddisk::iscsi.target.0 iscsitarget

Aug 26 08:32:15 node1 IPaddr[12186]: INFO: Running OK

Aug 26 08:33:08 node1 ntpd[1634]: Listen normally on 11 eth0:0 10.11.55.50 UDP 123

Aug 26 08:33:08 node1 ntpd[1634]: new interface(s) found: waking up resolver

ifconfig

eth0 Link encap:Ethernet HWaddr 00:50:56:20:f9:6c

 inet addr:10.11.55.55 Bcast:10.255.255.255 Mask:255.0.0.0

 inet6 addr: fe80::20c:29ff:fe20:f96c/64 Scope:Link

 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

 RX packets:3622 errors:0 dropped:0 overruns:0 frame:0

 TX packets:8081 errors:0 dropped:0 overruns:0 carrier:0

 collisions:0 txqueuelen:1000

 RX bytes:302472 (302.4 KB) TX bytes:6943622 (6.9 MB)

 Interrupt:19 Base address:0x2000


eth0:0 Link encap:Ethernet HWaddr 00:50:56:20:f9:6c

 inet addr:10.11.55.50 Bcast:10.255.255.255 Mask:255.0.0.0

 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

 Interrupt:19 Base address:0x2000


eth1 Link encap:Ethernet HWaddr 00:50:56:20:f9:76

 inet addr:192.168.22.11 Bcast:192.168.22.255 Mask:255.255.255.0

 inet6 addr: fe80::20c:29ff:fe20:f976/64 Scope:Link

 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

 RX packets:1765 errors:0 dropped:0 overruns:0 frame:0

 TX packets:3064 errors:0 dropped:0 overruns:0 carrier:0

 collisions:0 txqueuelen:1000

 RX bytes:171179 (171.1 KB) TX bytes:492567 (492.5 KB)

 Interrupt:19 Base address:0x2080

We connect the resulting iSCSI-target to both ESX (i) hosts. After both hosts saw the storage, we assemble the HA cluster. Although there is no space left for creating virtual machines on the hosts themselves, now this space appears to be virtual storage. If any of the nodes fails, the virtual machine on the second node will go into Primary mode and will continue to work as iSCSI-target.

Using hdparm, I measured the speed of a disk in a virtual machine installed on target'e:

Naturally, such a storage system is not suitable for serious production systems. But if there are no heavily loaded virtual machines or if it is necessary to test the possibility of building an HA cluster, then this way of providing shared storage has the right to life.

After reading this material, many will probably say that this is “wrong”, “there will be drawbacks in performance”, “the possibility of failure of both nodes”, etc. Yes! Maybe it will be so, but after all, for some reason, VMware released its Storage Appliance?

PS: By the way, who is too lazy to shovel everything manually, there is a Management Console for setting up a DRBD cluster: http://www.drbd.org/mc/screenshot-gallery/ .

madbug ,
senior systems engineer, DEPO Computers

Tags:

Free replacement VMware vSphere Storage Appliance based on DRBD

Also popular now: