Ceph in ProxMox on ZFS

In his work (system administrator), you always have to look for things and knowledge that are unique to your region. One of such things in our office is ProxMox, installed on the ZFS file system, which allows you to use a good raid array without using iron controllers. Once, thinking what else could surprise and please the customers, we decided to upload all this to the Ceph distributed file system. I don’t know how adequate such a decision was, but I decided to make a wish come true. And then it started ... I shoveled mountains of articles and forums, but I did not find one adequate manual that describes in detail what and how to do, therefore, having coped with everything, this article was born, who are interested, welcome to cat.


image



So, in principle, everything is done in the console and the ProxMox web muzzle is not really needed for us. I did everything in test mode, so two virtual machines with four disks were raised inside a proxymox not very powerful in iron (a sort of nested doll). The four discs were originally due to the fact that I wanted to raise, like on the future no longer test hardware, on ZFS10, but the goldfish did not come out for reasons unknown to me (in fact, it was too lazy to understand). It turned out that ProxMox was unable to partition ZFS10 on virtual disks, so it was decided to use a slightly different “geography”. ProxMox itself was installed on one of the disks, ZFS1 rose on the other two, the third was supposedly under the Ceph magazine, but in the end I forgot about it, so for now, leave it alone. So let's get started.


There will be a small introduction:


We have a proxmox freshly installed in two places. Nodes are called ceph1 and ceph2. We do everything the same on both nodes, except for the places that I designate. Our network is 192.168.111.0/24. The first node (ceph1) has the address 192.168.111.1, the second (ceph2) is 192.168.111.2. The disks on both nodes have the following meanings: / dev / vda - the disk on which ProxMox is standing, / dev / vdb and / dev / vdc - the disks intended for ZFS, / dev / vdd - the disk for the Ceph log.


The first thing we need to do is change the paid ProxMox repository, which requires a subscription, to a free one:


nano /etc/apt/sources.list.d/pve-enterprise.list

There we comment on a single line and enter a new one below:


deb http://download.proxmox.com/debian jessie pve-no-subscription

Next, we update our ProxMox:


apt update && apt dist-upgrade

Install packages for working with Ceph:


pveceph install -version hammer

The next step we need to make a cluster of proxies.


On the first note, execute sequentially:


pvecm create mycluster

where mycluster is the name of our cluster.


On the second note:


pvecm add 192.168.111.1

We agree that we need to accept the ssh key and enter the root password from the first node.


Checking this whole thing with the pvecm status command


Next, we initialize the Ceph configuration (done only on the first node, which will be the “main”):


pveceph init --network 192.168.111.0/24

this will create a symlink for us on /etc/ceph/ceph.conf, from which we will continue to build on.


Right after that we need to add an option to the [osd] section there:


[osd]
    journal dio = false

This is because ZFS does not know how directIO.


The next thing we do is prepare our ZFS pool. To do this, you need to mark the disks in the GPT:


fdisk /dev/vdb

There, successively press g and w (g to create the GPT table and w to accept the changes). Repeat the same on / dev / vdc.


We create a mirror ZFS pool, it will be called with us as is customary in ProxMox - rpool:


zpool create rpool mirror /dev/vdb /dev/vdc

Check with the zpool status -v command and get (at least should):


pool: rpool
state: ONLINE
scan: none requested
config:
    NAME    STATE               READ    WRITE   CKSUM
    rpool   ONLINE              0       0       0
        mirror-0    ONLINE          0       0       0
            vdb     ONLINE      0       0       0
            vdc     ONLINE      0       0       0
errors: No known data errors

We have created the ZFS pool, it's time to do the most important thing - ceph.


Let's create a file system (a strange name, but it's taken from the docks on ZFS) for our Ceph monitor:


zfs create -o mountpoint=/var/lib/ceph/mon rpool/ceph-monfs

Create the monitor itself (first on the first node, then on the second):


pveceph createmon

Then begins what I had to tinker with, namely, how to make a block device for Ceph OSD (and it works with them) in ZFS and so that it also works.


And everything is done simply - through zvol:


zfs create -V 90G rpool/ceph-osdfs 

90G is how much we give to our Ceph to be torn to pieces. So little because the server is virtual and more than 100G I did not give it.


Well, let's do Ceph OSD ourselves:


ceph-disk prepare --zap-disk --fs-type xfs --cluster ceph --cluster-uuid FSID /dev/zd0

--fs-type we have chosen XFS because XFS is the default FS in Ceph. FSID is the ID of our Ceph, which can be found in /etc/ceph/ceph.conf. Well, and / dev / zd0 is our zvol.


If after that you df -h does not show something like this:


/dev/zd0p1         85G   35M   85G   1% /var/lib/ceph/osd/ceph-0

значит что-то пошло не так и вам либо нужно перезагрузиться, либо ещё раз нужно выполнить создание ceph OSD.


В общем то, на этом мы уже сделали наш ceph и можно дальше им рулить уже в вебморде ProxMox и создать на нем нужное RDB хранилище, но вы не сможете его использовать (собственно, ради чего всё это затевалось). Лечится простым способом (для этого всё-таки хранилище надо создать) — нужно скопировать ключ ceph с первой ноды во вторую.


Открываем конфиг хранилищ ProxMox:


nano /etc/pve/storage.cfg 

И вписываем туда нужный нам RBD:


rbd: test
    monhost 192.168.111.1:6789;192.168.111.2:6789
    pool rbd
    krbd 1
    username admin
    content images

Здесь test — это имя нашего хранилища, а IP адреса — это то, где находятся ceph мониторы, то есть наши проксмоксы. Остальные опции дефолтные.


Дальше создаем папочку для ключа на второй ноде:


mkdir /etc/pve/priv/ceph

И копируем ключ с первой:


scp ceph1:/etc/ceph/ceph.client.admin.keyring /etc/pve/priv/ceph/test.keyring

Здесь ceph1 — наша первая нода, а test — имя хранилища.


You can put an end to this - the storage is active and working, we can use all ceph buns.


Thanks for attention!


In order to raise all this, I used these links:


" Https://pve.proxmox.com/wiki/Storage:_Ceph
" https://pve.proxmox.com/wiki/Ceph_Server
" http://xgu.ru/wiki/ZFS
" https: //forum.proxmox .com


Also popular now: