Ceph in ProxMox on ZFS
In his work (system administrator), you always have to look for things and knowledge that are unique to your region. One of such things in our office is ProxMox, installed on the ZFS file system, which allows you to use a good raid array without using iron controllers. Once, thinking what else could surprise and please the customers, we decided to upload all this to the Ceph distributed file system. I don’t know how adequate such a decision was, but I decided to make a wish come true. And then it started ... I shoveled mountains of articles and forums, but I did not find one adequate manual that describes in detail what and how to do, therefore, having coped with everything, this article was born, who are interested, welcome to cat.
So, in principle, everything is done in the console and the ProxMox web muzzle is not really needed for us. I did everything in test mode, so two virtual machines with four disks were raised inside a proxymox not very powerful in iron (a sort of nested doll). The four discs were originally due to the fact that I wanted to raise, like on the future no longer test hardware, on ZFS10, but the goldfish did not come out for reasons unknown to me (in fact, it was too lazy to understand). It turned out that ProxMox was unable to partition ZFS10 on virtual disks, so it was decided to use a slightly different “geography”. ProxMox itself was installed on one of the disks, ZFS1 rose on the other two, the third was supposedly under the Ceph magazine, but in the end I forgot about it, so for now, leave it alone. So let's get started.
There will be a small introduction:
We have a proxmox freshly installed in two places. Nodes are called ceph1 and ceph2. We do everything the same on both nodes, except for the places that I designate. Our network is 192.168.111.0/24. The first node (ceph1) has the address 192.168.111.1, the second (ceph2) is 192.168.111.2. The disks on both nodes have the following meanings: / dev / vda - the disk on which ProxMox is standing, / dev / vdb and / dev / vdc - the disks intended for ZFS, / dev / vdd - the disk for the Ceph log.
The first thing we need to do is change the paid ProxMox repository, which requires a subscription, to a free one:
nano /etc/apt/sources.list.d/pve-enterprise.list
There we comment on a single line and enter a new one below:
deb http://download.proxmox.com/debian jessie pve-no-subscription
Next, we update our ProxMox:
apt update && apt dist-upgrade
Install packages for working with Ceph:
pveceph install -version hammer
The next step we need to make a cluster of proxies.
On the first note, execute sequentially:
pvecm create mycluster
where mycluster is the name of our cluster.
On the second note:
pvecm add 192.168.111.1
We agree that we need to accept the ssh key and enter the root password from the first node.
Checking this whole thing with the pvecm status command
Next, we initialize the Ceph configuration (done only on the first node, which will be the “main”):
pveceph init --network 192.168.111.0/24
this will create a symlink for us on /etc/ceph/ceph.conf, from which we will continue to build on.
Right after that we need to add an option to the [osd] section there:
[osd]
journal dio = false
This is because ZFS does not know how directIO.
The next thing we do is prepare our ZFS pool. To do this, you need to mark the disks in the GPT:
fdisk /dev/vdb
There, successively press g and w (g to create the GPT table and w to accept the changes). Repeat the same on / dev / vdc.
We create a mirror ZFS pool, it will be called with us as is customary in ProxMox - rpool:
zpool create rpool mirror /dev/vdb /dev/vdc
Check with the zpool status -v command and get (at least should):
pool: rpool
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
vdb ONLINE 0 0 0
vdc ONLINE 0 0 0
errors: No known data errors
We have created the ZFS pool, it's time to do the most important thing - ceph.
Let's create a file system (a strange name, but it's taken from the docks on ZFS) for our Ceph monitor:
zfs create -o mountpoint=/var/lib/ceph/mon rpool/ceph-monfs
Create the monitor itself (first on the first node, then on the second):
pveceph createmon
Then begins what I had to tinker with, namely, how to make a block device for Ceph OSD (and it works with them) in ZFS and so that it also works.
And everything is done simply - through zvol:
zfs create -V 90G rpool/ceph-osdfs
90G is how much we give to our Ceph to be torn to pieces. So little because the server is virtual and more than 100G I did not give it.
Well, let's do Ceph OSD ourselves:
ceph-disk prepare --zap-disk --fs-type xfs --cluster ceph --cluster-uuid FSID /dev/zd0
--fs-type we have chosen XFS because XFS is the default FS in Ceph. FSID is the ID of our Ceph, which can be found in /etc/ceph/ceph.conf. Well, and / dev / zd0 is our zvol.
If after that you df -h does not show something like this:
/dev/zd0p1 85G 35M 85G 1% /var/lib/ceph/osd/ceph-0
значит что-то пошло не так и вам либо нужно перезагрузиться, либо ещё раз нужно выполнить создание ceph OSD.
В общем то, на этом мы уже сделали наш ceph и можно дальше им рулить уже в вебморде ProxMox и создать на нем нужное RDB хранилище, но вы не сможете его использовать (собственно, ради чего всё это затевалось). Лечится простым способом (для этого всё-таки хранилище надо создать) — нужно скопировать ключ ceph с первой ноды во вторую.
Открываем конфиг хранилищ ProxMox:
nano /etc/pve/storage.cfg
И вписываем туда нужный нам RBD:
rbd: test
monhost 192.168.111.1:6789;192.168.111.2:6789
pool rbd
krbd 1
username admin
content images
Здесь test — это имя нашего хранилища, а IP адреса — это то, где находятся ceph мониторы, то есть наши проксмоксы. Остальные опции дефолтные.
Дальше создаем папочку для ключа на второй ноде:
mkdir /etc/pve/priv/ceph
И копируем ключ с первой:
scp ceph1:/etc/ceph/ceph.client.admin.keyring /etc/pve/priv/ceph/test.keyring
Здесь ceph1 — наша первая нода, а test — имя хранилища.
You can put an end to this - the storage is active and working, we can use all ceph buns.
Thanks for attention!
In order to raise all this, I used these links:
" Https://pve.proxmox.com/wiki/Storage:_Ceph
" https://pve.proxmox.com/wiki/Ceph_Server
" http://xgu.ru/wiki/ZFS
" https: //forum.proxmox .com