Shared Storage (Ceph) - Funky Penguin's Geek Cookbook

@funkypenguin I’ve finally got round to setting this up and it all appears to be working great, thank you. My only question is that the output from ceph status shows a total of 30GiB (3 x 10GB disks) with an availability of 27GiB but when doing a df -h the mounted ceph path shows only 8.5G available? Here’s the output from ceph status:

cluster:
id: 630cf0f4-a389-11ea-978c-fa163ec47a7e
health: HEALTH_OK

services:
mon: 3 daemons, quorum dn0,dn1,dn2 (age 11h)
mgr: dn0.qomczu(active, since 11h), standbys: dn1.cpxaem
mds: data:1 {0=data.dn2.moduyx=up:active} 1 up:standby
osd: 3 osds: 3 up (since 11h), 3 in (since 12h)

task status:
scrub status:
mds.data.dn2.moduyx: idle

data:
pools: 3 pools, 65 pgs
objects: 27 objects, 74 KiB
usage: 3.0 GiB used, 27 GiB / 30 GiB avail
pgs: 65 active+clean

Thank you

Edit: Also, notice only one of my nodes is in standby as a manager, how do I get the 3rd node in standby too? If I reboot any of the VPS (for a kernel update for example), rebooting dn1 (in standby) or dn2 (the one not in standby) ceph continues to operate with a warning. However, rebooting dn0 (the one with cephadm on) ceph hangs and doesn’t operate (this concerns me to what would happen once I have docker volumes mounted to here with DBs running etc if ceph stops!) until the VPS is back online. Any ideas?

Edit 2: (Sorry) I’m also receiving a health warning with 1 hosts fail cephadm check (How do I go about finding out which host has the issue) and any ideas on what might cause this. This is only a mess around so happy to wipe and try again but thought I’d best ask first :smiley:

P.S I’m a complete noob to all of this stuff so forgive my ignorance :slight_smile:

This’ll mean that you have 30GiB of “raw” storage, but due to the amount of replicas you’ve chosen (3, by default) in the pool used to back cephfs, you’re only presented with an availability of 8.5G for cephfs.

If (for example), you created another pool (say, for rbd storage) with 2 replicas instead of 3, you’d have 17G available in this pool, and as you consumed each pool, the available storage would be reduced accordingly.

1 Like

I’ve not tried this yet, but based on https://ceph.readthedocs.io/en/latest/mgr/orchestrator/, I think you’d do something like ceph orch apply mgr 3 dn2. However, how do you know that ceph hangs when dn0 reboots? What does ceph -s on dn1/2 tell you when this happens?

@funkypenguin Thank you David for explaining that, makes more sense now. My next question of course is how would I go about specifying 2 replicas for the pool I have or for a new one (Happy to start again if easier)? The idea in my mind is that I can then lose one VPS (for a reboot) while having enough space left over out of the 3 nodes I currently have setup.

@funkypenguin Thanks again, I’ll have a play with this and report back. Noticed I’ve only got 2/3 nodes for MDS too. Out of interest, did you find this was the case when you tried this or did you have all hosts showing for all the various services (like the old guide used to)?

When running ceph -s on dn1/2 (while dn0 is offline) it just sits and never returns, also, if I put a test file in the ceph mount (during this time) it is not copied over to the remaining host. If I reboot either the other two but not dn0 ceph -s returns immediately with the warning that a host is down (as expected) and continues to operate normally

In fstab you need to put:

manager1,manager2,manager3:/ /var/data ceph name=admin,secret=[cephkey],noatime,_netdev 0 0

[cephkey] is generated by sudo ceph-authtool -p /etc/ceph/ceph.client.admin.keyring.

In Debian Buster you must install ceph-common.

Sorry for my bad english… I hope I have helped.

@funkypenguin, you forgot add secret option at fstab in to the recipe.

Welcome @cristain_dkb! I turns out I didn’t need a secret option - provided I installed ceph-common using the Octopus repo installed by cephadm, it just worked :slight_smile:

Maybe, it’s a distro thing. With Debian Buster without secret option I had error 22, like @zeiglecm . But with secret option is working perfect.

Another thing, to run ceph orch host add [node-name], I needed execute sudo ./cephadm shell --fsid [fsid] -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring before.

@funkypenguin, this is a great job. Thanks you very much. I’m going to deploy a Swarm in a University in Argentina with servers in premise. And these guides are very useful.

1 Like

@cristain_dkb I’ve added a section on Debian Buster, but I can’t test it currently: https://geek-cookbook.funkypenguin.co.nz/ha-docker-swarm/shared-storage-ceph/#mount-cephfs-volume

Would you mind validating that my example works for you?

Thanks!
D

Hi I am totally new to this topic, go gentle on me.
I am trying to create a one node cluster just to see if i got basics right.
(and do i really need 3 cluster hosts, if so i might have to give up now as i only have two)

I get this error (and yes i had to do the secret export):

root@docker01:/etc# mount -a
mount error: no mds server is up or the cluster is laggy

how long does it take to create the mds?

root@docker01:/etc# ceph -s
  cluster:
id:     9fa14f50-af83-11ea-a832-0211322d0d4d
health: HEALTH_WARN
        1 MDSs report slow metadata IOs
        Reduced data availability: 65 pgs inactive
        Degraded data redundancy: 65 pgs undersized
        OSD count 1 < osd_pool_default_size 3
 
  services:
mon: 1 daemons, quorum docker01 (age 58m)
mgr: docker01.xnkndd(active, since 56m)
mds: data:1 {0=data.docker01.tvnvea=up:creating}
osd: 1 osds: 1 up (since 50m), 1 in (since 50m)
 
  task status:
scrub status:
    mds.data.docker01.tvnvea: idle
 
  data:
pools:   3 pools, 65 pgs
objects: 0 objects, 0 B
usage:   1.0 GiB used, 499 GiB / 500 GiB avail
pgs:     100.000% pgs not active
         65 undersized+peered

Hey @scyto!

Hmm, I’ve not tried with a single node - but it does look unhappy about only having 1 OSD. I’d still have expected the MDS to start though - what’s the specs of the node you’re using? If it’s too old/slow, you may be bumping into some default IO limits…

You could also try checking the logs of the mds contain, incase that gives a clue?

D

Thanks for the pointers, will try again soon. I tore down the docker host completely to start again incase I made some muck up the first time through.

The docker host is actually a Debian VM running on a synology VM host. The data volume is an iscsi volume on the vmhost mapped into the Debian VM.

I could see how that could cause issues if it is expecting extremely high IO, but mapping the iscsi volume into windows to test speed showed it was reasonably fast.

tl;dr i lost the logs in the process of tearing it all down :slight_smile:

If you’re doing a 1-node cluster, you don’t really even need Ceph. Just create /var/data on the host and have the Docker containers access it there. Ceph is for keeping data highly available between nodes. So, unless you’re setting this up now and meaning to add more nodes to the Swarm later, I think you could just ignore Ceph and move on.

1 Like

The _netdev mount option (which waits for network before mounting) doesn’t seem to be available in newer distros. It didn’t work for me on ubuntu 20.04 and it doesn’t work on 18.04 latest as far as i can tell. You can replace _netdev with x-systemd.automount in fstab, which waits for remote-fs.target. I’m not sure whether waiting for remote-fs.target or network.target is a better option (if you manually wrote a .mount file instead of using fstab to generate it, you could instead specify to wait on the network specifically), but x-systemd.automount seemed to work for me. If you don’t do this, (depending on your system) your mount (at boot) may fail due to the network not being up before systemd tries to mount cephfs.

Also perhaps consider moving the section of ceph-common to the start considering things like ceph orch will require it to be installed. You’ll also need either docker or podman to be installed before bootstrapping using cephadm, so maybe that should be mentioned as well espcially considering the docker setup comes after the ceph setup…

Wanted to share something I learned. I’ve been trying to figure out why my ceph installation kept failing on fresh, vanilla instances of Ubuntu Server 20.04. Ceph kept insisting /var/lib/ceph was mounted as a read-only fs, despite the fact that it also sometimes wrote to it. Turns out if you choose to install docker during the OS installation using Ubuntu’s fancy new installer, it installs the snap version of docker - which doesn’t play nice with ceph, and also doesn’t produce very helpful error messages. Removing it and installing docker normally (Install Docker Engine on Ubuntu | Docker Documentation) fixes it right up, though.

Oooh, nice catch, thanks @rdotts !

I’m trying to follow this recipe and I’m stuck at mounting the filesystem.
Everything works fine until:

mkdir /var/data

MYNODES="<node1>,<node2>,<node3>" # Add your own nodes here, comma-delimited
MYHOST=`ip route get 1.1.1.1 | grep -oP 'src \K\S+'`
echo -e "
# Mount cephfs volume \n
raphael,donatello,leonardo:/ /var/data ceph name=admin,noatime,_netdev 0 0" >> /etc/fstab
mount -a

MYNODES and MYHOST are not used so probably obsolete ?
The ninja turtles are probably your nodes. But anyway. I can’t add the ceph mount line on any node because there is no ceph installed on the nodes or did I miss something.

I’m also having issues with this recipe on Ubuntu 20.04 – which may be related to ceph making changes, because none of the ceph commands work outside of cephadm shell anymore, this is the result for any of them:[errno 13] RADOS permission denied (error connecting to the cluster)

Not sure how to fix it yet.