Shared Storage (Ceph) - Funky Penguin's Geek Cookbook


#19

Long shot, but have you tried the “zapping” command?


#20

Yes, many times :slight_smile:

  Greg

#21

I just tried to replicate this one one of my Atomic VM hosts… I added a 10G disk (/dev/vdb), and went through adding a mon and an OSD. The OSD died at this point:

2017-09-29 06:42:46 /entrypoint.sh: Regarding parted, device /dev/vdb is inconsistent/broken/weird.
2017-09-29 06:42:46 /entrypoint.sh: It would be too dangerous to destroy it without any notification.
2017-09-29 06:42:46 /entrypoint.sh: Please set OSD_FORCE_ZAP to ‘1’ if you really want to zap this disk.
[root@ds1 ~]#

Which, I guess, we expect. Zapping required. Now /dev/vdb is the block device as my OS sees it.

[root@ds1 ~]# lsblk | grep vdb
vdb 252:16 0 10G 0 disk
[root@ds1 ~]#

So, this all looks good.

Can you post the output of docker logs ceph-osd ?


#22

2 questions:

#1 - Is /dev/nvme1n1 the entire block device, or just a partition? (It needs to be the entire device)

#2 - Have you tried docker exec -it ceph-osd bash, and confirming that /dev/nvme1n1 does exist within the container?


#23

#1. It’s the entire device.

#2. it keeps restarting, so i can’t run bash. However, looking at the logs, it clearly can see the device. the problem I see is that the script in the kraken docker container doesn’t actually prepare the disk.


#24

Okay, I rebuilt everything on top of Ubuntu server 16.04. Seems to be working. One thing that would be very helpful would be to get an example “ceph status” at the various steps. I had some trouble along the way and being able to see what the status should be would help. Here’s my current status. I’m curious about the OSDs (I have 5 nodes):

  cluster:
    id:     67f89555-83a3-48e6-8f47-54467435e107
    health: HEALTH_WARN
            no active mgr
 
  services:
    mon: 5 daemons, quorum orange,lime,lemon,plum,fig
    mgr: no daemons active
    mds: cephfs-1/1/1 up  {0=orange=up:creating}, 4 up:standby
    osd: 1 osds: 1 up, 1 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs:

#25

Ah, in fact the other OSDs are not starting. Here’s the logs. It appears that each node is looking for a different fsid?

mount_activate: Failed to activate
unmount: Unmounting /var/lib/ceph/tmp/mnt.YhQiIO
command_check_call: Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.YhQiIO
Traceback (most recent call last):
  File "/usr/sbin/ceph-disk", line 9, in <module>
    load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5704, in run
    main(sys.argv[1:])
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5655, in main
    args.func(args)
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3759, in main_activate
    reactivate=args.reactivate,
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3522, in mount_activate
    (osd_id, cluster) = activate(path, activate_key_template, init)
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3669, in activate
    ' with fsid %s' % ceph_fsid)
ceph_disk.main.Error: Error: No cluster conf found in /etc/ceph with fsid 5224144f-00e5-4a11-9791-0062dd4c5c34

#26

We’re SOOO close! The contents of /etc/ceph/ should be identical on each node. Are they?


#27

It’s possible that in one of the iterations, i forgot to zap the osd drive before. So I went and did it again. Seems better :slight_smile: Now the question is about the mds status:

  cluster:
    id:     67f89555-83a3-48e6-8f47-54467435e107
    health: HEALTH_WARN
            no active mgr
 
  services:
    mon: 5 daemons, quorum orange,lime,lemon,plum,fig
    mgr: no daemons active
    mds: cephfs-1/1/1 up  {0=orange=up:active}, 4 up:standby
    osd: 5 osds: 5 up, 5 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs:

#28

That’s better! :slight_smile:

Have you actually installed the MDS on each per the recipe? (one should take the primary role, and the rest should go into standby/backup)

I see that you’re also seeing the new issue presented in Luminous, whereby you need to deploy at least one “mgr” as well as your mons, too, before your cluster will report itself as “healthy” :slight_smile: - Could you you please send me the output when you do this, so that I can incorporate into the recipe, and make it Luminous-compatible?

D


#29

Yes, I had MDS running. I didn’t know about the primary/standby status for mds. Added the manager

docker run -d --net=host -v /etc/ceph:/etc/ceph -v /var/lib/ceph:/var/lib/ceph -e CEPH_PUBLIC_NETWORK=192.168.2.0/24 ceph/daemon mgr

Status is now:

  cluster:
    id:     67f89555-83a3-48e6-8f47-54467435e107
    health: HEALTH_WARN
            too many PGs per OSD (307 > max 300)
 
  services:
    mon: 5 daemons, quorum orange,lime,lemon,plum,fig
    mgr: orange(active)
    mds: cephfs-1/1/1 up  {0=orange=up:active}, 4 up:standby
    osd: 5 osds: 5 up, 5 in
 
  data:
    pools:   2 pools, 512 pgs
    objects: 21 objects, 2246 bytes
    usage:   10245 MB used, 4756 GB / 4766 GB avail
    pgs:     512 active+clean
 
  io:
    client:   854 B/s wr, 0 op/s rd, 4 op/s wr
    recovery: 672 B/s, 3 keys/s, 4 objects/s

#30

Aah, right, I misread the MDS status on your output. Yes, a single active MDS is normal. So this looks healthy, except for the warning about PG count being slightly too high. If you don’t have any data yet, you could delete and recreate your pool with a smaller pg/pgp size, or (possibly - I’ve never tried) you could reduce the PG count of the existing pools.

A note from painful experience - I (now) like to set replica count to three, but allow the cluster to continue to operate at two. This lets me loose an OSD with enough redundantly to let me sleep at night until it’s replaced!


#31

Is there a simple way to delete the pool? Or am I zapping the drives again and starting over? The docker daemon is convenient in one way, but painful to run custom commands.

I haven’t been about to get cephfs to mount yet on ubuntu, so there is no data.


#32

Deleting pools is dangerously simple, so Luminous introduced a feature - you need to set a flag in ceph.conf on every mon before it’ll let you delete a pool.

See https://blog.widodh.nl/2015/04/protecting-your-ceph-pools-against-removal-or-property-changes/ for details


#33

Well, i ended up wiping everything and starting over. Here’s the current status. Two things: you have a typo in your mgr command (mgs instead of mgr). I ended up using PG=64. Technically the computation said I should be using 128, but you can increase the PG but never decrease it and it depends on the number of pools that you have. I can’t mount the fs I get an error “mount: mount orange:6789:/ on /var/data failed: No such process”.

  cluster:
    id:     e48a2eb3-9c49-402e-ac86-a3ec091f8852
    health: HEALTH_OK
 
  services:
    mon: 5 daemons, quorum orange,lime,lemon,plum,fig
    mgr: orange(active)
    mds: cephfs-1/1/1 up  {0=orange=up:active}, 4 up:standby
    osd: 5 osds: 5 up, 5 in
 
  data:
    pools:   2 pools, 128 pgs
    objects: 21 objects, 2246 bytes
    usage:   10244 MB used, 4756 GB / 4766 GB avail
    pgs:     128 active+clean

#34

More debugging. The “no such process” was because the ceph installed by default on ubuntu was v10 which didn’t have the fs installed. I updated the client ceph to luminous as well.

$ wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key add -
$ sudo apt-add-repository 'deb https://download.ceph.com/debian-luminous/ xenial main'
$ sudo apt-get update
$ sudo apt-get install ceph-common

Now I get a timeout when trying to mount instead. The ceph-mon process is listening on port 6789, so I don’t understand why its failing now.

Here’s the mount command I’m using in case it’s something obvious:

sudo mount -t ceph orange:6789:/ /var/data -o name=dockerswarm,secret=AQB8DtFZnghSJhAA6kOacgTPP8nAff1lz5UBKQ==,_netdev,noatime


#35

Mount successful! I found this in my syslog:

Oct  1 14:24:23 orange kernel: [ 5886.955545] libceph: mon2 192.168.2.12:6789 feature set mismatch, my 107b84a842aca < server's 40107b84a842aca, missing 400000000000000
Oct  1 14:24:23 orange kernel: [ 5886.962032] libceph: mon2 192.168.2.12:6789 missing required protocol features

This makes no sense since I’m running the same version on both client and docker. However, doing:

sudo ceph osd crush tunables hammer

Fixed the problem and now I can mount cephfs.


#36

Hurrumph. You’re right, it doesn’t make much sense.

Here’s my theory - you can either mount CephFS using the kernel driver or using FUSE in user-space. Presumably the kernel driver is more efficient, but older. I bet if you used FUSE, you wouldn’t have needed the hammer tunables.

So congrats, you have a workable Ceph cluster :slight_smile:

I’d suggest you run through some simulated failures to be sure it’s tweaked the way you you want, before you put data on it though!

D


#37

I am new to ceph but a colleague was a developer for ceph and praised it a lot.
I found a good use case for testing with a 3 node rancher HA environment with cattle at the moment.
This step by step guide worked great. Having RancherOS with the Ubuntu console, its Ubuntu 17.4.

Thanks again for this work.


#38

Hi All,

I am new to ceph and have been trying to build a cluster using your awesome guide. Now ive managed to get it to this point…

  cluster:
    id:     5fcf6c04-c435-4848-a4ca-32e1b15c8d40
    health: HEALTH_WARN
            noscrub,nodeep-scrub flag(s) set

  services:
    mon: 9 daemons, quorum ceph-mon01,ceph-mon04,ceph-mon02,ceph-mon05,ceph-mon03,ceph-mon06,ceph-mon07,ceph-mon08,ceph-mon09
    mgr: ceph-mgr01(active), standbys: ceph-mgr02, ceph-mgr03, ceph-mgr04, ceph-mgr05, ceph-mgr06, ceph-mgr07, ceph-mgr08, ceph-mgr09
    mds: cephfs-1/1/1 up  {0=ceph-mds01=up:active}, 2 up:standby
    osd: 3 osds: 3 up, 3 in
         flags noscrub,nodeep-scrub

  data:
    pools:   2 pools, 262 pgs
    objects: 21 objects, 2246 bytes
    usage:   5214 MB used, 4494 GB / 4499 GB avail
    pgs:     262 active+clean

I had to set the PG to 70 or the MDS wodulnt start up, I had to keep reducing it until it worked. I did check the caculator which says it should be 256 but I cant get it to accept that.

Anyway after they are up, for some reason 1) its unhealthy when it shoudlnt be IMO as all nodes are up, I sadly only have 3 OSD’s which is the same as what you have. I also did the tweak for two replcas. This hasnt helped. On top I finally managed to pull the secret key but its refusing to mount and it says…

mount: 172.30.1.200:6789:/ is write-protected, mounting read-only
mount: cannot mount 172.30.1.200:6789:/ read-only

Which is so weird.

So im kinda at a loss, what have I miss understood and or can someone point me to how I can debug why its un-healthy for one.

I can add more MDS, there should be 9 but I dont think it will help.

Thanks in advance.

Kind Regards