Shared Storage (Ceph) - Funky Penguin's Geek Cookbook


#61

I’m stuck at spinning up the OSD - the container is restart-looping. I’m attempting to use a partition rather than a disk, and I wonder if that’s the problem. The docs indicate that -should- work though. The logs look like this:

https://pastebin.com/VLmBwgGJ


#62

So I gave up on running in Docker. Using ceph-deploy instead to spin up the cluster on bare metal got me much further. I am using the newer version of Ceph (Mimic) and I’m almost fully up and running. I’m just missing the part where the docker volume plugin gets installed - it’s just links to upstream bugs in your docs. I’m not using Atomic so I hopefully don’t suffer from the linked issues.

I’m assuming you are advising to install the rexray/rbd plugin, which I’m having a go at using now…


#63

Yes, keen to hear how that docker volume works out! :slight_smile:


#64

It’s not working, but I’m pretty much uninitiated on the setup for this - before installing the plugin a ceph rbd volume(?) needs to be setup. I have gone through the Ceph docs to do this, but I’m pretty sure I’m missing something (as it isn’t working).

This is blocking me deploying some services using swarm replication, where each container wants it’s own storage, rather than a ceph replicated filesystem bind-mounted. I can just use plain old local docker volumes in the short term but that rather defeats the purpose of this exercise :slight_smile:


#65

OK, I got way further. My Ceph config is all good now, in terms of RBD,

The Rexray plugin is giving me problems now. It seems like when I create a volume in a compose file, it fails after the first node, because the other nodes complain the volume already exists. I might need to employ some yaml-fu to increment the volume name with a numeric suffix perhaps.

I did try a different RBD driver but that one had it’s own issues.

Sleepytime!


#66

Nearly there. I switched plugins to wetopi/rbd as it seemed to have a more robust reputation than RexRay in the DevOps groups I’m a member of.

I gave up on the volume naming issue though - the service I want to deploy (Consul) is stateful, requires persistent data, and can scale to x nodes. Volume creation collides after the first node. I have to either statically define volumes and containers (which means it can’t scale dynamically) or do something funky with copying data around prior to service init. Not sure what else to do :confused: I ended up statically defining and throwing out the ability to scale.

Traefik is runnning happily, and RBD backed, plugin created Docker volumes are lovely. Now I’m out of the Ceph woods and back in familiar Dockery territory I’m disappointed I couldn’t get Ceph running in containers. I can’t seem to find a containerized solution for running ceph-volume. Maybe I should just roll my own.


#67

Seffyroff, I created a set of containers for Ceph (latest version of Mimic) from scratch by reading their manuals. Maybe you can try them to see if you can run Ceph over containers.



A simple docker-compose.yml example (with all daemons, but no serious storage configured) is:


version: '3.5'

services:

  etcd0:
    image: quay.io/coreos/etcd
    environment:
      - ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379
      - ETCD_ADVERTISE_CLIENT_URLS=http://etcd0:2379

  mon0:
    image: flaviostutz/ceph-monitor
    environment:
      - ETCD_URL=http://etcd0:2379
      - PEER_MONITOR_HOST=mon1
      - CREATE_CLUSTER_IF_PEER_DOWN=true

  mon1:
    image: flaviostutz/ceph-monitor
    environment:
      - ETCD_URL=http://etcd0:2379
      - PEER_MONITOR_HOST=mon0

  mgr1:
    image: flaviostutz/ceph-manager
    ports:
      - 18443:8443 #dashboard https
      - 18003:8003 #restful https
      - 19283:9283 #prometheus
    environment:
      - LOG_LEVEL=0
      - PEER_MONITOR_HOST=mon0
      - ETCD_URL=http://etcd0:2379

  mgr2:
    image: flaviostutz/ceph-manager
    ports:
      - 28443:8443 #dashboard https
      - 28003:8003 #restful https
      - 29283:9283 #prometheus
    environment:
      - LOG_LEVEL=0
      - PEER_MONITOR_HOST=mon0
      - ETCD_URL=http://etcd0:2379

  osd1:
    image: flaviostutz/ceph-osd
    environment:
      - PEER_MONITOR_HOST=mon0
      - OSD_EXT4_SUPPORT=true
      - OSD_JOURNAL_SIZE=512
      - ETCD_URL=http://etcd0:2379

  osd2:
    image: flaviostutz/ceph-osd
    environment:
      - PEER_MONITOR_HOST=mon0
      - OSD_EXT4_SUPPORT=true
      - OSD_JOURNAL_SIZE=512
      - ETCD_URL=http://etcd0:2379

  osd3:
    image: flaviostutz/ceph-osd
    environment:
      - PEER_MONITOR_HOST=mon0
      - OSD_EXT4_SUPPORT=true
      - OSD_JOURNAL_SIZE=512
      - ETCD_URL=http://etcd0:2379

Just run “docker-compose up” and (hopefully) the magic will happen…

On https://hub.docker.com/r/flaviostutz/ceph-osd/ you can see how to configure a storage (Bluestore).


#68

@flaviostutz thanks for your reply. I actually already built several iterations of docker-compose here, of similar structure.

The problem I had was with initializing and mounting the block storage. I had to use ceph-deploy to get that working, and at that point had spent so many cycles watching failed docker containers that I carried on using ceph-deploy after the success in creating the storage to perform the rest of the deployment directly to the hosts.

I don’t doubt I could probably now spin up containers for the storage now it exists, but I have other problems with my ceph deployment that I want to fully understand before iterating over my rollout strategy.


#69

So I have a ceph cluster running on 3 nodes, with each one contributing 200-500GB of block storage, which is used by a cephfs and rbd pool. The cephfs pool mostly gets bind mounted by swarm containers, and the rbd pool is used by other containers that need their own volumes per task.

My problem here is memory usage - the Ceph processes reduce the cluster to a crawl and quickly (<48h) consume all available RAM and fill up the swap.

Does anyone have recommendations to reduce Ceph memory usage? I already tried reducing the bluestore cache size to 256. I found some docs related to cephfs metadata cache reduction and will try that next, but I don’t have much hope. The nodes I’m working with are modest spec, but they have a minimum of 2GB RAM, and I avoid scheduling anything hefty on the most lightweight boxes.


#70

Can I follow this tutorial on ubuntu 18.04?


#71

Yes :slight_smile: But I’d suggest you pop into http://chat.funkypenguin.co.nz first - I think some of the geeks in there have tried/done it, and may have some pointers…


#72

heyho,
thank you for the recipe! Very good description.
Just finished my cluster with CentOS Linux release 7.5.1804 (Core). It is a little bit different to the setup with the CentOS Atomic but after several retries it was possible. Main difference is the manual setup of ceph with version 13.2.2 before. I have now a test cluster with 3 OSD 20GB each.

[root@server1 ]# ceph -s
cluster:
id: 2eb9aff0-c86a-4a67-ab18-919bc4a50b9f
health: HEALTH_OK

services:
mon: 3 daemons, quorum server1,server2,server3
mgr: server1(active), standbys: server3, server2
mds: cephfs-1/1/1 up {0=server3=up:active}, 2 up:standby
osd: 3 osds: 3 up, 3 in

data:
pools: 2 pools, 200 pgs
objects: 24 objects, 9.0 KiB
usage: 6.0 GiB used, 52 GiB / 58 GiB avail
pgs: 200 active+clean

io:
client: 341 B/s wr, 0 op/s rd, 0 op/s wr

One question: How much space is now available on my “drive” /var/data? The “58 GiB avail” from ceph -s or the 25 GB of my df -h?

[root@server1 ]# df -h |grep /var/data
xx.xx.xx.43:6789:/ 25G 0 25G 0% /var/data


#73

The 25GB is what’s available. Ceph reports “raw” storage space, but your pool size is set to 2, so in real world terms, you can only use half the space it reports.

D


#74

Seems impossible to do now. ceph-osd will not cooperate at all after many tries with multiple different versions.

2018-12-01 08:47:44 /opt/ceph-container/bin/entrypoint.sh: static: does not generate config
/opt/ceph-container/bin/osd_disk_activate.sh: line 4: disk_list.sh: No such file or directory

I got this worked out and I am using sepich/ceph-swarm. In order to get it going just fix the scripting path errors.
remap the compat.sh script to /opt/ceph-container/bin/entrypoint.sh at line 27

edit osd_directory.sh line 69: source /opt/ceph-container/bin/osd_common.sh

Add new config to yaml
configs:
compat.sh:
file: ./compat.sh
ceph.conf:
external: true
osd_directory.sh:
file: ./osd_directory.sh
osd_common.sh:
file: ./osd_common.sh

Append to osd service:
configs:

  • source: compat.sh
    target: /tmp/compat.sh
    mode: 0755
  • source: osd_directory.sh
    target: /opt/ceph-container/bin/osd_directory.sh
    mode: 0755
  • source: osd_common.sh
    target: /opt/ceph-container/bin/osd_common.sh
    mode: 0755
  • source: ceph.conf
    target: /etc/ceph/ceph.conf

If you prefer to use this non-swarm version, then the approach is the same. However, it is a bit more hands in with the non-swarm version. Just use sed to inject the changes on each container while it is attempting to start. It will take a few tries by you can get it done like so below:

docker exec -u 0 -it $(docker ps | grep ceph-ods | awk ‘{print $1}’) sed -i “s|source disk_list.sh|source /opt/ceph-container/bin/disk_list.sh|g” /opt/ceph-container/bin/osd_disk_activate.sh