We have a scenario when we switch from non-container to containers. This
means we don't know anything about the ceph partitions associated to an
OSD. Normally in a containerized context we have files containing the
preparation sequence. From these files we can get the capabilities of
each OSD. As a last resort we use a ceph-disk call inside a dummy bash
container to discover the ceph journal on the current osd.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1525612
Signed-off-by: Sébastien Han <seb@redhat.com>
The name docker_version is very generic and is also used by other
roles. As a result, there may be name conflicts. To avoid this a
ceph_ prefix should be used for this fact. Since it is an internal
fact renaming is not a problem.
Add the variables ceph_osd_docker_cpuset_cpus and
ceph_osd_docker_cpuset_mems, so that a user may specify
the CPUs and memory nodes of NUMA systems on which OSD
containers are run.
Provides a example in osds.yaml.sample to guide user
based on sample `lscpu` output since cpuset-mems refers
to the memory by NUMA node only while cpuset-cpus can
refer to individual vCPUs within a NUMA node.
there is no need to have a condition on this task, this test should be
always run since the result will be interpreted later.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
During the initial implementation of this 'old' thing we were falling
into this issue without noticing
https://github.com/moby/moby/issues/30341 and where blindly using --rm,
now this is fixed the prepare container disappears and thus activation
fail.
I'm fixing this for old jewel images.
Also this fixes the machine reboot case where the docker logs are
purgend. In the old scenario, we now store the log locally in the same
directory as the ceph-osd-run.sh script.
Signed-off-by: Sébastien Han <seb@redhat.com>
Use "ceph_tcmalloc_max_total_thread_cache" to set the
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES value inside /etc/default/ceph for
Debian installs, or /etc/sysconfig/ceph for Red Hat/CentOS installs.
By default this is set to 0, so the default package value will be used,
if specified this value will be changed to match the variable, and ceph
osd services will be restarted.
There was a huge resync from luminous to jewel in ceph-docker:
https://github.com/ceph/ceph-docker/pull/797
This change brought a new handy function to discover partitions tight to
an OSD. This function doesn't exist in the old image so the
ceph-osd-run.sh script breaks when trying to deploy Jewel OSD with that
old Jewel image version.
Signed-off-by: Sébastien Han <seb@redhat.com>
This will solve the following issue when starting docker containers on ubuntu:
invalid argument "1\u00a0" for --cpus=1 : failed to parse 1 as a rational number
Closes-bug: #2056
This is causing unknown issues when trying to start a dmcrypt container.
Basically the container is stuck at mount opening the LUKS device. This
is still unknown why this is causing trouble but we need to move
forward. Also, this doesn't seem to help in any ways to fix the race
condition we've seen.
Here is the log for dmcrypt:
cryptsetup 1.7.4 processing "cryptsetup --debug --verbose --key-file
key luksClose fbf8887d-8694-46ca-b9ff-be79a668e2a9"
Running command close.
Locking memory.
Installing SIGINT/SIGTERM handler.
Unblocking interruption on signal.
Allocating crypt device context by device
fbf8887d-8694-46ca-b9ff-be79a668e2a9.
Initialising device-mapper backend library.
dm version [ opencount flush ] [16384] (*1)
dm versions [ opencount flush ] [16384] (*1)
Detected dm-crypt version 1.14.1, dm-ioctl version 4.35.0.
Device-mapper backend running with UDEV support enabled.
dm status fbf8887d-8694-46ca-b9ff-be79a668e2a9 [ opencount flush ]
[16384] (*1)
Releasing device-mapper backend.
Trying to open and read device /dev/sdc1 with direct-io.
Allocating crypt device /dev/sdc1 context.
Trying to open and read device /dev/sdc1 with direct-io.
Initialising device-mapper backend library.
dm table fbf8887d-8694-46ca-b9ff-be79a668e2a9 [ opencount flush
securedata ] [16384] (*1)
Trying to open and read device /dev/sdc1 with direct-io.
Crypto backend (gcrypt 1.5.3) initialized in cryptsetup library
version 1.7.4.
Detected kernel Linux 3.10.0-693.el7.x86_64 x86_64.
Reading LUKS header of size 1024 from device /dev/sdc1
Key length 32, device size 1943016847 sectors, header size 2050
sectors.
Deactivating volume fbf8887d-8694-46ca-b9ff-be79a668e2a9.
dm status fbf8887d-8694-46ca-b9ff-be79a668e2a9 [ opencount flush ]
[16384] (*1)
Udev cookie 0xd4d14e4 (semid 32769) created
Udev cookie 0xd4d14e4 (semid 32769) incremented to 1
Udev cookie 0xd4d14e4 (semid 32769) incremented to 2
Udev cookie 0xd4d14e4 (semid 32769) assigned to REMOVE task(2) with
flags (0x0)
dm remove fbf8887d-8694-46ca-b9ff-be79a668e2a9 [ opencount flush
retryremove ] [16384] (*1)
fbf8887d-8694-46ca-b9ff-be79a668e2a9: Stacking NODE_DEL [verify_udev]
Udev cookie 0xd4d14e4 (semid 32769) decremented to 1
Udev cookie 0xd4d14e4 (semid 32769) waiting for zero
Signed-off-by: Sébastien Han <seb@redhat.com>
It's sad but we can not rely on the prepare container anymore since the
log are flushed after reboot. So inpecting the container does not return
anything.
Now, instead we use a ephemeral container to look up for the
journal/block.db/block.wal (depending if filestore or bluestore) and
build the activate command accordingly.
Signed-off-by: Sébastien Han <seb@redhat.com>
ceph services can fail to start under certain circumstances (for
example, when running in a container) because the default systemd
service configuration causes namespace issues.
To work around this we can override the system service settings by
placing an overrides file in the ceph-<service>@.service.d directory.
This can be generic so as to allow any potential changes required to
the ceph-<service> service files.
The overrides file is only setup when the
"ceph_<service>_systemd_overrides" config_template override variable is
specified.
The available service systemd override files are as follows:
ceph_mds_systemd_overrides
ceph_mgr_systemd_overrides
ceph_mon_systemd_overrides
ceph_osd_systemd_overrides
ceph_rbd_mirror_systemd_overrides
ceph_rgw_systemd_overrides
There is only two main scenarios now:
* collocated: everything remains on the same device:
- data, db, wal for bluestore
- data and journal for filestore
* non-collocated: dedicated device for some of the component
Signed-off-by: Sébastien Han <seb@redhat.com>
`ceph-docker-common`:
At the moment there is a lot of duplicated tasks in each
`./roles/ceph-<role>/tasks/docker/main.yml` that could be refactored in
`./roles/ceph-docker-common/tasks/main.yml`.
`*_containerized_deployment` variables:
All `*_containerized_deployment` have been refactored to a single
variable `containerized_deployment`
duplicate `cephx` variables in `group_vars/* have been removed.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since distro will not allow /usr/share to be writable (e.g: atomic) so
we let the operator decide where to put that script.
Signed-off-by: Sébastien Han <seb@redhat.com>
Oh yeah! This patch adds more fine grained control on how we run the
activation osd container. We now use --device to give a read, write and
mknodaccess to a specific device to be consumed by Ceph. We also use
SYS_ADMIN cap to allow mount operations, ceph-disk needs to temporary
mount the osd data directory during the activation sequence.
This patch also enables the support of dedicated journal devices when
deploying ceph-docker with ceph-ansible.
Depends on https://github.com/ceph/ceph-docker/pull/478
Signed-off-by: Sébastien Han <seb@redhat.com>
We changed the way we declare image.
Prior to this patch we must have a "user/image:tag"
format, which is incompatible with non docker-hub registry where you
usually don't have a "user". On the docker hub a "user" is also
identified as a namespace, so for Ceph the user was "ceph".
Variables have been simplified with only:
* ceph_docker_image
* ceph_docker_image_tag
1. For docker hub images: ceph_docker_name: "ceph/daemon" will give
you the 'daemon' image of the 'ceph' user.
2. For non docker hub images: ceph_docker_name: "daemon" will simply
give you the "daemon" image.
Infrastructure playbooks have been modified as well.
The file group_vars/all.docker.yml.sample has been removed as well.
It is hard to maintain since we have to generate it manually. If
you want to configure specific variables for a specific daemon simply
edit group_vars/$DAEMON.yml
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1420207
Signed-off-by: Sébastien Han <seb@redhat.com>
Prior to this patch we had several ways to runs containers, we could use
ansible's docker module on some distro and on containers distros we were
using systemd. We strongly believe threating containers as services with
systemd is the right approach so this patch generalizes to all the
distros. These days most of the distros are running systemd so it's fair
assumption.
Signed-off-by: Sébastien Han <seb@redhat.com>
This fixes#845 for containerized deployments. We now also mount the
/etc/localtime volume in the containers in order to synchronize the host
timezone with the container timezone.
Signed-off-by: Ivan Font <ivan.font@redhat.com>
Still WIP, @mwheckmann free to test
As requested by #162
Current known issue, since ceph.conf gets modified during every single
run (at the end during the merge) so this will restart ceph daemons.
Signed-off-by: Sébastien Han <sebastien.han@enovance.com>