ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Ning Yao	691ddf5349	cleanup osd.conf.j2 in ceph-osd osd crush location is set by ceph_crush in the library, osd.conf.j2 is not used any more. Signed-off-by: Ning Yao <yaoning@unitedstack.com>	2018-03-26 15:57:37 +08:00
Sébastien Han	0f8a4251ba	move system tuning to osd role The changes from these tasks only apply to osd nodes so there is no reason to have them in ceph-common. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-03-15 14:06:26 +01:00
Guillaume Abrioux	e537779bb3	osd: fix osd restart when dmcrypt This commit fixes a bug that occurs especially for dmcrypt scenarios. There is an issue where the 'disk_list' container can't reach the ceph cluster because it's not launched with `--net=host`. If this container can't reach the cluster, it will hang on this step (when trying to retrieve the dm-crypt key) : ``` +common_functions.sh:448: open_encrypted_part(): ceph --cluster abc12 --name \ client.osd-lockbox.9138767f-7445-49e0-baad-35e19adca8bb --keyring \ /var/lib/ceph/osd-lockbox/9138767f-7445-49e0-baad-35e19adca8bb/keyring \ config-key get dm-crypt/osd/9138767f-7445-49e0-baad-35e19adca8bb/luks +common_functions.sh:452: open_encrypted_part(): base64 -d +common_functions.sh:452: open_encrypted_part(): cryptsetup --key-file \ -luksOpen /dev/sdb1 9138767f-7445-49e0-baad-35e19adca8bb ``` It means the `ceph-run-osd.sh` script won't be able to start the `osd_disk_activate` process in ceph-container because he won't have filled the `$DOCKER_ENV` environment variable properly. Adding `--net=host` to the 'disk_list' container fixes this issue. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1543284 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-02-08 15:45:13 +01:00
Sébastien Han	bbc79765f3	osd: best effort if no device is found during activation We have a scenario when we switch from non-container to containers. This means we don't know anything about the ceph partitions associated to an OSD. Normally in a containerized context we have files containing the preparation sequence. From these files we can get the capabilities of each OSD. As a last resort we use a ceph-disk call inside a dummy bash container to discover the ceph journal on the current osd. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1525612 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-12-19 14:40:48 +01:00
Christian Berendt	50a848dc40	Rename fact docker_version to ceph_docker_version The name docker_version is very generic and is also used by other roles. As a result, there may be name conflicts. To avoid this a ceph_ prefix should be used for this fact. Since it is an internal fact renaming is not a problem.	2017-12-15 20:12:21 +01:00
John Fulton	8cba44262c	Add flags for OSD 'docker run --cpuset-{cpus,mems}' Add the variables ceph_osd_docker_cpuset_cpus and ceph_osd_docker_cpuset_mems, so that a user may specify the CPUs and memory nodes of NUMA systems on which OSD containers are run. Provides a example in osds.yaml.sample to guide user based on sample `lscpu` output since cpuset-mems refers to the memory by NUMA node only while cpuset-cpus can refer to individual vCPUs within a NUMA node.	2017-12-14 16:39:35 +01:00
Guillaume Abrioux	591d77220e	osd: always run disk_list test there is no need to have a condition on this task, this test should be always run since the result will be interpreted later. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-09 11:51:16 +01:00
Sébastien Han	d4ed9a2064	osd: enhance backward compatibility During the initial implementation of this 'old' thing we were falling into this issue without noticing https://github.com/moby/moby/issues/30341 and where blindly using --rm, now this is fixed the prepare container disappears and thus activation fail. I'm fixing this for old jewel images. Also this fixes the machine reboot case where the docker logs are purgend. In the old scenario, we now store the log locally in the same directory as the ceph-osd-run.sh script. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-11-03 11:15:23 +01:00
Sébastien Han	5f9e50dabe	Merge pull request #2103 from andymcc/tcmalloc_settings Option to set TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES	2017-10-25 17:36:04 +02:00
Andy McCrae	7f6c39102d	Option to set TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES Use "ceph_tcmalloc_max_total_thread_cache" to set the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES value inside /etc/default/ceph for Debian installs, or /etc/sysconfig/ceph for Red Hat/CentOS installs. By default this is set to 0, so the default package value will be used, if specified this value will be changed to match the variable, and ceph osd services will be restarted.	2017-10-25 14:38:36 +01:00
Sébastien Han	968ef04324	osd: bring backward compatibility with old Jewel images There was a huge resync from luminous to jewel in ceph-docker: https://github.com/ceph/ceph-docker/pull/797 This change brought a new handy function to discover partitions tight to an OSD. This function doesn't exist in the old image so the ceph-osd-run.sh script breaks when trying to deploy Jewel OSD with that old Jewel image version. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-20 16:26:41 +02:00
Christian Berendt	cf901f0171	In docker start scripts replace \u00a0 with \u0020 This will solve the following issue when starting docker containers on ubuntu: invalid argument "1\u00a0" for --cpus=1 : failed to parse 1 as a rational number Closes-bug: #2056	2017-10-16 15:16:48 +02:00
Sébastien Han	d0a9e57bfc	osd: rollback bindmount of /run/udev This is causing unknown issues when trying to start a dmcrypt container. Basically the container is stuck at mount opening the LUKS device. This is still unknown why this is causing trouble but we need to move forward. Also, this doesn't seem to help in any ways to fix the race condition we've seen. Here is the log for dmcrypt: cryptsetup 1.7.4 processing "cryptsetup --debug --verbose --key-file key luksClose fbf8887d-8694-46ca-b9ff-be79a668e2a9" Running command close. Locking memory. Installing SIGINT/SIGTERM handler. Unblocking interruption on signal. Allocating crypt device context by device fbf8887d-8694-46ca-b9ff-be79a668e2a9. Initialising device-mapper backend library. dm version [ opencount flush ] [16384] (1) dm versions [ opencount flush ] [16384] (1) Detected dm-crypt version 1.14.1, dm-ioctl version 4.35.0. Device-mapper backend running with UDEV support enabled. dm status fbf8887d-8694-46ca-b9ff-be79a668e2a9 [ opencount flush ] [16384] (1) Releasing device-mapper backend. Trying to open and read device /dev/sdc1 with direct-io. Allocating crypt device /dev/sdc1 context. Trying to open and read device /dev/sdc1 with direct-io. Initialising device-mapper backend library. dm table fbf8887d-8694-46ca-b9ff-be79a668e2a9 [ opencount flush securedata ] [16384] (1) Trying to open and read device /dev/sdc1 with direct-io. Crypto backend (gcrypt 1.5.3) initialized in cryptsetup library version 1.7.4. Detected kernel Linux 3.10.0-693.el7.x86_64 x86_64. Reading LUKS header of size 1024 from device /dev/sdc1 Key length 32, device size 1943016847 sectors, header size 2050 sectors. Deactivating volume fbf8887d-8694-46ca-b9ff-be79a668e2a9. dm status fbf8887d-8694-46ca-b9ff-be79a668e2a9 [ opencount flush ] [16384] (1) Udev cookie 0xd4d14e4 (semid 32769) created Udev cookie 0xd4d14e4 (semid 32769) incremented to 1 Udev cookie 0xd4d14e4 (semid 32769) incremented to 2 Udev cookie 0xd4d14e4 (semid 32769) assigned to REMOVE task(2) with flags (0x0) dm remove fbf8887d-8694-46ca-b9ff-be79a668e2a9 [ opencount flush retryremove ] [16384] (1) fbf8887d-8694-46ca-b9ff-be79a668e2a9: Stacking NODE_DEL [verify_udev] Udev cookie 0xd4d14e4 (semid 32769) decremented to 1 Udev cookie 0xd4d14e4 (semid 32769) waiting for zero Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-11 13:21:37 +02:00
Sébastien Han	bf99751ce1	osd: bindmount /run/udev Ensures that "udevadm" is able to check the status of udev's event queue. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-09 17:25:45 +02:00
Sébastien Han	3bd341f6c0	osd: container use id instead of dev name Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1494127 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-03 14:44:00 +02:00
Sébastien Han	46a01df434	osd: add cluster name support I forgot to add cluster name support so some partition were never mounted correctly. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-29 20:30:54 +02:00
Sébastien Han	45797ab968	osd: fix container reboot It's sad but we can not rely on the prepare container anymore since the log are flushed after reboot. So inpecting the container does not return anything. Now, instead we use a ephemeral container to look up for the journal/block.db/block.wal (depending if filestore or bluestore) and build the activate command accordingly. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-25 13:34:47 +02:00
Sébastien Han	2fa151b9e8	container: introduce resource limitation for containers This can be controlled via 2 options: * ceph_$DAEMON_docker_memory_limit * ceph_$DAEMON_docker_cpu_limit All daemons default to 1GB for memory and 1 CPU by default. Recommendations from: https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html/red_hat_ceph_storage_hardware_guide/minimum_recommendations Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-06 14:52:21 +02:00
Sébastien Han	e0a264c7e9	osd: allow multi dedicated journals for containers Fix: https://bugzilla.redhat.com/show_bug.cgi?id=1475820 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-30 12:34:06 +02:00
Andy McCrae	4671b9e74e	Allow ceph service systemd overrides to be specified ceph services can fail to start under certain circumstances (for example, when running in a container) because the default systemd service configuration causes namespace issues. To work around this we can override the system service settings by placing an overrides file in the ceph-<service>@.service.d directory. This can be generic so as to allow any potential changes required to the ceph-<service> service files. The overrides file is only setup when the "ceph_<service>_systemd_overrides" config_template override variable is specified. The available service systemd override files are as follows: ceph_mds_systemd_overrides ceph_mgr_systemd_overrides ceph_mon_systemd_overrides ceph_osd_systemd_overrides ceph_rbd_mirror_systemd_overrides ceph_rgw_systemd_overrides	2017-08-16 17:57:06 +01:00
Sébastien Han	30991b1c0a	osd: simplify scenarios There is only two main scenarios now: * collocated: everything remains on the same device: - data, db, wal for bluestore - data and journal for filestore * non-collocated: dedicated device for some of the component Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-03 10:20:39 +02:00
Guillaume Abrioux	30a0fa31e3	Docker: Fix bug "waiting for /dev/XXX to show up" Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-07-12 15:02:39 +02:00
Guillaume Abrioux	0a38bfaadc	Osd: Fix bug 'uniq' command not found Due to a breaking space introduced by `d2320e412e` the command here is broken. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-07-12 15:02:39 +02:00
Sébastien Han	d2320e412e	osd: docker, refactor ceph-osd-run.sh.j2 Easier to read and enhance. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-07-06 15:49:14 +02:00
Guillaume Abrioux	ddfe019342	Refact code `ceph-docker-common`: At the moment there is a lot of duplicated tasks in each `./roles/ceph-<role>/tasks/docker/main.yml` that could be refactored in `./roles/ceph-docker-common/tasks/main.yml`. `_containerized_deployment` variables: All `_containerized_deployment` have been refactored to a single variable `containerized_deployment` duplicate `cephx` variables in `group_vars/* have been removed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-05-24 15:55:41 +02:00
Andrew Schoen	b38b69b603	ceph-osd: fix typo in containerized OSD systemd unit Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-05-08 15:30:45 -05:00
John McEleney	f1388dc2c2	Apparmor on Ubuntu Xenial will not permit containers to mount devices, even with CAP SYS_ADMIN.	2017-04-19 19:22:02 +01:00
Tobias Florek	931027e6f7	harmonize docker names Created containers now are named more or less in the form of <ansible role>-<ansible_hostname>	2017-02-23 09:15:05 +01:00
Sébastien Han	b91d227b99	docker: make ceph docker osd script path Since distro will not allow /usr/share to be writable (e.g: atomic) so we let the operator decide where to put that script. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-02-21 15:56:09 -05:00
Sébastien Han	73cf0378c2	docker: osd, do not use priviledged container anymore Oh yeah! This patch adds more fine grained control on how we run the activation osd container. We now use --device to give a read, write and mknodaccess to a specific device to be consumed by Ceph. We also use SYS_ADMIN cap to allow mount operations, ceph-disk needs to temporary mount the osd data directory during the activation sequence. This patch also enables the support of dedicated journal devices when deploying ceph-docker with ceph-ansible. Depends on https://github.com/ceph/ceph-docker/pull/478 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-02-21 15:54:36 -05:00
Sébastien Han	c2f1dca823	docker: use a better method to pull images We changed the way we declare image. Prior to this patch we must have a "user/image:tag" format, which is incompatible with non docker-hub registry where you usually don't have a "user". On the docker hub a "user" is also identified as a namespace, so for Ceph the user was "ceph". Variables have been simplified with only: * ceph_docker_image * ceph_docker_image_tag 1. For docker hub images: ceph_docker_name: "ceph/daemon" will give you the 'daemon' image of the 'ceph' user. 2. For non docker hub images: ceph_docker_name: "daemon" will simply give you the "daemon" image. Infrastructure playbooks have been modified as well. The file group_vars/all.docker.yml.sample has been removed as well. It is hard to maintain since we have to generate it manually. If you want to configure specific variables for a specific daemon simply edit group_vars/$DAEMON.yml Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1420207 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-02-09 17:57:18 +01:00
Andrew Schoen	655b8449ae	use ceph_docker_registry when starting containers Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-01-16 11:17:41 -06:00
Sébastien Han	2d8ac4a586	docker: only use systemd to manage containers Prior to this patch we had several ways to runs containers, we could use ansible's docker module on some distro and on containers distros we were using systemd. We strongly believe threating containers as services with systemd is the right approach so this patch generalizes to all the distros. These days most of the distros are running systemd so it's fair assumption. Signed-off-by: Sébastien Han <seb@redhat.com>	2016-12-16 19:37:05 +01:00
Ivan Font	8c67689d08	Add option to enable ntp This fixes #845 for containerized deployments. We now also mount the /etc/localtime volume in the containers in order to synchronize the host timezone with the container timezone. Signed-off-by: Ivan Font <ivan.font@redhat.com>	2016-08-08 10:16:48 -07:00
pprokop	3950751317	Adding an option to choose an etcd port and tag of docker images	2016-07-13 10:19:50 +02:00
Ivan Font	6f5f6610a8	Support for docker image tags Signed-off-by: Ivan Font <ivan.font@redhat.com>	2016-07-12 15:49:07 -07:00
pprokop	9e252c6c44	Adding missing space	2016-03-30 12:22:32 +02:00
pprokop	ec9a96e570	Adding ceph-osd continerized deployment with kv store	2016-03-29 10:23:31 +02:00
Sébastien Han	f68cd46664	WIP: Implement OSD sections Still WIP, @mwheckmann free to test As requested by #162 Current known issue, since ceph.conf gets modified during every single run (at the end during the merge) so this will restart ceph daemons. Signed-off-by: Sébastien Han <sebastien.han@enovance.com>	2015-01-09 11:14:20 -05:00

39 Commits (5b73be254d249a23ac2eb2f86c4412ef296352a9)