ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	8a154ae14a	osd: change lvm bindmount This commit makes the bindmount a bit more generic, otherwise it currently makes the OSDs failing to start in an OSP FFU upgrade (with RHEL7 > RHEL8 OS upgrade). docker2podman playbook is run from ceph-ansible stable-3.2 branch against RHEL7 nodes where `/var/run/lvmetad.socket` exists but once the system is upgraded to RHEL8, this socket doesn't exist anymore and prevent OSDs from starting after the reboot. As a workaround we can make this bindmount a bit more generic like what is done in `stable-4.0` branch by mounting `/run/lvm` instead. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1866252 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-08-05 09:23:39 -04:00
Dimitri Savineau	4e3301d361	ceph-osd: exit gracefully when no data partition When using collocated or non-collocated osd_scenarios (ceph-disk) and trying to deterime the OSD_DEVICE from the OSD_ID passed to the systemd unit then we can be in a situation where the OSD hasn't been activated but the OSD ID exists. This means the data partition isn't in activate state and the ceph-disk list command won't show the OSD ID on the data partition. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1850377 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-07-07 18:18:14 +02:00
Guillaume Abrioux	90f3f61548	infra: introduce docker to podman playbook This isn't backported from master because there are too many changes between stable-3.2 and other newer branches. NOTE: This playbook doesn't add podman support in stable-3.2 at all. This is a tripleO dedicated playbook which is intended to be run early during FFU workflow in order to prepare the OS upgrade. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1853457 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-07-07 12:11:09 -04:00
Guillaume Abrioux	8b8fa74db7	switch_to_containers: don't set noup flag We shouldn't set this flag when running switch_to_containers playbook. Otherwise the playbook fails waiting for pgs to be clean. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843569 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `b91d60d384`)	2020-06-29 15:25:01 +02:00
Guillaume Abrioux	693e534ee9	Revert "switch_to_containers: don't set noup flag" This reverts commit `b7ec4a995b`. We need to provide a tag for RHCS 3.3z6 without this commit.	2020-06-25 17:07:25 +02:00
Dimitri Savineau	a2556f084d	docker: Add Requires on docker service When using docker container engine then the systemd unit scripts only use a dependency on the docker daemon via the After parameter. But if docker is restarted on a live system then the ceph systemd units should wait for the docker daemon to be fully restarted. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1846830 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `bd22f1d1ec`)	2020-06-22 21:08:13 -04:00
Guillaume Abrioux	b7ec4a995b	switch_to_containers: don't set noup flag We shouldn't set this flag when running switch_to_containers playbook. Otherwise the playbook fails waiting for pgs to be clean. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843569 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `b91d60d384`)	2020-06-18 09:56:28 +02:00
Guillaume Abrioux	6f3d696742	clients: move dummy container creation This commit moves the dummy container creation task right before the cephx keys creation task so it can't be run out of time. Also, this commit makes the dummy container running for ever. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1828105 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-04-27 13:31:52 -04:00
ianwatsonrh	2666c54b3a	typo: updating type check on rc Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1827271 Signed-off-by: ianwatsonrh <ianwatson@redhat.com> (cherry picked from commit `ccf6a7f153`)	2020-04-23 11:36:59 -04:00
Dimitri Savineau	65b0e9bb5d	ceph-validate: update RHEL requirement for RHCS We were not testing the right ansible_distribution fact value for RHEL distribution. This commit also updates the minial RHEL version supported by RHCS. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `5de74fe512`)	2020-04-14 11:27:21 -04:00
Guillaume Abrioux	8ccf91c1f0	add-osd: unset noup flag after last osd is deployed this commit fixes a bug when using `add-osd.yml` playbook. `noup` flag is set early but it never got unset before the "wait for pgs clean" check, so the playbook always fails because OSDs aren't never seen UP. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816023 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-04-07 11:19:53 -04:00
Guillaume Abrioux	d4ffe21225	osd: support changing default rule even when osd_crush_location isn't defined Creating crush rules even with no crush hierarchy configuration is a valid scenario so we shouldn't be bound to the first task result (which configure crush hierarchy) to be able to add new crush rules. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816989 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5b0476385c`)	2020-03-31 23:04:03 +02:00
John Fulton	658d9cadfd	The _filtered_clients list should intersect with ansible_play_batch Client configuration with --limit fails without this patch because certain tasks are only done to the first host in the _filtered_clients list and it's likely that first host will not be included in what's sepcified with --limit. To fix this the _filtered_clients list should be built from all clients in the inventory that are also in the running play. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798781 Signed-off-by: John Fulton <fulton@redhat.com> (cherry picked from commit `e4bf4857f5`)	2020-03-30 11:10:29 -04:00
Guillaume Abrioux	6006985466	defaults: remove legacy comment This is no longer true, let's remove this comment given that this option is not ignored in containerized deployments. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e551b5ba1a`)	2020-03-26 12:08:31 -04:00
Guillaume Abrioux	c60967f045	docker-common: remove legacy tasks for ntp configuration Those tasks aren't needed in docker-common since the introduction of `ceph-infra` role. They are duplicated tasks. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1810376 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `cd0195c562`)	2020-03-25 13:53:25 -04:00
Benoît Knecht	87034b1fb6	ceph-rgw: Fix customize pool size "when" condition In `3c31b19ab3`, I fixed the `customize pool size` task by replacing `item.size` with `item.value.size`. However, I missed the same issue in the `when` condition. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `3842aa1a30`)	2020-02-17 11:53:58 -05:00
Benoît Knecht	874c94c59e	ceph-rgw: Fix custom pool size setting RadosGW pools can be created by setting ```yaml rgw_create_pools: .rgw.root: pg_num: 512 size: 2 ``` for instance. However, doing so would create pools of size `osd_pool_default_size` regardless of the `size` value. This was due to the fact that the Ansible task used ``` {{ item.size \| default(osd_pool_default_size) }} ``` as the pool size value, but `item.size` is always undefined; the correct variable is `item.value.size`. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `3c31b19ab3`)	2020-02-17 11:53:58 -05:00
Dimitri Savineau	db8902d444	ceph-{mon,osd}: move default crush variables Since `ed36a11` we move the crush rules creation code from the ceph-mon to the ceph-osd role. To keep the backward compatibility we kept the possibility to set the crush variables on the mons side but we didn't move the default values. As a result, when using crush_rule_config set to true and wanted to use the default values for crush_rules then the crush rule ansible task creation will fail. "msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute 'crush_rules'" This patch move the default crush variables from ceph-mon to ceph-osd role but also use those default values when nothing is defined on the mons side. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798864 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `1fc6b33714`)	2020-02-17 16:23:33 +01:00
Dimitri Savineau	306ce82358	ceph-validate: fail if no mgr host is present We already stop the upgrade playbook (rolling_update.yml) if there's no mgr node present so we should also do the same for initial deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1788644 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-11 13:27:10 -05:00
Dimitri Savineau	553fb1ed1e	ceph-mon: use interactive session with aliases When using ceph aliases with commands that require manual intervention to stop then the command will keep running inside the container (like using Ctrl+c). For handling this, we should use the interactive session option (-it) with the docker commands. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1797874 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-05 15:29:51 +01:00
Mike Christie	c2a9397474	iscsi: Fix crashes during rolling update During a rolling update we will run the ceph iscsigw tasks that start the daemons then run the configure_iscsi.yml tasks which can create iscsi objects like targets, disks, clients, etc. The problem is that once the daemons are started they will accept confifguration requests, or may want to update the system themself. Those operations can then conflict with the configure_iscsi.yml tasks that setup objects and we can end up in crashes due to the kernel being in a unsupported state. This could also happen during creation, but is less likely due to no objects being setup yet, so there are no watchers or users accessing the gws yet. The fix in this patch works for both update and initial setup. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1795806 Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `77f3b5d51b`)	2020-02-03 15:15:53 +01:00
Guillaume Abrioux	d437593e85	config: fix external client scenario When no monitor group is present in the inventory, this task fails. This affects only non-containerized deployments. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e7bc079405`)	2020-02-03 10:20:19 +01:00
Guillaume Abrioux	b6744fd82a	validate: allow running ceph-ansible 3.2 against ansible 2.7 This commit allows ceph-ansible 3.2 to be run against ansible 2.7 However, note that running stable-3.2 against ansible 2.7 doesn't get any testing upstream this might break the playbook, only ansible 2.6 is officially supported. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1781635 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-31 10:07:48 -05:00
Dimitri Savineau	13e0f7d341	ceph-defaults: remove rgw from ceph_conf_overrides The [rgw] section in the ceph.conf file or via the ceph_conf_overrides variable doesn't exist and has no effect. To apply overrides to all radosgw instances we should use either the [global] or [client] sections. Overrides per radosgw instance should still use the [client.rgw.{instance-name}] section. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794552 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `2f07b85131`)	2020-01-29 14:19:17 +01:00
Guillaume Abrioux	726b3f220b	defaults: change monitor\|radosgw_address default values To avoid confusion, let's change the default value from `0.0.0.0` to `x.x.x.x`. Users might think setting `0.0.0.0` will make the daemon binding on all interfaces. Fixes: #4827 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `fc02fc98eb`)	2020-01-14 17:22:35 +01:00
Dimitri Savineau	7ce33f4865	ceph-defaults: exclude rbd devices from discovery The RBD devices aren't excluded from the devices list in the LVM auto discovery scenario. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783908 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `6f0556f015`)	2020-01-13 12:06:35 -05:00
Dimitri Savineau	aea4257807	ceph-osd: wait for all osds once `cf8c6a3` moves the 'wait for all osds' task from openstack_config to the main tasks list. But the openstack_config code was executed only on the last OSD node. We don't need to do this check on all OSD node so we need to add set run_once to true on that task. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `5bd1cf40eb`)	2020-01-13 16:54:01 +01:00
Dimitri Savineau	9a42fe580f	ceph-osd: wait for all osd before crush rules When creating crush rules with device class parameter we need to be sure that all OSDs are up and running because the device class list is is populated with this information. This is now enable for all scenario not openstack_config only. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `cf8c6a3849`)	2020-01-13 16:54:01 +01:00
Dimitri Savineau	8b2659bf6d	rolling_update: create crush rule after osd play When upgrading from jewel to luminous we can execute the crush rule tasks only when the 'osd require-osd-release luminous' command. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-13 16:54:01 +01:00
Dimitri Savineau	af57597df6	ceph-osd: add device class to crush rules This adds device class support to crush rules when using the class key in the rule dict via the create-replicated sub command. If the class key isn't specified then we use the create-simple sub command for backward compatibility. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ef2cb99f73`)	2020-01-13 16:54:01 +01:00
Dimitri Savineau	0ac43d83f4	move crush rule creation from mon to osd role If we want to create crush rules with the create-replicated sub command and device class then we need to have the OSD created before the crush rules otherwise the device classes won't exist. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ed36a11eab`)	2020-01-13 16:54:01 +01:00
Dimitri Savineau	255be99bc5	ceph-validate: add rbdmirror validation When ceph_rbd_mirror_configure is set to true we need to ensure that the required variables aren't empty. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1760553 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `4a065cebd7`)	2020-01-13 16:53:32 +01:00
Dimitri Savineau	58ffae3117	ceph-mds: allow directory fragmentation We need to explicitly enable the allow_dirfrags flag on cephfs pool after upgrading to Luminous. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1776233 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-13 16:52:11 +01:00
Guillaume Abrioux	881056fa9d	facts: avoid duplicated element in devices list When using `osd_auto_discovery`, `devices` is built multiple times due to multiple runs of `ceph-facts` role. It end up with duplicate instances of a same device in the list. Using `unique` filter when building the list fixes this issue. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `23b1f43897`)	2020-01-13 15:47:02 +01:00
Dimitri Savineau	193ce4f572	ceph-iscsi: add ceph-iscsi stable repositories This commit adds the support of the ceph-iscsi stable repository when use ceph_repository community instead of always using the devel repositories. We're still using the devel repositories for rtslib and tcmu-runner in both cases (dev and community). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-08 17:47:52 +01:00
Dimitri Savineau	56a7537f48	ceph-osd: update systemd unit script The systemd unit script wasn't updated with the new container name format (without the hostname). We now have the same start/stop docker commands for all scenarios. During the device to id OSD migration we need to be sure that the old container with the hostname are stopped. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1780688 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-12-10 23:59:13 +01:00
Dimitri Savineau	9f9b952473	ceph-defaults: exclude md devices from discovery The md devices (RAID software) aren't excluded from the devices list in the auto discovery scenario. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1764601 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `014f51c2a4`)	2019-12-09 09:32:55 +01:00
Guillaume Abrioux	4f6925890c	facts: fix auto_discovery exclude the previous approach was wrong. checking if `item.key` is in `osd_auto_discovery_exclude` (`['dm-', 'loop']`) is incorrect because it will obviously not match. Therefore, the condition will return `True` whatever the device we are checking. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `8f42007272`)	2019-12-09 09:32:55 +01:00
Guillaume Abrioux	f6fea33b40	osd: add possibility to exclude device in osd_auto_discovery Add a new `osd_auto_discovery_exclude` to give the possibility of excluding some devices in auto_discovery scenario. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `83d7ef777e`)	2019-12-09 09:32:55 +01:00
Andrew Schoen	690860affc	ceph-facts: generate devices when osd_auto_discovery is true This task used to live in ceph-osd, but we need it defined here to that ceph-config can use it when trying to determine the number of osds. Signed-off-by: Andrew Schoen <aschoen@redhat.com> (cherry picked from commit `88eda479a9`)	2019-12-09 09:32:55 +01:00
VasishtaShastry	c67de5a342	Evades validation of ceph_repository_type in containerized scenario This will prevent failure of site-docker.yml with configs in doc. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1769760 Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com> Co-Authored-By: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `9a1f1626c3`)	2019-11-18 16:41:34 +01:00
Noah Watkins	146d144045	Remove outdated documentation Fixes BZ https://bugzilla.redhat.com/show_bug.cgi?id=1640525 Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2019-11-13 16:04:55 +01:00
Dimitri Savineau	b47f7763fc	ceph-osd: fix fs.aio-max-nr sysctl condition [1] introduced a regression on the fs.aio-max-nr sysctl value condition. The enable key isn't a boolean but a string because the expression isn't evaluated. This string output "(osd_objectstore == 'bluestore')" is always true because item.enable condition only matches non empty string. So the sysctl value was applyied for both filestore and bluestore backend. [2] added the bool filter to the condition but the filter always returns false on string and the sysctl wasn't applyed at all. This commit fixes the enable key value by evaluating the value instead of using the string. [1] https://github.com/ceph/ceph-ansible/commit/08a2b58 [2] https://github.com/ceph/ceph-ansible/commit/ab54fe2 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ece46d33be`)	2019-11-07 20:38:33 +01:00
Harald Jensås	e8ed6655f3	Support comma-delimited subnets in firewall ceph.conf supports a comma separated list of subnet CIDR's for the public_network and the cluster network. ceph-ansible should support setting up the firewall for this configuration. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1767392 Closes: #4425 Related: #4333 https://docs.ceph.com/docs/nautilus/rados/configuration/network-config-ref/#network-config-settings Signed-off-by: Harald Jensås <hjensas@redhat.com> (cherry picked from commit `d94229204d`)	2019-11-01 11:00:18 -04:00
Dimitri Savineau	dd4a4cbb66	ceph-infra: Remove restart firewalld handler There's no need to restart firewalld service when a new rule is added due to the usage of the immediate flag. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `b7338d438a`)	2019-11-01 11:00:18 -04:00
Dimitri Savineau	4cd53bfbe5	ceph-osd: Remove ulimit nofile on container start Even if this improves ceph-disk/ceph-volume performances then it also impact the ceph-osd process. The ceph-osd process shouldn't use 1024:4096 value for the max open files. Removing the ulimit option from the container engine and doing this kind of change on the container side [1]. [1] https://github.com/ceph/ceph-container/pull/1497 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `9a996aef7f`)	2019-10-31 14:42:41 -04:00
Dimitri Savineau	f3fc97caa0	openstack_config: fix docker exec command container_exec_cmd should be replace by docker_exec_cmd. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1765110 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-24 14:13:52 -04:00
Guillaume Abrioux	1884506189	update: follow new recommandation to upgrade mds cluster Refact the mds cluster upgrade code in order to follow the documented recommandation. See: https://github.com/ceph/ceph/blob/luminous/doc/cephfs/upgrading.rst Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1569689 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `71cebf80a6`)	2019-10-21 15:44:38 -04:00
Guillaume Abrioux	8dc40711bb	common: do not override ceph_release when using custom repo Otherwise it fails like following: ``` TASK [ceph-mds : allow multimds] ************************************************************************************************************************************************ Monday 22 July 2019 16:37:38 +0800 (0:00:03.269) 0:13:25.651 ********* fatal: [rhel7u6clone1]: FAILED! => {"msg": "The conditional check 'ceph_release_num[ceph_release] == ceph_release_num.luminous' failed. The error was: error while evaluating conditional (ceph_release_num[ceph_release] == ceph_release_num.luminous): 'dict object' has no attribute u'dummy'\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-mds/tasks/create_mds_filesystems.yml': line 43, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: allow multimds\n ^ here\n"} ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1645379 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4e9504c939`)	2019-10-17 20:10:57 -04:00
Dimitri Savineau	c8d0c4722c	rbd-mirror: fail if the peer is not added Due the 'failed_when: false' statement present in the peer task then the playbook continues to ran even if the peer task was failing (like incorrect remote peer format. "stderr": "rbd: invalid spec 'admin@cluster1'" This patch adds a task to list the peer present and add the peer only if it's not already added. With this we don't need the failed_when statement anymore. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1665877 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `0b1e9c0737`)	2019-10-16 14:01:18 -04:00

1 2 3 4 5 ...

2142 Commits (8a154ae14a3eb24322b498a2afce19ea4d3672c0)