ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Dimitri Savineau	65b0e9bb5d	ceph-validate: update RHEL requirement for RHCS We were not testing the right ansible_distribution fact value for RHEL distribution. This commit also updates the minial RHEL version supported by RHCS. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `5de74fe512`)	2020-04-14 11:27:21 -04:00
Guillaume Abrioux	a51331beb9	add-osd: refact the playbook There's no need to have two plays anymore since we now set/unset osd flags in `ceph-osd` role. Also, this commit makes the role `ceph-facts` to be called after `ceph-defaults` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-04-07 11:19:53 -04:00
Guillaume Abrioux	724620ed3d	add-osd: fix fact gathering in add-osd This commit makes this playbook gathering facts from all other nodes but clients. When collocating OSDs on other nodes it can fail like following: ``` fatal: [vm252-11]: FAILED! => { "msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_hostname'" } ``` In that case, a fact from a RGW node is called when rendering the `ceph.conf.j2` but it fails because facts are gathered only from mon and osd nodes. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1806765 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-04-07 11:19:53 -04:00
Guillaume Abrioux	8ccf91c1f0	add-osd: unset noup flag after last osd is deployed this commit fixes a bug when using `add-osd.yml` playbook. `noup` flag is set early but it never got unset before the "wait for pgs clean" check, so the playbook always fails because OSDs aren't never seen UP. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816023 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-04-07 11:19:53 -04:00
Guillaume Abrioux	a8f5e43624	ceph_key: fetch key when needed Fetch the key when it is present in the cluster but not on the node. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `ccfa249919`)	2020-04-03 16:19:03 -04:00
Guillaume Abrioux	323d4f8f0b	ceph_key: fix idempotency when no secret is passed `553584cbd0` introduced a regression when no secret is passed, it overwrites the secret each time the task is run. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `003defec03`)	2020-04-03 16:19:03 -04:00
Guillaume Abrioux	b107dcf80b	ceph_key: remove 'update' state With this change, the state `present` is enough to update a keyring. If the keyring already exist, it will be updated if caps or secret passed to the module are different. If the keyring doen't exist, it will be created. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808367 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `553584cbd0`)	2020-04-03 16:19:03 -04:00
Dimitri Savineau	edfeb98593	tests: add mgr nodes to shrink_mon inventory Since `306ce82` we explicitly fail when there's no mgr node preent in the inventory. fatal: [mon0]: FAILED! => { "changed": false } MSG: Please add a mgr host to your inventory. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-04-02 22:02:35 +02:00
Guillaume Abrioux	d4ffe21225	osd: support changing default rule even when osd_crush_location isn't defined Creating crush rules even with no crush hierarchy configuration is a valid scenario so we shouldn't be bound to the first task result (which configure crush hierarchy) to be able to add new crush rules. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816989 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5b0476385c`)	2020-03-31 23:04:03 +02:00
Dimitri Savineau	586c6e8afe	Add site-container.yml symlink This adds a symlink to the site-docker.yml.sample playbook. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-31 23:00:49 +02:00
Guillaume Abrioux	3b1794a0fd	switch_to_containers: exclude clients nodes from facts gathering just like site.yml and rolling_update, let's exclude clients node from the fact gathering. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `332c39376b`) (cherry picked from commit `5c3ba0787c`)	2020-03-30 11:10:29 -04:00
Guillaume Abrioux	cfe77bc51f	main: exclude client nodes from facts gathering when delegate_facts_host This commit excludes client nodes from facts gathering, they are not needed and can speed up this task. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `865d2eac9b`)	2020-03-30 11:10:29 -04:00
John Fulton	658d9cadfd	The _filtered_clients list should intersect with ansible_play_batch Client configuration with --limit fails without this patch because certain tasks are only done to the first host in the _filtered_clients list and it's likely that first host will not be included in what's sepcified with --limit. To fix this the _filtered_clients list should be built from all clients in the inventory that are also in the running play. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798781 Signed-off-by: John Fulton <fulton@redhat.com> (cherry picked from commit `e4bf4857f5`)	2020-03-30 11:10:29 -04:00
Guillaume Abrioux	6006985466	defaults: remove legacy comment This is no longer true, let's remove this comment given that this option is not ignored in containerized deployments. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e551b5ba1a`)	2020-03-26 12:08:31 -04:00
Guillaume Abrioux	c60967f045	docker-common: remove legacy tasks for ntp configuration Those tasks aren't needed in docker-common since the introduction of `ceph-infra` role. They are duplicated tasks. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1810376 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `cd0195c562`)	2020-03-25 13:53:25 -04:00
Guillaume Abrioux	a0f01db800	tests: add inventory host for 4.0 upgrade job This inventory is intended to be used in the upgrade scenario in stable-4.0 branch. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-04 23:18:43 +01:00
Guillaume Abrioux	d2d241f21d	tests: modify add-osd job This commit modifies the way we test add-osd scenario given that the playbook add-osd.yml is broken at the moment. As a workaround we can use main playbook with `--limit` to achieve this operation. Note: This commit is intended to be reverted once we get a fix. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-03 11:08:22 +01:00
Dimitri Savineau	2d2cec99fc	tests: pg num should be a power of two number This patch changes the pg_num value of the rgw pools foo and bar to be a power of two number. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 14:52:09 -05:00
Benoît Knecht	87034b1fb6	ceph-rgw: Fix customize pool size "when" condition In `3c31b19ab3`, I fixed the `customize pool size` task by replacing `item.size` with `item.value.size`. However, I missed the same issue in the `when` condition. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `3842aa1a30`)	2020-02-17 11:53:58 -05:00
Benoît Knecht	874c94c59e	ceph-rgw: Fix custom pool size setting RadosGW pools can be created by setting ```yaml rgw_create_pools: .rgw.root: pg_num: 512 size: 2 ``` for instance. However, doing so would create pools of size `osd_pool_default_size` regardless of the `size` value. This was due to the fact that the Ansible task used ``` {{ item.size \| default(osd_pool_default_size) }} ``` as the pool size value, but `item.size` is always undefined; the correct variable is `item.value.size`. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `3c31b19ab3`)	2020-02-17 11:53:58 -05:00
Dimitri Savineau	db8902d444	ceph-{mon,osd}: move default crush variables Since `ed36a11` we move the crush rules creation code from the ceph-mon to the ceph-osd role. To keep the backward compatibility we kept the possibility to set the crush variables on the mons side but we didn't move the default values. As a result, when using crush_rule_config set to true and wanted to use the default values for crush_rules then the crush rule ansible task creation will fail. "msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute 'crush_rules'" This patch move the default crush variables from ceph-mon to ceph-osd role but also use those default values when nothing is defined on the mons side. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798864 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `1fc6b33714`)	2020-02-17 16:23:33 +01:00
Dimitri Savineau	306ce82358	ceph-validate: fail if no mgr host is present We already stop the upgrade playbook (rolling_update.yml) if there's no mgr node present so we should also do the same for initial deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1788644 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-11 13:27:10 -05:00
Dimitri Savineau	553fb1ed1e	ceph-mon: use interactive session with aliases When using ceph aliases with commands that require manual intervention to stop then the command will keep running inside the container (like using Ctrl+c). For handling this, we should use the interactive session option (-it) with the docker commands. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1797874 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-05 15:29:51 +01:00
Mike Christie	c2a9397474	iscsi: Fix crashes during rolling update During a rolling update we will run the ceph iscsigw tasks that start the daemons then run the configure_iscsi.yml tasks which can create iscsi objects like targets, disks, clients, etc. The problem is that once the daemons are started they will accept confifguration requests, or may want to update the system themself. Those operations can then conflict with the configure_iscsi.yml tasks that setup objects and we can end up in crashes due to the kernel being in a unsupported state. This could also happen during creation, but is less likely due to no objects being setup yet, so there are no watchers or users accessing the gws yet. The fix in this patch works for both update and initial setup. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1795806 Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `77f3b5d51b`)	2020-02-03 15:15:53 +01:00
Guillaume Abrioux	b7a21d94d3	tests: retry to fire up VMs on vagrant failure Add a script to retry several times to fire up VMs to avoid vagrant failures. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-authored-by: Andrew Schoen <aschoen@redhat.com> (cherry picked from commit `1ecb3a9352`)	2020-02-03 10:20:19 +01:00
Guillaume Abrioux	d437593e85	config: fix external client scenario When no monitor group is present in the inventory, this task fails. This affects only non-containerized deployments. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e7bc079405`)	2020-02-03 10:20:19 +01:00
Guillaume Abrioux	523a93b0e1	tests: add external_clients scenario This commit adds a new 'external ceph clients' scenario. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `641729357e`)	2020-02-03 10:20:19 +01:00
Guillaume Abrioux	b6744fd82a	validate: allow running ceph-ansible 3.2 against ansible 2.7 This commit allows ceph-ansible 3.2 to be run against ansible 2.7 However, note that running stable-3.2 against ansible 2.7 doesn't get any testing upstream this might break the playbook, only ansible 2.6 is officially supported. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1781635 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-31 10:07:48 -05:00
Guillaume Abrioux	ce7503a3a6	tests: add 'all_in_one' scenario Add new scenario 'all_in_one' in order to catch more collocated related issues. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3e7dbb4b16`)	2020-01-31 11:26:40 +01:00
Guillaume Abrioux	cf748e729f	update: remove legacy tasks These tasks should have been removed with backport #4756 Note: This should have been backported from master but it's not possible because of too many change between master and stable-3.2 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1740463 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-29 09:25:15 -05:00
Dimitri Savineau	13e0f7d341	ceph-defaults: remove rgw from ceph_conf_overrides The [rgw] section in the ceph.conf file or via the ceph_conf_overrides variable doesn't exist and has no effect. To apply overrides to all radosgw instances we should use either the [global] or [client] sections. Overrides per radosgw instance should still use the [client.rgw.{instance-name}] section. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794552 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `2f07b85131`)	2020-01-29 14:19:17 +01:00
Guillaume Abrioux	726b3f220b	defaults: change monitor\|radosgw_address default values To avoid confusion, let's change the default value from `0.0.0.0` to `x.x.x.x`. Users might think setting `0.0.0.0` will make the daemon binding on all interfaces. Fixes: #4827 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `fc02fc98eb`)	2020-01-14 17:22:35 +01:00
Dimitri Savineau	071b950325	tox: allow copy admin key for purge scenario This is enabled in the group_vars/clients file but it's overrided in extra vars by tox. Let's do it like that for now. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-13 14:50:29 -05:00
Guillaume Abrioux	01095f1f4c	tests: add coverage on purge playbook This commit adds a playbook to be played before we run purge playbook, it first creates an rbd image then map an rbd device on client0 so the purge playbook will try to unmap it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `db77fbda15`)	2020-01-13 14:50:29 -05:00
Guillaume Abrioux	5db0b239f6	purge: use sysfs to unmap rbd devices in containerized context, using the binary provided in atomic os won't work because it's an old version provided by ceph-common based on 10.2.5. Using a container could be an idea but for large cluster with hundreds of client nodes, that would require to pull the image of each of them just to unmap the rbd devices. Let's use the sysfs method in order to avoid any issue related to ceph version that is shipped on the host. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766064 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3cfcc7a105`)	2020-01-13 14:50:29 -05:00
Guillaume Abrioux	bcd7fee18d	update: only run post osd upgrade play on 1 mon There is no need to run these tasks n times from each monitor. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c878e99589`)	2020-01-13 13:42:01 -05:00
Guillaume Abrioux	09f295e89c	update: use flags noout and nodeep-scrub only 1. set noout and nodeep-scrub flags, 2. upgrade each OSD node, one by one, wait for active+clean pgs 3. after all osd nodes are upgraded, unset flags Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-authored-by: Rachana Patel <racpatel@redhat.com> (cherry picked from commit `548db78b95`)	2020-01-13 13:42:01 -05:00
Dimitri Savineau	7ce33f4865	ceph-defaults: exclude rbd devices from discovery The RBD devices aren't excluded from the devices list in the LVM auto discovery scenario. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783908 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `6f0556f015`)	2020-01-13 12:06:35 -05:00
Dimitri Savineau	aea4257807	ceph-osd: wait for all osds once `cf8c6a3` moves the 'wait for all osds' task from openstack_config to the main tasks list. But the openstack_config code was executed only on the last OSD node. We don't need to do this check on all OSD node so we need to add set run_once to true on that task. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `5bd1cf40eb`)	2020-01-13 16:54:01 +01:00
Dimitri Savineau	9a42fe580f	ceph-osd: wait for all osd before crush rules When creating crush rules with device class parameter we need to be sure that all OSDs are up and running because the device class list is is populated with this information. This is now enable for all scenario not openstack_config only. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `cf8c6a3849`)	2020-01-13 16:54:01 +01:00
Dimitri Savineau	8b2659bf6d	rolling_update: create crush rule after osd play When upgrading from jewel to luminous we can execute the crush rule tasks only when the 'osd require-osd-release luminous' command. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-13 16:54:01 +01:00
Dimitri Savineau	af57597df6	ceph-osd: add device class to crush rules This adds device class support to crush rules when using the class key in the rule dict via the create-replicated sub command. If the class key isn't specified then we use the create-simple sub command for backward compatibility. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ef2cb99f73`)	2020-01-13 16:54:01 +01:00
Dimitri Savineau	0ac43d83f4	move crush rule creation from mon to osd role If we want to create crush rules with the create-replicated sub command and device class then we need to have the OSD created before the crush rules otherwise the device classes won't exist. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ed36a11eab`)	2020-01-13 16:54:01 +01:00
Dimitri Savineau	255be99bc5	ceph-validate: add rbdmirror validation When ceph_rbd_mirror_configure is set to true we need to ensure that the required variables aren't empty. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1760553 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `4a065cebd7`)	2020-01-13 16:53:32 +01:00
Dimitri Savineau	2436044369	switch_to_containers: set GUID on lockbox part The ceph lockbox partition (part number 5) used with non lvm scenarios and in non containerized deployment don't have a valid PARTUUID. The value is set to 00000000-0000-0000-0000-000000000000 for each OSD devices. $ blkid -t PARTLABEL="ceph lockbox" -o value -s PARTUUID 00000000-0000-0000-0000-000000000000 00000000-0000-0000-0000-000000000000 00000000-0000-0000-0000-000000000000 00000000-0000-0000-0000-000000000000 00000000-0000-0000-0000-000000000000 When switching to containerized deployment we manually mount the lockbox partition by using the PARTUUID. Unfortunately because we have most of the time multiple OSD on the same node we can't have the right symlink in /dev/disk/by-partuuid because it will point to only one partition. /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000 -> ../../sdb5 After the switch_to_containers playbook then only one OSD will restart correctly and the other will try to access to the wrong device causing error like 'xxxx is still in use'. When deploying with containers and dmcrypt OSDs we force a PARTUUID value during the ceph-disk prepare task. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-13 16:52:55 +01:00
Dimitri Savineau	58ffae3117	ceph-mds: allow directory fragmentation We need to explicitly enable the allow_dirfrags flag on cephfs pool after upgrading to Luminous. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1776233 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-13 16:52:11 +01:00
Guillaume Abrioux	881056fa9d	facts: avoid duplicated element in devices list When using `osd_auto_discovery`, `devices` is built multiple times due to multiple runs of `ceph-facts` role. It end up with duplicate instances of a same device in the list. Using `unique` filter when building the list fixes this issue. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `23b1f43897`)	2020-01-13 15:47:02 +01:00
Guillaume Abrioux	195c49eaa9	tests: add shrink-osd-legacy testing This commit introduce back testing against ceph-disk deployed osds. In stable-3.2 which is the most common version used at customers (downstream pov), a bunch of OSDs are still deployed using ceph-disk. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-09 09:24:22 +01:00
Guillaume Abrioux	ca728dcd70	shrink-osd: support fqdn in inventory When using fqdn in inventory, that playbook fails because of some tasks using the result of ceph osd tree (which returns shortname) to get some datas in hostvars[]. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1779021 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `6d9ca6b05b`)	2020-01-09 09:24:22 +01:00
Dimitri Savineau	193ce4f572	ceph-iscsi: add ceph-iscsi stable repositories This commit adds the support of the ceph-iscsi stable repository when use ceph_repository community instead of always using the devel repositories. We're still using the devel repositories for rtslib and tcmu-runner in both cases (dev and community). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-08 17:47:52 +01:00

1 2 3 4 5 ...

4382 Commits (65b0e9bb5db52d2a0279069e6f2fded31180e3fc) All Branches Search

4382 Commits (65b0e9bb5db52d2a0279069e6f2fded31180e3fc)

All Branches