ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	cf748e729f	update: remove legacy tasks These tasks should have been removed with backport #4756 Note: This should have been backported from master but it's not possible because of too many change between master and stable-3.2 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1740463 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-29 09:25:15 -05:00
Dimitri Savineau	13e0f7d341	ceph-defaults: remove rgw from ceph_conf_overrides The [rgw] section in the ceph.conf file or via the ceph_conf_overrides variable doesn't exist and has no effect. To apply overrides to all radosgw instances we should use either the [global] or [client] sections. Overrides per radosgw instance should still use the [client.rgw.{instance-name}] section. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794552 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `2f07b85131`)	2020-01-29 14:19:17 +01:00
Guillaume Abrioux	726b3f220b	defaults: change monitor\|radosgw_address default values To avoid confusion, let's change the default value from `0.0.0.0` to `x.x.x.x`. Users might think setting `0.0.0.0` will make the daemon binding on all interfaces. Fixes: #4827 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `fc02fc98eb`)	2020-01-14 17:22:35 +01:00
Dimitri Savineau	071b950325	tox: allow copy admin key for purge scenario This is enabled in the group_vars/clients file but it's overrided in extra vars by tox. Let's do it like that for now. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-13 14:50:29 -05:00
Guillaume Abrioux	01095f1f4c	tests: add coverage on purge playbook This commit adds a playbook to be played before we run purge playbook, it first creates an rbd image then map an rbd device on client0 so the purge playbook will try to unmap it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `db77fbda15`)	2020-01-13 14:50:29 -05:00
Guillaume Abrioux	5db0b239f6	purge: use sysfs to unmap rbd devices in containerized context, using the binary provided in atomic os won't work because it's an old version provided by ceph-common based on 10.2.5. Using a container could be an idea but for large cluster with hundreds of client nodes, that would require to pull the image of each of them just to unmap the rbd devices. Let's use the sysfs method in order to avoid any issue related to ceph version that is shipped on the host. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766064 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3cfcc7a105`)	2020-01-13 14:50:29 -05:00
Guillaume Abrioux	bcd7fee18d	update: only run post osd upgrade play on 1 mon There is no need to run these tasks n times from each monitor. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c878e99589`)	2020-01-13 13:42:01 -05:00
Guillaume Abrioux	09f295e89c	update: use flags noout and nodeep-scrub only 1. set noout and nodeep-scrub flags, 2. upgrade each OSD node, one by one, wait for active+clean pgs 3. after all osd nodes are upgraded, unset flags Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-authored-by: Rachana Patel <racpatel@redhat.com> (cherry picked from commit `548db78b95`)	2020-01-13 13:42:01 -05:00
Dimitri Savineau	7ce33f4865	ceph-defaults: exclude rbd devices from discovery The RBD devices aren't excluded from the devices list in the LVM auto discovery scenario. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783908 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `6f0556f015`)	2020-01-13 12:06:35 -05:00
Dimitri Savineau	aea4257807	ceph-osd: wait for all osds once `cf8c6a3` moves the 'wait for all osds' task from openstack_config to the main tasks list. But the openstack_config code was executed only on the last OSD node. We don't need to do this check on all OSD node so we need to add set run_once to true on that task. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `5bd1cf40eb`)	2020-01-13 16:54:01 +01:00
Dimitri Savineau	9a42fe580f	ceph-osd: wait for all osd before crush rules When creating crush rules with device class parameter we need to be sure that all OSDs are up and running because the device class list is is populated with this information. This is now enable for all scenario not openstack_config only. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `cf8c6a3849`)	2020-01-13 16:54:01 +01:00
Dimitri Savineau	8b2659bf6d	rolling_update: create crush rule after osd play When upgrading from jewel to luminous we can execute the crush rule tasks only when the 'osd require-osd-release luminous' command. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-13 16:54:01 +01:00
Dimitri Savineau	af57597df6	ceph-osd: add device class to crush rules This adds device class support to crush rules when using the class key in the rule dict via the create-replicated sub command. If the class key isn't specified then we use the create-simple sub command for backward compatibility. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ef2cb99f73`)	2020-01-13 16:54:01 +01:00
Dimitri Savineau	0ac43d83f4	move crush rule creation from mon to osd role If we want to create crush rules with the create-replicated sub command and device class then we need to have the OSD created before the crush rules otherwise the device classes won't exist. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ed36a11eab`)	2020-01-13 16:54:01 +01:00
Dimitri Savineau	255be99bc5	ceph-validate: add rbdmirror validation When ceph_rbd_mirror_configure is set to true we need to ensure that the required variables aren't empty. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1760553 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `4a065cebd7`)	2020-01-13 16:53:32 +01:00
Dimitri Savineau	2436044369	switch_to_containers: set GUID on lockbox part The ceph lockbox partition (part number 5) used with non lvm scenarios and in non containerized deployment don't have a valid PARTUUID. The value is set to 00000000-0000-0000-0000-000000000000 for each OSD devices. $ blkid -t PARTLABEL="ceph lockbox" -o value -s PARTUUID 00000000-0000-0000-0000-000000000000 00000000-0000-0000-0000-000000000000 00000000-0000-0000-0000-000000000000 00000000-0000-0000-0000-000000000000 00000000-0000-0000-0000-000000000000 When switching to containerized deployment we manually mount the lockbox partition by using the PARTUUID. Unfortunately because we have most of the time multiple OSD on the same node we can't have the right symlink in /dev/disk/by-partuuid because it will point to only one partition. /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000 -> ../../sdb5 After the switch_to_containers playbook then only one OSD will restart correctly and the other will try to access to the wrong device causing error like 'xxxx is still in use'. When deploying with containers and dmcrypt OSDs we force a PARTUUID value during the ceph-disk prepare task. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-13 16:52:55 +01:00
Dimitri Savineau	58ffae3117	ceph-mds: allow directory fragmentation We need to explicitly enable the allow_dirfrags flag on cephfs pool after upgrading to Luminous. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1776233 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-13 16:52:11 +01:00
Guillaume Abrioux	881056fa9d	facts: avoid duplicated element in devices list When using `osd_auto_discovery`, `devices` is built multiple times due to multiple runs of `ceph-facts` role. It end up with duplicate instances of a same device in the list. Using `unique` filter when building the list fixes this issue. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `23b1f43897`)	2020-01-13 15:47:02 +01:00
Guillaume Abrioux	195c49eaa9	tests: add shrink-osd-legacy testing This commit introduce back testing against ceph-disk deployed osds. In stable-3.2 which is the most common version used at customers (downstream pov), a bunch of OSDs are still deployed using ceph-disk. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-01-09 09:24:22 +01:00
Guillaume Abrioux	ca728dcd70	shrink-osd: support fqdn in inventory When using fqdn in inventory, that playbook fails because of some tasks using the result of ceph osd tree (which returns shortname) to get some datas in hostvars[]. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1779021 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `6d9ca6b05b`)	2020-01-09 09:24:22 +01:00
Dimitri Savineau	193ce4f572	ceph-iscsi: add ceph-iscsi stable repositories This commit adds the support of the ceph-iscsi stable repository when use ceph_repository community instead of always using the devel repositories. We're still using the devel repositories for rtslib and tcmu-runner in both cases (dev and community). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-01-08 17:47:52 +01:00
Guillaume Abrioux	d606ad0bac	ansible.cfg: do not enforce PreferredAuthentications There's no need to enforce PreferredAuthentications by default. Users can still choose to override the ansible.cfg with any additional parameter like this one to fit their infrastructure. Fixes: #4826 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d682412e2a`)	2019-12-11 08:56:05 -05:00
Dimitri Savineau	56a7537f48	ceph-osd: update systemd unit script The systemd unit script wasn't updated with the new container name format (without the hostname). We now have the same start/stop docker commands for all scenarios. During the device to id OSD migration we need to be sure that the old container with the hostname are stopped. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1780688 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-12-10 23:59:13 +01:00
Dimitri Savineau	c409d6e960	tests: add lvm-auto-discovery scenario This adds the lvm-auto-discovery scenario. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-12-09 09:32:55 +01:00
Dimitri Savineau	9f9b952473	ceph-defaults: exclude md devices from discovery The md devices (RAID software) aren't excluded from the devices list in the auto discovery scenario. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1764601 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `014f51c2a4`)	2019-12-09 09:32:55 +01:00
Guillaume Abrioux	4f6925890c	facts: fix auto_discovery exclude the previous approach was wrong. checking if `item.key` is in `osd_auto_discovery_exclude` (`['dm-', 'loop']`) is incorrect because it will obviously not match. Therefore, the condition will return `True` whatever the device we are checking. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `8f42007272`)	2019-12-09 09:32:55 +01:00
Guillaume Abrioux	f6fea33b40	osd: add possibility to exclude device in osd_auto_discovery Add a new `osd_auto_discovery_exclude` to give the possibility of excluding some devices in auto_discovery scenario. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `83d7ef777e`)	2019-12-09 09:32:55 +01:00
Andrew Schoen	690860affc	ceph-facts: generate devices when osd_auto_discovery is true This task used to live in ceph-osd, but we need it defined here to that ceph-config can use it when trying to determine the number of osds. Signed-off-by: Andrew Schoen <aschoen@redhat.com> (cherry picked from commit `88eda479a9`)	2019-12-09 09:32:55 +01:00
Dimitri Savineau	825429658b	tests: reduce max_mds from 3 to 2 Having max_mds value equals to the number of mds nodes generates a warning in the ceph cluster status: cluster: id: 6d3e49a4-ab4d-4e03-a7d6-58913b8ec00a' health: HEALTH_WARN' insufficient standby MDS daemons available' (...) services: mds: cephfs:3 {0=mds1=up:active,1=mds0=up:active,2=mds2=up:active}' Let's use 2 active and 1 standby mds. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `4a6d19dae2`)	2019-12-04 17:57:33 -05:00
Dimitri Savineau	b08ac9cd44	switch_to_containers: fix umount ceph partitions When a container is already running on a non containerized node then the umount ceph partition task is skipped. This is due to the container ps command which always returns 0 even if the filter matches nothing. We should run the umount task when: 1/ the container command is failing (not installed) : rc != 0 2/ the container command reports running ceph-osd containers : rc == 0 Also we should not fail on the ceph directory listing. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `39cfe0aa65`)	2019-12-03 15:58:57 +01:00
Guillaume Abrioux	cbfa01f697	tests: fix update scenario (container) The path to the inventory isn't correct because we are missing the variable `CONTAINER_DIR` here. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-02 10:48:56 -05:00
Guillaume Abrioux	4d004bd5f6	tests: revert vagrant_variable file name detection This commit reverts the following change: `fcf181342a (diff-23b6f443c01ea2efcb4f36eedfea9089R7-R14)` this is causing CI failures so this commit is intended to unlock the CI. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5353ab8a23`)	2019-11-25 15:33:10 +01:00
Dimitri Savineau	cb0926262d	rolling_update: don't enable ceph-mon unit On non containerized deployment the ceph-mon hostname/fqdn systemd service are stopped at the beginning of the mon upgrade. But the parameter enabled is set to true for both task so even if we're not using the fqdn then it will enabled the systemd unit based on it. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1649617 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-21 09:19:43 +01:00
Dimitri Savineau	25ac0efddd	container: add always tag on gather fact tasks If we execute the site-container.yml playbook with specific tags (like ceph_update_config) then we need to be sure to gather the facts otherwise we will see error like: The task includes an option with an undefined variable. The error was: 'ansible_hostname' is undefined This commit also adds missing 'gather_facts: false' to mons plays. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1754432 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `d7fd769b6d`)	2019-11-18 16:42:47 +01:00
VasishtaShastry	c67de5a342	Evades validation of ceph_repository_type in containerized scenario This will prevent failure of site-docker.yml with configs in doc. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1769760 Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com> Co-Authored-By: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `9a1f1626c3`)	2019-11-18 16:41:34 +01:00
Guillaume Abrioux	c6bc4c4976	ceph_key: restore file mode after a key is fetched when `import_key` is enabled, if the key already exists, it will only be fetched using ceph cli, if the mode specified in the `ceph_key` task is different from what is applied by the ceph cli, the mode isn't restored because we don't call `module.set_fs_attributes_if_different()` before `module.exit_json(**result)` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1734513 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `b717b5f736`)	2019-11-15 06:11:11 +01:00
Noah Watkins	146d144045	Remove outdated documentation Fixes BZ https://bugzilla.redhat.com/show_bug.cgi?id=1640525 Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2019-11-13 16:04:55 +01:00
Guillaume Abrioux	4b1a810906	mergify: remove mergify config on stable-3.2 This commit removes the mergify config on stable-3.2 At the moment there is no need to have a mergify config on this branch given that we don't use it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-11-07 15:22:29 -05:00
Dimitri Savineau	b47f7763fc	ceph-osd: fix fs.aio-max-nr sysctl condition [1] introduced a regression on the fs.aio-max-nr sysctl value condition. The enable key isn't a boolean but a string because the expression isn't evaluated. This string output "(osd_objectstore == 'bluestore')" is always true because item.enable condition only matches non empty string. So the sysctl value was applyied for both filestore and bluestore backend. [2] added the bool filter to the condition but the filter always returns false on string and the sysctl wasn't applyed at all. This commit fixes the enable key value by evaluating the value instead of using the string. [1] https://github.com/ceph/ceph-ansible/commit/08a2b58 [2] https://github.com/ceph/ceph-ansible/commit/ab54fe2 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ece46d33be`)	2019-11-07 20:38:33 +01:00
Harald Jensås	e8ed6655f3	Support comma-delimited subnets in firewall ceph.conf supports a comma separated list of subnet CIDR's for the public_network and the cluster network. ceph-ansible should support setting up the firewall for this configuration. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1767392 Closes: #4425 Related: #4333 https://docs.ceph.com/docs/nautilus/rados/configuration/network-config-ref/#network-config-settings Signed-off-by: Harald Jensås <hjensas@redhat.com> (cherry picked from commit `d94229204d`)	2019-11-01 11:00:18 -04:00
Dimitri Savineau	dd4a4cbb66	ceph-infra: Remove restart firewalld handler There's no need to restart firewalld service when a new rule is added due to the usage of the immediate flag. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `b7338d438a`)	2019-11-01 11:00:18 -04:00
Dimitri Savineau	4cd53bfbe5	ceph-osd: Remove ulimit nofile on container start Even if this improves ceph-disk/ceph-volume performances then it also impact the ceph-osd process. The ceph-osd process shouldn't use 1024:4096 value for the max open files. Removing the ulimit option from the container engine and doing this kind of change on the container side [1]. [1] https://github.com/ceph/ceph-container/pull/1497 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `9a996aef7f`)	2019-10-31 14:42:41 -04:00
Guillaume Abrioux	a5a231b0b6	update: add default values when setting fact This commit adds a default value in the with_dict because when using python 2.7, if a task using a with_dict has a condition, it is evaluated anyway whereas in python 3 it isn't. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766499 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-29 16:00:39 -04:00
Dimitri Savineau	8acb42dc61	rolling_update: remove default filter on mds group There's no need to use the default filter on active/standby groups because if the group doesn't exist then the play is just skipped. Currently this generates warnings like: [WARNING]: Could not match supplied host pattern, ignoring: \| [WARNING]: Could not match supplied host pattern, ignoring: default([]) Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `2ca79fcc99`)	2019-10-28 13:08:43 -04:00
Dimitri Savineau	bd79b4480a	rolling_update: fix active mds host value The active mds host should be based on the inventory hostname and not on the ansible hostname. The value returns under the mdsmap structure is based on the OS hostname so we need to find the right node in the inventory with this value when doing operation on inventory nodes. Othewise we could see error like: The task includes an option with an undefined variable. The error was: "hostvars[foobar]" is undefined Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `f1f2352c79`)	2019-10-28 13:08:43 -04:00
Guillaume Abrioux	4b667b2f37	update: skip mds deactivation when no mds in inventory Let's skip this part of the code if there's no mds node in the inventory. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5ec906c3af`)	2019-10-25 08:57:47 -04:00
Dimitri Savineau	f3fc97caa0	openstack_config: fix docker exec command container_exec_cmd should be replace by docker_exec_cmd. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1765110 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-24 14:13:52 -04:00
Guillaume Abrioux	1884506189	update: follow new recommandation to upgrade mds cluster Refact the mds cluster upgrade code in order to follow the documented recommandation. See: https://github.com/ceph/ceph/blob/luminous/doc/cephfs/upgrading.rst Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1569689 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `71cebf80a6`)	2019-10-21 15:44:38 -04:00
Dimitri Savineau	52bba29a7f	tests: fix the size on the second data LV The commit replaces the pv/vg/lv commands used with the ansible command module by the lvg and lvol modules. This also fixes the size of the second data LV because we were only using 50% of the remaining space instead of 100%. With a 50G device, the result was: - data-lv1 was 25G - data-lv2 was 12.5G Instead of: - data-lv1 was 25G - data-lv2 was 25G Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `2c03c6fcd3`)	2019-10-18 14:49:57 -04:00
Guillaume Abrioux	8dc40711bb	common: do not override ceph_release when using custom repo Otherwise it fails like following: ``` TASK [ceph-mds : allow multimds] ************************************************************************************************************************************************ Monday 22 July 2019 16:37:38 +0800 (0:00:03.269) 0:13:25.651 ********* fatal: [rhel7u6clone1]: FAILED! => {"msg": "The conditional check 'ceph_release_num[ceph_release] == ceph_release_num.luminous' failed. The error was: error while evaluating conditional (ceph_release_num[ceph_release] == ceph_release_num.luminous): 'dict object' has no attribute u'dummy'\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-mds/tasks/create_mds_filesystems.yml': line 43, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: allow multimds\n ^ here\n"} ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1645379 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4e9504c939`)	2019-10-17 20:10:57 -04:00

1 2 3 4 5 ...

4353 Commits (cf748e729fbe4b8d22a14624822fc18354b28cd2) All Branches Search

4353 Commits (cf748e729fbe4b8d22a14624822fc18354b28cd2)

All Branches