ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Dimitri Savineau	c409d6e960	tests: add lvm-auto-discovery scenario This adds the lvm-auto-discovery scenario. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-12-09 09:32:55 +01:00
Dimitri Savineau	9f9b952473	ceph-defaults: exclude md devices from discovery The md devices (RAID software) aren't excluded from the devices list in the auto discovery scenario. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1764601 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `014f51c2a4`)	2019-12-09 09:32:55 +01:00
Guillaume Abrioux	4f6925890c	facts: fix auto_discovery exclude the previous approach was wrong. checking if `item.key` is in `osd_auto_discovery_exclude` (`['dm-', 'loop']`) is incorrect because it will obviously not match. Therefore, the condition will return `True` whatever the device we are checking. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `8f42007272`)	2019-12-09 09:32:55 +01:00
Guillaume Abrioux	f6fea33b40	osd: add possibility to exclude device in osd_auto_discovery Add a new `osd_auto_discovery_exclude` to give the possibility of excluding some devices in auto_discovery scenario. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `83d7ef777e`)	2019-12-09 09:32:55 +01:00
Andrew Schoen	690860affc	ceph-facts: generate devices when osd_auto_discovery is true This task used to live in ceph-osd, but we need it defined here to that ceph-config can use it when trying to determine the number of osds. Signed-off-by: Andrew Schoen <aschoen@redhat.com> (cherry picked from commit `88eda479a9`)	2019-12-09 09:32:55 +01:00
Dimitri Savineau	825429658b	tests: reduce max_mds from 3 to 2 Having max_mds value equals to the number of mds nodes generates a warning in the ceph cluster status: cluster: id: 6d3e49a4-ab4d-4e03-a7d6-58913b8ec00a' health: HEALTH_WARN' insufficient standby MDS daemons available' (...) services: mds: cephfs:3 {0=mds1=up:active,1=mds0=up:active,2=mds2=up:active}' Let's use 2 active and 1 standby mds. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `4a6d19dae2`)	2019-12-04 17:57:33 -05:00
Dimitri Savineau	b08ac9cd44	switch_to_containers: fix umount ceph partitions When a container is already running on a non containerized node then the umount ceph partition task is skipped. This is due to the container ps command which always returns 0 even if the filter matches nothing. We should run the umount task when: 1/ the container command is failing (not installed) : rc != 0 2/ the container command reports running ceph-osd containers : rc == 0 Also we should not fail on the ceph directory listing. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `39cfe0aa65`)	2019-12-03 15:58:57 +01:00
Guillaume Abrioux	cbfa01f697	tests: fix update scenario (container) The path to the inventory isn't correct because we are missing the variable `CONTAINER_DIR` here. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-12-02 10:48:56 -05:00
Guillaume Abrioux	4d004bd5f6	tests: revert vagrant_variable file name detection This commit reverts the following change: `fcf181342a (diff-23b6f443c01ea2efcb4f36eedfea9089R7-R14)` this is causing CI failures so this commit is intended to unlock the CI. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5353ab8a23`)	2019-11-25 15:33:10 +01:00
Dimitri Savineau	cb0926262d	rolling_update: don't enable ceph-mon unit On non containerized deployment the ceph-mon hostname/fqdn systemd service are stopped at the beginning of the mon upgrade. But the parameter enabled is set to true for both task so even if we're not using the fqdn then it will enabled the systemd unit based on it. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1649617 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-11-21 09:19:43 +01:00
Dimitri Savineau	25ac0efddd	container: add always tag on gather fact tasks If we execute the site-container.yml playbook with specific tags (like ceph_update_config) then we need to be sure to gather the facts otherwise we will see error like: The task includes an option with an undefined variable. The error was: 'ansible_hostname' is undefined This commit also adds missing 'gather_facts: false' to mons plays. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1754432 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `d7fd769b6d`)	2019-11-18 16:42:47 +01:00
VasishtaShastry	c67de5a342	Evades validation of ceph_repository_type in containerized scenario This will prevent failure of site-docker.yml with configs in doc. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1769760 Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com> Co-Authored-By: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `9a1f1626c3`)	2019-11-18 16:41:34 +01:00
Guillaume Abrioux	c6bc4c4976	ceph_key: restore file mode after a key is fetched when `import_key` is enabled, if the key already exists, it will only be fetched using ceph cli, if the mode specified in the `ceph_key` task is different from what is applied by the ceph cli, the mode isn't restored because we don't call `module.set_fs_attributes_if_different()` before `module.exit_json(**result)` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1734513 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `b717b5f736`)	2019-11-15 06:11:11 +01:00
Noah Watkins	146d144045	Remove outdated documentation Fixes BZ https://bugzilla.redhat.com/show_bug.cgi?id=1640525 Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2019-11-13 16:04:55 +01:00
Guillaume Abrioux	4b1a810906	mergify: remove mergify config on stable-3.2 This commit removes the mergify config on stable-3.2 At the moment there is no need to have a mergify config on this branch given that we don't use it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-11-07 15:22:29 -05:00
Dimitri Savineau	b47f7763fc	ceph-osd: fix fs.aio-max-nr sysctl condition [1] introduced a regression on the fs.aio-max-nr sysctl value condition. The enable key isn't a boolean but a string because the expression isn't evaluated. This string output "(osd_objectstore == 'bluestore')" is always true because item.enable condition only matches non empty string. So the sysctl value was applyied for both filestore and bluestore backend. [2] added the bool filter to the condition but the filter always returns false on string and the sysctl wasn't applyed at all. This commit fixes the enable key value by evaluating the value instead of using the string. [1] https://github.com/ceph/ceph-ansible/commit/08a2b58 [2] https://github.com/ceph/ceph-ansible/commit/ab54fe2 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ece46d33be`)	2019-11-07 20:38:33 +01:00
Harald Jensås	e8ed6655f3	Support comma-delimited subnets in firewall ceph.conf supports a comma separated list of subnet CIDR's for the public_network and the cluster network. ceph-ansible should support setting up the firewall for this configuration. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1767392 Closes: #4425 Related: #4333 https://docs.ceph.com/docs/nautilus/rados/configuration/network-config-ref/#network-config-settings Signed-off-by: Harald Jensås <hjensas@redhat.com> (cherry picked from commit `d94229204d`)	2019-11-01 11:00:18 -04:00
Dimitri Savineau	dd4a4cbb66	ceph-infra: Remove restart firewalld handler There's no need to restart firewalld service when a new rule is added due to the usage of the immediate flag. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `b7338d438a`)	2019-11-01 11:00:18 -04:00
Dimitri Savineau	4cd53bfbe5	ceph-osd: Remove ulimit nofile on container start Even if this improves ceph-disk/ceph-volume performances then it also impact the ceph-osd process. The ceph-osd process shouldn't use 1024:4096 value for the max open files. Removing the ulimit option from the container engine and doing this kind of change on the container side [1]. [1] https://github.com/ceph/ceph-container/pull/1497 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `9a996aef7f`)	2019-10-31 14:42:41 -04:00
Guillaume Abrioux	a5a231b0b6	update: add default values when setting fact This commit adds a default value in the with_dict because when using python 2.7, if a task using a with_dict has a condition, it is evaluated anyway whereas in python 3 it isn't. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766499 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-29 16:00:39 -04:00
Dimitri Savineau	8acb42dc61	rolling_update: remove default filter on mds group There's no need to use the default filter on active/standby groups because if the group doesn't exist then the play is just skipped. Currently this generates warnings like: [WARNING]: Could not match supplied host pattern, ignoring: \| [WARNING]: Could not match supplied host pattern, ignoring: default([]) Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `2ca79fcc99`)	2019-10-28 13:08:43 -04:00
Dimitri Savineau	bd79b4480a	rolling_update: fix active mds host value The active mds host should be based on the inventory hostname and not on the ansible hostname. The value returns under the mdsmap structure is based on the OS hostname so we need to find the right node in the inventory with this value when doing operation on inventory nodes. Othewise we could see error like: The task includes an option with an undefined variable. The error was: "hostvars[foobar]" is undefined Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `f1f2352c79`)	2019-10-28 13:08:43 -04:00
Guillaume Abrioux	4b667b2f37	update: skip mds deactivation when no mds in inventory Let's skip this part of the code if there's no mds node in the inventory. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5ec906c3af`)	2019-10-25 08:57:47 -04:00
Dimitri Savineau	f3fc97caa0	openstack_config: fix docker exec command container_exec_cmd should be replace by docker_exec_cmd. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1765110 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-24 14:13:52 -04:00
Guillaume Abrioux	1884506189	update: follow new recommandation to upgrade mds cluster Refact the mds cluster upgrade code in order to follow the documented recommandation. See: https://github.com/ceph/ceph/blob/luminous/doc/cephfs/upgrading.rst Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1569689 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `71cebf80a6`)	2019-10-21 15:44:38 -04:00
Dimitri Savineau	52bba29a7f	tests: fix the size on the second data LV The commit replaces the pv/vg/lv commands used with the ansible command module by the lvg and lvol modules. This also fixes the size of the second data LV because we were only using 50% of the remaining space instead of 100%. With a 50G device, the result was: - data-lv1 was 25G - data-lv2 was 12.5G Instead of: - data-lv1 was 25G - data-lv2 was 25G Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `2c03c6fcd3`)	2019-10-18 14:49:57 -04:00
Guillaume Abrioux	8dc40711bb	common: do not override ceph_release when using custom repo Otherwise it fails like following: ``` TASK [ceph-mds : allow multimds] ************************************************************************************************************************************************ Monday 22 July 2019 16:37:38 +0800 (0:00:03.269) 0:13:25.651 ********* fatal: [rhel7u6clone1]: FAILED! => {"msg": "The conditional check 'ceph_release_num[ceph_release] == ceph_release_num.luminous' failed. The error was: error while evaluating conditional (ceph_release_num[ceph_release] == ceph_release_num.luminous): 'dict object' has no attribute u'dummy'\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-mds/tasks/create_mds_filesystems.yml': line 43, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: allow multimds\n ^ here\n"} ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1645379 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4e9504c939`)	2019-10-17 20:10:57 -04:00
Guillaume Abrioux	9dad8fc201	tests: add multimds coverage This commit makes the all_daemons scenario deploying 3 mds in order to cover the multimds case. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-18 00:34:48 +02:00
Dimitri Savineau	c8d0c4722c	rbd-mirror: fail if the peer is not added Due the 'failed_when: false' statement present in the peer task then the playbook continues to ran even if the peer task was failing (like incorrect remote peer format. "stderr": "rbd: invalid spec 'admin@cluster1'" This patch adds a task to list the peer present and add the peer only if it's not already added. With this we don't need the failed_when statement anymore. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1665877 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `0b1e9c0737`)	2019-10-16 14:01:18 -04:00
Dimitri Savineau	1eea339f87	Remove validate action and notario dependency The current ceph-validate role is using both validate action and fail module tasks to validate the ceph configuration. The validate action is based on the notario python library. When one of the notario validation fails then a python stack trace is reported to the ansible task. This output isn't understandable by users. This patch removes the validate action and the notario depencendy. The validation is now done with only fail ansible module. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1654790 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-15 18:05:16 +02:00
Dimitri Savineau	475d2b1557	tests: fix rgw multisite vagrant variables The secondary vagrant variables didn't have the grafana vm variable set which create an vagrant error. There was an error loading a Vagrantfile. The file being loaded and the error message are shown below. This is usually caused by an invalid or undefined variable. This patch also changes the ssh-extra-args parameter to ssh-common-args to get the same values for ssh/sftp/scp. Otherwise we can see warnings from ansible and some tasks are failing. [WARNING]: sftp transfer mechanism failed on [mon0]. Use ANSIBLE_DEBUG=1 to see detailed information It also updates the ssh-common-args value for the rgw-multisite scenario to reflect the ANSIBLE_SSH_ARGS environment variable value. Finally changing the IP addresses due to the Vagrant refact done in the commit `778c51a` Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `010158ff84`)	2019-10-14 09:46:38 +02:00
Guillaume Abrioux	07489c9f8e	switch_to_containers: optimize ownership change As per https://github.com/ceph/ceph-ansible/pull/4323#issuecomment-538420164 using `find` command should be faster. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1757400 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-Authored-by: Giulio Fidente <gfidente@redhat.com> (cherry picked from commit `c5d0c90bb7`)	2019-10-11 12:19:21 -04:00
Guillaume Abrioux	70ac841153	validate: prevent from installing OSD on same disk as the OS This commit adds a validation task to prevent from installing an OSD on the same disk as the OS. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623580 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `80e2d00b16`)	2019-10-11 09:44:20 -04:00
Guillaume Abrioux	6e976c197c	tests: update tox due to pipeline removal This commit reflects the recent changes in ceph/ceph-build#1406 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `bcaf8cedee`)	2019-10-08 14:03:52 -04:00
Dimitri Savineau	2d40e3923f	switch_to_containers: umount osd lockbox partition When switching from a baremetal deployment to a containerized deployment we only umount the OSD data partition. If the OSD is encrypted (dmcrypt: true) then there's an additional partition (part number 5) used for the lockbox and mount in the /var/lib/ceph/osd-lockbox/ directory. Because this partition isn't umount then the containerized OSD aren't able to start. The partition is still mount by the system and can't be remount from the container. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `19edf707a5`)	2019-10-08 09:43:40 +02:00
Dimitri Savineau	2e44b6af74	ceph-config: remove container_binary variable `9e7972a` introduced a regression via the container_binary variable which is undefined. The CEPH_CONTAINER_BINARY environment variable isn't used at all. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-08 00:44:13 +02:00
Dimitri Savineau	077b61a008	ceph-mgr: fix ceph_key module with container `556052b` changed the way the mgr keyring are created but the ceph_key module need the containerized parameter when the deployment is using containers. This module doesn't support CEPH_CONTAINER_[BINARY\|IMAGE] environment variables. Closes: #4547 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-10-07 16:05:43 -04:00
Guillaume Abrioux	b1fa3c881c	nfs: stop nfs server service in all context This commit moves this task in order to stop the nfs server service regardless the deployment type desired (containerized or non containerized). Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1508506 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `6c6a512a72`)	2019-10-07 18:18:21 +02:00
Guillaume Abrioux	003017d568	nfs: stop nfs server service The syntax here wasn't working, this refact fixes this task. Also, removing the `ignore_errors: true` which was hidding the failure. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1508506 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `47034effe0`)	2019-10-07 18:18:21 +02:00
Guillaume Abrioux	fb7ca818d1	playbook: add missing tags Add missing tag on ceph-handler role call. Otherwise, we can't use `--tags='ceph_update_config'` for updating the ceph configuration file. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1754432 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `f59dad620d`)	2019-10-07 09:05:39 +02:00
Rishabh Dave	556052b235	ceph-mgr: create keys for MGRs Add code in ceph-mgr for creating a keyring for manager in so that managers can be deployed on a separate node too. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1552210 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `56bfec7c58`)	2019-10-04 13:15:26 +02:00
Dimitri Savineau	070db68ffd	ceph-handler: don't restart all OSDs with limit When using the ansible --limit option on one or few OSD nodes and if the handler is triggered then we will restart the OSD service on all OSDs nodes instead of the hosts limited by the limit value. Even if the play is limited by the --limit value we are using all OSD nodes from the OSD group. with_items: '{{ groups[osd_group_name] }}' Instead we should iterate only on the nodes present in both OSD group and limit list. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `0346871fb5`)	2019-10-04 07:43:17 +02:00
Guillaume Abrioux	f7b4ca5237	Vagrantfile: support more than 9 nodes per daemon type because of the current ip address assignation, it's not possible to deploy more than 9 nodes per daemon type. This commit refact a bit and allows us to get around this limitation. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `778c51a0ff`)	2019-10-04 07:40:51 +02:00
Guillaume Abrioux	d9f6b37ae6	tests: set gateway_ip_list dynamically so we dont' have to hardcode this in the tests Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-04 07:39:41 +02:00
Guillaume Abrioux	8a1bda6d91	osd: refact 'wait for all osd to be up' task let's use `until` instead of doing test in bash using python oneliner also, use `command` instead of `shell`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c76cd5ad84`)	2019-10-04 04:25:20 +02:00
Guillaume Abrioux	86c224e71d	validate: fix gpt header check Check for gpt header when osd scenario is lvm or lvm batch. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1731310 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-10-01 09:59:31 -04:00
Kevin Jones	b3abe23493	Set proper ownership command performance improvement By changing the set ownership command from using the file module in combination with a with_items loop to a raw chown command, we can achieve a 98% performance increase here. On a ceph cluster with a significant amount of directories and files in /var/lib/ceph, the file module has to run checks on ownership of all those directories and files to determine whether a change is needed. In this case, we just want to explicitly set the ownership of all these directories and files to the ceph_uid Added context note to all set proper ownership tasks Signed-off-by: Kevin Jones <kevinjones@redhat.com> (cherry picked from commit `47bf47c9d8`)	2019-10-01 09:10:28 -04:00
Andrew Schoen	1821efb3a2	ceph-config: do not always assume containers when calculating num_osds CEPH_CONTAINER_IMAGE should be None if containerized_deployment is False. Resolves: #4498 Signed-off-by: Andrew Schoen <aschoen@redhat.com> (cherry picked from commit `70a4368bc5`)	2019-09-30 13:38:51 -04:00
Guillaume Abrioux	749d404e87	mon: use ceph_key module for containerized mgr keyring creation This commit replaces a `command` task with `ceph_key` in order to create mgr keyrings. This allows us to use `mode` parameter to set the right mode on generated keys. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1734513 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-09-25 11:30:41 -04:00
Dimitri Savineau	211dd2fcf6	ceph-osd: handle loop devices with containers Since we change the way to run the OSD containers with the ID instead of the device name, we lost the ability to use loop devices. Loop devices are like nvme or cciss devices because the partitions are referenced with an extra 'p' before the partition number. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1749097 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-09-25 16:11:29 +02:00

1 2 3 4 5 ...

4330 Commits (c409d6e96008cd431f1679d2582325f174c47879) All Branches Search

4330 Commits (c409d6e96008cd431f1679d2582325f174c47879)

All Branches