ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	f463d1838e	mgr: wait for all mgr to be available before managing mgr modules, we must ensure all mgr are available otherwise we can hit failure like following: ``` stdout:Error ENOENT: all mgr daemons do not support module 'restful', pass --force to force enablement ``` It happens because all mgr are not yet available when trying to manage with mgr modules. This should have been cherry-picked from `41f7518c1b` but there's too much changes. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-11 10:02:25 +02:00
Guillaume Abrioux	652374636e	nfs: add coverage on `ganesha_conf_overrides` This commit adds `ganesha_conf_overrides` variable in CI testing. This fixes the test `test_nfs_config_override`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-10 14:24:52 +02:00
Guillaume Abrioux	24810e0da2	tests: fix purge scenarios names This commit fixes the purge_* scenario names in stable-3.1 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-07-10 11:57:22 +02:00
Guillaume Abrioux	13602e426d	tests: add missing variables in collocation scenario add : ceph_origin: repository ceph_repository: community in all.yml for collocation scenario (non contanier) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-23 11:14:00 +02:00
Guillaume Abrioux	018297957e	tests: fix path to inventory host file in tox-update.ini the path had `/{env:CONTAINER_DIR:}` which is already added in `changedir=` section. That led to a wrong path so the initial deployment couldn't complete. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-22 13:54:05 +02:00
Guillaume Abrioux	bf17099964	tests: split update in a dedicated tox.ini file This commit splits the update scenario into a dedicated tox.ini file. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-21 09:25:15 +02:00
Guillaume Abrioux	4cc08f7e1d	tests: use INVENTORY env variable in tox let's use `INVENTORY` variable to run against the right inventory host regarding which OS we are running on. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-20 13:36:18 +02:00
Guillaume Abrioux	d63b1c993d	tests: add back testinfra testing `136bfe0` removed testinfra testing on all scenario excepted all_daemons Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `8d106c2c58`)	2019-04-04 14:26:58 +00:00
Guillaume Abrioux	9a8c1d4081	tests: pin pytest-xdist to 1.27.0 looks like newer version of pytest-xdist requires pytest>=4.4.0 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `ba0a95211c`)	2019-04-04 14:26:58 +00:00
Dimitri Savineau	8cad54e0ef	tox: Fix container purge jobs On containerized CI jobs the playbook executed is purge-cluster.yml but it should be set to purge-docker-cluster.yml Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `bd0869cd01`)	2019-04-04 09:14:05 +02:00
Guillaume Abrioux	dd77affe7f	tests: fix shrink_mon scenario since the node names have changed recently (the 'ceph-' prefix has been removed), we must change the name in the shrink_mon playbook command here. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-03 10:03:10 +02:00
Guillaume Abrioux	a80ea0a929	tests: fix shrink_osd scenario the wrong image version was used to run shrink_osd playbook. in stable-3.1 we should use a luminous image, not nautilus which doesn't have ceph-disk binary anymore. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-03 09:48:04 +02:00
Guillaume Abrioux	7926eebebf	tests: disable nfs scenario The packages are broken, so let's remove it, until this solved. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-03 07:27:42 +00:00
Guillaume Abrioux	f4f41d62ce	tests: test idempotency only on all_daemons job there's no need to test this on all scenarios. testing idempotency on all_daemons should be enough and allow us to save precious resources for the CI. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `136bfe096c`)	2019-04-03 07:27:42 +00:00
Guillaume Abrioux	64bee9cb86	osd: backward compatibility with old disk_list.sh location Since all files in container image have moved to `/opt/ceph-container` this check must look for new AND the old path so it's backward compatible. Otherwise it could end up by templating an inconsistent `ceph-osd-run.sh`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `987bdac963`)	2019-04-02 11:09:46 +02:00
Guillaume Abrioux	69cda84a21	iscsi-gws: remove a leftover remove leftover introduced by `9d590f4` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d4b3c1d409`)	2019-03-28 15:36:26 +00:00
Guillaume Abrioux	ff243781c5	iscsi: fix permission denied error Typical error: ``` fatal: [iscsi-gw0]: FAILED! => msg: 'an error occurred while trying to read the file ''/home/guits/ceph-ansible/tests/functional/all_daemons/fetch/e5f4ab94-c099-4781-b592-dbd440a9d6f3/iscsi-gateway.key'': [Errno 13] Permission denied: b''/home/guits/ceph-ansible/tests/functional/all_daemons/fetch/e5f4ab94-c099-4781-b592-dbd440a9d6f3/iscsi-gateway.key''' ``` `become: True` is not needed on the following task: `copy crt file(s) to gateway nodes`. Since it's already set in the main playbook (site.yml/site-container.yml) The thing is that the files get generated in the 'fetch_directory' with root user because there is a 'delegate_to' + we run the playbook with `become: True` (from main playbook). The idea here is to create files under ansible user so we can open them later to copy them on the remote machine. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `9d590f4339`)	2019-03-28 15:36:26 +00:00
Guillaume Abrioux	d9895338d0	tests: rename all nodes name remove the 'ceph-' prefix in order to have the same names in all branches. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-28 13:39:54 +00:00
Guillaume Abrioux	9df795abdc	tests: use memory backend for cache fact force ansible to generate facts for each run. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4a1bafdc21`)	2019-03-05 10:06:08 +01:00
Guillaume Abrioux	7c51657c58	tests: remove lvm_batch scenario this scenario doesn't exist in stable-3.1 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-04 16:26:56 +01:00
Guillaume Abrioux	a16ab0cad5	tests: refact all stable-3.1 testing refact the testing on stable-3.1 the same way it has been made for stabe-3.2 and master. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-04 14:44:57 +01:00
Patrick Donnelly	cb92299756	use shortname in keyring path socket.gethostname may return a FQDN. Problem found in Linode. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com> (cherry picked from commit `8cd0308f5f`)	2019-01-30 15:01:04 +01:00
Rishabh Dave	b39345751f	ceph-common: disable unrequired NTP services When one of the currently supported NTP services has been set up, disable rest of the NTP services on Ceph nodes. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1651875 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `6fa757d343`)	2019-01-14 16:37:35 +01:00
Rishabh Dave	ada7a400c2	ceph-common: merge ntp_debian.yml and ntp_rpm.yml Merge ntp_debian.yml and ntp_rpm.yml into one (the new file is called setup_ntp.yml) since they are almost identical. Since this is as a "as it is" backport for the original commit, it also adds the feature of supporting multiple NTP daemons (namely, chronyd & timesyncd). This is to maintain consistency across all branches since the backport for stable-3.2 was auto-merged by mergify despite of conflicts. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `b03ab60742`)	2019-01-14 16:37:35 +01:00
Benjamin Cherian	bb41a7da20	Add support for different NTP daemons Allow user to choose between timesyncd, chronyd and ntpd Installation will default to timesyncd since it is distributed as part of the systemd installation for most distros. Added note indicating NTP daemon type is not used for containerized deployments. Fixes issue #3086 on Github Signed-off-by: Benjamin Cherian <benjamin_cherian@amat.com> (cherry picked from commit `85071e6e53`)	2019-01-14 16:37:35 +01:00
Sébastien Han	c34027c3ba	rolling_update: do not fail on missing keys We don't want to fail on key that are not present since they will get created after the mons are updated. They will be created by the task "create potentially missing keys (rbd and rbd-mirror)". Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650572 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-29 15:50:07 +01:00
Guillaume Abrioux	741ef74629	update: fix a typo `hostvars[groups[mon_host]]['ansible_hostname']` seems to be a typo. That should be `hostvars[mon_host]['ansible_hostname']` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `7c99b6df6d`)	2018-11-26 19:36:30 +00:00
Guillaume Abrioux	9022f83450	rolling_update: refact set_fact `mon_host` each monitor node should select another monitor which isn't itself. Otherwise, one node in the monitor group won't set this fact and causes failure. Typical error: ``` TASK [create potentially missing keys (rbd and rbd-mirror) when mon is containerized] * task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-update_docker_cluster/rolling_update.yml:200 Thursday 22 November 2018 14:02:30 +0000 (0:00:07.493) 0:02:50.005 *** fatal: [mon1]: FAILED! => {} MSG: The task includes an option with an undefined variable. The error was: 'dict object' has no attribute u'mon2' ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `af78173584`)	2018-11-26 19:36:30 +00:00
Sébastien Han	5c9aa5ed66	rolling_update: create rbd and rbd-mirror keyrings During an upgrade ceph won't create keys that were not existing on the previous version. So after the upgrade of let's Jewel to Luminous, once all the monitors have the new version they should get or create the keys. It's ok to have the task fails, especially for the rbd-mirror key, which only appears in Nautilus. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650572 Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `4e267bee4f`)	2018-11-26 19:36:30 +00:00
Sébastien Han	46a2701b5e	ceph_key: add a get_key function When checking if a key exists we also have to ensure that the key exists on the filesystem, the key can change on Ceph but still have an outdated version on the filesystem. This solves this issue. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `691f373543`)	2018-11-26 19:36:30 +00:00
Jairo Llopis	a5aca6ebbc	Fix problem with ceph_key in python3 Pretty basic problem of iteritems removal. Signed-off-by: Jairo Llopis <yajo.sk8@gmail.com> (cherry picked from commit `fc20973c2b`)	2018-10-26 16:23:34 +02:00
Guillaume Abrioux	10403b76e3	tox: fix a typo the line setting `ANSIBLE_CONFIG` obviously contains a typo introduced by `1e283bf69b` `ANSIBLE_CONFIG` has to point to a path only (path to an ansible.cfg) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `a0cceb3e44`)	2018-10-26 16:22:46 +02:00
Sébastien Han	d814644c4a	rolling_update: fix upgrade when using fqdn CLusters that were deployed using 'mon_use_fqdn' have a different unit name, so during the upgrade this must be used otherwise the upgrade will fail, looking for a unit that does not exist. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1597516 Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `44d0da0dd4`)	2018-10-24 12:42:14 +00:00
Guillaume Abrioux	7c9699ad51	tests: do not install lvm2 on atomic host we need to detect whether we are running on atomic host to not try to install lvm2 package. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d2ca24eca8`)	2018-10-16 14:35:08 +02:00
Alfredo Deza	f4a5551bfd	tests: install lvm2 before setting up ceph-volume/LVM tests Signed-off-by: Alfredo Deza <adeza@redhat.com> (cherry picked from commit `3e488e8298`)	2018-10-16 14:35:08 +02:00
Noah Watkins	e089f46607	Stringify ceph_docker_image_tag This could be a numeric input, but is treated like a string leading to runtime errors. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1635823 Signed-off-by: Noah Watkins <nwatkins@redhat.com> (cherry picked from commit `8dcc8d1434`)	2018-10-16 14:35:08 +02:00
Noah Watkins	75c9130865	Avoid using tests as filter Fixes the deprecation warning: [DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of using `result\|search` use `result is search`. Signed-off-by: Noah Watkins <nwatkins@redhat.com> (cherry picked from commit `306e308f13`)	2018-10-16 14:35:08 +02:00
Andy McCrae	ee1b6dd83c	Sync config_template with upstream for Ansible 2.6 The original_basename option in the copy module changed to be _original_basename in Ansible 2.6+, this PR resyncs the config_template module to allow this to work with both Ansible 2.6+ and before. Additionally, this PR removes the _v1_config_template.py file, since ceph-ansible no longer supports versions of Ansible before version 2, and so we shouldn't continue to carry that code. Closes: #2843 Signed-off-by: Andy McCrae <andy.mccrae@gmail.com> (cherry picked from commit `a1b3d5b7c3`)	2018-10-15 22:00:35 +00:00
Sébastien Han	d0b03f6faa	switch: copy initial mon keyring We need to copy this key into /etc/ceph so when ceph-docker-common runs it can fetch it to the ansible server. Previously the task wasn't not failing because `fail_on_missing` was False before 2.5, so now it's True hence the failure. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `bae0f41705`)	2018-10-15 13:59:21 +02:00
Guillaume Abrioux	da05c1fd31	switch: support migration when cluster is scrubbing Similar to `c13a3c3` we must allow scrubbing when running this playbook. In cluster with a large number of PGs, it can be expected some of them scrubbing, it's a normal operation. Preventing from scrubbing operation force to set noscrub flag. This commit allows to switch from non containerized to containerized environment even while PGs are scrubbing. Closes: #3182 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `54b02fe187`)	2018-10-15 13:59:21 +02:00
Guillaume Abrioux	75c2b83e43	defaults: fix osd containers handler `ceph_osd_container_stat` might not be set on other osd node. We must ensure we are on the last node before trying to evaluate `ceph_osd_container_stat`. This should have been backported but it's part of a too important refact in master that can't be backported. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-15 10:33:56 +02:00
Sébastien Han	513608cebe	switch: allow switch big clusters (more than 99 osds) The current regex had a limitation of 99 OSDs, now this limit has been removed and regardless the number of OSDs they will all be collected. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1630430 Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `9fccffa1ca`) (cherry picked from commit `d5e57af23d`)	2018-10-15 10:33:56 +02:00
Guillaume Abrioux	4e4184e579	defaults: fix osd handlers that are never triggered `run_once: true` + `inventory_hostname == groups.get(osd_group_name) \| last` is a bad combination since if the only node being run isn't the last, the task will be definitly skipped. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-03 14:09:39 +00:00
Guillaume Abrioux	ba6c3a8e6b	config: look up for monitor_address_block in hostvars `monitor_address_block` should be read from hostvars[host] instead of current node being played. eg: Let's assume we have: ``` [mons] ceph-mon0 monitor_address=192.168.1.10 ceph-mon1 monitor_interface=eth1 ceph-mon2 monitor_address_block=192.168.1.0/24 ``` the ceph.conf generation task will end up with: ``` fatal: [ceph-mon0]: FAILED! => {} MSG: 'ansible.vars.hostvars.HostVarsVars object' has no attribute u'ansible_interface' ``` the reason is that it will assume `monitor_address_block` isn't defined even on ceph-mon2 because looking for `monitor_address_block` instead of `hostvars[host]['monitor_address_block']`, therefore it enters in the condition as default value: ``` {%- else -%} {% set interface = 'ansible_' + (monitor_interface \| replace('-', '_')) %} {% if ip_version == 'ipv4' -%} {{ hostvars[host][interface][ip_version]['address'] }} {%- elif ip_version == 'ipv6' -%} [{{ hostvars[host][interface][ip_version][0]['address'] }}] {%- endif %} {%- endif %} ``` `monitor_interface` is set with default value `'interface'` so the `interface` variable is built with 'ansible_' + 'interface'. It makes ansible throwing a confusing message about `'ansible_interface'`. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1635303 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `6130bc841d`)	2018-10-02 21:54:09 +00:00
Guillaume Abrioux	79a5725cf6	purge: actually remove of /var/lib/ceph/* `38dc20e74b` introduced a bug in the purge playbooks because using `` in `command` module doesn't work. `/var/lib/ceph/` files are not purged it means there is a leftover. When trying to redeploy a cluster, it failed because monitor daemon was detecting existing keyring, therefore, it assumed a cluster already existed. Typical error (from container output): ``` Sep 26 13:18:16 mon0 docker[31316]: 2018-09-26 13:18:16 /entrypoint.sh: Existing mon, trying to rejoin cluster... Sep 26 13:18:16 mon0 docker[31316]: 2018-09-26 13:18:16.9323937f15b0d74700 -1 auth: unable to find a keyring on /etc/ceph/test.client.admin.keyring,/etc/ceph/test.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,:(2) No such file or directory Sep 26 13:18:23 mon0 docker[31316]: 2018-09-26 13:18:23 /entrypoint.sh: SUCCESS ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1633563 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `144c92b21f`)	2018-09-27 21:42:43 +02:00
Matthew Vernon	0bb13cff08	restart_osd_daemon.sh.j2 - use `+` rather than `{1,}` in regex `+` is more idiomatic for "one or more" in a regex than `{1,}`; the latter was introduced in a previous fix for an incorrect `{1,2}` restriction. Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk> (cherry picked from commit `806461ac6e`)	2018-09-26 21:38:36 +00:00
Matthew Vernon	d701c192e0	restart_osd_daemon.sh.j2 - consider active+clean+* pgs as OK After restarting each OSD, restart_osd_daemon.sh checks that the cluster is in a good state before moving on to the next one. One of the checks it does is that the number of pgs in the state "active+clean" is equal to the total number of pgs in the cluster. On large clusters (e.g. we have 173,696 pgs), it is likely that at least one pg will be scrubbing and/or deep-scrubbing at any one time. These pgs are in state "active+clean+scrubbing" or "active+clean+scrubbing+deep", so the script was erroneously not including them in the "good" count. Similar concerns apply to "active+clean+snaptrim" and "active+clean+snaptrim_wait". Fix this by considering as good any pg whose state contains active+clean. Do this as an integer comparison to num_pgs in pgmap. (could this be backported to at least stable-3.0 please?) Closes: #2008 Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk> (cherry picked from commit `04f4991648`)	2018-09-26 21:38:36 +00:00
Guillaume Abrioux	fdc2d7681d	rolling_update: ensure pgs_by_state has at least 1 entry Previous commit `c13a3c3` has removed a condition. This commit brings back this condition which is essential to ensure we won't hit a false positive result in the `when` condition for the check PGs task. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `179c4d00d7`)	2018-09-26 10:58:51 +00:00
Guillaume Abrioux	f008f40628	upgrade: consider all 'active+clean' states as valid pgs In cluster with a large number of PGs, it can be expected some of them scrubbing, it's a normal operation. Preventing from scrubbing operation force to set noscrub flag before a rolling update which is a problem because it pauses an important data integrity operation until the end of the rolling upgrade. This commit allows an upgrade even while PGs are scrubbing. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616066 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c13a3c3492`)	2018-09-25 14:13:16 +00:00
Giulio Fidente	7d2a13f8c7	Fix version check in ceph.conf template We need to look for ceph_release when comparing with release names, not ceph_version. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1631789 Signed-off-by: Giulio Fidente <gfidente@redhat.com> (cherry picked from commit `6126210e0e`)	2018-09-24 12:32:32 +00:00

1 2 3 4 5 ...

3845 Commits (f463d1838eddc851ad81905dfc8412dcc6953ced) All Branches Search

3845 Commits (f463d1838eddc851ad81905dfc8412dcc6953ced)

All Branches