ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Sébastien Han	2cea33f7fc	rolling_update: default ceph json output to empty dict So we can avoid the following failure: The conditional check 'hostvars[mon_host]['ansible_hostname'] in (ceph_health_raw.stdout \| from_json)["quorum_names"] or hostvars[mon_host]['ansible_fqdn'] in (ceph_health_raw.stdout \| from_json)["quorum_names"] ' failed. The error was: No JSON object could be decoded We just need to set a default, the next iteration will have a more complete json since the command won't fail. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-29 01:49:05 +00:00
Guillaume Abrioux	292d967d2f	update: fix a typo `hostvars[groups[mon_host]]['ansible_hostname']` seems to be a typo. That should be `hostvars[mon_host]['ansible_hostname']` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `7c99b6df6d`)	2018-11-29 01:49:05 +00:00
Guillaume Abrioux	1f4cf61058	rolling_update: refact set_fact `mon_host` each monitor node should select another monitor which isn't itself. Otherwise, one node in the monitor group won't set this fact and causes failure. Typical error: ``` TASK [create potentially missing keys (rbd and rbd-mirror) when mon is containerized] * task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-update_docker_cluster/rolling_update.yml:200 Thursday 22 November 2018 14:02:30 +0000 (0:00:07.493) 0:02:50.005 *** fatal: [mon1]: FAILED! => {} MSG: The task includes an option with an undefined variable. The error was: 'dict object' has no attribute u'mon2' ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `af78173584`)	2018-11-29 01:49:05 +00:00
Sébastien Han	d4f1f12bd0	rolling_update: create rbd and rbd-mirror keyrings During an upgrade ceph won't create keys that were not existing on the previous version. So after the upgrade of let's Jewel to Luminous, once all the monitors have the new version they should get or create the keys. It's ok to have the task fails, especially for the rbd-mirror key, which only appears in Nautilus. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650572 Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `4e267bee4f`)	2018-11-29 01:49:05 +00:00
Sébastien Han	26ea96424c	switch: do not look for devices anymore It's easier lookup a directoriy instead of the block devices, especially because of ceph-volume and ceph-disk have a different way to handle devices. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `c14f9b78ff`)	2018-11-29 00:31:47 +01:00
Sébastien Han	57ac7b94c0	switch: disable all ceph units Prior to this commit we were only disabling ceph-osd units, but forgot the ceph.target which is controlling everything and will restart the ceph-osd units at each reboot. Now that everything gets disabled there won't be any conflicts between the old non-container and the new container units. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `cd56dad9fa`)	2018-11-29 00:31:47 +01:00
Sébastien Han	8d0379b4d9	switch: do not mask systemd unit If we mask it we won't be able to start the OSD container since now the osd container use the osd ID as a name such as: ceph-osd@0 Fixes the error: Failed to execute operation: Cannot send after transport endpoint shutdown Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `fe1d09925a`)	2018-11-29 00:31:47 +01:00
Guillaume Abrioux	b72d806f4c	mgr: fix mgr keyring error on rolling_update when upgrading from RHCS 2.5 to 3.2, it fails because the task `create ceph mgr keyring(s) when mon is containerized` has a when condition `inventory_hostname == groups[mon_group_name]\|last`. First, this is incorrect because `inventory_hostname` is referring to a mgr node, it means this condition would have never been satisfied. Then, this condition + `serial: 1` makes the mgr keyring creating skipped on the first node. Further, the `ceph-mgr` role tries to copy the mgr keyring (it's not aware we are running `serial: 1`) this leads to a failure like the following: ``` TASK [ceph-mgr : copy ceph keyring(s) if needed] ************************************************************************************************************************************************************************************************************************************************************************* task path: /usr/share/ceph-ansible/roles/ceph-mgr/tasks/common.yml:10 Tuesday 27 November 2018 12:03:34 +0000 (0:00:00.296) 0:11:01.290 **** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AnsibleFileNotFound: Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring' failed: [magna021] (item={u'dest': u'/var/lib/ceph/mgr/local-magna021/keyring', u'name': u'/etc/ceph/local.mgr.magna021.keyring', u'copy_key': True}) => {"changed": false, "item": {"copy_key": true, "dest": "/var/lib/ceph/mgr/local-magna021/keyring", "name": "/etc/ceph/local.mgr.magna021.keyring"}, "msg": "Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'"} ``` The ceph_key module is idempotent, so there is no need to have such a condition. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1649957 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `73287f91bc`)	2018-11-28 23:11:46 +01:00
Rishabh Dave	a74f4204cd	remove configuration files for ceph packages on ubuntu clusters For apt-get, purge command needs to be used, instead of remove command, to remove related configuration files. Otherwise, packages might be shown as installed while running dpkg command even after removing them. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1640061 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `640cad3fd8`)	2018-11-09 16:50:25 +01:00
Mike Christie	77de54025b	igw: stop tcmu-runner on iscsi purge When the iscsi purge playbook is run we stop the gw and api daemons but not tcmu-runner which I forgot on the previous PR. Fixes Red Hat BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1621255 Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `b523a44a1a`)	2018-11-09 16:50:04 +01:00
Ali Maredia	219fa8f919	infrastructure playbooks: ensure nvme_device is defined in lv-create.yml Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-10-29 08:41:42 +00:00
Mike Christie	0904860032	igw: stop daemons on purge all calls When purging the entire igw config (lio and rbd) stop disable the api and gw daemons. Fixes Red Hat BZ https://bugzilla.redhat.com/show_bug.cgi?id=1621255 Signed-off-by: Mike Christie <mchristi@redhat.com>	2018-10-25 12:59:18 +02:00
Sébastien Han	44d0da0dd4	rolling_update: fix upgrade when using fqdn CLusters that were deployed using 'mon_use_fqdn' have a different unit name, so during the upgrade this must be used otherwise the upgrade will fail, looking for a unit that does not exist. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1597516 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-19 13:06:56 +00:00
Guillaume Abrioux	b8418ebd17	add-osds: followup on `3632b26` Three fixes: - fix a typo in vagrant_variables that cause a networking issue for containerized scenario. - add containerized_deployment: true - remove a useless block of code: the fact docker_exec_cmd is set in ceph-defaults which is played right after. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-17 17:07:25 +02:00
Sébastien Han	d6e79044ef	infra: add a gather-ceph-logs.yml playbook Add a gather-ceph-logs.yml which will log onto all the machines from your inventory and will gather ceph logs. This is not intended to work on containerized environments since the logs are stored in journald. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1582280 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-17 13:52:19 +00:00
Sébastien Han	fbd878c8d5	infra: rename osd-configure to add-osd and improve it The playbook has various improvements: * run ceph-validate role before doing anything * run ceph-fetch-keys only on the first monitor of the inventory list * set noup flag so PGs get distributed once all the new OSDs have been added to the cluster and unset it when they are up and running Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1624962 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-17 11:26:11 +00:00
Guillaume Abrioux	40b7747af7	remove jewel support As of now, we should no longer support Jewel in ceph-ansible. The latest ceph-ansible release supporting Jewel is `stable-3.1`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-12 23:38:17 +00:00
Sébastien Han	9fccffa1ca	switch: allow switch big clusters (more than 99 osds) The current regex had a limitation of 99 OSDs, now this limit has been removed and regardless the number of OSDs they will all be collected. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1630430 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-10 16:35:30 -04:00
Noah Watkins	8dcc8d1434	Stringify ceph_docker_image_tag This could be a numeric input, but is treated like a string leading to runtime errors. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1635823 Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-10-10 04:26:33 +00:00
Noah Watkins	306e308f13	Avoid using tests as filter Fixes the deprecation warning: [DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of using `result\|search` use `result is search`. Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-10-10 04:26:33 +00:00
Guillaume Abrioux	79bd06ad28	rolling_update: add ceph-handler role since the introduction of ceph-handler, it has to be added in rolling_update playbook as well Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-05 13:48:04 +00:00
Rishabh Dave	b5d2ea269f	don't use "static" field while including tasks Instead used "import_tasks" and "include_tasks" to tell whether tasks must be included statically or dynamically. Fixes: https://github.com/ceph/ceph-ansible/issues/2998 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-10-04 07:44:28 +00:00
Sébastien Han	bae0f41705	switch: copy initial mon keyring We need to copy this key into /etc/ceph so when ceph-docker-common runs it can fetch it to the ansible server. Previously the task wasn't not failing because `fail_on_missing` was False before 2.5, so now it's True hence the failure. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-03 13:58:53 +00:00
Guillaume Abrioux	03e76af7b4	switch: add missing call to ceph-handler role Add missing call the ceph-handler role, otherwise we can't have reference to variable registered from ceph-handler from other roles. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-03 13:58:53 +00:00
Guillaume Abrioux	54b02fe187	switch: support migration when cluster is scrubbing Similar to `c13a3c3` we must allow scrubbing when running this playbook. In cluster with a large number of PGs, it can be expected some of them scrubbing, it's a normal operation. Preventing from scrubbing operation force to set noscrub flag. This commit allows to switch from non containerized to containerized environment even while PGs are scrubbing. Closes: #3182 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-03 13:58:53 +00:00
Andrew Schoen	9747f3dbd5	purge-cluster: zap devices used with the lvm scenario Fixes: https://github.com/ceph/ceph-ansible/issues/3156 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-09-28 14:49:56 +02:00
wumingqiao	5da71e1ca1	purge-cluster: recursively remove ceph-related files, symlinks and directories under /etc/systemd/system. fix: https://github.com/ceph/ceph-ansible/issues/3166 Signed-off-by: wumingqiao <wumingqiao@beyondcent.com>	2018-09-28 14:49:22 +02:00
Rishabh Dave	380168dadc	don't use "include" to include tasks Use "import_tasks" or "include_tasks" instead. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-09-27 17:53:40 +02:00
Guillaume Abrioux	144c92b21f	purge: actually remove of /var/lib/ceph/* `38dc20e74b` introduced a bug in the purge playbooks because using `` in `command` module doesn't work. `/var/lib/ceph/` files are not purged it means there is a leftover. When trying to redeploy a cluster, it failed because monitor daemon was detecting existing keyring, therefore, it assumed a cluster already existed. Typical error (from container output): ``` Sep 26 13:18:16 mon0 docker[31316]: 2018-09-26 13:18:16 /entrypoint.sh: Existing mon, trying to rejoin cluster... Sep 26 13:18:16 mon0 docker[31316]: 2018-09-26 13:18:16.9323937f15b0d74700 -1 auth: unable to find a keyring on /etc/ceph/test.client.admin.keyring,/etc/ceph/test.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,:(2) No such file or directory Sep 26 13:18:23 mon0 docker[31316]: 2018-09-26 13:18:23 /entrypoint.sh: SUCCESS ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1633563 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-27 17:45:21 +02:00
Guillaume Abrioux	179c4d00d7	rolling_update: ensure pgs_by_state has at least 1 entry Previous commit `c13a3c3` has removed a condition. This commit brings back this condition which is essential to ensure we won't hit a false positive result in the `when` condition for the check PGs task. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-25 14:58:54 +00:00
Guillaume Abrioux	c13a3c3492	upgrade: consider all 'active+clean' states as valid pgs In cluster with a large number of PGs, it can be expected some of them scrubbing, it's a normal operation. Preventing from scrubbing operation force to set noscrub flag before a rolling update which is a problem because it pauses an important data integrity operation until the end of the rolling upgrade. This commit allows an upgrade even while PGs are scrubbing. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616066 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-25 12:12:06 +00:00
Guillaume Abrioux	57f0b6a476	shrink-osd: follow up on `36fb3cde` - Adds loop in bash to satisfy the 1:n relation between `osd_hosts` and the different device lists. - Fixes some container name which were using the host hostname instead of the actual container one. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-18 07:27:41 +00:00
Sébastien Han	735e1917db	shrink-osd: purge dedicated devices Once the OSD is destroyed we also have to purge the associated devices, this means purging journal, db , wal partitions too. This now works for container and non-container. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1572933 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-09-18 07:27:41 +00:00
Guillaume Abrioux	4159326a18	shrink-osd: fix purge osd on containerized deployment `ce1dd8d` introduced the purge osd on containers but it was incorrect. `resolve parent device` and `zap ceph osd disks` tasks must be delegated to their respective OSD nodes. Indeed, they were run on the ansible node, it means it was trying to resolve parent devices from this node where it should be done on OSD nodes. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1612095 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-13 18:14:01 +02:00
Sébastien Han	38dc20e74b	purge: only purge /var/lib/ceph content Sometime /var/lib/ceph is mounted on a device so we won't be able to remove it (device busy) so let's remove its content only. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1615872 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-09-03 10:51:24 +02:00
Ali Maredia	561ec9203d	infrastructure-playbooks: add comments for lv_vars.yml Add comments telling user that devices used in playbooks must not have GPT/FS/RAID signatures Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-08-29 21:10:20 +00:00
Ali Maredia	77eb459a88	infrastructure playbooks: remove lv-create error msg remove error message when PV creation fails Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-08-29 21:10:20 +00:00
Ali Maredia	e1ff438800	infrastructure-playbooks: failure msg for pvcreate Add a message for when PV creation fails. This message alerts users that FS/GPT/RAID signatures could still on the device and the reason for the failures. `wipefs -a $device` needs to be run to fix this issue. Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-08-28 20:21:42 +00:00
Sébastien Han	2e6e885bb7	rolling_upgrade: set sortbitwise properly Running 'osd set sortbitwise' when we detect a version 12 of Ceph is wrong. When OSD are getting updated, even though the package is updated they won't send their updated version (12) and will stick with 10 if the command is not applied. So we have to check if OSD are sending a version 10 and then run the command to unlock the OSDs. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-21 12:22:32 +00:00
Sébastien Han	77a3a682f3	iscsi group name preserve backward compatibility Recently we renamed the group_name for iscsi iscsigws where previously it was named iscsi-gws. Existing deployments with a host file section with iscsi-gws must continue to work. This commit adds the old group name as a backoward compatility, no error from Ansible should be expected, if the hostgroup is not found nothing is played. Close: https://bugzilla.redhat.com/show_bug.cgi?id=1619167 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-20 23:52:19 +02:00
Sébastien Han	b738706810	take-over-existing-cluster: do not call var_files We were using var_files long ago when default variables were not in ceph-defaults, now the role exists this is not need. Moreover having these two var files added: - roles/ceph-defaults/defaults/main.yml - group_vars/all.yml Will create collision and override necessary variables. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1555305 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-20 14:47:04 +02:00
Andrew Schoen	04df3f0802	lv-create: use copy instead of the template module The copy module does in fact do variable interpolation so we do not need to use the template module or keep a template in the source. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	131796f275	lv-create: add an example logfile_path config option in lv_vars.yml Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	b0bfc17351	lv-teardown: fail silently if lv_vars.yml is not found This allows user to opt out of using lv_vars.yml and load configuration from other sources. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	8424858b40	lv-teardown: set become: true at the playbook level Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	e43eec57bb	lv-create: fail silenty if lv_vars.yml is not found If a user decides to to use the lv_vars.yml file then it should fail silenty so that configuration can be picked up from other places. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	fde47be13c	lv-create: set become: true at the playbook level Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	35301b35af	lv-create: use the template module to write log file The copy module will not expand the template and render the variables included, so we must use template. Creating a temp file and using it locally means that you must run the playbook with sudo privledges, which I don't think we want to require. This introduces a logfile_path variable that the user can use to control where the logfile is written to, defaulting to the cwd. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Neha Ojha	909b38da82	infrastructure-playbooks/vars/lv_vars.yaml: minor fixes Signed-off-by: Neha Ojha <nojha@redhat.com>	2018-08-16 16:38:23 +02:00
Neha Ojha	f65f3ea89f	infrastructure-playbooks/lv-create.yml: use tempfile to create logfile Signed-off-by: Neha Ojha <nojha@redhat.com>	2018-08-16 16:38:23 +02:00

1 2 3 4 5 ...

338 Commits (fa8bd10cac7253c83fc1cfb4c00ed6eb1bb47e98)