ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Dimitri Savineau	81de8a8106	remove ceph-agent role and references The ceph-agent role was used only for RHCS 2 (jewel) so it's not usefull anymore. The current code will fail on CentOS distribution because the rhscon package is only avaible on Red Hat with the RHCS 2 repository and this ceph release is supported on stable-3.0 branch. Resolves: #4020 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `7503098ca0`)	2019-06-17 14:42:08 -04:00
Mike Christie	0a24078bbb	igw: Fix rolling update service ordering We must stop tcmu-runner after the other rbd-target-* services because they may need to interact with tcmu-runner during shutdown. There is also a bug in some kernels where IO can get stuck in the kernel and by stopping rbd-target-* first we can make sure all IO is flushed. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1659611 Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `d7ef12910e`)	2019-05-10 11:12:50 +02:00
Guillaume Abrioux	f1b4874176	Revert "Revert "shrink_osd: use cv zap by fsid to remove parts/lvs"" This reverts commit `043ee8c158`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-05-10 09:13:10 +02:00
Guillaume Abrioux	043ee8c158	Revert "shrink_osd: use cv zap by fsid to remove parts/lvs" This reverts commit `be59e0b451`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-25 21:27:37 +02:00
Dimitri Savineau	9ff19cc604	rolling_update: restart all ceph-iscsi services Currently only rbd-target-gw service is restarted during an update. We also need to restart tcmu-runner and rbd-target-api services during the ceph iscsi upgrade. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1659611 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `f1048627ea`)	2019-04-24 23:17:41 +00:00
Guillaume Abrioux	c5c354a61a	remove all NBSPs char in stable-3.2 branch this can cause issues, let's replace all of these chars with real spaces. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-04-10 13:27:48 +02:00
Guillaume Abrioux	7136f1734e	purge: fix lvm-batch purge osd `lvm_volumes` and/or `devices` variable(s) can be undefined depending on the scenario chosen. These tasks should be run only if these variable are defined, otherwise it ends up with undefined variable errors. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `0180738313`)	2019-04-03 08:48:39 +02:00
Dimitri Savineau	fa6d9c940a	rolling_update: Update systemd unit regex for nvme The systemd unit regex doesn't handle nvme devices (/dev/nvmeXn1). Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1687828 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `c8442f3705`)	2019-04-01 15:22:24 +00:00
Dimitri Savineau	8e2cfd9d24	purge-docker-cluster: Remove ceph-osd service The systemd ceph-osd@.service file used for starting the ceph osd containers is used in all osd_scenarios. Currently purging a containerized deployment using the lvm scenario didn't remove the ceph-osd systemd service. If the next deployment is a non-containerized deployment, the OSDs won't be online because the file is still present and override the one from the package. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `7cc626b72d`)	2019-04-01 09:10:29 +00:00
Dimitri Savineau	ef9525482b	add-osd.yml: Add become flag for ceph-validate The check_devices task fails if the ceph-validate role isn't executed as a privileged user (Permission denied). failed: [osd0] (item=/dev/sdb) => {"changed": false, "err": "Error: Error opening /dev/sdb: Permission denied\n", "item": "/dev/sdb", "msg": "Error while getting device information with parted script: '/sbin/parted -s -m /dev/sdb -- unit 'MiB' print'", "out": "", "rc": 1} Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `b23c05ae52`)	2019-03-12 14:48:03 +01:00
Guillaume Abrioux	4dd46ec396	add-osd: gather facts in second part of playbook otherwise, it will end up with error like following: ``` FAILED! => {"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_hostname'"} ``` because facts won't have been gathered. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1670663 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `a440878533`)	2019-03-04 15:48:44 +00:00
Guillaume Abrioux	06ad7e0b57	purge: fix rbd-mirror group name the default is rbdmirrors in ceph-defaults Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `47ebef374f`)	2019-03-01 22:16:19 +00:00
Guillaume Abrioux	a8467d8f33	purge: fix rbd mirror purge as of `b70d54ac80` the service launched isn't ceph-rbd-mirror@admin.service. it's now `ceph-rbd-mirror@rbd-mirror.{{ ansible_hostname }}` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `a915308477`)	2019-03-01 22:16:19 +00:00
Guillaume Abrioux	5470e6fa42	purge: do not remove /var/lib/apt/lists/* removing the content of this directory seems a bit agressive and cause a redeployment to fail after a purge on debian based distrubition. Typical error: ``` fatal: [mon0]: FAILED! => changed=false attempts: 3 msg: No package matching 'ceph' is available ``` The following task will consider the cache is still valid, so apt doesn't refresh it: ``` - name: update apt cache if cache_valid_time has expired apt: update_cache: yes cache_valid_time: 3600 register: result until: result is succeeded ``` since the task installing ceph packages has a `update_cache: no` it fails: ``` - name: install ceph for debian apt: name: "{{ debian_ceph_pkgs \| unique }}" update_cache: no state: "{{ (upgrade_ceph_packages\|bool) \| ternary('latest','present') }}" default_release: "{{ ceph_stable_release_uca \| default('') }}{{ ansible_distribution_release ~ '-backports' if ceph_origin == 'distro' and ceph_use_distro_backports else '' }}" register: result until: result is succeeded ``` /tmp/* isn't specific to ceph as well, so we shouldn't remove everything in this directory. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3849f30f58`)	2019-03-01 22:16:19 +00:00
Guillaume Abrioux	255eab59ac	purge: fix purge of lvm devices using `shell` module seems to be the only way to make this task working on rhel based distribution AND debian based distributions. on ubuntu, using `command` ansible module fails like following (not due to `sudo` usage or not): ``` ok: [osd1] => changed=false cmd: command -v ceph-volume failed_when_result: false msg: '[Errno 2] No such file or directory: ''command'': ''command''' rc: 2 ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `89f77589fa`)	2019-03-01 22:16:19 +00:00
Noah Watkins	be59e0b451	shrink_osd: use cv zap by fsid to remove parts/lvs Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1569413 https://bugzilla.redhat.com/show_bug.cgi?id=1572933 Note: rebased Signed-off-by: Noah Watkins <noahwatkins@gmail.com> (cherry picked from commit `9a43674d2e`)	2019-02-06 00:37:11 +00:00
Noah Watkins	b8c39d7613	Add a ceph-volume aware shrink-osd playbook Signed-off-by: Noah Watkins <nwatkins@redhat.com> (cherry picked from commit `f5dacbf7de`)	2019-01-30 14:58:59 +01:00
Noah Watkins	8f57a95048	Rename ceph-disk version of shrink-osd playbook This will be replaced by a ceph-volume aware verison. Signed-off-by: Noah Watkins <nwatkins@redhat.com> (cherry picked from commit `0782cfc546`)	2019-01-30 14:58:59 +01:00
Giulio Fidente	75855b2d58	Preserve rolling_update backward compatibility with ansible < 2.5 Let's enforce the default value for `client_update_batch` to 20 since `ansible_forks` isn't always available. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650184 Signed-off-by: Giulio Fidente <gfidente@redhat.com> (cherry picked from commit `ff8dbe114c`)	2019-01-21 14:28:07 +00:00
Sébastien Han	04d8002614	switch: do not fail on missing key Some people use the switch playbook to perform upgrade so they end up in the same situation than https://bugzilla.redhat.com/show_bug.cgi?id=1650572 This is applying the same fix as `729744c6a8`. We don't want to fail on key that are not present since they will get created after the mons are updated. They will be created by the task "create potentially missing keys (rbd and rbd-mirror)". Signed-off-by: Sébastien Han <seb@redhat.com>	2019-01-14 18:54:46 +00:00
Guillaume Abrioux	416b503476	introduce new role ceph-facts sometimes we play the whole role `ceph-defaults` just to access the default value of some variables. It means we play the `facts.yml` part in this role while it's not desired. Splitting this role will speedup the playbook. Closes: #3282 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `0eb56e36f8`)	2019-01-07 09:14:10 +01:00
Guillaume Abrioux	c3bb76b8e9	purge-container: move facts gathering after ceph-defaults role import This task has to be called after the role `ceph-defaults` has been played, otherwise, `mon_group_name` will never be known. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `a12de3e048`)	2019-01-07 09:14:10 +01:00
Guillaume Abrioux	b9bf7c6703	purge-container: fix wrong syntax we want a default value for `mon_group_name`, not for `groups[mon_group_name]`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d0b3cb7f85`)	2019-01-07 09:14:10 +01:00
Guillaume Abrioux	0ff1260fc1	purge-docker: do not call ceph-osd role calling ceph-osd role in purge playbook is not needed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `ae7f3d66a6`)	2019-01-07 09:14:10 +01:00
Guillaume Abrioux	c405fd1140	purge: gather monitors facts in OSD purge the OSD part of the purge delegates commands on monitor node, we need to gather monitors facts to know the `ansible_hostname` fact that is used in the `docker_exec_cmd` fact. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `1a4a6ec855`)	2019-01-07 09:14:10 +01:00
Sébastien Han	37ba313d76	purge-container: gather fact before calling ceph-defaults ceph-defaults relies on facts so we must gather facts before running it. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `62111ff53c`)	2019-01-07 09:14:10 +01:00
Sébastien Han	8e83ecfce1	purge-cluster: add support for mon/mgr collocation Recently we introduced the default collocation of mon/mgr without the need of a dedicated mgrs section. This means we have to stop the mgr process on that machine too. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `fc6ebd8ebb`)	2019-01-07 09:14:10 +01:00
Sébastien Han	12d6466582	purge-cluster: remove support for other init system We only support systemd and use the service module anyway. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `3a154fa0ad`)	2019-01-07 09:14:10 +01:00
Sébastien Han	782959f094	purge-docker-cluster: add support for mgr/mon collocation Recently we introduced the collocation of mon and mgr by default, so we don't need to have an explicit mgrs section for this. This means we have to remove the mgr container on the mon machines too. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `325a159415`) # Conflicts: # infrastructure-playbooks/purge-docker-cluster.yml	2019-01-07 09:14:10 +01:00
Sébastien Han	8ce8d580a4	purge-docker-cluste: add a task to check hosts It's useful when running on CI to see what might remain on the machines. So we list all the containers and images. We expect the list to be empty. We fail if we see containers running. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `2bcc00896f`)	2019-01-07 09:14:10 +01:00
Sébastien Han	f37c21a9d0	purge-docker-cluster: add ceph-volume support This commits adds the support for purging cluster that were deployed with ceph-volume. It also separates nicely with a block intruction the work to do when lvm is used or not. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `1751885bc9`)	2019-01-07 09:14:10 +01:00
Sébastien Han	668c7a4db7	fix json data type Json is a type structure which is always typed as a string, where before this we were declaring a dict, which is not a json valid structure. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1663026 Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `896676ee80`)	2019-01-04 12:02:34 +01:00
Guillaume Abrioux	dc02156736	update: do not enforce `serial: 1` on client nodes There is no need to enforce `serial: 1` on client nodes. Let's make it parameterizable by introducing a new extra variable `client_update_batch`, if not filled this will default to `{{ ansible_forks }}`. NOTE: this is only usable as an extra variable passed with `-e client_update_batch=<num>` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650184 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `268f2cef82`)	2019-01-04 11:59:02 +01:00
Andrew Schoen	e55ec6c0f5	purge-cluster: skip tasks that use ceph-volume if it's not installed This will allow the playbook to be idempotent. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1656935 Signed-off-by: Andrew Schoen <aschoen@redhat.com> (cherry picked from commit `ffd56177e7`)	2018-12-20 14:03:30 +01:00
Guillaume Abrioux	e37a90b5ec	purge: add iscsi support add iscsi support for both non containerized and containerized deployment in purge playbooks. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1651054 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `78116fa6db`)	2018-12-04 18:04:13 +01:00
Ramana Raja	0ec2ac34e3	rolling_update: fail if less than 3 MONs ... for non-containerized deployments as well. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1655470 Signed-off-by: Ramana Raja <rraja@redhat.com> (cherry picked from commit `cb784c601d`)	2018-12-04 16:34:57 +01:00
Sébastien Han	2cea33f7fc	rolling_update: default ceph json output to empty dict So we can avoid the following failure: The conditional check 'hostvars[mon_host]['ansible_hostname'] in (ceph_health_raw.stdout \| from_json)["quorum_names"] or hostvars[mon_host]['ansible_fqdn'] in (ceph_health_raw.stdout \| from_json)["quorum_names"] ' failed. The error was: No JSON object could be decoded We just need to set a default, the next iteration will have a more complete json since the command won't fail. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-29 01:49:05 +00:00
Guillaume Abrioux	292d967d2f	update: fix a typo `hostvars[groups[mon_host]]['ansible_hostname']` seems to be a typo. That should be `hostvars[mon_host]['ansible_hostname']` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `7c99b6df6d`)	2018-11-29 01:49:05 +00:00
Guillaume Abrioux	1f4cf61058	rolling_update: refact set_fact `mon_host` each monitor node should select another monitor which isn't itself. Otherwise, one node in the monitor group won't set this fact and causes failure. Typical error: ``` TASK [create potentially missing keys (rbd and rbd-mirror) when mon is containerized] * task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-update_docker_cluster/rolling_update.yml:200 Thursday 22 November 2018 14:02:30 +0000 (0:00:07.493) 0:02:50.005 *** fatal: [mon1]: FAILED! => {} MSG: The task includes an option with an undefined variable. The error was: 'dict object' has no attribute u'mon2' ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `af78173584`)	2018-11-29 01:49:05 +00:00
Sébastien Han	d4f1f12bd0	rolling_update: create rbd and rbd-mirror keyrings During an upgrade ceph won't create keys that were not existing on the previous version. So after the upgrade of let's Jewel to Luminous, once all the monitors have the new version they should get or create the keys. It's ok to have the task fails, especially for the rbd-mirror key, which only appears in Nautilus. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650572 Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `4e267bee4f`)	2018-11-29 01:49:05 +00:00
Sébastien Han	26ea96424c	switch: do not look for devices anymore It's easier lookup a directoriy instead of the block devices, especially because of ceph-volume and ceph-disk have a different way to handle devices. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `c14f9b78ff`)	2018-11-29 00:31:47 +01:00
Sébastien Han	57ac7b94c0	switch: disable all ceph units Prior to this commit we were only disabling ceph-osd units, but forgot the ceph.target which is controlling everything and will restart the ceph-osd units at each reboot. Now that everything gets disabled there won't be any conflicts between the old non-container and the new container units. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `cd56dad9fa`)	2018-11-29 00:31:47 +01:00
Sébastien Han	8d0379b4d9	switch: do not mask systemd unit If we mask it we won't be able to start the OSD container since now the osd container use the osd ID as a name such as: ceph-osd@0 Fixes the error: Failed to execute operation: Cannot send after transport endpoint shutdown Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit `fe1d09925a`)	2018-11-29 00:31:47 +01:00
Guillaume Abrioux	b72d806f4c	mgr: fix mgr keyring error on rolling_update when upgrading from RHCS 2.5 to 3.2, it fails because the task `create ceph mgr keyring(s) when mon is containerized` has a when condition `inventory_hostname == groups[mon_group_name]\|last`. First, this is incorrect because `inventory_hostname` is referring to a mgr node, it means this condition would have never been satisfied. Then, this condition + `serial: 1` makes the mgr keyring creating skipped on the first node. Further, the `ceph-mgr` role tries to copy the mgr keyring (it's not aware we are running `serial: 1`) this leads to a failure like the following: ``` TASK [ceph-mgr : copy ceph keyring(s) if needed] ************************************************************************************************************************************************************************************************************************************************************************* task path: /usr/share/ceph-ansible/roles/ceph-mgr/tasks/common.yml:10 Tuesday 27 November 2018 12:03:34 +0000 (0:00:00.296) 0:11:01.290 **** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AnsibleFileNotFound: Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring' failed: [magna021] (item={u'dest': u'/var/lib/ceph/mgr/local-magna021/keyring', u'name': u'/etc/ceph/local.mgr.magna021.keyring', u'copy_key': True}) => {"changed": false, "item": {"copy_key": true, "dest": "/var/lib/ceph/mgr/local-magna021/keyring", "name": "/etc/ceph/local.mgr.magna021.keyring"}, "msg": "Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'"} ``` The ceph_key module is idempotent, so there is no need to have such a condition. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1649957 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `73287f91bc`)	2018-11-28 23:11:46 +01:00
Rishabh Dave	a74f4204cd	remove configuration files for ceph packages on ubuntu clusters For apt-get, purge command needs to be used, instead of remove command, to remove related configuration files. Otherwise, packages might be shown as installed while running dpkg command even after removing them. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1640061 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `640cad3fd8`)	2018-11-09 16:50:25 +01:00
Mike Christie	77de54025b	igw: stop tcmu-runner on iscsi purge When the iscsi purge playbook is run we stop the gw and api daemons but not tcmu-runner which I forgot on the previous PR. Fixes Red Hat BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1621255 Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `b523a44a1a`)	2018-11-09 16:50:04 +01:00
Ali Maredia	219fa8f919	infrastructure playbooks: ensure nvme_device is defined in lv-create.yml Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-10-29 08:41:42 +00:00
Mike Christie	0904860032	igw: stop daemons on purge all calls When purging the entire igw config (lio and rbd) stop disable the api and gw daemons. Fixes Red Hat BZ https://bugzilla.redhat.com/show_bug.cgi?id=1621255 Signed-off-by: Mike Christie <mchristi@redhat.com>	2018-10-25 12:59:18 +02:00
Sébastien Han	44d0da0dd4	rolling_update: fix upgrade when using fqdn CLusters that were deployed using 'mon_use_fqdn' have a different unit name, so during the upgrade this must be used otherwise the upgrade will fail, looking for a unit that does not exist. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1597516 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-19 13:06:56 +00:00
Guillaume Abrioux	b8418ebd17	add-osds: followup on `3632b26` Three fixes: - fix a typo in vagrant_variables that cause a networking issue for containerized scenario. - add containerized_deployment: true - remove a useless block of code: the fact docker_exec_cmd is set in ceph-defaults which is played right after. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-17 17:07:25 +02:00

1 2 3 4 5 ...

374 Commits (27aad7347154f1ef1c39832c09a736de6c3241aa)