ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	a12de3e048	purge-container: move facts gathering after ceph-defaults role import This task has to be called after the role `ceph-defaults` has been played, otherwise, `mon_group_name` will never be known. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-12-12 16:50:24 +00:00
Guillaume Abrioux	d0b3cb7f85	purge-container: fix wrong syntax we want a default value for `mon_group_name`, not for `groups[mon_group_name]`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-12-12 11:33:57 +01:00
Guillaume Abrioux	0eb56e36f8	introduce new role ceph-facts sometimes we play the whole role `ceph-defaults` just to access the default value of some variables. It means we play the `facts.yml` part in this role while it's not desired. Splitting this role will speedup the playbook. Closes: #3282 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-12-12 11:18:01 +01:00
Guillaume Abrioux	ae7f3d66a6	purge-docker: do not call ceph-osd role calling ceph-osd role in purge playbook is not needed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-12-11 09:59:25 +01:00
Guillaume Abrioux	1a4a6ec855	purge: gather monitors facts in OSD purge the OSD part of the purge delegates commands on monitor node, we need to gather monitors facts to know the `ansible_hostname` fact that is used in the `docker_exec_cmd` fact. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-12-11 09:59:25 +01:00
Sébastien Han	62111ff53c	purge-container: gather fact before calling ceph-defaults ceph-defaults relies on facts so we must gather facts before running it. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-11 09:59:25 +01:00
Sébastien Han	fc6ebd8ebb	purge-cluster: add support for mon/mgr collocation Recently we introduced the default collocation of mon/mgr without the need of a dedicated mgrs section. This means we have to stop the mgr process on that machine too. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-11 09:59:25 +01:00
Sébastien Han	3a154fa0ad	purge-cluster: remove support for other init system We only support systemd and use the service module anyway. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-11 09:59:25 +01:00
Sébastien Han	325a159415	purge-docker-cluster: add support for mgr/mon collocation Recently we introduced the collocation of mon and mgr by default, so we don't need to have an explicit mgrs section for this. This means we have to remove the mgr container on the mon machines too. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-11 09:59:25 +01:00
Sébastien Han	2bcc00896f	purge-docker-cluste: add a task to check hosts It's useful when running on CI to see what might remain on the machines. So we list all the containers and images. We expect the list to be empty. We fail if we see containers running. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-11 09:59:25 +01:00
Sébastien Han	1751885bc9	purge-docker-cluster: add ceph-volume support This commits adds the support for purging cluster that were deployed with ceph-volume. It also separates nicely with a block intruction the work to do when lvm is used or not. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-11 09:59:25 +01:00
Rishabh Dave	2fb12ae554	use pre_tasks and post_tasks when necessary Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-12-05 08:17:10 +00:00
Rishabh Dave	e4f0af2b78	don't use private option for import_role Since sharing variables amongst roles has been made default since Ansible 2.6, private option has been deprecated; so stop using it. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-12-04 23:45:59 +00:00
Ramana Raja	cb784c601d	rolling_update: fail if less than 3 MONs ... for non-containerized deployments as well. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1655470 Signed-off-by: Ramana Raja <rraja@redhat.com>	2018-12-04 14:28:49 +00:00
Sébastien Han	896676ee80	fix json data type Json is a type structure which is always typed as a string, where before this we were declaring a dict, which is not a json valid structure. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-04 12:34:54 +01:00
Guillaume Abrioux	78116fa6db	purge: add iscsi support add iscsi support for both non containerized and containerized deployment in purge playbooks. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1651054 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-12-03 17:35:21 +01:00
Sébastien Han	1c760904b0	site: collocated mon and mgr by default This will speed up the deployment and also deploy mon and mgr collocated just as recommended. This won't prevent you of adding more and dedicaded machines for mgr if needed. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-03 14:39:43 +01:00
Sébastien Han	bb7bfca113	rolling-update: remove old condition This failure condition was only valid at the time where clusters didn't have ceph-mgr activated. Now since we collocate the ceph-mgr with the mon by default, if the daemon wasn't present it will be created during the upgrade. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-03 14:39:43 +01:00
Guillaume Abrioux	a952122c38	rolling_update: create missing keyring only on running mon try to create the potentially missing keys only on monitors that are actually running. The current node being played is stopped before this task. By the way, delegating the command on all nodes but the current node being played ensures that the generated keys will be present on all monitors. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-29 16:40:46 +00:00
Sébastien Han	61fb6972ec	rolling_update: default ceph json output to empty dict So we can avoid the following failure: The conditional check 'hostvars[mon_host]['ansible_hostname'] in (ceph_health_raw.stdout \| from_json)["quorum_names"] or hostvars[mon_host]['ansible_fqdn'] in (ceph_health_raw.stdout \| from_json)["quorum_names"] ' failed. The error was: No JSON object could be decoded We just need to set a default, the next iteration will have a more complete json since the command won't fail. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-29 10:46:15 +00:00
Guillaume Abrioux	73287f91bc	mgr: fix mgr keyring error on rolling_update when upgrading from RHCS 2.5 to 3.2, it fails because the task `create ceph mgr keyring(s) when mon is containerized` has a when condition `inventory_hostname == groups[mon_group_name]\|last`. First, this is incorrect because `inventory_hostname` is referring to a mgr node, it means this condition would have never been satisfied. Then, this condition + `serial: 1` makes the mgr keyring creating skipped on the first node. Further, the `ceph-mgr` role tries to copy the mgr keyring (it's not aware we are running `serial: 1`) this leads to a failure like the following: ``` TASK [ceph-mgr : copy ceph keyring(s) if needed] ************************************************************************************************************************************************************************************************************************************************************************* task path: /usr/share/ceph-ansible/roles/ceph-mgr/tasks/common.yml:10 Tuesday 27 November 2018 12:03:34 +0000 (0:00:00.296) 0:11:01.290 **** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AnsibleFileNotFound: Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring' failed: [magna021] (item={u'dest': u'/var/lib/ceph/mgr/local-magna021/keyring', u'name': u'/etc/ceph/local.mgr.magna021.keyring', u'copy_key': True}) => {"changed": false, "item": {"copy_key": true, "dest": "/var/lib/ceph/mgr/local-magna021/keyring", "name": "/etc/ceph/local.mgr.magna021.keyring"}, "msg": "Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'"} ``` The ceph_key module is idempotent, so there is no need to have such a condition. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1649957 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-27 18:19:56 +01:00
Sébastien Han	e5d5dffeb5	shrink-osd: add missing CEPH_BINARY We need to add the right binary to do the docker exec. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	4f57e44f9c	defaults: declare container_binary Always declare container_binary and assign it a correct value. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	49e0e19056	rolling_update: update ceph_key task for container Use the new way to create keys on containerized env as introduced by: 1098b71bda90db3dad19ac179f0ba900ccb0f953 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	2814d36c93	infra playbooks: use the right container binary Use podman or docker wether they are available or not. podman will be prioritized over docker if present. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Guillaume Abrioux	7c99b6df6d	update: fix a typo `hostvars[groups[mon_host]]['ansible_hostname']` seems to be a typo. That should be `hostvars[mon_host]['ansible_hostname']` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-26 18:22:20 +01:00
Guillaume Abrioux	af78173584	rolling_update: refact set_fact `mon_host` each monitor node should select another monitor which isn't itself. Otherwise, one node in the monitor group won't set this fact and causes failure. Typical error: ``` TASK [create potentially missing keys (rbd and rbd-mirror) when mon is containerized] * task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-update_docker_cluster/rolling_update.yml:200 Thursday 22 November 2018 14:02:30 +0000 (0:00:07.493) 0:02:50.005 *** fatal: [mon1]: FAILED! => {} MSG: The task includes an option with an undefined variable. The error was: 'dict object' has no attribute u'mon2' ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-26 18:22:20 +01:00
Sébastien Han	4e267bee4f	rolling_update: create rbd and rbd-mirror keyrings During an upgrade ceph won't create keys that were not existing on the previous version. So after the upgrade of let's Jewel to Luminous, once all the monitors have the new version they should get or create the keys. It's ok to have the task fails, especially for the rbd-mirror key, which only appears in Nautilus. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650572 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-26 18:22:20 +01:00
Sébastien Han	c14f9b78ff	switch: do not look for devices anymore It's easier lookup a directoriy instead of the block devices, especially because of ceph-volume and ceph-disk have a different way to handle devices. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-23 07:56:23 +00:00
Sébastien Han	cd56dad9fa	switch: disable all ceph units Prior to this commit we were only disabling ceph-osd units, but forgot the ceph.target which is controlling everything and will restart the ceph-osd units at each reboot. Now that everything gets disabled there won't be any conflicts between the old non-container and the new container units. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-23 07:56:23 +00:00
Sébastien Han	fe1d09925a	switch: do not mask systemd unit If we mask it we won't be able to start the OSD container since now the osd container use the osd ID as a name such as: ceph-osd@0 Fixes the error: Failed to execute operation: Cannot send after transport endpoint shutdown Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-23 07:56:23 +00:00
Guillaume Abrioux	c783bc70da	docker-common: rename role rename `ceph-docker-common` role to `ceph-container-common` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-12 10:51:48 +01:00
Rishabh Dave	90f222f6a5	add quotes around package names added in `da6f384` Add quotes around package names added in the commit `da6f384223` so that the difference between the Ansible variables and package names is clear. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-11-09 12:59:08 +00:00
Rishabh Dave	d72340abbe	pass the list of packages to package management modules Instead of looping over a list of packages or repeating the task separately for different packages, pass the list of packages to the task performing package management. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-11-09 12:59:08 +00:00
Sébastien Han	53910de43b	ceph_key: add fetch_initial_keys capability This is needed for Nautilus since the ceph-create-keys script goes away. (https://github.com/ceph/ceph/pull/21305) Now the module if called with 'state: fetch_initial_keys' will lookup keys generated by the monitor and write them down on the filesystem to the right location (/etc/ceph and /var/lib/ceph/boostrap*). This is not applicable to container since keys are generated by the container only. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-09 12:45:52 +01:00
Mike Christie	b523a44a1a	igw: stop tcmu-runner on iscsi purge When the iscsi purge playbook is run we stop the gw and api daemons but not tcmu-runner which I forgot on the previous PR. Fixes Red Hat BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1621255 Signed-off-by: Mike Christie <mchristi@redhat.com>	2018-11-09 10:02:16 +01:00
Noah Watkins	b848d2be4c	don't use "role" or "roles" to include roles see `3f62fc585f` Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-11-08 17:45:37 +01:00
Noah Watkins	9c47950961	Fix comments in shrink-osd-ceph-disk playbook Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-11-08 17:45:37 +01:00
Noah Watkins	f5dacbf7de	Add a ceph-volume aware shrink-osd playbook Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-11-08 17:45:37 +01:00
Noah Watkins	0782cfc546	Rename ceph-disk version of shrink-osd playbook This will be replaced by a ceph-volume aware verison. Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-11-08 17:45:37 +01:00
Sébastien Han	b82995df58	Revert "ceph_key: add fetch_initial_keys capability" This reverts commit `17883e09ba`.	2018-11-08 13:34:47 +00:00
Sébastien Han	17883e09ba	ceph_key: add fetch_initial_keys capability This is needed for Nautilus since the ceph-create-keys script goes away. (https://github.com/ceph/ceph/pull/21305) Now the module if called with 'state: fetch_initial_keys' will lookup keys generated by the monitor and write them down on the filesystem to the right location (/etc/ceph and /var/lib/ceph/boostrap*). This is not applicable to container since keys are generated by the container only. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-08 13:32:18 +00:00
Rishabh Dave	da6f384223	don't loop over a task using package management modules For tasks using (Ansible) modules for package management utilities, pass the list of packages to be installed instead of repeating the task for each package. Using the latter manner of installing a list of packages leads to a deprecation warning by ansible-playbook command. Fixes: https://github.com/ceph/ceph-ansible/issues/3293 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-11-08 08:38:10 +00:00
Rishabh Dave	640cad3fd8	remove configuration files for ceph packages on ubuntu clusters For apt-get, purge command needs to be used, instead of remove command, to remove related configuration files. Otherwise, packages might be shown as installed while running dpkg command even after removing them. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1640061 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-11-07 15:52:53 +01:00
Guillaume Abrioux	f7d4651186	playbook: remove jinja syntax in when statement this syntax in deprecated Closes: #3281 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-31 13:45:41 +01:00
Rishabh Dave	3f62fc585f	don't use "role" or "roles" to include roles Since import_role and include_role are more readable, explicit (about the nature of inclusion) and flexible (allows placibf inclusion anywhere) amongst the tasks, use them instead of using roles or role keyword. Besides, these keywords also allow more arguments. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-10-31 09:38:59 +01:00
Rishabh Dave	8edbda96df	use blocks directives to group tasks Using block directives simplifies the playbooks and makes them more readable. Fixes: https://github.com/ceph/ceph-ansible/issues/2835 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-10-31 09:37:43 +01:00
Guillaume Abrioux	d8d3e55006	remove restapi role As of `mimic`, restapi is no longer available because of manager daemon. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-30 14:19:13 +01:00
Ali Maredia	219fa8f919	infrastructure playbooks: ensure nvme_device is defined in lv-create.yml Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-10-29 08:41:42 +00:00
Mike Christie	0904860032	igw: stop daemons on purge all calls When purging the entire igw config (lio and rbd) stop disable the api and gw daemons. Fixes Red Hat BZ https://bugzilla.redhat.com/show_bug.cgi?id=1621255 Signed-off-by: Mike Christie <mchristi@redhat.com>	2018-10-25 12:59:18 +02:00
Sébastien Han	44d0da0dd4	rolling_update: fix upgrade when using fqdn CLusters that were deployed using 'mon_use_fqdn' have a different unit name, so during the upgrade this must be used otherwise the upgrade will fail, looking for a unit that does not exist. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1597516 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-19 13:06:56 +00:00
Guillaume Abrioux	b8418ebd17	add-osds: followup on `3632b26` Three fixes: - fix a typo in vagrant_variables that cause a networking issue for containerized scenario. - add containerized_deployment: true - remove a useless block of code: the fact docker_exec_cmd is set in ceph-defaults which is played right after. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-17 17:07:25 +02:00
Sébastien Han	d6e79044ef	infra: add a gather-ceph-logs.yml playbook Add a gather-ceph-logs.yml which will log onto all the machines from your inventory and will gather ceph logs. This is not intended to work on containerized environments since the logs are stored in journald. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1582280 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-17 13:52:19 +00:00
Sébastien Han	fbd878c8d5	infra: rename osd-configure to add-osd and improve it The playbook has various improvements: * run ceph-validate role before doing anything * run ceph-fetch-keys only on the first monitor of the inventory list * set noup flag so PGs get distributed once all the new OSDs have been added to the cluster and unset it when they are up and running Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1624962 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-17 11:26:11 +00:00
Guillaume Abrioux	40b7747af7	remove jewel support As of now, we should no longer support Jewel in ceph-ansible. The latest ceph-ansible release supporting Jewel is `stable-3.1`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-12 23:38:17 +00:00
Sébastien Han	9fccffa1ca	switch: allow switch big clusters (more than 99 osds) The current regex had a limitation of 99 OSDs, now this limit has been removed and regardless the number of OSDs they will all be collected. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1630430 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-10 16:35:30 -04:00
Noah Watkins	8dcc8d1434	Stringify ceph_docker_image_tag This could be a numeric input, but is treated like a string leading to runtime errors. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1635823 Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-10-10 04:26:33 +00:00
Noah Watkins	306e308f13	Avoid using tests as filter Fixes the deprecation warning: [DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of using `result\|search` use `result is search`. Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-10-10 04:26:33 +00:00
Guillaume Abrioux	79bd06ad28	rolling_update: add ceph-handler role since the introduction of ceph-handler, it has to be added in rolling_update playbook as well Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-05 13:48:04 +00:00
Rishabh Dave	b5d2ea269f	don't use "static" field while including tasks Instead used "import_tasks" and "include_tasks" to tell whether tasks must be included statically or dynamically. Fixes: https://github.com/ceph/ceph-ansible/issues/2998 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-10-04 07:44:28 +00:00
Sébastien Han	bae0f41705	switch: copy initial mon keyring We need to copy this key into /etc/ceph so when ceph-docker-common runs it can fetch it to the ansible server. Previously the task wasn't not failing because `fail_on_missing` was False before 2.5, so now it's True hence the failure. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-03 13:58:53 +00:00
Guillaume Abrioux	03e76af7b4	switch: add missing call to ceph-handler role Add missing call the ceph-handler role, otherwise we can't have reference to variable registered from ceph-handler from other roles. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-03 13:58:53 +00:00
Guillaume Abrioux	54b02fe187	switch: support migration when cluster is scrubbing Similar to `c13a3c3` we must allow scrubbing when running this playbook. In cluster with a large number of PGs, it can be expected some of them scrubbing, it's a normal operation. Preventing from scrubbing operation force to set noscrub flag. This commit allows to switch from non containerized to containerized environment even while PGs are scrubbing. Closes: #3182 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-03 13:58:53 +00:00
Andrew Schoen	9747f3dbd5	purge-cluster: zap devices used with the lvm scenario Fixes: https://github.com/ceph/ceph-ansible/issues/3156 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-09-28 14:49:56 +02:00
wumingqiao	5da71e1ca1	purge-cluster: recursively remove ceph-related files, symlinks and directories under /etc/systemd/system. fix: https://github.com/ceph/ceph-ansible/issues/3166 Signed-off-by: wumingqiao <wumingqiao@beyondcent.com>	2018-09-28 14:49:22 +02:00
Rishabh Dave	380168dadc	don't use "include" to include tasks Use "import_tasks" or "include_tasks" instead. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-09-27 17:53:40 +02:00
Guillaume Abrioux	144c92b21f	purge: actually remove of /var/lib/ceph/* `38dc20e74b` introduced a bug in the purge playbooks because using `` in `command` module doesn't work. `/var/lib/ceph/` files are not purged it means there is a leftover. When trying to redeploy a cluster, it failed because monitor daemon was detecting existing keyring, therefore, it assumed a cluster already existed. Typical error (from container output): ``` Sep 26 13:18:16 mon0 docker[31316]: 2018-09-26 13:18:16 /entrypoint.sh: Existing mon, trying to rejoin cluster... Sep 26 13:18:16 mon0 docker[31316]: 2018-09-26 13:18:16.9323937f15b0d74700 -1 auth: unable to find a keyring on /etc/ceph/test.client.admin.keyring,/etc/ceph/test.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,:(2) No such file or directory Sep 26 13:18:23 mon0 docker[31316]: 2018-09-26 13:18:23 /entrypoint.sh: SUCCESS ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1633563 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-27 17:45:21 +02:00
Guillaume Abrioux	179c4d00d7	rolling_update: ensure pgs_by_state has at least 1 entry Previous commit `c13a3c3` has removed a condition. This commit brings back this condition which is essential to ensure we won't hit a false positive result in the `when` condition for the check PGs task. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-25 14:58:54 +00:00
Guillaume Abrioux	c13a3c3492	upgrade: consider all 'active+clean' states as valid pgs In cluster with a large number of PGs, it can be expected some of them scrubbing, it's a normal operation. Preventing from scrubbing operation force to set noscrub flag before a rolling update which is a problem because it pauses an important data integrity operation until the end of the rolling upgrade. This commit allows an upgrade even while PGs are scrubbing. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616066 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-25 12:12:06 +00:00
Guillaume Abrioux	57f0b6a476	shrink-osd: follow up on `36fb3cde` - Adds loop in bash to satisfy the 1:n relation between `osd_hosts` and the different device lists. - Fixes some container name which were using the host hostname instead of the actual container one. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-18 07:27:41 +00:00
Sébastien Han	735e1917db	shrink-osd: purge dedicated devices Once the OSD is destroyed we also have to purge the associated devices, this means purging journal, db , wal partitions too. This now works for container and non-container. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1572933 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-09-18 07:27:41 +00:00
Guillaume Abrioux	4159326a18	shrink-osd: fix purge osd on containerized deployment `ce1dd8d` introduced the purge osd on containers but it was incorrect. `resolve parent device` and `zap ceph osd disks` tasks must be delegated to their respective OSD nodes. Indeed, they were run on the ansible node, it means it was trying to resolve parent devices from this node where it should be done on OSD nodes. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1612095 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-13 18:14:01 +02:00
Sébastien Han	38dc20e74b	purge: only purge /var/lib/ceph content Sometime /var/lib/ceph is mounted on a device so we won't be able to remove it (device busy) so let's remove its content only. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1615872 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-09-03 10:51:24 +02:00
Ali Maredia	561ec9203d	infrastructure-playbooks: add comments for lv_vars.yml Add comments telling user that devices used in playbooks must not have GPT/FS/RAID signatures Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-08-29 21:10:20 +00:00
Ali Maredia	77eb459a88	infrastructure playbooks: remove lv-create error msg remove error message when PV creation fails Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-08-29 21:10:20 +00:00
Ali Maredia	e1ff438800	infrastructure-playbooks: failure msg for pvcreate Add a message for when PV creation fails. This message alerts users that FS/GPT/RAID signatures could still on the device and the reason for the failures. `wipefs -a $device` needs to be run to fix this issue. Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-08-28 20:21:42 +00:00
Sébastien Han	2e6e885bb7	rolling_upgrade: set sortbitwise properly Running 'osd set sortbitwise' when we detect a version 12 of Ceph is wrong. When OSD are getting updated, even though the package is updated they won't send their updated version (12) and will stick with 10 if the command is not applied. So we have to check if OSD are sending a version 10 and then run the command to unlock the OSDs. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-21 12:22:32 +00:00
Sébastien Han	77a3a682f3	iscsi group name preserve backward compatibility Recently we renamed the group_name for iscsi iscsigws where previously it was named iscsi-gws. Existing deployments with a host file section with iscsi-gws must continue to work. This commit adds the old group name as a backoward compatility, no error from Ansible should be expected, if the hostgroup is not found nothing is played. Close: https://bugzilla.redhat.com/show_bug.cgi?id=1619167 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-20 23:52:19 +02:00
Sébastien Han	b738706810	take-over-existing-cluster: do not call var_files We were using var_files long ago when default variables were not in ceph-defaults, now the role exists this is not need. Moreover having these two var files added: - roles/ceph-defaults/defaults/main.yml - group_vars/all.yml Will create collision and override necessary variables. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1555305 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-20 14:47:04 +02:00
Andrew Schoen	04df3f0802	lv-create: use copy instead of the template module The copy module does in fact do variable interpolation so we do not need to use the template module or keep a template in the source. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	131796f275	lv-create: add an example logfile_path config option in lv_vars.yml Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	b0bfc17351	lv-teardown: fail silently if lv_vars.yml is not found This allows user to opt out of using lv_vars.yml and load configuration from other sources. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	8424858b40	lv-teardown: set become: true at the playbook level Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	e43eec57bb	lv-create: fail silenty if lv_vars.yml is not found If a user decides to to use the lv_vars.yml file then it should fail silenty so that configuration can be picked up from other places. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	fde47be13c	lv-create: set become: true at the playbook level Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	35301b35af	lv-create: use the template module to write log file The copy module will not expand the template and render the variables included, so we must use template. Creating a temp file and using it locally means that you must run the playbook with sudo privledges, which I don't think we want to require. This introduces a logfile_path variable that the user can use to control where the logfile is written to, defaulting to the cwd. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Neha Ojha	909b38da82	infrastructure-playbooks/vars/lv_vars.yaml: minor fixes Signed-off-by: Neha Ojha <nojha@redhat.com>	2018-08-16 16:38:23 +02:00
Neha Ojha	f65f3ea89f	infrastructure-playbooks/lv-create.yml: use tempfile to create logfile Signed-off-by: Neha Ojha <nojha@redhat.com>	2018-08-16 16:38:23 +02:00
Neha Ojha	65fdad0723	infrastructure-playbooks/lv-create.yml: add lvm_volumes to suggested paste Signed-off-by: Neha Ojha <nojha@redhat.com>	2018-08-16 16:38:23 +02:00
Neha Ojha	50a6d8141c	infrastructure-playbooks/lv-create.yml: copy without using a template file Signed-off-by: Neha Ojha <nojha@redhat.com>	2018-08-16 16:38:23 +02:00
Neha Ojha	186c4e11c7	infrastructure-playbooks/lv-create.yml: don't use action to copy Signed-off-by: Neha Ojha <nojha@redhat.com>	2018-08-16 16:38:23 +02:00
Neha Ojha	9d43806df9	infrastructure-playbooks: standardize variable usage with a space after brackets Signed-off-by: Neha Ojha <nojha@redhat.com>	2018-08-16 16:38:23 +02:00
Neha Ojha	e0293de3e7	vars/lv_vars.yaml: remove journal_device Signed-off-by: Neha Ojha <nojha@redhat.com>	2018-08-16 16:38:23 +02:00
Ali Maredia	1f018d8612	infrastructure-playbooks: playbooks for creating LVs for bucket indexes and journals These playbooks create and tear down logical volumes for OSD data on HDDs and for a bucket index and journals on 1 NVMe device. Users should follow the guidelines set in var/lv_vars.yaml After the lv-create.yml playbook is run, output is sent to /tmp/logfile.txt for copy and paste into osds.yml Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-08-16 16:38:23 +02:00
Sébastien Han	dad10e8f3f	rolling_update: register container osd units Before running the upgrade, let's call systemd to collect unit names instead of relaying on the device list. This is more accurate and fix the osd_auto_discovery scenario too. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613626 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-16 11:13:12 +02:00
Jeffrey Zhang	85cc61a6d9	Use /var/lib/ceph/osd folder to filter osd mount point In some case, use may mount a partition to /var/lib/ceph, and umount it will be failure and no need to do so too. Signed-off-by: Jeffrey Zhang <zhang.lei.fly@gmail.com>	2018-08-14 13:00:24 +00:00
Sébastien Han	b3266c5be2	rolling_update: set osd sortbitwise upgrade RHCS 2 -> RHCS 3 will fail if cluster has still set sortnibblewise, it stay stuck on "TASK [waiting for clean pgs...]" as RHCS 3 osds will not start if nibblewise is set. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-07-24 17:19:02 +02:00
Sébastien Han	ce1dd8d2b3	shrink-osd: purge osd on containerized deployment Prior to this commit we were only stopping the container, but now we also purge the devices. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1572933 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-07-18 14:26:22 +00:00
Guillaume Abrioux	d0746e0858	common: switch from docker module to docker_container As of ansible 2.4, `docker` module has been removed (was deprecated since ansible 2.1). We must switch to `docker_container` instead. See: https://docs.ansible.com/ansible/latest/modules/docker_module.html#docker-module Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-07-10 20:08:07 +00:00
Vishal Kanaujia	44d514850a	Rolling upgrades: Migrate to ceph-key module This change moves ceph-mgr upgrades to using ceph-key library. Fixes: #2758 Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>	2018-07-03 18:22:14 +02:00
Sébastien Han	20c8065e48	ceph-iscsi: rename group iscsi_gws Let's try to avoid using dashes as testinfra needs to be able to read the groups. Typically, with iscsi-gws we can't add a marker for these iscsi nodes, using an underscore fixes the issue. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-08 10:21:54 +02:00
Guillaume Abrioux	232a16d77f	rolling_update: fix facts gathering delegation this is kind of follow up on what has been made in #2560. See #2560 and #2553 for details. Closes: #2708 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-06-06 16:36:30 +08:00
Vishal Kanaujia	08d9432454	Rolling upgrades should use norebalance flag for OSDs The rolling upgrades playbook should have norebalance flag set for OSDs upgrades to wait only for recovery. Fixes: #2657 Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>	2018-06-04 10:59:01 +02:00
Sébastien Han	e91648a7af	rolling_update: add role ceph-iscsi-gw Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1575829 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-05-26 02:38:47 -07:00
Paul Cuzner	2890b57cfc	Add privilege escalation to iscsi purge tasks Without the escalation, invocation from non-root users with fail when accessing the rados config object, or when attempting to log to /var/log Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1549004 Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2018-05-25 03:50:24 -07:00
Sébastien Han	da5b104098	rolling_update: fix get fsid for containers When running ansible2.4-update_docker_cluster there is an issue on the "get current fsid" task. The current task only works for non-containerized deployment but will run all the time (even for containerized). This currently results in the following error: TASK [get current fsid] ****************************************************** task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-luminous-ansible2.4-update_docker_cluster/rolling_update.yml:214 Tuesday 22 May 2018 22:48:32 +0000 (0:00:02.615) 0:11:01.035 ********* fatal: [mgr0 -> mon0]: FAILED! => { "changed": true, "cmd": [ "ceph", "--cluster", "test", "fsid" ], "delta": "0:05:00.260674", "end": "2018-05-22 22:53:34.555743", "rc": 1, "start": "2018-05-22 22:48:34.295069" } STDERR: 2018-05-22 22:48:34.495651 7f89482c6700 0 -- 192.168.17.10:0/1022712 >> 192.168.17.12:6789/0 pipe(0x7f8944067010 sd=4 :42654 s=1 pgs=0 cs=0 l=1 c=0x7f894405d510).connect protocol feature mismatch, my 83ffffffffffff < peer 481dff8eea4fffb missing 400000000000000 2018-05-22 22:48:34.495684 7f89482c6700 0 -- 192.168.17.10:0/1022712 >> 192.168.17.12:6789/0 pipe(0x7f8944067010 sd=4 :42654 s=1 pgs=0 cs=0 l=1 c=0x7f894405d510).fault This is not really representative on the real error since the 'ceph' cli is available on that machine. On other environments we will have something like "command not found: ceph". Signed-off-by: Sébastien Han <seb@redhat.com>	2018-05-23 04:44:12 +02:00
Guillaume Abrioux	9801bde4d4	purge_cluster: fix dmcrypt purge dmcrypt devices aren't closed properly, therefore, it may fail when trying to redeploy after a purge. Typical errors: ``` ceph-disk: Cannot discover filesystem type: device /dev/sdb1: Command '/sbin/blkid' returned non-zero exit status 2 ``` ``` ceph-disk: Error: unable to read dm-crypt key: /var/lib/ceph/osd-lockbox/c6e01af1-ed8c-4d40-8be7-7fc0b4e104cf: /etc/ceph/dmcrypt-keys/c6e01af1-ed8c-4d40-8be7-7fc0b4e104cf.luks.key ``` Closing properly dmcrypt devices allows to redeploy without error. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-21 08:23:10 +02:00
Guillaume Abrioux	415dc0a29b	take-over: fix bug when trying to override variable A customer has been facing an issue when trying to override `monitor_interface` in inventory host file. In his use case, all nodes had the same interface for `monitor_interface` name except one. Therefore, they tried to override this variable for that node in the inventory host file but the take-over-existing-cluster playbook was failing when trying to generate the new ceph.conf file because of undefined variable. Typical error: ``` fatal: [srvcto103cnodep01]: FAILED! => {"failed": true, "msg": "'dict object' has no attribute u'ansible_bond0.15'"} ``` Including variables like this `include_vars: group_vars/all.yml` prevent us from overriding anything in inventory host file because it overwrites everything you would have defined in inventory. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1575915 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-18 10:10:08 +02:00
Sébastien Han	49a4712485	switch: disable ceph-disk units During the transition from jewel non-container to container old ceph units are disabled. ceph-disk can still remain in some cases and will appear as 'loaded failed', this is not a problem although operators might not like to see these units failing. That's why we remove them if we find them. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1577846 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-05-17 08:48:28 +02:00
Guillaume Abrioux	a9247c4de7	purge_cluster: wipe all partitions In order to ensure there is no leftover after having purged a cluster, we must wipe all partitions properly. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-17 08:37:17 +02:00
Guillaume Abrioux	9cad113e2f	purge_cluster: fix bug when building device list there is some leftover on devices when purging osds because of a invalid device list construction. typical error: ``` changed: [osd3] => (item=/dev/sda sda1) => { "changed": true, "cmd": "# if the disk passed is a raw device AND the boot system disk\n if parted -s \"/dev/sda sda1\" print \| grep -sq boot; then\n echo \"Looks like /dev/sda sda1 has a boot partition,\"\n echo \"if you want to delete specific partitions point to the partition instead of the raw device\"\n echo \"Do not use your system disk!\"\n exit 1\n fi\n echo sgdisk -Z \"/dev/sda sda1\"\n echo dd if=/dev/zero of=\"/dev/sda sda1\" bs=1M count=200\n echo udevadm settle --timeout=600", "delta": "0:00:00.015188", "end": "2018-05-16 12:41:40.408597", "item": "/dev/sda sda1", "rc": 0, "start": "2018-05-16 12:41:40.393409" } STDOUT: sgdisk -Z /dev/sda sda1 dd if=/dev/zero of=/dev/sda sda1 bs=1M count=200 udevadm settle --timeout=600 STDERR: Error: Could not stat device /dev/sda sda1 - No such file or directory. ``` the devices list in the task `resolve parent device` isn't built properly because the command used to resolve the parent device doesn't return the expected output eg: ``` changed: [osd3] => (item=/dev/sda1) => { "changed": true, "cmd": "echo /dev/$(lsblk -no pkname \"/dev/sda1\")", "delta": "0:00:00.013634", "end": "2018-05-16 12:41:09.068166", "item": "/dev/sda1", "rc": 0, "start": "2018-05-16 12:41:09.054532" } STDOUT: /dev/sda sda1 ``` For instance, it will result with a devices list like: `['/dev/sda sda1', '/dev/sdb', '/dev/sdc sdc1']` where we expect to have: `['/dev/sda', '/dev/sdb', '/dev/sdc']` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-17 08:37:17 +02:00
Sébastien Han	d80a871a07	rolling_update: move osd flag section During a minor update from a jewel to a higher jewel version (10.2.9 to 10.2.10 for example) osd flags don't get applied because they were done in the mgr section which is skipped in jewel since this daemons does not exist. Moving the set flag section after all the mons have been updated solves that problem. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1548071 Co-authored-by: Tomas Petr <tpetr@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com>	2018-05-17 08:17:16 +02:00
Guillaume Abrioux	1b4c3f292d	rolling_update: fix dest path for mgr keys fetching the role `ceph-mgr` that is played later in the playbook fails because the destination path for the fetched keys is wrong. This patch fix the destination path used in the task `fetch ceph mgr key(s)` so there is no mismatch. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-15 19:30:34 +02:00
Guillaume Abrioux	3b89f1bfb1	rolling_update: get fsid in mgr pre_task {{ fsid }} points to {{ cluster_uuid.stdout }} which is not defined in this part of the rolling_update playbook. Since we need to call {{ fsid }} we must get the fsid and register it to `cluster_uuid`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-15 09:01:42 +02:00
Sébastien Han	52fc8a0385	rolling_update: move mgr key creation Until all the mons haven't been updated to Luminous, there is no way to create a key. So we should do the key creation in the mon role only if we are not part of an update. If we are then the key creation is done after the mons upgrade to Luminous. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-05-15 09:01:42 +02:00
Guillaume Abrioux	adeecc51f8	switch: fix ceph_uid fact for osd In addition to b324c17 this commit fix the ceph uid for osd role in the switch from non containerized to containerized playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-30 08:15:18 +02:00
Sébastien Han	5fa92804f9	switch: resolve device path so we can umount the osd data dir If we don't do this, umounting devices declared like this /dev/disk/by-id/ata-QEMU_HARDDISK_QM00001 will fail like: umount: /dev/disk/by-id/ata-QEMU_HARDDISK_QM000011: mountpoint not found Since we append '1' (partition 1), this won't work. So we need to resolved the link to get something like /dev/sdb and then append 1 to /dev/sdb1 Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-30 08:15:18 +02:00
Sébastien Han	767abb5de0	switch: fix ceph_uid fact Latest is now centos not ubuntu anymore so the condition was wrong. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-30 08:15:18 +02:00
Sébastien Han	85732d11b9	mon/client: remove acl code Applying ACL on the keyrings is not used anymore so let's remove this code. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-23 18:34:58 +02:00
Sébastien Han	66c1ea8cd5	shrink-osd: ability to shrink NVMe drives Now if the service name contains nvme we know we need to remove the last 2 character instead of 1. If nvme then osd_to_kill_disks is nvme0n1, we need nvme0 If ssd or hdd then osd_to_kill_disks is sda1, we need sda Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1561456 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-20 15:08:29 +02:00
Sébastien Han	641f141c0f	selinux: remove chcon calls We know bindmount with the :z option at the end of the -v command so this will basically run the exact same command as we used to run. So to speak: chcon -Rt svirt_sandbox_file_t /var/lib/ceph Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-19 14:59:37 +02:00
Sébastien Han	473939d215	infra: add playbook example for ceph_key module Helper playbook to manage CephX keys. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-11 12:18:34 +02:00
Andrew Schoen	08f4875533	ceph_volume: refactor to not run ceph osd destroy This changes state to action and gives the options 'create' or 'zap'. The zap parameter is also removed. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-04-10 14:19:21 +02:00
Andrew Schoen	c6e8f8fb11	purge-cluster: no need to use objectstore for ceph_volume module When zapping objectstore is not required. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-04-10 14:19:21 +02:00
Andrew Schoen	c29a75ac7f	purge-cluster: use ceph_volume module to zap and destroy OSDs Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-04-10 14:19:21 +02:00
Randy J. Martinez	d1f2d64b15	purge-docker: added conditionals needed to successfully re-run purge Added 'ignore_errors: true' to multiple lines which run docker commands; even in cases where docker is no longer installed. Because of this, certain tasks in the purge-docker-cluster.yml will cause the playbook to fail if re-run and stop the purge. This leaves behind a dirty environment, and a playbook which can no longer be run. Fix Regex line 275: Sometimes 'list-units' will output 4 spaces between loaded+active. The update will account for both scenarios. purge fetch_directory: in other roles fetch_directory is hard linked ex.: "{{ fetch_directory }}"/"{{ somedir }}". That being said, fetch_directory will never have a trailing slash in the all.yml so this task was never being run(causing failures when trying to re-deploy). Signed-off-by: Randy J. Martinez <ramartin@redhat.com>	2018-04-10 13:39:14 +02:00
Guillaume Abrioux	e32a177af8	purge-docker: remove redundant task The `remove_packages` prompt is redundant to the `ireallymeanit` prompt since it does exactly the same thing. I guess the only goal of this task was to make a break to warn user about `--skip-tags=with_pkg` feature. This warning should be part of the first prompt. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-03 11:54:42 +02:00
Andy McCrae	60d4b75f51	Cleanup plugins directories and references Having callback_plugins, and action plugins in random locations causes a lot of disparity. We should centralize this into one place in the plugins directory and fix up the ansible.cfg to reflect this. Additionally, since the ansible.cfg already reflects action_plugins, we don't need a link to action_plugins in the base of the repository.	2018-03-14 11:15:39 +01:00
jtudelag	691f7c5146	Adds handy ceph aliases whe containerized installations. Same approach as openshift-ansible etcdctl: * https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/etcd/tasks/auxiliary/drop_etcdctl.yml * https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/etcd/etcdctl.sh	2018-03-08 13:56:39 +01:00
Guillaume Abrioux	c04e67347c	update: look for short and fqdn in ceph_health_raw According to hostname configuration, the task waiting for mons to be in quorum might fail. The idea here is to look for both shortname and fqdn in `ceph_health_raw` instead of just `ansible_hostname` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1546127 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-02-19 10:27:47 +01:00
Andrew Schoen	699c777e68	rolling update: fix undefined jewel_minor_update failure Variables set at the play level with ``vars`` do not carry over into the next play in the playbook. The var jewel_minor_update was set in a previous play but used in this one and was failing because it was not defined. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1544029 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-02-13 17:03:05 +01:00
Andrew Schoen	7c7017ebe6	infra: do not include host_vars/* in take-over-existing-cluster.yml These are better collected by ansible automatically. This would also fail if the host_var file didn't exist. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-02-12 11:48:47 +01:00
Guillaume Abrioux	3b2f6c34e4	purge-docker: fix ceph-osd-zap name container the `zap ceph osd disks` task should iter on `resolved_parent_device` instead of `combined_devices_list` which contain only the base device name (vs. full path name in `combined_devices_list`). this fixes the issue where docker complain about container name because of illegal characters such as `/` : ``` "/usr/bin/docker-current: Error response from daemon: Invalid container name (ceph-osd-zap-magna074-/dev/sdb1), only [a-zA-Z0-9][a-zA-Z0-9_.-] are allowed.","See '/usr/bin/docker-current run --help'." "" ``` having the the basename of the device path is enough for the container name. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1540137 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-02-02 22:09:11 +01:00
Guillaume Abrioux	dd0c98c5a2	common: do not use `shell` module when it is not needed There is no need here to use `shell` instead of `command` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-31 10:45:34 +01:00
Guillaume Abrioux	deaf273b25	syntax: change local_action syntax Use a nicer syntax for `local_action` tasks. We used to have oneliner like this: ``` local_action: wait_for port=22 host={{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }} state=started delay=10 timeout=500 }} ``` The usual syntax: ``` local_action: module: wait_for port: 22 host: "{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}" state: started delay: 10 timeout: 500 ``` is nicer and kind of way to keep consistency regarding the whole playbook. This also fix a potential issue about missing quotation : ``` Traceback (most recent call last): File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 213, in <module> main() File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 185, in main rc, out, err = module.run_command(args, executable=executable, use_unsafe_shell=shell, encoding=None, data=stdin) File "/tmp/ansible_wQtWsi/ansible_modlib.zip/ansible/module_utils/basic.py", line 2710, in run_command File "/usr/lib64/python2.7/shlex.py", line 279, in split return list(lex) File "/usr/lib64/python2.7/shlex.py", line 269, in next token = self.get_token() File "/usr/lib64/python2.7/shlex.py", line 96, in get_token raw = self.read_token() File "/usr/lib64/python2.7/shlex.py", line 172, in read_token raise ValueError, "No closing quotation" ValueError: No closing quotation ``` writing `local_action: shell echo {{ fsid }} \| tee {{ fetch_directory }}/ceph_cluster_uuid.conf` can cause trouble because it's complaining with missing quotes, this fix solves this issue. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-31 10:45:34 +01:00
Guillaume Abrioux	f372a4232e	purge: fix resolve parent device task This is a typo caused by leftover. It was previously written like this : `shell: echo /dev/$(lsblk -no pkname "{{ item }}") }}")` and has been rewritten to : `shell: $(lsblk --nodeps -no pkname "{{ item }}") }}")` because we are appending later the '/dev/' in the next task. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1540137 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-30 17:40:10 +01:00
Guillaume Abrioux	c7ec12d49c	upgrade: skip luminous tasks for jewel minor update These tasks are needed only when upgrading to luminous. They are not needed in Jewel minor upgrade and by the way, they fail because `ceph versions` command doesn't exist. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-25 18:30:34 +01:00
Sébastien Han	8af7459476	rolling update: add mgr exception for jewel minor updates When update from a minor Jewel version to another, the playbook will fail on the task "fail if no mgr host is present in the inventory". This now can be worked around by running Ansible with_items -e jewel_minor_update=true Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1535382 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-01-18 14:06:05 +01:00
Guillaume Abrioux	55298fa80c	purge-container: use lsblk to resolv parent device Using `lsblk` to resolv the parent device is better than just removing the last char when passing it to the zap container. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-17 15:54:20 +01:00
Guillaume Abrioux	58eb045d2f	purge-container: remove awk usage in favor of blkid Avoid using `awk` to get the different devices from the partlabel. Using `blkid` is more readable. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-17 15:54:20 +01:00
Andrew Schoen	b613321c21	switch-to-containers: do not fail when stopping the nfs-ganesha service If we're working with a jewel cluster then this service will not exist. This is mainly a problem with CI testing because our tests are setup to work with both jewel and luminous, meaning that eventhough we want to test jewel we still have a nfs-ganesha host in the test causing these tasks to run. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-01-06 14:07:55 +01:00
Andrew Schoen	0b4b60e3c9	switch-to-containers: do not fail when stopping the ceph-mgr daemon If we are working with a jewel cluster ceph mgr does not exist and this makes the playbook fail. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-01-06 14:07:55 +01:00
Andrew Schoen	997edea271	rolling_update: do not fail the playbook if nfs-ganesha is not present The rolling update playbook was attempting to stop the nfs-ganesha service on nodes where jewel is still installed. The nfs-ganesha service did not exist in jewel so the task fails. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-01-06 14:07:55 +01:00
Guillaume Abrioux	c5b7b37105	purge-cluster: clean some code Avoid using regexp to match device Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-12-20 17:42:45 +01:00
Guillaume Abrioux	eeedefdf02	purge-cluster: wipe disk using dd `bluestore_purge_osd_non_container` scenario is failing because it keeps old osd_uuid information on devices and cause the `ceph-disk activate` to fail when trying to redeploy a new cluster after a purge. typical error seen : ``` 2017-12-13 14:29:48.021288 7f6620651d00 -1 bluestore(/var/lib/ceph/tmp/mnt.2_3gh6/block) _check_or_set_bdev_label bdev /var/lib/ceph/tmp/mnt.2_3gh6/block fsid 770080e2-20db-450f-bc17-81b55f167982 does not match our fsid f33efff0-2f07-4203-ad8d-8a0844d6bda0 ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-12-20 17:42:45 +01:00
Sébastien Han	200785832f	rolling_update: do not require root to answer question There is no need to ask for root on the local action. This will prompt for a password the current user is not part of sudoers. That's unnecessary anyways. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1516947 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-12-19 14:04:55 +01:00
Guillaume Abrioux	aaaf980140	purge: fix bug on 'wait_for' task this task hangs because `{{ inventory_hostname }}` doesn't resolv to an actual ip address. Using `hostvars[inventory_hostname]['ansible_default_ipv4']['address']` should fix this because it will reach the node with its actual IP address. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-29 11:10:56 +01:00
Guillaume Abrioux	947766e294	purge-cluster: remove usage of `with_fileglob` `with_fileglob` loops over files on the machine where ansible-playbook is being run. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-21 08:24:11 +01:00
Guillaume Abrioux	d9c1b61092	purge-docker: remove osd disk prepare logs `with_fileglob` loops over files on the machine that runs the playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-16 14:27:36 +01:00
Sébastien Han	68566444e9	Merge pull request #2142 from squidboylan/master infra: fix take-over-existing-cluster.yml playbook	2017-11-13 22:06:16 +11:00
Guillaume Abrioux	fa675f2ead	purge-docker-cluster: ensure old logs are removed purge-docker-cluster must remove all osd_disk_prepare logs in `{{ ceph_osd_docker_run_script_path }}`, otherwise if you purge your cluster and try to redeploy it, osds will fail to start since because it will try to retrieve find a partition uuid which doesn't exist. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1510470 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-11-09 17:49:20 +01:00
Caleb Boylan	41d10a2f64	infra: fix take-over-existing-cluster.yml playbook The ansible inventory could have more than just ceph-ansible hosts, so we shouldnt use "hosts: all", also only grab one file when getting the ceph cluster name instead of failing when there is more than one file in /etc/ceph. Also fix location of the ceph.conf template	2017-11-06 15:00:30 -08:00
Sébastien Han	473673ab41	shrink-mon: fix typo in the code doc Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-27 11:59:22 +02:00
Sébastien Han	2837d0a22e	purge: do not reboot by default Rebooting servers is really intrusive and perhaps this is not what the operator wants. So we disable the reboot by default now. Note that the reboot might not happen all the time. It can be enabled by default by running the purge playbook with -e reboot_osd_node=True Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1505011 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-26 14:18:38 +02:00
Guillaume Abrioux	f90f2f3a04	purge: containers are not stopped During purge osd, the containers are not stopped because of a typo, as a result, all the devices can't be unmounted later. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-10-25 07:58:00 +02:00
Sébastien Han	4413511b66	all: backward compatibility between stable-2.2 and 3.0 stable-3.0 brought numerous changes in ceph-ansible variables, this PR aims to maintain backward compatibility for someone running stable-2.2 upgrading to stable-3.0 but keeps its groups_vars untouched. We will then determine the right options to make sure the upgrade works but we are expecting that new variables should be used. We will drop this in a near future, maybe 3.1 or 3.2. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-20 11:54:10 +02:00
Guillaume Abrioux	982326373b	upgrade: fix upgrade jewel to luminous for nfs nodes nfs nodes can't be upgraded from jewel to luminous because ceph-nfs role is skipped because of the condition `when: "ceph_release_num[ceph_release] >= ceph_release_num.luminous"`. Indeed, package is upgraded in `ceph-nfs` role, therefore, `ceph_release` is still set to the old version. It means the when can't be satisfied. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-10-19 20:54:23 +02:00
Guillaume Abrioux	70034451e9	upgrade: fix upgrade jewel to luminous for mgr nodes mgr nodes can't be upgraded from jewel to luminous because ceph-mgr role is skipped because of the condition `when: "ceph_release_num[ceph_release] >= ceph_release_num.luminous"`. Indeed, ceph-mgr package is upgraded in `ceph-mgr` role, therefore, `ceph_release` is still set to the old version. It means the when can't be satisfied. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit 302e563601cd6820b1ae44fabdfb1506688c7c9b)	2017-10-19 20:54:23 +02:00
Sébastien Han	d920d4839d	upgrade: support for rbd mirror and nfs - Add upgrade support for rbd mirror and nfs daemons. - Only works with systemd (remove sysvinit and upstart occurence) - A bit of cleanup Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-17 10:54:47 +02:00
Sébastien Han	39bf102b64	switch: nicer way to check mon quorum re-use the same syntax as rolling_udate.yml Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-17 10:54:36 +02:00
Sébastien Han	b685aceede	Merge pull request #2044 from major/avoid-jinja-in-when Remove jinja2 delimiters from `when` keys	2017-10-12 22:23:06 +02:00
Major Hayden	c01851325e	Remove jinja2 delimiters from `when` keys This patch changes the `when:` keys so that they have no jinja2 delimiters. This avoids Ansible warnings which could turn into errors in a future Ansible release.	2017-10-12 11:27:42 -05:00
Major Hayden	33b200d43a	Suppress yum/dnf/rpm command warnings Ansible throws warnings when using yum/dnf/rpm with the command module: [WARNING]: Consider using yum module rather than running yum This patch adds the `warn: no` argument to suppress the warnings in the Ansible output.	2017-10-12 08:38:05 -05:00
Sébastien Han	13bce287ad	infra: replace osd playbook This playbook can replace failed OSD in containerized and non-containerized env. The current limitation is that it won't allow you to choose between filestore/bluestore and will do collocation as well. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-12 11:53:30 +02:00
Sébastien Han	85e13a864c	purge-iscsi: fix group name Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1500281 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-11 12:52:12 +02:00
Sébastien Han	24b82c2679	purge: fix journal purge Using a condition when osd_scenario == 'non-collocated' was wrong since these partitions can be collocated on a single device also. Removing the check makes the purge of these partitions. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1499871 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-10 09:57:39 +02:00
Guillaume Abrioux	f147b119ed	Merge pull request #2014 from ceph/fixes-2 infra: use the pg check in the right place	2017-10-09 20:14:06 +02:00
Sébastien Han	450108fab9	infra: add independant purge-iscsi-gateways.yml The current inclusion of purge-iscsi-gateways.yml in purge-cluster.yml is not working well and blocking the CI too. So removing it from purge-cluster.yml and re-add the original purge-iscsi-gateways.yml. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-09 17:25:44 +02:00
Sébastien Han	774697ebd8	infra: use the pg check in the right place Use the pg check before doing the pg check, not on the quorum check. Also never quote int when doing comparaison. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-09 17:25:41 +02:00
Sébastien Han	a3e7bcb13f	Merge pull request #2013 from ceph/wip-purge-cluster A couple of purge cluster fixes	2017-10-09 17:18:30 +02:00
Sébastien Han	33a3aa0dda	switch: check pgs only when num_pgs > 0 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-07 03:42:09 +02:00
Sébastien Han	05f26031ea	rolling_update: perform pg check when pgs_num > 0 If num_pgs = 0 the check will never return 0. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-07 03:39:09 +02:00
Sébastien Han	c3c63ae539	switch: rework and fix clean pg wait Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-07 03:39:09 +02:00
Sébastien Han	c693e95cbf	purge-docker: rework device detection we don't need "devices" and other device variable anymore, the playbook detects that for us. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-07 03:39:04 +02:00
Sébastien Han	2fb4981ca9	shrink-osd: admin key not needed for container shrink Also do some clean Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-07 00:20:43 +02:00
Boris Ranto	64e272d818	purge-cluster: Do not use shell for rm The shell wildcard expansion of non-existing paths fails on zsh making the whole script fail. We can use file module with with_fileglob to alleviate the problem instead. Signed-off-by: Boris Ranto <branto@redhat.com>	2017-10-06 22:54:37 +02:00
Boris Ranto	f696cb7637	purge-cluster: Do not fail on systemd commands The systemd can't stop services if the unit files were removed before the cluster was purged. We should just ignore these. Signed-off-by: Boris Ranto <branto@redhat.com>	2017-10-06 22:52:56 +02:00
Sébastien Han	b6b24a5ca9	iscsi: fix wrong group name for iscsi Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1498490 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-05 17:25:32 +02:00
Sébastien Han	f37e014a65	Merge pull request #1974 from ceph/mgr-upgrade-luminous upgrade: a support for mgrs	2017-10-03 19:57:31 +02:00
Sébastien Han	99466e79a1	upgrade: a support for mgrs Also we now play ceph-config to have everything being generated for new daemons bootstrap during upgrade. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1497959 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-03 16:57:31 +02:00
Sébastien Han	3bd341f6c0	osd: container use id instead of dev name Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1494127 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-10-03 14:44:00 +02:00
Sébastien Han	3c2c31a591	Merge pull request #1964 from vatelzh/master purge-cluster: delete block partitions if using bluestore	2017-10-02 12:10:26 +02:00
Sébastien Han	b9050d6229	update: fix var register Even if the task is skipped, ansible registers the var as 'skipped' so this task the task using this variable for its next usage. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-29 14:27:55 +02:00
zhangwentao	86a6db0d58	purge-cluster: delete block partitions if using bluestore	2017-09-29 14:04:17 +08:00
Sébastien Han	a0a5b174ba	rolling_update: clarify mon quorum command Cleaner. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-29 01:19:46 +02:00
Sébastien Han	bd5471b940	update: complete luminous upgrade Once we complete the upgrade to Luminous, we must issue a specific command. For more info read: http://ceph.com/community/new-luminous-upgrade-complete/ Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-28 21:05:00 +02:00
Sébastien Han	68f1f99ee9	update: nicer way to wait for clean pgs More comprhensive and friendly to read. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-28 14:46:26 +02:00
Andrew Schoen	fccc604f4a	purge-cluster: default lvm_volumes if not defined Most osd scenarios do not use lvm_volumes, so default it in purge-cluster.yml if it's not defined. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-09-26 15:14:29 -05:00
Guillaume Abrioux	fcb6454e04	rbd-mirror: fix systemd unit in purge-docker rbd-mirror containers are not stopped in purge-docker-cluster playbook because of the wrong name used. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-09-24 21:18:50 +02:00
Guillaume Abrioux	c80ba7a307	purge: implement mgr purge unti now, mgr nodes are not managed by purge-cluster.yml, therefore it breaks scenario like purge_cluster. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-09-24 21:18:50 +02:00
Guillaume Abrioux	7195b08718	update: update rgw systemd unit name The old name is used in `rolling_update.yml` and `purge-docker-cluster.yml`, it breaks the `test_rgw_service_is_running()` test. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-09-24 14:58:55 +02:00
Sébastien Han	6bac613611	shrink: support for container We can now shrink mon and osds on containerized deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492115 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-20 16:25:07 +02:00
Sébastien Han	7fedc8ebf4	Merge pull request #1891 from ceph/clarify-update rolling_update: clarify update doc	2017-09-15 07:08:49 -06:00
Sébastien Han	fe1d84d395	Merge pull request #1892 from ceph/purge-dmcrypt-col purge: only purge specific directories for mon	2017-09-13 17:57:06 -06:00
Sébastien Han	ba3e3b6cc7	purge: only purge specific directories for mon Handles the case when a mon is collocated with an OSD. Closes: https://github.com/ceph/ceph-ansible/issues/1877 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-13 17:07:04 -06:00
Sébastien Han	82c4848ec4	Merge pull request #1885 from ceph/shrink-osd shrink-osd: fix when multiple osds	2017-09-13 16:12:49 -06:00
Sébastien Han	92f9be963b	rolling_update: clarify update doc Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1490188 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-13 15:46:29 -06:00
Sébastien Han	3031e51778	shrink-osd: fix when multiple osds The loop was being built properly so we were always getting the last item as osd host. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1490355 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-13 15:20:11 -06:00
Sébastien Han	aa364264cd	resync ceph-iscsi-gw with old upstream Taken from https://github.com/pcuzner/ceph-iscsi-ansible/tree/tcmu-fixes Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1454945 and https://bugzilla.redhat.com/show_bug.cgi?id=1484083 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-12 18:06:10 -06:00
Sébastien Han	477f86e305	switch to container: fix ceph nfs The service is nfs-ganesha where ceph-nfs@{{ ansible_hostname }} will be the name of the container. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-08 22:43:50 +02:00
Sébastien Han	fdacac9fa0	switch: make osd collection idempotent This commits allows us to run switch-from-non-containerized-to-containerized-ceph-daemons.yml multiple times. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1489353 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-08 11:31:47 +02:00
Sébastien Han	e46440e19c	switch-from-non-containerized-to-containerized: fix devices If devices is passed through an extra var this register won't work so let's only register the var is devices is not defined. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1489099 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-07 23:18:14 +02:00
Sébastien Han	b9ced956d7	purge: get lockbox mountpoint and unmount it Prior command was avoiding the lockbox mountpoint and the playbook was failing with: rmtree failed: [Errno 30] Read-only file system: '/var/lib/ceph/osd-lockbox/4e9d8052-87c2-4fde-a56c-b8c108a3eefc/key-management-mode' Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-07 16:31:31 +02:00
Guillaume Abrioux	d987d26719	tests: force docker variable for switch-to-containers scenario we need to force the value of `docker` variable which is initially set to `false` since it's a migration from non-containerized to containerized cluster. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-09-06 18:03:52 +02:00
Sébastien Han	b7db600caa	switch-from-non-containerized-to-containerized: mask unit files We must mask the image so we are sure that even if the system reboots then the OSDs won't start. Also remove Ceph udev rules if found on the system prior to deploy containers. If we don't do this we are exposed to conflicts between udev rules and sytemd unit files. Also add the CI will now test the migration from a non-containerized cluster to a containerized cluster. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-05 15:20:31 +02:00
Sébastien Han	579b95fd8a	shrink-mon: wait a little bit for the mon to be out Monitor removal from the monmap is not immediate, so let's wait a little bit and then fail if the monitor is still in the monmap. We try twice in total with 10 sec intervals. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-04 23:08:57 +02:00
Sébastien Han	54d7a81241	infra playbook: move untested scenario to a new dir Move untested/with few confidence playbooks in a untested-by-ci directory. Also removing this directory from the package build. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1461551 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-01 19:58:24 +02:00
Sébastien Han	298a63c437	shrink mon and osd Rework shrinking a monitor and an OSD playbook. Also adding test scenario. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1366807 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-09-01 19:12:00 +02:00
Sébastien Han	e0a264c7e9	osd: allow multi dedicated journals for containers Fix: https://bugzilla.redhat.com/show_bug.cgi?id=1475820 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-30 12:34:06 +02:00
Ben England	617d9ee75d	dont use devices var anymore, works for osd_auto_discover	2017-08-28 17:27:01 -04:00
Sébastien Han	0205f6d645	rolling_update: nicer way to set osd flags Prior to this patch, we were applying the osd flags like this: " General pre tasks Set flags Upgrade OSDs on a host Unset flags <-- this triggers pending scrub to start Set flags Upgrade OSDs on a hosts Unset flags <-- this triggers pending scrub to start . . . General post tasks " Now instead, we apply the flag once before starting the OSD update and unset them once the last OSD is finished. " General pre tasks Set flags and wait for any scrubs to finish Upgrade OSDs on a host Upgrade OSDs on a host . . . Unset flags General post tasks " Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1450754 Signed-off-by: Sébastien Han <seb@redhat.com> Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-08-25 18:21:28 +02:00
Sébastien Han	4a4a20f07d	rolling update: skip pg check if num_pgs = 0 In our test case we don't have any pgs, thus the check fails. The check always returns an empty array, which makes the comparaison failing. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-24 08:50:49 +02:00
Alfredo Deza	e651469a2a	Merge pull request #1797 from ceph/purge-lvm adds purge support for the lvm_osds osd scenario	2017-08-23 14:28:29 -04:00
Sébastien Han	f2499ff5ac	Merge pull request #1788 from ceph/improve-switch switch-from-non-containerized-to-containerized: simplify	2017-08-23 19:47:26 +02:00
Sébastien Han	4f0ecb7f30	switch-from-non-containerized-to-containerized: simplify This commit eases the use of the infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml playbook. We basically run it with a couple of pre-tasks and then we let the playbook run the docker roles. It obviously expect to have proper variables configured in order to work. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-23 18:39:45 +02:00
Andrew Schoen	bed57572cc	purge-cluster: adds support for purging lvm osds This also adds a new testing scenario for purging lvm osds Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-08-23 10:33:35 -05:00
Sébastien Han	1ac0969c28	Merge pull request #1778 from ceph/fix-1770 purge: add ability to purge bluestore osd	2017-08-22 23:56:36 +02:00
Giulio Fidente	2c01de4350	Default cluster to ceph in switch to containers	2017-08-22 13:13:36 +02:00
Giulio Fidente	f0423b1804	Parse ceph_docker_registry in switch to containers Defaults it to docker.io as it was for backward compatibility.	2017-08-22 13:11:27 +02:00
Giulio Fidente	a59b84d5c9	Assume mon_docker_privileged false in switch to containers	2017-08-22 13:01:25 +02:00
Giulio Fidente	0106fa6835	Consume public_network vs ceph_mon_docker_subnet In the switch to containers migration there were broken references to ceph_mon_docker_subnet variable, replaced with public_network. Also fixes references to ceph_mon_docker_extra_env setting for it a default as it could be undefined.	2017-08-21 18:34:24 +02:00
Giulio Fidente	386303d42e	Extend set_uid fact to support RH Ceph images	2017-08-21 18:32:08 +02:00
Sébastien Han	9c824b9818	purge: add ability to purge bluestore osd We now purge block db and/or wal partitions if we find any. Closes: https://github.com/ceph/ceph-ansible/issues/1770 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-21 18:08:18 +02:00
Andrew Schoen	d2f4d3666f	Merge pull request #1725 from ceph/simplify-osd-scenario osd: simply osd scenario declaration	2017-08-03 09:31:57 -05:00
Sébastien Han	671f2cd4bc	Merge pull request #1738 from yanyixing/nvmepart fix for nvme part path	2017-08-03 13:37:10 +02:00
yanyx	d506fad056	fix for nvme part path	2017-08-03 17:37:52 +08:00
Sébastien Han	30991b1c0a	osd: simplify scenarios There is only two main scenarios now: * collocated: everything remains on the same device: - data, db, wal for bluestore - data and journal for filestore * non-collocated: dedicated device for some of the component Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-03 10:20:39 +02:00
Sébastien Han	fdc6aebd62	infrastructure-playbooks: update with ceph-defaults roles Signed-off-by: Sébastien Han <seb@redhat.com>	2017-08-02 17:12:20 +02:00
Guillaume Abrioux	7a333d05ce	Add handlers for containerized deployment Until now, there is no handlers for containerized deployments. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-08-02 17:12:20 +02:00
Guillaume Abrioux	5adbf0fdaa	Move role dependencies in site.yml/site-docker.yml This will give us more flexibility and avoid a lot of useless when skipping all tasks from a non-desired role. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-08-02 17:12:14 +02:00
Guillaume Abrioux	206c7a16d0	rolling_update: refact code Refact rolling_update playbook. Add ceph-client upgrade. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-08-02 11:10:51 +02:00
yanyx	d0a17b11b2	change the partition's ownership	2017-07-27 11:55:30 +08:00
Sébastien Han	fad9d0caec	Merge pull request #1690 from yanyixing/master fix: when osd device is a disk partition	2017-07-26 15:55:29 +02:00
yanyx	2e6233271e	fix: when osd device is a disk partition	2017-07-25 21:39:43 +08:00
Sébastien Han	0c18cf199e	purge: remove leftover unit files Closes https://github.com/ceph/ceph-ansible/issues/1672 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-07-25 13:26:28 +02:00
Guillaume Abrioux	828f88403e	Update: Avoid screen scraping in rolling update since luminous has revamped the `ceph -s` output, we need to avoid screen scraping. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-07-12 15:02:39 +02:00
Guillaume Abrioux	896d62d78b	Refact: remove ceph_mon_docker_interface variable remove `ceph_mon_docker_interface` and use `monitor_interface` instead for both containerized and non-containerized deployment. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-07-04 18:08:59 +02:00
Guillaume Abrioux	73141118d0	Make the new check PGs working with /bin/sh The new test in the checks PGs are no longer working on distributions where /bin/sh isn't linked to /bin/bash. Fix: #1619 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-06-22 17:59:38 +02:00
David Galloway	127b5ad9b4	infra: Create a backup of ceph.conf when taking over existing cluster Signed-off-by: David Galloway <dgallowa@redhat.com>	2017-06-21 09:53:09 -04:00
David Galloway	40ed2d7be6	infra: Fix ceph.conf creation when taking over existing cluster Fixes bug introduced in https://github.com/ceph/ceph-ansible/pull/1330 The "stat ceph.conf" task was basically using the stat module on a string instead of the ceph.conf filename. This caused the "generate ceph configuration file" task to fail. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1463382 Signed-off-by: David Galloway <dgallowa@redhat.com>	2017-06-21 09:52:01 -04:00
Andrew Schoen	e2104acb62	rolling_update: set health_mon_check_delay to 15 The old value of 10 did not give enough time for a containerized mon to pass the health check. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-06-13 08:56:44 -05:00
Guillaume Abrioux	5af9bb432c	rewrite check pgs clean tasks Avoid screen scrapping by rewriting `waiting for clean pgs` tasks like it is done in `304de48`. Use the json output returned by `ceph -s` instead Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-06-13 09:48:56 +02:00
Andrew Schoen	59992c54cc	purge-docker-cluster: include ceph_docker_registry We need to include ceph_docker_registry when removing containers/images because if we don't it will assume docker.io which is not always where the image originated from, causing the playbook to fail. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-06-02 09:49:17 -05:00
Sébastien Han	fdc7866072	Merge pull request #1469 from ceph/refact_code Docker: Refact code	2017-06-02 12:40:25 +02:00
Andrew Schoen	f7677e4393	purge-docker-cluster: pip is only used on Debian We only need to purge packages installed by pip on Debian systems. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-05-31 09:03:44 -05:00
Andrew Schoen	8e322d4825	purge-docker-cluster: default raw_journal_devices to [] If we're purging a containerized cluster that did not use the raw_multi_journal OSD scenario then raw_journal_devices will not be defined which causes the playbook to fail. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1455187 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-05-25 07:30:25 -05:00
Guillaume Abrioux	ddfe019342	Refact code `ceph-docker-common`: At the moment there is a lot of duplicated tasks in each `./roles/ceph-<role>/tasks/docker/main.yml` that could be refactored in `./roles/ceph-docker-common/tasks/main.yml`. `_containerized_deployment` variables: All `_containerized_deployment` have been refactored to a single variable `containerized_deployment` duplicate `cephx` variables in `group_vars/* have been removed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-05-24 15:55:41 +02:00
Sébastien Han	90389864d8	rolling-update: set/unset flags on the right container Problem: we are delegating the set/unset flag to a monitor node but we try to call an osd container Solution: use the right container name. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-05-22 09:38:08 +02:00
Sébastien Han	b93ffe637b	Merge pull request #1476 from WingkaiHo/improve-shrink-osd.yml improve shrink-osd.yml can shrink osd when disk damage	2017-04-27 11:01:27 +02:00
WingkaiHo	0b9f322ca0	improve shrink-osd.yml can shrink osd when disk damage	2017-04-27 10:26:26 +08:00
Andrew Schoen	5a3f95dfc1	purge-cluster: check for any running ceph process after purge Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-04-25 09:30:22 -05:00
Andrew Schoen	26bdd59f5d	purge-cluster: we don't support sysv or upstart anymore Now that ceph-ansible only supports > jewel we don't need to bother with sysv or upstart Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-04-21 15:14:38 -07:00
Andrew Schoen	7ca2bddcce	purge-cluster: do not need to check for running ceph processes Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-04-21 15:12:46 -07:00
Andrew Schoen	aac79df3b3	purge-cluster: no need to remove ceph.target The package uninstalls will stop ceph.target Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-04-21 15:11:03 -07:00
Sébastien Han	dfd8f4d96e	test: add mgr section to the host inventory file Without this, we don't test the mgr role so we need to add it. Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com>	2017-04-15 00:16:10 +02:00
Sébastien Han	17ac1fd464	Merge pull request #1443 from WingkaiHo/osds-journal-migrate Migrate osd(s) journal to ssd	2017-04-13 16:45:57 +02:00
WingkaiHo	9fba41b4ce	Migrate osd(s) journal to ssd	2017-04-13 11:05:58 +08:00
Daniel Lupescu	d5e56c481a	purge-cluster: fix grep match for NVMe and HP Smart Array devices raw_device would return invalid block device names for NVMe and HPSA devices which would cause sgdisk partition deletion to fail $ echo /dev/nvme1n1p3 \| egrep -o '/dev/([hsv]d[a-z]{1,2}\|cciss/c[0-9]d[0-9]p\|nvme[0-9]n[0-9]p){1,2}' /dev/nvme1n1p $ echo /dev/cciss/c0d0p2 \| egrep -o '/dev/([hsv]d[a-z]{1,2}\|cciss/c[0-9]d[0-9]p\|nvme[0-9]n[0-9]p){1,2}' /dev/cciss/c0d0p	2017-04-11 16:13:28 +03:00
Sébastien Han	c37aaa41f4	playbook: homogenize the way list osd ids Problem: too many different commands to do the same thing. The 'cut' command on infrastructure-playbooks/purge-cluster.yml was also wrong. This sed command from osixia in ceph-docker https://github.com/ceph/ceph-docker/pull/580/ addresses all the scenarios. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-03-30 11:51:38 +02:00
Sébastien Han	35a90ae283	Merge pull request #1386 from WingkaiHo/master Create recover-osds-after-ssd-journal-failure.yml	2017-03-28 09:50:39 +02:00
Konstantin Shalygin	1662976fc0	Resolve issues when groups names not in default value.	2017-03-27 21:44:30 +07:00
WingkaiHo	ac1498b0d7	Merge https://github.com/ceph/ceph-ansible	2017-03-27 10:50:38 +08:00
WingkaiHo	ebb56ccebf	command module instead shell	2017-03-23 17:38:41 +08:00
WingkaiHo	2d44c1cee6	remove service enable	2017-03-23 15:28:14 +08:00
WingkaiHo	14c189fee5	break it into lines since you already use the string block synta and fix disable it here and enable again in later task	2017-03-23 14:49:10 +08:00
WingkaiHo	62c37042fe	remove this detection and simply rely on {{ cluster }}	2017-03-23 09:22:06 +08:00
WingkaiHo	3d10c5981e	fix some pelling mistakes and wirting format, use full device path for device name	2017-03-22 17:48:34 +08:00
WingkaiHo	1e670bdeb0	This assumes ceph as a cluster name. We need detect the name of the cluster	2017-03-22 10:09:06 +08:00
WingkaiHo	83a1ac0c67	This assumes ceph as a cluster name. We need detect the name of the cluster	2017-03-22 10:06:11 +08:00
WingkaiHo	19f9e200d7	Add auto detect the ceph cluster name	2017-03-22 10:00:44 +08:00
WingkaiHo	8602166f6e	Ansible will include host_vars/ansible_hostname.yml itself, no need this task IMO.	2017-03-21 13:50:27 +08:00
WingkaiHo	55725fd01d	fix some syntax error	2017-03-21 11:19:25 +08:00
WingKai Ho	7445113dc4	Create recover-osds-after-ssd-journal-failure.yml This playbook use to recover Ceph OSDs after ssd journal failure.	2017-03-21 11:08:25 +08:00
Anthony D'Atri	6c4911276e	Enhance clean PG check to catch active+clean+scrubbing and active+clean+scrubbing+deep Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>	2017-03-19 00:23:26 -07:00
Daniel Marks	77edd3d40a	Fixing tabs that are breaking the syntax check With the merge of PR #1336 the syntax check fails. This commit replaces the tabs with proper indentation.	2017-03-15 14:15:15 +01:00
Sébastien Han	38ab6de602	Merge pull request #1336 from WingkaiHo/master Load a variable file for devices partition	2017-03-15 11:55:26 +01:00
Sébastien Han	8320c14191	Merge pull request #1317 from ibotty/harmonize-docker-names harmonize docker names	2017-03-14 18:20:20 +01:00
Andrew Schoen	e81d690aa0	switch-to-containers: do not include group vars or role defaults Doing so will override any values set for these in the group_vars directory relative to the users inventory. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-03-08 08:57:09 -06:00
Andrew Schoen	cf702b05cf	purge-docker-cluster: do not include role defaults or group vars Doing so at playbook level overrides whatever values might be set for these in the user's group_vars directory that's relative to their inventory. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-03-08 08:57:09 -06:00
Andrew Schoen	aef54d89d9	switch-to-containers: do not set group name vars at playbook level Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-03-08 08:57:09 -06:00
Andrew Schoen	7289acb6b3	purge-docker-cluster: do not set group names vars at playbook level Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-03-08 08:57:08 -06:00
Andrew Schoen	46f26bec13	rolling-update: do not set group name vars at playbook level Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-03-08 08:57:08 -06:00
Andrew Schoen	4fe6607004	purge-cluster: do not set group name vars at playbook level This has the behavior of overriding custom values set in group_vars. I've added defaults to the rest of the group names so that if they are not overridden in group_vars then defaults will be used. See: https://bugzilla.redhat.com/show_bug.cgi?id=1354700 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-03-08 08:57:08 -06:00
WingKai Ho	0d134b4ad9	Update make-osd-partitions.yml change	2017-03-08 17:46:37 +08:00
WingKai Ho	e2d06068f4	Update make-osd-partitions.yml When ansible do not load the file host_vars/{{ ansible_hostname }}.yml and host_vars/default.yml it will show syntactic, so keyword "skip" to fix it. Exit the playbook if the user not define devices in both host_vars/{{ ansible_hostname }}.yml and host_vars/default.yml	2017-03-06 15:43:09 +08:00
WingKai Ho	2861a483d7	Update make-osd-partitions.yml When ansible do not load the file host_vars/{{ ansible_hostname }}.yml and host_vars/default.yml it will show syntactic err, so add keyword "skip" to fix it. Exit the playbook if the user not define devices in both host_vars/{{ ansible_hostname }}.yml and host_vars/default.yml host_vars/default.yml	2017-03-06 10:33:22 +08:00
WingKai Ho	4cc489f2ba	Update make-osd-partitions.yml fix syntactic error	2017-03-03 17:26:53 +08:00
WingKai Ho	102befa927	Update make-osd-partitions.yml Remove capital `L`	2017-03-02 14:06:41 +08:00
WingKai Ho	c3f170e758	Update make-osd-partitions.yml there is an extra space between 'custom' and 'layout'	2017-03-02 12:24:44 +08:00
WingKai Ho	2967772f6a	Load a variable file for devices parrition load device partition file in directory host_vars 1) if the user define host_vars/hostname.yml load the devices partition on this file. 2) otherwise load host_vars/default.yml for default	2017-03-01 17:27:57 +08:00
yangyimincn	8b36cbac64	Update rolling_update.yml The task waiting for the monitor to join the quorum... , the result for ceph -s \| grep monmap only contain monmap, not included quorum: # ceph -s --cluster ceph \| grep monmap monmap e1: 3 mons at {sh-office-ceph-1=10.12.10.34:6789/0,sh-office-ceph-2=10.12.10.35:6789/0,sh-office-ceph-3=10.12.10.36:6789/0} If want to get monitor, should use this: # ceph -s --cluster ceph \| grep election election epoch 80, quorum 0,1 sh-office-ceph-1,sh-office-ceph-2 ceph verison: 10.2.5	2017-02-28 16:56:02 +08:00
Sébastien Han	4639d89231	infra: fix cluster name detection The previous command was returning /etc/ceph/ceph.conf, we only need 'ceph' to be returned. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-02-23 15:40:34 -05:00
Tobias Florek	931027e6f7	harmonize docker names Created containers now are named more or less in the form of <ansible role>-<ansible_hostname>	2017-02-23 09:15:05 +01:00
Sébastien Han	3b633d5ddc	purge-docker: re-implement zap devices We now run the container and waits until it dies. Prior to this we were stopping it before completion so not all the devices where zapped. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-02-21 15:56:09 -05:00
Sébastien Han	a002508a91	purge-docker: also purge journal devices Signed-off-by: Sébastien Han <seb@redhat.com>	2017-02-21 15:54:36 -05:00
Andrew Schoen	5622c94e8b	rolling-update: do not use upstart to stop mons when using systemd Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-02-21 12:31:26 -06:00
Shengjing Zhu	32923fd217	fix grep match pattern for osd ids Some playbooks use [0-9]*, others use \d+$ The latter is more correct since cluster name may contain numbers. Signed-off-by: Shengjing Zhu <zsj950618@gmail.com>	2017-02-20 16:35:56 +08:00
Andrew Schoen	22f52a9dc6	purge-cluster: also purge dmcrypt dedicated journals See: https://bugzilla.redhat.com/show_bug.cgi?id=1414647 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-02-15 10:27:17 -06:00
Andrew Schoen	3964929a56	rgw-standalone: also fetch keys from mons This is to allow for ceph-installer usage of this playbook and to ensure that you have the correct keys locally when bootstrapping. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-02-14 16:12:59 -06:00
Andrew Schoen	c5f561a4e9	purge-cluster: remove calamari-server package See: https://bugzilla.redhat.com/show_bug.cgi?id=1422134 Signed-off-by: Andrew Schoen <aschoen@redhat.com> Resolves rhbz#1422134	2017-02-14 09:24:02 -06:00
Sébastien Han	c2f1dca823	docker: use a better method to pull images We changed the way we declare image. Prior to this patch we must have a "user/image:tag" format, which is incompatible with non docker-hub registry where you usually don't have a "user". On the docker hub a "user" is also identified as a namespace, so for Ceph the user was "ceph". Variables have been simplified with only: * ceph_docker_image * ceph_docker_image_tag 1. For docker hub images: ceph_docker_name: "ceph/daemon" will give you the 'daemon' image of the 'ceph' user. 2. For non docker hub images: ceph_docker_name: "daemon" will simply give you the "daemon" image. Infrastructure playbooks have been modified as well. The file group_vars/all.docker.yml.sample has been removed as well. It is hard to maintain since we have to generate it manually. If you want to configure specific variables for a specific daemon simply edit group_vars/$DAEMON.yml Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1420207 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-02-09 17:57:18 +01:00
Andrew Schoen	5ddfc4f85c	Merge pull request #1284 from ceph/BZ-1418980 purge-cluster: do not use ceph-detect-init	2017-02-08 08:46:03 -06:00
Andrew Schoen	4ff5908758	Merge pull request #1289 from ceph/fix-1286 rolling-update: detect init system properly	2017-02-08 06:31:30 -06:00
Andrew Schoen	865b4500dc	purge-cluster: set a default value for fetch_directory if not defined Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-02-08 06:25:43 -06:00
Andrew Schoen	adf6aee643	purge-cluster: remove all include tasks Including variables from role defaults or files in a group_vars directory relative to the playbook is a bad practice. We don't want to do this because including these defaults at the task level overrides values that would be set in a group_vars directory relative to the inventory file, which is the correct usage if you wish to override those default values. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-02-08 06:25:43 -06:00
Andrew Schoen	0476b24af1	purge-cluster: do not use ceph-detect-init We can not always ensure that ceph-detect-init will be present on the system. See: https://bugzilla.redhat.com/show_bug.cgi?id=1418980 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-02-08 06:24:44 -06:00
Sébastien Han	8f94bfb498	rolling-update: detect init system properly Simply use the ansible_service_mgr fact. Closes: #1286 Signed-off-by: Sébastien Han <seb@redhat.com>	2017-02-08 08:52:05 +01:00
Sébastien Han	c34d0a9d28	purge-docker: force image deletion even if non-runnin containers are using this image as a reference. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-02-07 22:14:21 +01:00
Sébastien Han	72cd9199ac	purge: ability to purge client role Signed-off-by: Sébastien Han <seb@redhat.com>	2017-02-07 22:14:18 +01:00
Guillaume Abrioux	76ddcbc271	Remove support of releases prior to Jewel. According to #1216, we need to simply the code by removing the support of anything before Jewel. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2017-01-31 11:00:54 +01:00
Sébastien Han	d5dd658cfa	purge: do not stop ceph.target on each daemon Doing this cause some all the daemons to go down at the same time. In a scenario where we colocate a monitor and an osd, this osds will take some time to go down which will make the 'umount' task fail. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-01-30 14:31:56 +01:00
Sébastien Han	cb57a359ba	purge: do not fail on purge ceph files On systems running docker there is an issue with lxfs that results in the find command returning 1 but actually did the job. e.g: on a system with docker runnning find /var will give us the following error: find: '/var/lib/lxcfs/cgroup/devices/lxc/x1/system.slice/systemd-update-utmp.service/devices.deny': Permission denied find: '/var/lib/lxcfs/cgroup/devices/lxc/x1/system.slice/dev-random.mount/devices.allow': Permission denied ... ... However ceph files got deleted so we ignore the error. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-01-30 14:31:56 +01:00
Sébastien Han	e371bd591c	purge: fix ubuntu purge when not using systemd We now rely on the cli tool ceph-detect-init which will tell us the init system in used on the distribution. We do this instead of the previous lookup for systemd unit files to call the right task depending on the init system. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-01-30 14:31:56 +01:00
Sébastien Han	0e2e270ab2	purge: allow purge to run multiple times with_items is evaluated before the when so in a second run where the variable is empty if will fail with "'dict object' has no attribute 'stdout_lines'". To fix this we had a default array so with_items does not fail and the task is skipped with the when. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-01-30 14:31:56 +01:00
Sébastien Han	0d2e580768	Merge pull request #1250 from ceph/new-tests CI testing updates	2017-01-27 14:30:45 +01:00
Andrew Schoen	d3cb8dba4e	purge-cluster: fix failure when raw_multi_journal is not defined Because the purge-cluster.yml playbook does not have access to the roles default vars then we can be sure that raw_multi_journal is defined. For example, if this was purging a dmcrypt journal then raw_multi_journal might not be defined at all in group_vars/all.yml or group_vars/osds.yml. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-01-27 05:23:17 -06:00
Ivan Font	0298354137	Update to use consistent docker extra env vars This playbook was still referencing the old version of the ceph__docker_extra_env but only for Ceph MONs and Ceph NFS. This playbook was not kept up-to-date when updating the ceph__docker_extra_env variables to add the '-e' option to docker. That's because the addition of '-e' breaks this playbook as it requires a comma separated list of variables for the 'env:' docker module parameter. Therefore this change just makes the playbook consistently broken by referencing the same variable throughout.	2017-01-26 15:57:34 -08:00
Andrew Schoen	b2a6f095f1	purge-cluster: fix syntax when deleting dmcrypt devices Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-01-26 11:28:30 -06:00
Sébastien Han	73ca1a7a00	purge: remove dm-crypt devices When running encrypted OSDs, an encrypted device mapper is used (because created by the crypsetup tool). So before attempting to remove all the partitions on a device we must delete all the encrypted device mappers, then we can delete all the partitions. Signed-off-by: Sébastien Han <seb@redhat.com> Please enter the commit message for your changes. Lines starting	2017-01-25 22:32:46 +01:00
Sébastien Han	adeb3decf3	purge: remove zap_block_devs variable The name of this variable was a bit confusing since its activation will zap all the block devices no matter which osd scenario we are using. Removing this variable and applying a condition on the OSD scenario is now feasible and easier since we import group_vars variable files for OSDs. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-01-18 10:55:01 +01:00
Sébastien Han	b7fcbe5ca2	purge: cosmetic cleanup Just applying our writing syntax convention in the playbook. Signed-off-by: Sébastien Han <seb@redhat.com>	2017-01-18 10:53:21 +01:00
Andrew Schoen	dd8389cdf7	purge-cluster: do not include ceph-osd and ceph-common defaults for osds When purging OSDs we do not need to include these defaults as nothing in the following tasks uses them. Also, it has the side effect of overwriting any variables defined in group_vars files that are relative to the inventory you are using with the default values. That behavior was causing the CI tests to fail. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-01-10 16:57:58 -06:00
Andrew Schoen	321cea8ba9	purge-cluster: get journal partitions after zapping osd disks In my testing zapping the osd disks deleted the journal partitions, making the 'zap ceph journal partitions' task fail because the partitions it found previously do not exist anymore. This moves the task that finds the journal partitions after 'zap osd disks' to catch any partitions ceph-disk might have missed. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-01-03 15:57:17 -06:00
Andrew Schoen	c9e5914377	purge-cluster: use ignore_errors: true when including group_vars files Using failed_when will still throw an exception and stop the playbook if the file you're trying to include doesn't exist. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2017-01-03 15:57:17 -06:00
Sébastien Han	cb1c06901e	Merge pull request #1171 from cbodley/wip-libcephfs2 bump package version to libcephfs2	2017-01-03 10:48:56 +01:00
Shengjing Zhu	2dc2e1d48c	infrastructure playbook: add make osd partition Signed-off-by: Shengjing Zhu <zsj950618@gmail.com>	2016-12-15 22:03:38 +08:00
Casey Bodley	acaf01ac17	purge-cluster: add new version of libcephfs2 the libcephfs version was bumped to 2, so we need to check for that as well when we're removing all ceph packages Signed-off-by: Casey Bodley <cbodley@redhat.com>	2016-12-09 16:54:06 -05:00
Sébastien Han	9dac195200	take-over: use more precise ceph.conf detection Prior to this patch we were just looking for any *.conf file which sometimes could results in multiple matches. The new command looks for a .conf file that must contain [global] and 'fsid' patterns. This will definitely get us the ceph.conf file. We can not directly use ceph.conf because of a different cluster name. Signed-off-by: Sébastien Han <seb@redhat.com>	2016-12-06 16:02:48 +01:00
Sébastien Han	4444d7d78e	git: update gitignore * ignore yml files in general * refactor based on commit f8e043b6ea5ac4e886532d4f2f675c507b44b955 that changed directory layouts Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit ec5c6f5da566611c4e0b88f925cbd26dc90368d6)	2016-12-06 10:18:19 +01:00
chenyanshan	7eab2529ed	this patch fix the regex pattern in infrastructure-playbooks/shrink-osd.yml when the osd's pid num is bigger than 9999 Signed-off-by: chenyanshan <yanshanchen@139.com>	2016-12-05 13:40:38 +08:00
Guillaume Abrioux	b2b7222b3a	[shrink-mon]: force playbook to fail if there is only one mon The playbook will fail if only 1 mon is in the cluster and advise to use the `purge-cluster` playbook instead. Fix #1083	2016-11-25 11:20:11 +01:00
Guillaume Abrioux	a680707f6f	All `include_vars` need to have `.yml`, `.yaml` or `*.json` extension. As introduced in the following PR: - https://github.com/ansible/ansible/pull/17207 we need to refactor our code.	2016-11-24 14:03:49 +01:00
Sébastien Han	829e2b6598	Merge pull request #1077 from font/rolling_update Support containerized rolling update	2016-11-22 16:56:46 +01:00
Sébastien Han	38e846e542	rolling_update: clarify "serial" usage Prior to this commit the serial variable was poorly documented. Now we are making clear that this value should be left untouched as the rolling update mechanism should happen serially. Solves: bz-1396742 Signed-off-by: Sébastien Han <seb@redhat.com>	2016-11-21 14:42:46 +01:00
Ken Dreyer	adfdf6871e	remove apache support for RGW libfcgi is dead upstream (http://tracker.ceph.com/issues/16784) The RGW developers intend to remove libfcgi support entirely before the Luminous release. Since libfcgi gets little-to-no developer attention or testing, remove it entirely from ceph-ansible.	2016-11-18 13:13:12 -07:00
Ivan Font	255e816e28	Rolling update changes for containerized deployments Separate out systemd restart tasks for containerized and non-containerized deployments Signed-off-by: Ivan Font <ifont@redhat.com>	2016-11-17 11:25:25 -08:00
Ivan Font	e72f08080d	Warn user when upgrading cluster with only one mon Signed-off-by: Ivan Font <ifont@redhat.com>	2016-11-17 11:25:25 -08:00
Ivan Font	3ff17f1c8f	Support containerized rolling update - Update rolling update playbook to support containerized deployments for mons, osds, mdss, and rgws - Skip checking if existing cluster is running when performing a rolling update - Fixed bug where we were failing to start the mds container because it was missing the admin keyring. The admin keyring was missing because it was not being pushed from the mon host to the ansible host due to the keyring not being available before running the copy_configs.yml task include file. Now we forcefully wait for the admin keyring to be generated before continuing with the copy_configs.yml task include file - Skip pre_requisite.yml when running on atomic host. This technically no longer requires specifying to skip tasks containing the with_pkg tag - Add missing variables to all.docker.sample - Misc. cleanup Signed-off-by: Ivan Font <ifont@redhat.com>	2016-11-17 11:25:25 -08:00
Alfredo Deza	60ce2311b8	rolling_update: bump retries for osd_check/retries to 20 minutes Signed-off-by: Alfredo Deza <adeza@redhat.com> Resolves: rhbz#1395073	2016-11-17 10:43:58 -05:00
Sébastien Han	81a72cb85d	Merge pull request #1068 from ceph/v2.2 moving to ansible v2.2 compatibility	2016-11-16 16:33:40 +01:00
Andrew Schoen	5f44b118b8	rolling update: stop RGWs before upgrade and start afterwards Signed-off-by: Andrew Schoen <aschoen@redhat.com> Resolves: rhbz#1394929	2016-11-14 14:47:12 -06:00
Andrew Schoen	ded9d9dfd3	rolling update: stop MDSs before upgrading and start afterwards Signed-off-by: Andrew Schoen <aschoen@redhat.com> Resolves: rhbz#1394929	2016-11-14 14:47:12 -06:00
Andrew Schoen	5429c5f8c5	rolling update: stop MONs before upgrading and start afterwards Signed-off-by: Andrew Schoen <aschoen@redhat.com> Resolves: rhbz#1394929	2016-11-14 14:47:12 -06:00
Andrew Schoen	66f09bdac4	rolling update: stop OSDs before upgrading This avoids a bug where OSDs are sometimes restarted twice on upgrades which leaves the OSD process running but not marked up. See: https://bugzilla.redhat.com/show_bug.cgi?id=1394928 https://bugzilla.redhat.com/show_bug.cgi?id=1391675 https://bugzilla.redhat.com/show_bug.cgi?id=1394929 Signed-off-by: Andrew Schoen <aschoen@redhat.com> Resolves: rhbz#1394929	2016-11-14 14:46:58 -06:00
Sébastien Han	991341f525	rolling_update: add variable to upgrade ceph My stupid self removed this crucial variable here: `217ce3ca` thinking it was another hard coded variable import where this is actually the trigger for the upgrade. Closes: #1071 Signed-off-by: Sébastien Han <seb@redhat.com>	2016-11-04 17:31:02 +01:00
Sébastien Han	a2fcd222d2	moving to ansible v2.2 compatibility Signed-off-by: Sébastien Han <seb@redhat.com> Co-Authored-By: Julien Francoz julien@francoz.net	2016-11-04 10:09:38 +01:00
Andrew Schoen	8262ce5e40	rolling update: fix restarts of radosgw Signed-off-by: Andrew Schoen <aschoen@redhat.com> Resolves: rhbz#1391675 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2016-11-03 14:36:42 -05:00
Eduard Egorov	ab5c9f2a67	Adjust 'devices' list check for being not defined in purge-cluster playbook (see PR #1024 ) Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>	2016-11-03 06:36:42 +00:00
Leseb	899c8b309f	Merge pull request #1024 from eduardegorov/egorove_make_devices_optional Make {{ devices }} list optional	2016-11-02 15:12:02 +01:00
Eduard Egorov	e5473ee565	Fix typos Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>	2016-11-01 12:29:21 +00:00
Eduard Egorov	3652bb708b	Fix rbd-mirrors group name Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>	2016-11-01 12:21:47 +00:00
Eduard Egorov	645b5efebf	Fix hard-coded host group names in include tasks for group variables' file paths. Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>	2016-11-01 12:21:40 +00:00
Eduard Egorov	f33c1cd2d2	Make {{ devices }} list optional: define it as empty list by default, remove unneccessary 'default([])' checks Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>	2016-11-01 09:57:25 +00:00
Andrew Schoen	0897c965ff	rolling_update: define mon_group_name when upgrading the mons see: https://bugzilla.redhat.com/show_bug.cgi?id=1389456 Signed-off-by: Andrew Schoen <aschoen@redhat.com> Resolves: rhbz#1389456	2016-10-27 14:17:56 -05:00
Sébastien Han	b0989c700f	rolling_update: fix wrong indent Fixing: https://bugzilla.redhat.com/show_bug.cgi?id=1388295 Also add some notes in the README on how to run infrastructure playbooks. Signed-off-by: Sébastien Han <seb@redhat.com>	2016-10-26 12:51:08 -05:00
Ivan Font	534b188396	Update for infrastructure-playbooks execution - Updates to allow running infrastructure-playbooks both from within its directory or root directory of ceph-ansible. Signed-off-by: Ivan Font <ifont@redhat.com>	2016-10-26 09:43:37 -07:00
Andrew Schoen	bebf412c92	infrastructure-playbooks: fix syntax errors in all playbooks Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2016-10-25 16:56:58 -05:00
Sébastien Han	f49bf2832c	rolling_update: improve variables import we now have pointer to default role so we don't miss any of the variables defined. Signed-off-by: Sébastien Han <seb@redhat.com>	2016-10-06 14:08:04 +02:00
Ivan Font	5a5e185e11	Reworked purge cluster playbook - Separated out one large playbook into multiple playbooks to run host-type by host-type i.e. mdss, rgws, rbdmirrors, nfss, osds, mons. - Combined common tasks into one shared task for all hosts where applicable - Fixed various bugs Signed-off-by: Ivan Font <ivan.font@redhat.com>	2016-10-05 21:32:38 -07:00
Leseb	598d78cef3	Merge pull request #961 from ceph/fix-purge purge: only purge ceph partitions	2016-10-04 18:03:21 +02:00
Sébastien Han	e81ec9c138	purge: only purge ceph partitions Prior to this change we were purging all the partitions on the device when using the raw_journal_devices scenario. This was breaking deployments where other partitions are used for other purposes (ie: OS system). Signed-off-by: Sébastien Han <seb@redhat.com>	2016-10-04 17:58:53 +02:00
Leseb	4bf7e8355a	Merge pull request #953 from jsaintrocc/hammerfix Fixes for Hammer install and added numerical release checks	2016-10-04 11:34:26 +02:00
Sébastien Han	ac2cb9ac2c	upgrade: add custom timeout options This commit introduces the ability to configure delays and retries for cluster health checks, for both monitors and OSDs. Signed-off-by: Sébastien Han <seb@redhat.com>	2016-10-03 11:27:02 +02:00
James Saint-Rossy	982c44d41c	Rebased with upstream master	2016-09-25 23:22:16 -04:00
Sébastien Han	b8158a6554	ability to switch from bare metal to containerized daemons Signed-off-by: Sébastien Han <seb@redhat.com>	2016-09-21 18:07:50 +02:00
Leseb	517196ed66	Merge pull request #977 from ceph/switch-bare-metal-to-container ability to switch from bare metal to containerized daemons	2016-09-21 15:04:06 +02:00
Sébastien Han	5bfa1b0d24	ability to switch from bare metal to containerized daemons Signed-off-by: Sébastien Han <seb@redhat.com>	2016-09-21 14:46:57 +02:00
Sébastien Han	21356c653f	rolling updates: remove mon compact command Users have reported this task to hang. Since this command is not required to perform the upgrade, we remove it. Signed-off-by: Sébastien Han <seb@redhat.com>	2016-09-13 10:09:07 +02:00
James Saint-Rossy	666637f715	Replaced is_before is_after is_ booleans with numerical version dictionary	2016-09-09 17:34:26 -04:00
Rachana Patel	ad5805f03e	rolling_update.yml will not work if cluster name is not 'ceph'. Adding --cluster will solve this problem Fixes issue #969 Signed-off-by: Rachana Patel <rachana83.patel@gmail.com>	2016-09-07 15:38:58 -04:00
James Saint-Rossy	f52be23770	Prevent local_action from requiring root	2016-09-02 19:31:59 -04:00
Ivan Font	05c5d1ea91	Update relative path to include vars Signed-off-by: Ivan Font <ivan.font@redhat.com>	2016-08-24 00:27:54 -07:00
Ivan Font	7c9cb0993e	Include group_vars files in purge cluster playbook - Add all relevant group_vars files in containerized purge cluster playbook and ignore errors if file may not exist. - Also fixing indentation issues. Signed-off-by: Ivan Font <ivan.font@redhat.com>	2016-08-19 09:11:56 -07:00
Ivan Font	c1905bfa23	Update for containerized purge cluster playbook - Added support for purging containerized rbd-mirror node Signed-off-by: Ivan Font <ivan.font@redhat.com>	2016-08-19 09:11:56 -07:00
James Saint-Rossy	449d456086	Rebased and moved multisite/rgw playbooks to infrastructure-playbooks	2016-08-17 13:28:01 -04:00
Sébastien Han	fde819d1a8	create a directory for infrastructure playbooks Since we have a couple of infrastructure related playbooks (additionnally to the roles we are using to deploy Ceph), it makes sense to have them located in a separate directory. Signed-off-by: Sébastien Han <seb@redhat.com>	2016-08-17 11:53:34 +02:00

... 7 8 9 10 11 ...

776 Commits (9675e146ee4f4b6162adf6d5d5b995dafcbeafde)