ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Dimitri Savineau	f36306ebf4	add-{mon,osd}: add ceph-container-engine role The ceph-container-engine role is missing from both playbooks so the container engine (docker, podman) isn't install resulting in a failure on the added nodes. fatal: [xxxxx]: FAILED! => changed=false cmd: docker --version msg: '[Errno 2] No such file or directory' rc: 2 Closes: #4634 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `bfb1d6be12`)	2019-10-24 20:01:04 -04:00
Guillaume Abrioux	4a5d3c3c2d	update: add missing quotes Add missing quote in order to keep consistency. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `8d72ff8e5e`)	2019-10-21 13:26:37 -04:00
Dimitri Savineau	703c834dab	Move the dashboard playbook in the main directory The [group\|host]_vars directories are ignored for the dashboard playbook when the inventory file directory doesn't contain those directories. Closes: #4601 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1761612 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `8426856262`)	2019-10-18 19:32:42 -04:00
Guillaume Abrioux	9bc7f8a7d7	tests: add multimds coverage This commit makes the all_daemons scenario deploying 3 mds in order to cover the multimds case. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `25b98b2ce3`)	2019-10-18 22:09:04 +02:00
Guillaume Abrioux	bc3138eff4	upgrade: fix standby_mdss group creation This commit fixes the standby_mdss group creation by using `{{ item }}`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c4fc8cc878`)	2019-10-18 22:09:04 +02:00
Guillaume Abrioux	c962d87def	update: follow new recommandation to upgrade mds cluster Refact the mds cluster upgrade code in order to follow the documented recommandation. See: https://github.com/ceph/ceph/blob/master/doc/cephfs/upgrading.rst Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1569689 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `71cebf80a6`)	2019-10-16 12:59:08 -04:00
Dimitri Savineau	0b49538621	Execute common roles once on all nodes The common roles don't need to be executed again on each group plays (like mons, osds, etc..). We only need to execute them during the first play. That wat, we will apply the changes on all nodes in parallel instead of doing it once per group. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `68a3dac7cd`)	2019-10-16 10:41:32 -04:00
Dimitri Savineau	fd759f97fa	dashboard: disable facts gathering This is already done in the main playbooks but absent in the dashboard playbook. The facts are already gathered during the first play of the main playbooks so we don't need to doing twice. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `5ae7304ace`)	2019-10-14 09:45:11 +02:00
Guillaume Abrioux	ebfe7f31ed	dashboard: if no host is available, let's just skip these plays. If there is no host available, let's just skip these plays. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1759917 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `0b245bd007`)	2019-10-09 14:47:36 -04:00
Dimitri Savineau	5f91be8740	switch_to_containers: umount osd lockbox partition When switching from a baremetal deployment to a containerized deployment we only umount the OSD data partition. If the OSD is encrypted (dmcrypt: true) then there's an additional partition (part number 5) used for the lockbox and mount in the /var/lib/ceph/osd-lockbox/ directory. Because this partition isn't umount then the containerized OSD aren't able to start. The partition is still mount by the system and can't be remount from the container. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `19edf707a5`)	2019-10-08 00:57:05 +00:00
Guillaume Abrioux	b325cc386e	switch_to_containers: do not re-set `ceph_uid` This commit refacts the way we set `ceph_uid` fact in `ceph-facts` and removes all `set_fact` tasks for `ceph_uid` in switch-to-containers playbook to avoid duplicated code. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `fa9b42e98e`)	2019-10-07 10:18:17 -04:00
Guillaume Abrioux	468aa5d63b	switch_to_containers: optimize ownership change As per https://github.com/ceph/ceph-ansible/pull/4323#issuecomment-538420164 using `find` command should be faster. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1757400 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-Authored-by: Giulio Fidente <gfidente@redhat.com> (cherry picked from commit `c5d0c90bb7`)	2019-10-07 10:18:17 -04:00
Guillaume Abrioux	37fd0b179b	update: import ceph-defaults role in first play Typical error: ``` fatal: [mon0]: FAILED! => msg: \|- The conditional check 'not delegate_facts_host \| bool or inventory_hostname in groups.get(client_group_name, [])' failed. The error was: error while evaluating conditional (not delegate_facts_host \| bool or inventory_hostname in groups.get(client_group_name, [])): 'client_group_name' is undefined ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `8138d4193c`)	2019-10-07 11:21:23 +02:00
Guillaume Abrioux	9a4fcfabe1	main: exclude client nodes from facts gathering when delegate_facts_host This commit excludes client nodes from facts gathering, they are not needed and can speed up this task. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `865d2eac9b`)	2019-10-07 11:21:23 +02:00
Dimitri Savineau	ec1c57f690	dashboard: remove useless block section The block section were used with the dashboard_enabled condition when the code was included in the main playbooks. Because this condition isn't present in the dashboard playbook anymore we can remove the block section. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `cf47594b47`)	2019-10-04 13:28:37 +02:00
Guillaume Abrioux	9a79ed1bf0	rgw: refact tasks directory layout This commit moves containerized deployment related files to `./tasks/` directory. This is needed to make `docker-to-podman.yml` working since we use `tasks_from:` option. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e08194dd67`)	2019-10-01 18:50:51 +02:00
Guillaume Abrioux	7f902994b3	rbdmirror: refact tasks directory layout This commit moves containerized deployment related files to `./tasks/` directory. This is needed to make `docker-to-podman.yml` working since we use `tasks_from:` option. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c69816c6b7`)	2019-10-01 18:50:51 +02:00
Guillaume Abrioux	d7a06c67db	iscsigw: refact tasks directory layout This commit moves containerized deployment related files to `./tasks/ directory. This is needed to make `docker-to-podman.yml` working since we use `tasks_from:` option. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4636f3f7e2`)	2019-10-01 18:50:51 +02:00
Guillaume Abrioux	b564c37696	upgrade: add an infra playbook to migrate systemd units to podman this commit adds a new playbook to force systemd units for containers to use podman instead of docker. This is needed in the rhel8 upgrade context so after the base OS is upgraded containers can be started using podman. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `f2017dcda2`)	2019-10-01 18:50:51 +02:00
Guillaume Abrioux	4afe1b748c	update: reset mon_host after mons upgrade after all mon are upgraded, let's reset mon_host which is used in the rest of the playbook for setting `container_exec_cmd` so we are sure to use the right value. Typical error: ``` failed: [mds0 -> mon0] (item={u'path': u'/var/lib/ceph/bootstrap-mds/ceph.keyring', u'name': u'client.bootstrap-mds', u'copy_key': True}) => changed=true ansible_loop_var: item cmd: - docker - exec - ceph-mon-mon2 - ceph - --cluster - ceph - auth - get - client.bootstrap-mds delta: '0:00:00.016294' end: '2019-09-27 13:54:58.828835' item: copy_key: true name: client.bootstrap-mds path: /var/lib/ceph/bootstrap-mds/ceph.keyring msg: non-zero return code rc: 1 start: '2019-09-27 13:54:58.812541' stderr: 'Error response from daemon: No such container: ceph-mon-mon2' stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d84160a170`)	2019-09-28 09:01:16 +02:00
Harald Jensås	5fea830414	Replace ipaddr() with ips_in_ranges() This change implements a filter_plugin that is used in the ceph-facts, ceph-validate roles and infrastucture-playbooks. The new filter plugin will return a list of all IP address that reside in any one of the given IP ranges. The new filter replaces the use of the ipaddr filter. ceph.conf already support a comma separated list of CIDRs for the public_network and cluster_network options. Changes: [1] and [2] introduced a regression in ceph-ansible where public_network can no longer be a comma separated list of cidrs. With this change a comma separated list of subnet CIDRs can also be used for monitor_address_block and radosgw_address_block. [1] commit: `d67230b2a2` [2] commit: `20e4852888` Related-To: https://bugs.launchpad.net/tripleo/+bug/1840030 Related-To: https://bugzilla.redhat.com/show_bug.cgi?id=1740283 Closes: #4333 Please backport to stable-4.0 Signed-off-by: Harald Jensås <hjensas@redhat.com> (cherry picked from commit `e695efcaf7`)	2019-09-27 17:49:46 +02:00
Sam Choraria	7594bc9181	rolling_update.yml: force ceph-volume scan on osds The rolling_update.yml playbook fails when scanning ceph-disk osds while deploying nautilus. The --force flag is required to scan existing osds and rewrite their json metadata. Signed-off-by: Sam Choraria <sam.choraria@bbc.co.uk> (cherry picked from commit `7cc9f93680`)	2019-09-26 14:51:59 -04:00
Guillaume Abrioux	96dafd676c	infrastructure-playbooks: add filestore-to-bluestore.yml This playbook helps to migrate all osds on a node from filestore to bluestore backend. Note that ALL osd on the specified osd nodes will be shrinked and redeployed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3f9ccdaa8a`)	2019-09-26 16:21:54 +02:00
Guillaume Abrioux	26e0f4db97	lv-create: fix a typo This commit fixes a typo. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c785ad3637`)	2019-09-26 16:21:54 +02:00
Mehdy	8c37894109	shrink-rgw.yml: fix confirmation play's name the confirmation play's name should confirm removing rgw instead of monitor Signed-off-by: Mehdy Khoshnoody <mehdy.khoshnoody@gmail.com> (cherry picked from commit `9fa98d79fd`)	2019-09-25 16:37:44 +02:00
Dimitri Savineau	a5775be7c4	shrink-mon: search mon in the quorum_names list If we're looking at the mon hostname in the ceph status output then there's some scenarios where this could be true. If we collocate some services (mons, mgrs, etc..) then the hostname of the monitor to shrink will still be present in the ceph status (like in mgrs or other). Instead we should check the hostame only in the mon part of the output. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `734c0dc310`)	2019-09-18 14:47:40 +00:00
Kevin Jones	3a8de9cc36	Set proper ownership command performance improvement By changing the set ownership command from using the file module in combination with a with_items loop to a raw chown command, we can achieve a 98% performance increase here. On a ceph cluster with a significant amount of directories and files in /var/lib/ceph, the file module has to run checks on ownership of all those directories and files to determine whether a change is needed. In this case, we just want to explicitly set the ownership of all these directories and files to the ceph_uid Added context note to all set proper ownership tasks Signed-off-by: Kevin Jones <kevinjones@redhat.com> (cherry picked from commit `47bf47c9d8`)	2019-08-22 12:59:58 +02:00
Guillaume Abrioux	236020fb2b	shrink-mon: refact 'verify the monitor is out of the cluster' task use `from_json` filter instead of a `\| python` so we can get rid of the `shell` module usage here. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5573f17e76`)	2019-08-19 18:47:14 +00:00
Rishabh Dave	b28ed96378	use pre_tasks and post_tasks in shrink-mon.yml too This commit should've been part of commit `2fb12ae554`. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `2034387f57`)	2019-08-19 18:47:14 +00:00
Guillaume Abrioux	2f77704591	common: use discovered_interpreter_python fact in order to use the right binary name when using python cli in command or shell module. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `13815ad3ca`)	2019-08-19 18:47:14 +00:00
Dimitri Savineau	f9d9ffac8f	dashboard: run dashboard role on mgr/mon nodes We don't need to execute the ceph-dashboard role on the nodes present in the grafana-server group. This one is dedicated to the grafana and prometheus stack. The ceph-dashboard needs to executed where the ceph-mgr is running. It is either on the dedicated mgr nodes or if mgr and mon are collocated implicitly on the mon nodes. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `16939eff9e`)	2019-08-08 13:47:09 +02:00
Rishabh Dave	72a062b6fa	add a playbook the remove rgw from a given node Add a playbook named shrink-rgw.yml to infrastructure-playbooks/ that can remove a RGW from a node in an already deployed Ceph cluster. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `632a44bdf2`)	2019-07-31 15:25:15 -04:00
Rishabh Dave	8ca88b41cc	infra-playbooks: rewite a condition for better readability Use facility built-in in Ansible to check whether a command was executed successfully rather looking at its return value. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `5aecdd3ba6`)	2019-07-29 15:52:29 +02:00
Guillaume Abrioux	d0ad1cf0f1	dashboard: use dedicated group only There's no need to add complexity and trying to fallback on other group. Let's deploy dashboard on all nodes present in grafana-server group. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d67230b2a2`)	2019-07-29 15:46:58 +02:00
Dimitri Savineau	dd87db70ca	dashboard: move code into a dedicated playbook Move dashboard, grafana/prometheus and node-exporter plays into a dedicated playbook in infrastructure-playbook directory. To avoid using 'dashboard_enabled \| bool' condition multiple time in the main playbook we can just import the dashboard playbook or not. This patch also allows to use an unique dashboard playbook for both baremetal and container playbooks. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `43135840b1`)	2019-07-29 15:46:58 +02:00
Dimitri Savineau	43d625b59a	Remove NBSP characters Some NBSP are still present in the yaml files. Adding a test in travis CI. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `07c6695d16`)	2019-07-26 16:23:41 -04:00
Guillaume Abrioux	bee8a31afe	shrink-rbdmirror: check if rbdmirror is well removed from cluster This commits adds a check to ensure the daemon has been removed from the cluster. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `916dc1f52f`)	2019-07-16 15:02:49 +02:00
Rishabh Dave	0a15d1d112	add a playbook that removes rbd-mirror from a node Add a playbook named "shrink-rbdmirror.yml" in infrastructure-playbooks/ that removes a RBD Mirror from a node in an already deployed Ceph cluster. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `c4824acb19`)	2019-07-16 15:02:49 +02:00
Rishabh Dave	6197d1c8d9	add a playbook that removes manager from a node Add a playbook, named "shrink-mgr.yml", in infrastructure-playbooks/ that removes a MGR from a node in an already deployed Ceph cluster. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `f4ea75051b`)	2019-07-09 15:00:56 +00:00
Guillaume Abrioux	85a448429d	shrink-mds: refact post tasks This commit refacts the way we check the "mds_to_kill" node is well stopped. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-authored-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `7df62fde34`)	2019-07-09 12:07:47 +02:00
Rishabh Dave	38c2785e95	add a playbook that removes mds from a node Add a playbook, named "shrink-mds.yml", in infrastructure-playbooks/ that removes a MDS from a node in an already deployed Ceph cluster. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `235b1fccc6`)	2019-07-09 12:07:47 +02:00
Mike Christie	cf6050d4e6	igw: Support new ceph-iscsi package during purge The ceph-iscsi-config and ceph-iscsi-cli packages were combined into ceph-iscsi and its APIs changed. This fixes up the iscsi purge task to support the new API and old one. Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `b163206db7`)	2019-07-04 00:04:04 +00:00
Guillaume Abrioux	0a0cdc0963	purge: ensure no ceph kernel thread is present This tries to first unmount any cephfs/nfs-ganesha mount point on client nodes, then unmap any mapped rbd devices and finally it tries to remove ceph kernel modules. If it fails it means some resources are still busy and should be cleaned manually before continuing to purge the cluster. This is done early in the playbook so the cluster stays untouched until everything is ready for that operation, otherwise if you try to redeploy a cluster it could end up by getting confused by leftover from previous deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1337915 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `20e4852888`)	2019-06-24 13:20:50 +02:00
Guillaume Abrioux	77d24203fa	upgrade: accept HEALTH_OK and HEALTH_WARN as valid state `3a100cfa52` introduced a check which is a bit too restrictive, let's accept HEALTH_OK and HEALTH_WARN. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `6dce51183b`)	2019-06-21 15:47:33 +00:00
Dimitri Savineau	aa197f77fc	remove ceph restapi references The ceph restapi configuration was only available until Luminous release so we don't need those leftovers for nautilus+. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `da8b7ab7fb`)	2019-06-20 15:15:10 -04:00
Guillaume Abrioux	b93064c7c8	rolling_update: fail early if cluster state is not OK starting an upgrade if the cluster isn't HEALTH_OK isn't a good idea. Let's check for the cluster status before trying to upgrade. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3a100cfa52`)	2019-06-19 08:41:25 +00:00
Guillaume Abrioux	53dd58e84c	rolling_update: only mask and stop unit in mgr part Otherwise it fails like following: ``` fatal: [mon0]: FAILED! => changed=false msg: \|- Unable to enable service ceph-mgr@mon0: Failed to execute operation: Cannot send after transport endpoint shutdown ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `51b2813e04`)	2019-06-19 08:41:25 +00:00
Dimitri Savineau	6e565b251d	remove ceph-agent role and references The ceph-agent role was used only for RHCS 2 (jewel) so it's not usefull anymore. The current code will fail on CentOS distribution because the rhscon package is only avaible on Red Hat with the RHCS 2 repository and this ceph release is supported on stable-3.0 branch. Resolves: #4020 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `7503098ca0`)	2019-06-17 15:56:00 -04:00
L3D	1daca1ba83	ansible: use 'bool' filter on boolean conditionals By running ceph-ansible there are a lot ``[DEPRECATION WARNING]`` like these: ``` [DEPRECATION WARNING]: evaluating containerized_deployment as a bare variable, this behaviour will go away and you might need to add \|bool to the expression in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. ``` Now appended ``\| bool`` on a lot of the affected variables. Sometimes the coding style from ``variable\|bool`` changed to ``variable \| bool`` (with spaces at the pipe). Closes: #4022 Signed-off-by: L3D <l3d@c3woc.de> (cherry picked from commit `ab54fe20ec`)	2019-06-07 16:05:51 +02:00
Dimitri Savineau	7a384e7ec2	purge-cluster: clean all ceph repo files We currently only purge rh_storage yum repository file but depending on the ceph_repository value we are using, the ceph repository file could have a different name. Resolves: #4056 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `44c63903ca`)	2019-06-07 12:05:40 +00:00
guihecheng	a6312ba9bc	Add section for purging rgw loadbalancer in purge-cluster.yml Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com> (cherry picked from commit `59e702ec39`)	2019-06-06 19:44:30 +00:00
Guillaume Abrioux	16c6d530c6	roles: introduce `ceph-container-engine` role This commit splits the current `ceph-container-common` role. This introduces a new role `ceph-container-engine` which handles the tasks specific to the installation of containers tools (docker/podman). This is needed for the ceph-dashboard implementation for 2 main reasons: 1/ Since the ceph-dashboard stack is only containerized, we must install everything needed to run containers even in non containerized deployments. Splitting this role allows us to not have to call the full `ceph-container-common` role which would run a bunch of unneeded tasks that would have been skipped anyway. 2/ The current implementation would have required to run `ceph-container-common` on all ceph-clients nodes which would have been conflicting with `9d3517c670` (we don't want to run ceph-container-common on all client nodes, see mentioned commit for more details) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `55420d6253`)	2019-05-22 15:24:11 -04:00
Guillaume Abrioux	d83db2c8ed	switch to ansible 2.8 - remove private attribute with import_role. - update documentation. - update rpm spec requirement. - fix MagicMock python import in unit tests. Closes: #3765 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `72d8315299`)	2019-05-21 09:17:46 +02:00
Dimitri Savineau	023cdffd95	purge-docker-cluster: don't remove data on atomic Because we don't manage the docker service on atomic (yet) via the ceph-container-common role then we can't stop docker dans remove the data. For now let's do that only for non atomic hosts. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `638604929b`)	2019-05-17 10:44:52 -04:00
Guillaume Abrioux	e29fd842a6	rename docker_exec_cmd variable This commit renames the `docker_exec_cmd` variable to `container_exec_cmd` so it's more generic. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e74d80e72f`)	2019-05-17 16:05:58 +02:00
Zack Cerza	0496ce8e5c	purge-docker-cluster.yml: Default lvm_volumes We were failing when that variable is unset; purge-cluster.yml contains this workaround. Signed-off-by: Zack Cerza <zack@redhat.com> (cherry picked from commit `9b4339a2ba`)	2019-05-17 16:05:58 +02:00
Boris Ranto	5ac7559736	Merge cephmetrics/dashboard-ansible repo This commit will merge dashboard-ansible installation scripts with ceph-ansible. This includes several new roles to setup ceph-dashboard and the underlying technologies like prometheus and grafana server. Signed-off-by: Boris Ranto & Zack Cerza <team-gmeno@redhat.com> Co-authored-by: Zack Cerza <zcerza@redhat.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `2f141a6e80`)	2019-05-17 16:05:58 +02:00
wumingqiao	30b1ca9aeb	shrink_osd: mark all osd(s) out in one command Signed-off-by: wumingqiao <wumingqiao@beyondcent.com> (cherry picked from commit `5320aa11c4`)	2019-05-15 21:44:30 -04:00
Dimitri Savineau	1e23d853f9	purge-docker-cluster: remove docker data We never clean the content of /var/lib/docker so we can still have some data present in this directory after run the purge playbook. Pip isn't used anymore. Also update the docker package name (especially the python binding one). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `168d7cd016`)	2019-05-14 11:00:30 +02:00
Dimitri Savineau	6814fd5ce5	gather-ceph-logs: fix logs list generation The shell module doesn't have a stdout_lines attributes. Instead of using the shell module, we can use the find modules. Also adding `become: false` to the local tmp directory creation otherwise we won't have enough right to fetch the files into this directory. Resolves: #3966 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ea1f8f551c`)	2019-05-13 10:33:26 -04:00
Mike Christie	78a55a3df3	igw: Fix rolling update service ordering We must stop tcmu-runner after the other rbd-target-* services because they may need to interact with tcmu-runner during shutdown. There is also a bug in some kernels where IO can get stuck in the kernel and by stopping rbd-target-* first we can make sure all IO is flushed. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1659611 Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `d7ef12910e`)	2019-05-10 15:53:44 +02:00
Rishabh Dave	b6d5352783	remove infrastructure-playbooks/rgw-standalone.yml We don't need infrastructure-playbooks/rgw-standalone.yml since site.yml.sample and site-cotainer.yml.sample can add a new RGW node to an already deployed Ceph cluster. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `6e8fb2b3ea`)	2019-05-07 13:11:48 +02:00
letterwuyu	27a8179cd8	Fix comment content Signed-off-by: lishuhao letterwuyu@gmail.com (cherry picked from commit `d57f6fcdc6`)	2019-05-07 11:11:22 +02:00
Rishabh Dave	06b3ab2a6b	improve coding style Keywords requiring only one item shouldn't express it by creating a list with single item. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `739a662c80`) Conflicts: roles/ceph-mon/tasks/ceph_keys.yml roles/ceph-validate/tasks/check_devices.yml	2019-05-06 15:09:06 +00:00
Dimitri Savineau	92340d049c	rolling_update: restart all ceph-iscsi services Currently only rbd-target-gw service is restarted during an update. We also need to restart tcmu-runner and rbd-target-api services during the ceph iscsi upgrade. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1659611 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `f1048627ea`)	2019-04-30 12:09:52 -04:00
Andrew Schoen	f1e04835f4	rolling_update: ceph commands should use --cluster Signed-off-by: Andrew Schoen <aschoen@redhat.com> (cherry picked from commit `e2529dcd7f`)	2019-04-18 19:12:13 +02:00
Andrew Schoen	545d93aae8	rolling_update: set num_osds to the number of running osds We do this so that the ceph-config role can most accurately report the number of osds for the generation of the ceph.conf file. We don't want to use ceph-volume to determine the number of osds because in an upgrade to nautilus ceph-volume won't be able to accurately count osds created by ceph-disk. Signed-off-by: Andrew Schoen <aschoen@redhat.com> (cherry picked from commit `67453853ff`)	2019-04-18 19:12:13 +02:00
Andrew Schoen	c28388bb06	rolling_update: migrate ceph-disk osds to ceph-volume When upgrading to nautlius run ``ceph-volume simple scan`` and ``ceph-volume simple activate --all`` to migrate any running ceph-disk osds to ceph-volume. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1656460 Signed-off-by: Andrew Schoen <aschoen@redhat.com> (cherry picked from commit `28c47e4d1b`)	2019-04-18 19:12:13 +02:00
Guillaume Abrioux	35afd6a63a	update: ensure tasks are executed on an upgraded mon These tasks must be run from a monitor which is upgraded otherwise it might fail. See: https://tracker.ceph.com/issues/39355 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `7eb42c9e8e`)	2019-04-18 19:10:10 +02:00
Guillaume Abrioux	495711f296	update: ensure ceph command returns 0 these commands could return something else than 0. Let's ensure all retries have been done before actually failing. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `ed84325b1d`)	2019-04-18 19:10:10 +02:00
Guillaume Abrioux	4a678ac102	update: set osd flags before upgrading any mon Typical error: ``` failed: [mon0 -> mon2] (item=noout) => changed=true cmd: - ceph - --cluster - ceph - osd - set - noout delta: '0:00:00.293756' end: '2019-04-17 06:31:57.552386' item: noout msg: non-zero return code rc: 1 start: '2019-04-17 06:31:57.258630' stderr: \|- Traceback (most recent call last): File "/bin/ceph", line 1222, in <module> retval = main() File "/bin/ceph", line 1146, in main sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli') File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs cmd['sig'] = parse_funcsig(cmd['sig']) File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig raise JsonFormat(s) ceph_argparse.JsonFormat: unknown type CephBool stderr_lines: - 'Traceback (most recent call last):' - ' File "/bin/ceph", line 1222, in <module>' - ' retval = main()' - ' File "/bin/ceph", line 1146, in main' - ' sigdict = parse_json_funcsigs(outbuf.decode(''utf-8''), ''cli'')' - ' File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs' - ' cmd[''sig''] = parse_funcsig(cmd[''sig''])' - ' File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig' - ' raise JsonFormat(s)' - 'ceph_argparse.JsonFormat: unknown type CephBool' stdout: '' stdout_lines: <omitted> ``` Having mixed versions of monitors seems to cause this error. Moving these tasks before any monitor gets upgraded seems to be enough to get around this issue. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `543d1e2e41`)	2019-04-18 19:10:10 +02:00
Rishabh Dave	72309b49fe	allow adding a monitor to a deployed cluster Add a playbook that deploys a new monitor on a new node, adds that node to the Ceph cluster and the monitor to the quorum and updates the ceph configuration file on OSD nodes. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `d5967af7fb`)	2019-04-16 11:14:21 +02:00
Dimitri Savineau	1c3fbe5a60	purge-cluster: remove python-ceph-argparse package When using purge-cluster playbook with nautilus, there's still the python-ceph-argparse package installed on the host preventing to reinstall a ceph cluster with a different version (like luminous or mimic) Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `eb658b3af6`)	2019-04-15 17:32:22 +02:00
Dimitri Savineau	f90c051589	switch-from-non-containerized: stop all osds `e6bfb84` introduced a regression in the switch from non containerized to container deployment. We need to stop all previous OSDs services. We just don't need the ceph-disk pattern in the regex. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `150acba8c5`)	2019-04-12 00:45:21 +00:00
Guillaume Abrioux	f8c544c4a8	purge: remove references to ceph-disk as of stable-4.0, ceph-disk is no longer supported. These tasks aren't needed anymore. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `a1254d767c`)	2019-04-12 00:45:21 +00:00
Guillaume Abrioux	f1ede335e4	shrink-osd: remove legacy playbook as of stable-4.0, ceph-disk is no longer supported. Let's remove this legacy version of the playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `73aa788459`)	2019-04-12 00:45:21 +00:00
Guillaume Abrioux	f5478dcc0b	switch_to_containers: remove ceph-disk references as of stable-4.0, ceph-disk is no longer supported. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e6bfb843f4`)	2019-04-12 00:45:21 +00:00
Guillaume Abrioux	4a663e1fc0	osd: remove variable osd_scenario As of stable-4.0, the only valid scenario is `lvm`. Thus, this makes this variable useless. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4d35e9eeed`)	2019-04-12 00:45:21 +00:00
Guillaume Abrioux	2581c4d511	update: fix undefined error when no mgr group is declared if mgr group isn't defined in inventory, that task will fail with undefined error. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c1e4529b0e`)	2019-04-11 09:20:22 -04:00
Dimitri Savineau	532d749b2e	rolling_update: Remove ceph aliases ceph aliases have been introduced in stable-3.2 during the ceph deployment. On master this has been removed but we don't handle this removal in the upgrade from stable-3.2 to master via the rolling_update playbook. Also remove the task from purge-docker-cluster missing from `d9e7835` Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `57b4e76d11`)	2019-04-10 00:02:35 +00:00
Guillaume Abrioux	b723ef3fa2	purge: fix lvm-batch purge osd `lvm_volumes` and/or `devices` variable(s) can be undefined depending on the scenario chosen. These tasks should be run only if these variable are defined, otherwise it ends up with undefined variable errors. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `0180738313`)	2019-04-04 03:38:52 +02:00
Guillaume Abrioux	f55e2b08be	remove all NBSPs on master branch Similar to #3658 Since there's too many changes between master and stable branches let's commit directly in each branches instead of trying to backport this commit. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-28 11:57:55 +00:00
Dimitri Savineau	c8442f3705	rolling_update: Update systemd unit regex for nvme The systemd unit regex doesn't handle nvme devices (/dev/nvmeXn1). Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1687828 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-26 12:01:00 +00:00
Guillaume Abrioux	78aac3e96a	update: followup on `edfdc49` all rgw instances should be stopped according to the multiple rgw instances support added in rolling_update.yml Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	f6e0185146	update: add containerized deployment upgrade support (L->N) Add a couple of fixes to allow containerized deployments upgrade support to upgrade from luminous/mimic to nautilus. - pass CEPH_CONTAINER_IMAGE and CEPH_CONTAINER_BINARY environment variable to the ceph_key module, - fix the docker exec command in 'waiting for the containerized monitor to join the quorum' task according to the `delegate_to` parameter, - override `docker_exec_cmd` in `ceph-facts` with `mon_host` when rolling_update is `True`, - do not run unnecessarily `create_mds_filesystems.yml` when performing an upgrade. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	1816b876ee	update: add missing hosts in facts gathering iscsigws were missing. The 'complete upgrade' couldn't complete because rolling_update was set to False for iscsigw nodes. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	45ba90c169	update: remove rbdmirror legacy task This task is no longer needed for next release. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	0ea0adf039	update: show all daemons version at the end Let's display all daemons version at the end of the playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	f31d6d9485	update: enable new nautilus-only functionality once the cluster is upgraded to nautilus, we can complete the process by disallowing pre-nautilus OSDs and enabling all new nautilus-only functionality Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	afdaa70a63	update: enable msgr2 protocol This commit enable the msgr2 protocol when the cluster is fully upgraded to nautilus Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	ef096dd021	update: ensure mgrs are upgraded after ALL monitors As of `1c760904b0`, ceph-ansible implicitly bootstrap managers on monitors. mgrs must be upgraded only after all monitors, therefore, this commit refact the way mgrs are upgraded to be sure we don't upgrade a mgr during the monitors upgrade. This commit also ensure we handle the case were we split managers on dedicated nodes. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	7fa2434f0f	update: ensure /var/lib/ceph/bootstrap-rbd-mirror is present This directory is created by ceph-config node by node. In the upgrade context we need it to be created on ALL monitors as soon as the first iteration because of the task right after which creates and sends the keyrings on all monitors. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	82764afe8d	update: mask systemd service units during upgrade This prevents the packaging from restarting services before we do need to restart them in the rolling update sequence. We want to handle services restart at rolling_update playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	8add55451c	update: set osd flags only once There is no need to set osd flags (noout, norebalance) each time we upgrade a mon. This commit moves up those tasks (before stopping the mon) so we don't need to delegate them. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	f7c6f4e0b6	update: fix tasks waiting for the node to join the quorum We actually want to ensure the node being upgraded is joining the quorum instead of the monitor picked up earlier. Indeed, the `mon_host`is used only in `delegate_to:` so we can still run ceph commands while the monitor being upgraded is stopped. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	32569b79e2	update: remove an old parameter in ceph_key module call the `containerized` parameter in ceph_key module doesn't exist anymore. This was making the module failing but was hidden because of the `ignore_errors: True`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Dimitri Savineau	b23c05ae52	add-osd.yml: Add become flag for ceph-validate The check_devices task fails if the ceph-validate role isn't executed as a privileged user (Permission denied). failed: [osd0] (item=/dev/sdb) => {"changed": false, "err": "Error: Error opening /dev/sdb: Permission denied\n", "item": "/dev/sdb", "msg": "Error while getting device information with parted script: '/sbin/parted -s -m /dev/sdb -- unit 'MiB' print'", "out": "", "rc": 1} Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-09 05:54:46 +00:00
Guillaume Abrioux	a440878533	add-osd: gather facts in second part of playbook otherwise, it will end up with error like following: ``` FAILED! => {"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_hostname'"} ``` because facts won't have been gathered. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1670663 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-04 14:44:27 +01:00
Guillaume Abrioux	47ebef374f	purge: fix rbd-mirror group name the default is rbdmirrors in ceph-defaults Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-01 20:31:14 +00:00
Guillaume Abrioux	a915308477	purge: fix rbd mirror purge as of `b70d54ac80` the service launched isn't ceph-rbd-mirror@admin.service. it's now `ceph-rbd-mirror@rbd-mirror.{{ ansible_hostname }}` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-01 20:31:14 +00:00
Guillaume Abrioux	3849f30f58	purge: do not remove /var/lib/apt/lists/* removing the content of this directory seems a bit agressive and cause a redeployment to fail after a purge on debian based distrubition. Typical error: ``` fatal: [mon0]: FAILED! => changed=false attempts: 3 msg: No package matching 'ceph' is available ``` The following task will consider the cache is still valid, so apt doesn't refresh it: ``` - name: update apt cache if cache_valid_time has expired apt: update_cache: yes cache_valid_time: 3600 register: result until: result is succeeded ``` since the task installing ceph packages has a `update_cache: no` it fails: ``` - name: install ceph for debian apt: name: "{{ debian_ceph_pkgs \| unique }}" update_cache: no state: "{{ (upgrade_ceph_packages\|bool) \| ternary('latest','present') }}" default_release: "{{ ceph_stable_release_uca \| default('') }}{{ ansible_distribution_release ~ '-backports' if ceph_origin == 'distro' and ceph_use_distro_backports else '' }}" register: result until: result is succeeded ``` /tmp/* isn't specific to ceph as well, so we shouldn't remove everything in this directory. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-01 20:31:14 +00:00
Guillaume Abrioux	89f77589fa	purge: fix purge of lvm devices using `shell` module seems to be the only way to make this task working on rhel based distribution AND debian based distributions. on ubuntu, using `command` ansible module fails like following (not due to `sudo` usage or not): ``` ok: [osd1] => changed=false cmd: command -v ceph-volume failed_when_result: false msg: '[Errno 2] No such file or directory: ''command'': ''command''' rc: 2 ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-01 20:31:14 +00:00
Guillaume Abrioux	69310a5cd6	switch_to_containers: support multiple rgw instances per host add multiple rgw instances per host in switch_to_containers playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-02-13 09:42:27 +01:00
Guillaume Abrioux	70f1eea9b2	switch_to_containers: remove non-containerized systemd unit files remove old systemd unit files (non-containerized) during the switch_to_containers transition. We have seen sometimes the unit started is the old one instead of the new systemd unit generated. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-02-13 09:42:27 +01:00
Guillaume Abrioux	4064035a54	switch_to_containers: use ceph binary from container use the ceph binary from the container instead of the host. If the ceph CLI version isn't compatible between host and container image, it can cause the CLI to hang. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-02-13 09:42:27 +01:00
Guillaume Abrioux	7e0a70f7a8	switch_to_containers: do not try to redeploy monitors `ceph-mon` tries to redeploy monitors because it assumes it was not yet deployed since `mon_socket_stat` and `ceph_mon_container_stat` are undefined (indeed, we stop the daemon before calling `ceph-mon` in the switch_to_containers playbook). Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-02-13 09:42:27 +01:00
John Fulton	37b5d1084a	Make python print statements python3 compatible The restart_osd_daemon.sh generated from the j2 template contains a python call which uses 'print x' instead of 'print(x)'. Add the missing parentheses to make this call compatible with both 2 and 3. Also add parentheses to other python print calls found in roles/ceph-client/defaults/main.yml and infrastructure-playbooks/cluster-os-migration.yml. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1671721 Signed-off-by: John Fulton <fulton@redhat.com>	2019-02-01 15:23:27 +00:00
Noah Watkins	9a43674d2e	shrink_osd: use cv zap by fsid to remove parts/lvs Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1569413 https://bugzilla.redhat.com/show_bug.cgi?id=1572933 Signed-off-by: Noah Watkins <noahwatkins@gmail.com>	2019-01-24 16:34:13 +01:00
Guillaume Abrioux	edfdc49488	rolling_update: support multiple rgw instance `1ac94c048f` introduced the support of multiple rgw instances on a single host but somehow has missed to implement this feature in rolling_update. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-01-22 13:45:38 +01:00
Giulio Fidente	ff8dbe114c	Preserve rolling_update backward compatibility with ansible < 2.5 Signed-off-by: Giulio Fidente <gfidente@redhat.com>	2019-01-21 14:05:45 +01:00
guihecheng	1ac94c048f	rgw: add support for multiple rgw instances on a single host With this, we could have multiple rgw instances on a single host with a single run, don't have to use rgw-standalone.yml which does not seems able to bind ports separately. If you want to have multiple rgw instances, just change 'radosgw_instances' to the number you want, which defaults to 1. Not compatible with Multi-Site yet. Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>	2019-01-18 11:12:28 +01:00
Guillaume Abrioux	268f2cef82	update: do not enforce `serial: 1` on client nodes There is no need to enforce `serial: 1` on client nodes. Let's make it parameterizable by introducing a new extra variable `client_update_batch`, if not filled this will default to `{{ ansible_forks }}`. NOTE: this is only usable as an extra variable passed with `-e client_update_batch=<num>` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650184 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-01-02 16:55:08 +00:00
Daniel-Pivonka	ba149972be	Example ceph_add_users_buckets playbook This is example playbook will show how to bulk add rgw users and buckets Signed-off-by: Daniel-Pivonka <dpivonka@redhat.com>	2018-12-20 14:23:25 +01:00
Guillaume Abrioux	d7e77012ef	retry on packages and repositories failures add register/until on all packaging related tasks to avoid non valid CI failure. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-12-19 14:48:27 +00:00
Noah Watkins	110049e825	playbook: report storage device inventory Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-12-18 10:51:31 +01:00
Andrew Schoen	ffd56177e7	purge-cluster: skip tasks that use ceph-volume if it's not installed This will allow the playbook to be idempotent. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1656935 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-12-13 11:27:27 +01:00
Guillaume Abrioux	a12de3e048	purge-container: move facts gathering after ceph-defaults role import This task has to be called after the role `ceph-defaults` has been played, otherwise, `mon_group_name` will never be known. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-12-12 16:50:24 +00:00
Guillaume Abrioux	d0b3cb7f85	purge-container: fix wrong syntax we want a default value for `mon_group_name`, not for `groups[mon_group_name]`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-12-12 11:33:57 +01:00
Guillaume Abrioux	0eb56e36f8	introduce new role ceph-facts sometimes we play the whole role `ceph-defaults` just to access the default value of some variables. It means we play the `facts.yml` part in this role while it's not desired. Splitting this role will speedup the playbook. Closes: #3282 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-12-12 11:18:01 +01:00
Guillaume Abrioux	ae7f3d66a6	purge-docker: do not call ceph-osd role calling ceph-osd role in purge playbook is not needed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-12-11 09:59:25 +01:00
Guillaume Abrioux	1a4a6ec855	purge: gather monitors facts in OSD purge the OSD part of the purge delegates commands on monitor node, we need to gather monitors facts to know the `ansible_hostname` fact that is used in the `docker_exec_cmd` fact. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-12-11 09:59:25 +01:00
Sébastien Han	62111ff53c	purge-container: gather fact before calling ceph-defaults ceph-defaults relies on facts so we must gather facts before running it. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-11 09:59:25 +01:00
Sébastien Han	fc6ebd8ebb	purge-cluster: add support for mon/mgr collocation Recently we introduced the default collocation of mon/mgr without the need of a dedicated mgrs section. This means we have to stop the mgr process on that machine too. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-11 09:59:25 +01:00
Sébastien Han	3a154fa0ad	purge-cluster: remove support for other init system We only support systemd and use the service module anyway. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-11 09:59:25 +01:00
Sébastien Han	325a159415	purge-docker-cluster: add support for mgr/mon collocation Recently we introduced the collocation of mon and mgr by default, so we don't need to have an explicit mgrs section for this. This means we have to remove the mgr container on the mon machines too. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-11 09:59:25 +01:00
Sébastien Han	2bcc00896f	purge-docker-cluste: add a task to check hosts It's useful when running on CI to see what might remain on the machines. So we list all the containers and images. We expect the list to be empty. We fail if we see containers running. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-11 09:59:25 +01:00
Sébastien Han	1751885bc9	purge-docker-cluster: add ceph-volume support This commits adds the support for purging cluster that were deployed with ceph-volume. It also separates nicely with a block intruction the work to do when lvm is used or not. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-11 09:59:25 +01:00
Rishabh Dave	2fb12ae554	use pre_tasks and post_tasks when necessary Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-12-05 08:17:10 +00:00
Rishabh Dave	e4f0af2b78	don't use private option for import_role Since sharing variables amongst roles has been made default since Ansible 2.6, private option has been deprecated; so stop using it. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-12-04 23:45:59 +00:00
Ramana Raja	cb784c601d	rolling_update: fail if less than 3 MONs ... for non-containerized deployments as well. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1655470 Signed-off-by: Ramana Raja <rraja@redhat.com>	2018-12-04 14:28:49 +00:00
Sébastien Han	896676ee80	fix json data type Json is a type structure which is always typed as a string, where before this we were declaring a dict, which is not a json valid structure. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-04 12:34:54 +01:00
Guillaume Abrioux	78116fa6db	purge: add iscsi support add iscsi support for both non containerized and containerized deployment in purge playbooks. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1651054 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-12-03 17:35:21 +01:00
Sébastien Han	1c760904b0	site: collocated mon and mgr by default This will speed up the deployment and also deploy mon and mgr collocated just as recommended. This won't prevent you of adding more and dedicaded machines for mgr if needed. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-03 14:39:43 +01:00
Sébastien Han	bb7bfca113	rolling-update: remove old condition This failure condition was only valid at the time where clusters didn't have ceph-mgr activated. Now since we collocate the ceph-mgr with the mon by default, if the daemon wasn't present it will be created during the upgrade. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-12-03 14:39:43 +01:00
Guillaume Abrioux	a952122c38	rolling_update: create missing keyring only on running mon try to create the potentially missing keys only on monitors that are actually running. The current node being played is stopped before this task. By the way, delegating the command on all nodes but the current node being played ensures that the generated keys will be present on all monitors. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-29 16:40:46 +00:00
Sébastien Han	61fb6972ec	rolling_update: default ceph json output to empty dict So we can avoid the following failure: The conditional check 'hostvars[mon_host]['ansible_hostname'] in (ceph_health_raw.stdout \| from_json)["quorum_names"] or hostvars[mon_host]['ansible_fqdn'] in (ceph_health_raw.stdout \| from_json)["quorum_names"] ' failed. The error was: No JSON object could be decoded We just need to set a default, the next iteration will have a more complete json since the command won't fail. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-29 10:46:15 +00:00
Guillaume Abrioux	73287f91bc	mgr: fix mgr keyring error on rolling_update when upgrading from RHCS 2.5 to 3.2, it fails because the task `create ceph mgr keyring(s) when mon is containerized` has a when condition `inventory_hostname == groups[mon_group_name]\|last`. First, this is incorrect because `inventory_hostname` is referring to a mgr node, it means this condition would have never been satisfied. Then, this condition + `serial: 1` makes the mgr keyring creating skipped on the first node. Further, the `ceph-mgr` role tries to copy the mgr keyring (it's not aware we are running `serial: 1`) this leads to a failure like the following: ``` TASK [ceph-mgr : copy ceph keyring(s) if needed] ************************************************************************************************************************************************************************************************************************************************************************* task path: /usr/share/ceph-ansible/roles/ceph-mgr/tasks/common.yml:10 Tuesday 27 November 2018 12:03:34 +0000 (0:00:00.296) 0:11:01.290 **** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AnsibleFileNotFound: Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring' failed: [magna021] (item={u'dest': u'/var/lib/ceph/mgr/local-magna021/keyring', u'name': u'/etc/ceph/local.mgr.magna021.keyring', u'copy_key': True}) => {"changed": false, "item": {"copy_key": true, "dest": "/var/lib/ceph/mgr/local-magna021/keyring", "name": "/etc/ceph/local.mgr.magna021.keyring"}, "msg": "Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'"} ``` The ceph_key module is idempotent, so there is no need to have such a condition. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1649957 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-27 18:19:56 +01:00
Sébastien Han	e5d5dffeb5	shrink-osd: add missing CEPH_BINARY We need to add the right binary to do the docker exec. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	4f57e44f9c	defaults: declare container_binary Always declare container_binary and assign it a correct value. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	49e0e19056	rolling_update: update ceph_key task for container Use the new way to create keys on containerized env as introduced by: 1098b71bda90db3dad19ac179f0ba900ccb0f953 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Sébastien Han	2814d36c93	infra playbooks: use the right container binary Use podman or docker wether they are available or not. podman will be prioritized over docker if present. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-27 16:47:40 +00:00
Guillaume Abrioux	7c99b6df6d	update: fix a typo `hostvars[groups[mon_host]]['ansible_hostname']` seems to be a typo. That should be `hostvars[mon_host]['ansible_hostname']` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-26 18:22:20 +01:00
Guillaume Abrioux	af78173584	rolling_update: refact set_fact `mon_host` each monitor node should select another monitor which isn't itself. Otherwise, one node in the monitor group won't set this fact and causes failure. Typical error: ``` TASK [create potentially missing keys (rbd and rbd-mirror) when mon is containerized] * task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-update_docker_cluster/rolling_update.yml:200 Thursday 22 November 2018 14:02:30 +0000 (0:00:07.493) 0:02:50.005 *** fatal: [mon1]: FAILED! => {} MSG: The task includes an option with an undefined variable. The error was: 'dict object' has no attribute u'mon2' ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-26 18:22:20 +01:00
Sébastien Han	4e267bee4f	rolling_update: create rbd and rbd-mirror keyrings During an upgrade ceph won't create keys that were not existing on the previous version. So after the upgrade of let's Jewel to Luminous, once all the monitors have the new version they should get or create the keys. It's ok to have the task fails, especially for the rbd-mirror key, which only appears in Nautilus. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650572 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-26 18:22:20 +01:00
Sébastien Han	c14f9b78ff	switch: do not look for devices anymore It's easier lookup a directoriy instead of the block devices, especially because of ceph-volume and ceph-disk have a different way to handle devices. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-23 07:56:23 +00:00
Sébastien Han	cd56dad9fa	switch: disable all ceph units Prior to this commit we were only disabling ceph-osd units, but forgot the ceph.target which is controlling everything and will restart the ceph-osd units at each reboot. Now that everything gets disabled there won't be any conflicts between the old non-container and the new container units. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-23 07:56:23 +00:00
Sébastien Han	fe1d09925a	switch: do not mask systemd unit If we mask it we won't be able to start the OSD container since now the osd container use the osd ID as a name such as: ceph-osd@0 Fixes the error: Failed to execute operation: Cannot send after transport endpoint shutdown Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-23 07:56:23 +00:00
Guillaume Abrioux	c783bc70da	docker-common: rename role rename `ceph-docker-common` role to `ceph-container-common` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-11-12 10:51:48 +01:00
Rishabh Dave	90f222f6a5	add quotes around package names added in `da6f384` Add quotes around package names added in the commit `da6f384223` so that the difference between the Ansible variables and package names is clear. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-11-09 12:59:08 +00:00
Rishabh Dave	d72340abbe	pass the list of packages to package management modules Instead of looping over a list of packages or repeating the task separately for different packages, pass the list of packages to the task performing package management. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-11-09 12:59:08 +00:00
Sébastien Han	53910de43b	ceph_key: add fetch_initial_keys capability This is needed for Nautilus since the ceph-create-keys script goes away. (https://github.com/ceph/ceph/pull/21305) Now the module if called with 'state: fetch_initial_keys' will lookup keys generated by the monitor and write them down on the filesystem to the right location (/etc/ceph and /var/lib/ceph/boostrap*). This is not applicable to container since keys are generated by the container only. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-09 12:45:52 +01:00
Mike Christie	b523a44a1a	igw: stop tcmu-runner on iscsi purge When the iscsi purge playbook is run we stop the gw and api daemons but not tcmu-runner which I forgot on the previous PR. Fixes Red Hat BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1621255 Signed-off-by: Mike Christie <mchristi@redhat.com>	2018-11-09 10:02:16 +01:00
Noah Watkins	b848d2be4c	don't use "role" or "roles" to include roles see `3f62fc585f` Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-11-08 17:45:37 +01:00
Noah Watkins	9c47950961	Fix comments in shrink-osd-ceph-disk playbook Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-11-08 17:45:37 +01:00
Noah Watkins	f5dacbf7de	Add a ceph-volume aware shrink-osd playbook Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-11-08 17:45:37 +01:00
Noah Watkins	0782cfc546	Rename ceph-disk version of shrink-osd playbook This will be replaced by a ceph-volume aware verison. Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-11-08 17:45:37 +01:00
Sébastien Han	b82995df58	Revert "ceph_key: add fetch_initial_keys capability" This reverts commit `17883e09ba`.	2018-11-08 13:34:47 +00:00
Sébastien Han	17883e09ba	ceph_key: add fetch_initial_keys capability This is needed for Nautilus since the ceph-create-keys script goes away. (https://github.com/ceph/ceph/pull/21305) Now the module if called with 'state: fetch_initial_keys' will lookup keys generated by the monitor and write them down on the filesystem to the right location (/etc/ceph and /var/lib/ceph/boostrap*). This is not applicable to container since keys are generated by the container only. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-11-08 13:32:18 +00:00
Rishabh Dave	da6f384223	don't loop over a task using package management modules For tasks using (Ansible) modules for package management utilities, pass the list of packages to be installed instead of repeating the task for each package. Using the latter manner of installing a list of packages leads to a deprecation warning by ansible-playbook command. Fixes: https://github.com/ceph/ceph-ansible/issues/3293 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-11-08 08:38:10 +00:00
Rishabh Dave	640cad3fd8	remove configuration files for ceph packages on ubuntu clusters For apt-get, purge command needs to be used, instead of remove command, to remove related configuration files. Otherwise, packages might be shown as installed while running dpkg command even after removing them. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1640061 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-11-07 15:52:53 +01:00
Guillaume Abrioux	f7d4651186	playbook: remove jinja syntax in when statement this syntax in deprecated Closes: #3281 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-31 13:45:41 +01:00
Rishabh Dave	3f62fc585f	don't use "role" or "roles" to include roles Since import_role and include_role are more readable, explicit (about the nature of inclusion) and flexible (allows placibf inclusion anywhere) amongst the tasks, use them instead of using roles or role keyword. Besides, these keywords also allow more arguments. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-10-31 09:38:59 +01:00
Rishabh Dave	8edbda96df	use blocks directives to group tasks Using block directives simplifies the playbooks and makes them more readable. Fixes: https://github.com/ceph/ceph-ansible/issues/2835 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-10-31 09:37:43 +01:00
Guillaume Abrioux	d8d3e55006	remove restapi role As of `mimic`, restapi is no longer available because of manager daemon. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-30 14:19:13 +01:00
Ali Maredia	219fa8f919	infrastructure playbooks: ensure nvme_device is defined in lv-create.yml Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-10-29 08:41:42 +00:00
Mike Christie	0904860032	igw: stop daemons on purge all calls When purging the entire igw config (lio and rbd) stop disable the api and gw daemons. Fixes Red Hat BZ https://bugzilla.redhat.com/show_bug.cgi?id=1621255 Signed-off-by: Mike Christie <mchristi@redhat.com>	2018-10-25 12:59:18 +02:00
Sébastien Han	44d0da0dd4	rolling_update: fix upgrade when using fqdn CLusters that were deployed using 'mon_use_fqdn' have a different unit name, so during the upgrade this must be used otherwise the upgrade will fail, looking for a unit that does not exist. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1597516 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-19 13:06:56 +00:00
Guillaume Abrioux	b8418ebd17	add-osds: followup on `3632b26` Three fixes: - fix a typo in vagrant_variables that cause a networking issue for containerized scenario. - add containerized_deployment: true - remove a useless block of code: the fact docker_exec_cmd is set in ceph-defaults which is played right after. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-17 17:07:25 +02:00
Sébastien Han	d6e79044ef	infra: add a gather-ceph-logs.yml playbook Add a gather-ceph-logs.yml which will log onto all the machines from your inventory and will gather ceph logs. This is not intended to work on containerized environments since the logs are stored in journald. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1582280 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-17 13:52:19 +00:00
Sébastien Han	fbd878c8d5	infra: rename osd-configure to add-osd and improve it The playbook has various improvements: * run ceph-validate role before doing anything * run ceph-fetch-keys only on the first monitor of the inventory list * set noup flag so PGs get distributed once all the new OSDs have been added to the cluster and unset it when they are up and running Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1624962 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-17 11:26:11 +00:00
Guillaume Abrioux	40b7747af7	remove jewel support As of now, we should no longer support Jewel in ceph-ansible. The latest ceph-ansible release supporting Jewel is `stable-3.1`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-12 23:38:17 +00:00
Sébastien Han	9fccffa1ca	switch: allow switch big clusters (more than 99 osds) The current regex had a limitation of 99 OSDs, now this limit has been removed and regardless the number of OSDs they will all be collected. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1630430 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-10 16:35:30 -04:00
Noah Watkins	8dcc8d1434	Stringify ceph_docker_image_tag This could be a numeric input, but is treated like a string leading to runtime errors. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1635823 Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-10-10 04:26:33 +00:00
Noah Watkins	306e308f13	Avoid using tests as filter Fixes the deprecation warning: [DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of using `result\|search` use `result is search`. Signed-off-by: Noah Watkins <nwatkins@redhat.com>	2018-10-10 04:26:33 +00:00
Guillaume Abrioux	79bd06ad28	rolling_update: add ceph-handler role since the introduction of ceph-handler, it has to be added in rolling_update playbook as well Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-05 13:48:04 +00:00
Rishabh Dave	b5d2ea269f	don't use "static" field while including tasks Instead used "import_tasks" and "include_tasks" to tell whether tasks must be included statically or dynamically. Fixes: https://github.com/ceph/ceph-ansible/issues/2998 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-10-04 07:44:28 +00:00
Sébastien Han	bae0f41705	switch: copy initial mon keyring We need to copy this key into /etc/ceph so when ceph-docker-common runs it can fetch it to the ansible server. Previously the task wasn't not failing because `fail_on_missing` was False before 2.5, so now it's True hence the failure. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-10-03 13:58:53 +00:00
Guillaume Abrioux	03e76af7b4	switch: add missing call to ceph-handler role Add missing call the ceph-handler role, otherwise we can't have reference to variable registered from ceph-handler from other roles. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-03 13:58:53 +00:00
Guillaume Abrioux	54b02fe187	switch: support migration when cluster is scrubbing Similar to `c13a3c3` we must allow scrubbing when running this playbook. In cluster with a large number of PGs, it can be expected some of them scrubbing, it's a normal operation. Preventing from scrubbing operation force to set noscrub flag. This commit allows to switch from non containerized to containerized environment even while PGs are scrubbing. Closes: #3182 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-10-03 13:58:53 +00:00
Andrew Schoen	9747f3dbd5	purge-cluster: zap devices used with the lvm scenario Fixes: https://github.com/ceph/ceph-ansible/issues/3156 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-09-28 14:49:56 +02:00
wumingqiao	5da71e1ca1	purge-cluster: recursively remove ceph-related files, symlinks and directories under /etc/systemd/system. fix: https://github.com/ceph/ceph-ansible/issues/3166 Signed-off-by: wumingqiao <wumingqiao@beyondcent.com>	2018-09-28 14:49:22 +02:00
Rishabh Dave	380168dadc	don't use "include" to include tasks Use "import_tasks" or "include_tasks" instead. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2018-09-27 17:53:40 +02:00
Guillaume Abrioux	144c92b21f	purge: actually remove of /var/lib/ceph/* `38dc20e74b` introduced a bug in the purge playbooks because using `` in `command` module doesn't work. `/var/lib/ceph/` files are not purged it means there is a leftover. When trying to redeploy a cluster, it failed because monitor daemon was detecting existing keyring, therefore, it assumed a cluster already existed. Typical error (from container output): ``` Sep 26 13:18:16 mon0 docker[31316]: 2018-09-26 13:18:16 /entrypoint.sh: Existing mon, trying to rejoin cluster... Sep 26 13:18:16 mon0 docker[31316]: 2018-09-26 13:18:16.9323937f15b0d74700 -1 auth: unable to find a keyring on /etc/ceph/test.client.admin.keyring,/etc/ceph/test.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,:(2) No such file or directory Sep 26 13:18:23 mon0 docker[31316]: 2018-09-26 13:18:23 /entrypoint.sh: SUCCESS ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1633563 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-27 17:45:21 +02:00
Guillaume Abrioux	179c4d00d7	rolling_update: ensure pgs_by_state has at least 1 entry Previous commit `c13a3c3` has removed a condition. This commit brings back this condition which is essential to ensure we won't hit a false positive result in the `when` condition for the check PGs task. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-25 14:58:54 +00:00
Guillaume Abrioux	c13a3c3492	upgrade: consider all 'active+clean' states as valid pgs In cluster with a large number of PGs, it can be expected some of them scrubbing, it's a normal operation. Preventing from scrubbing operation force to set noscrub flag before a rolling update which is a problem because it pauses an important data integrity operation until the end of the rolling upgrade. This commit allows an upgrade even while PGs are scrubbing. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616066 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-25 12:12:06 +00:00
Guillaume Abrioux	57f0b6a476	shrink-osd: follow up on `36fb3cde` - Adds loop in bash to satisfy the 1:n relation between `osd_hosts` and the different device lists. - Fixes some container name which were using the host hostname instead of the actual container one. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-18 07:27:41 +00:00
Sébastien Han	735e1917db	shrink-osd: purge dedicated devices Once the OSD is destroyed we also have to purge the associated devices, this means purging journal, db , wal partitions too. This now works for container and non-container. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1572933 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-09-18 07:27:41 +00:00
Guillaume Abrioux	4159326a18	shrink-osd: fix purge osd on containerized deployment `ce1dd8d` introduced the purge osd on containers but it was incorrect. `resolve parent device` and `zap ceph osd disks` tasks must be delegated to their respective OSD nodes. Indeed, they were run on the ansible node, it means it was trying to resolve parent devices from this node where it should be done on OSD nodes. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1612095 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-09-13 18:14:01 +02:00
Sébastien Han	38dc20e74b	purge: only purge /var/lib/ceph content Sometime /var/lib/ceph is mounted on a device so we won't be able to remove it (device busy) so let's remove its content only. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1615872 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-09-03 10:51:24 +02:00
Ali Maredia	561ec9203d	infrastructure-playbooks: add comments for lv_vars.yml Add comments telling user that devices used in playbooks must not have GPT/FS/RAID signatures Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-08-29 21:10:20 +00:00
Ali Maredia	77eb459a88	infrastructure playbooks: remove lv-create error msg remove error message when PV creation fails Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-08-29 21:10:20 +00:00
Ali Maredia	e1ff438800	infrastructure-playbooks: failure msg for pvcreate Add a message for when PV creation fails. This message alerts users that FS/GPT/RAID signatures could still on the device and the reason for the failures. `wipefs -a $device` needs to be run to fix this issue. Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-08-28 20:21:42 +00:00
Sébastien Han	2e6e885bb7	rolling_upgrade: set sortbitwise properly Running 'osd set sortbitwise' when we detect a version 12 of Ceph is wrong. When OSD are getting updated, even though the package is updated they won't send their updated version (12) and will stick with 10 if the command is not applied. So we have to check if OSD are sending a version 10 and then run the command to unlock the OSDs. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-21 12:22:32 +00:00
Sébastien Han	77a3a682f3	iscsi group name preserve backward compatibility Recently we renamed the group_name for iscsi iscsigws where previously it was named iscsi-gws. Existing deployments with a host file section with iscsi-gws must continue to work. This commit adds the old group name as a backoward compatility, no error from Ansible should be expected, if the hostgroup is not found nothing is played. Close: https://bugzilla.redhat.com/show_bug.cgi?id=1619167 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-20 23:52:19 +02:00
Sébastien Han	b738706810	take-over-existing-cluster: do not call var_files We were using var_files long ago when default variables were not in ceph-defaults, now the role exists this is not need. Moreover having these two var files added: - roles/ceph-defaults/defaults/main.yml - group_vars/all.yml Will create collision and override necessary variables. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1555305 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-20 14:47:04 +02:00
Andrew Schoen	04df3f0802	lv-create: use copy instead of the template module The copy module does in fact do variable interpolation so we do not need to use the template module or keep a template in the source. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	131796f275	lv-create: add an example logfile_path config option in lv_vars.yml Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	b0bfc17351	lv-teardown: fail silently if lv_vars.yml is not found This allows user to opt out of using lv_vars.yml and load configuration from other sources. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	8424858b40	lv-teardown: set become: true at the playbook level Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	e43eec57bb	lv-create: fail silenty if lv_vars.yml is not found If a user decides to to use the lv_vars.yml file then it should fail silenty so that configuration can be picked up from other places. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	fde47be13c	lv-create: set become: true at the playbook level Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Andrew Schoen	35301b35af	lv-create: use the template module to write log file The copy module will not expand the template and render the variables included, so we must use template. Creating a temp file and using it locally means that you must run the playbook with sudo privledges, which I don't think we want to require. This introduces a logfile_path variable that the user can use to control where the logfile is written to, defaulting to the cwd. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-08-16 16:38:23 +02:00
Neha Ojha	909b38da82	infrastructure-playbooks/vars/lv_vars.yaml: minor fixes Signed-off-by: Neha Ojha <nojha@redhat.com>	2018-08-16 16:38:23 +02:00
Neha Ojha	f65f3ea89f	infrastructure-playbooks/lv-create.yml: use tempfile to create logfile Signed-off-by: Neha Ojha <nojha@redhat.com>	2018-08-16 16:38:23 +02:00
Neha Ojha	65fdad0723	infrastructure-playbooks/lv-create.yml: add lvm_volumes to suggested paste Signed-off-by: Neha Ojha <nojha@redhat.com>	2018-08-16 16:38:23 +02:00
Neha Ojha	50a6d8141c	infrastructure-playbooks/lv-create.yml: copy without using a template file Signed-off-by: Neha Ojha <nojha@redhat.com>	2018-08-16 16:38:23 +02:00
Neha Ojha	186c4e11c7	infrastructure-playbooks/lv-create.yml: don't use action to copy Signed-off-by: Neha Ojha <nojha@redhat.com>	2018-08-16 16:38:23 +02:00
Neha Ojha	9d43806df9	infrastructure-playbooks: standardize variable usage with a space after brackets Signed-off-by: Neha Ojha <nojha@redhat.com>	2018-08-16 16:38:23 +02:00
Neha Ojha	e0293de3e7	vars/lv_vars.yaml: remove journal_device Signed-off-by: Neha Ojha <nojha@redhat.com>	2018-08-16 16:38:23 +02:00
Ali Maredia	1f018d8612	infrastructure-playbooks: playbooks for creating LVs for bucket indexes and journals These playbooks create and tear down logical volumes for OSD data on HDDs and for a bucket index and journals on 1 NVMe device. Users should follow the guidelines set in var/lv_vars.yaml After the lv-create.yml playbook is run, output is sent to /tmp/logfile.txt for copy and paste into osds.yml Signed-off-by: Ali Maredia <amaredia@redhat.com>	2018-08-16 16:38:23 +02:00
Sébastien Han	dad10e8f3f	rolling_update: register container osd units Before running the upgrade, let's call systemd to collect unit names instead of relaying on the device list. This is more accurate and fix the osd_auto_discovery scenario too. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613626 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-08-16 11:13:12 +02:00
Jeffrey Zhang	85cc61a6d9	Use /var/lib/ceph/osd folder to filter osd mount point In some case, use may mount a partition to /var/lib/ceph, and umount it will be failure and no need to do so too. Signed-off-by: Jeffrey Zhang <zhang.lei.fly@gmail.com>	2018-08-14 13:00:24 +00:00
Sébastien Han	b3266c5be2	rolling_update: set osd sortbitwise upgrade RHCS 2 -> RHCS 3 will fail if cluster has still set sortnibblewise, it stay stuck on "TASK [waiting for clean pgs...]" as RHCS 3 osds will not start if nibblewise is set. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-07-24 17:19:02 +02:00
Sébastien Han	ce1dd8d2b3	shrink-osd: purge osd on containerized deployment Prior to this commit we were only stopping the container, but now we also purge the devices. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1572933 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-07-18 14:26:22 +00:00
Guillaume Abrioux	d0746e0858	common: switch from docker module to docker_container As of ansible 2.4, `docker` module has been removed (was deprecated since ansible 2.1). We must switch to `docker_container` instead. See: https://docs.ansible.com/ansible/latest/modules/docker_module.html#docker-module Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-07-10 20:08:07 +00:00
Vishal Kanaujia	44d514850a	Rolling upgrades: Migrate to ceph-key module This change moves ceph-mgr upgrades to using ceph-key library. Fixes: #2758 Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>	2018-07-03 18:22:14 +02:00
Sébastien Han	20c8065e48	ceph-iscsi: rename group iscsi_gws Let's try to avoid using dashes as testinfra needs to be able to read the groups. Typically, with iscsi-gws we can't add a marker for these iscsi nodes, using an underscore fixes the issue. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-06-08 10:21:54 +02:00
Guillaume Abrioux	232a16d77f	rolling_update: fix facts gathering delegation this is kind of follow up on what has been made in #2560. See #2560 and #2553 for details. Closes: #2708 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-06-06 16:36:30 +08:00
Vishal Kanaujia	08d9432454	Rolling upgrades should use norebalance flag for OSDs The rolling upgrades playbook should have norebalance flag set for OSDs upgrades to wait only for recovery. Fixes: #2657 Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>	2018-06-04 10:59:01 +02:00
Sébastien Han	e91648a7af	rolling_update: add role ceph-iscsi-gw Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1575829 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-05-26 02:38:47 -07:00
Paul Cuzner	2890b57cfc	Add privilege escalation to iscsi purge tasks Without the escalation, invocation from non-root users with fail when accessing the rados config object, or when attempting to log to /var/log Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1549004 Signed-off-by: Paul Cuzner <pcuzner@redhat.com>	2018-05-25 03:50:24 -07:00
Sébastien Han	da5b104098	rolling_update: fix get fsid for containers When running ansible2.4-update_docker_cluster there is an issue on the "get current fsid" task. The current task only works for non-containerized deployment but will run all the time (even for containerized). This currently results in the following error: TASK [get current fsid] ****************************************************** task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-luminous-ansible2.4-update_docker_cluster/rolling_update.yml:214 Tuesday 22 May 2018 22:48:32 +0000 (0:00:02.615) 0:11:01.035 ********* fatal: [mgr0 -> mon0]: FAILED! => { "changed": true, "cmd": [ "ceph", "--cluster", "test", "fsid" ], "delta": "0:05:00.260674", "end": "2018-05-22 22:53:34.555743", "rc": 1, "start": "2018-05-22 22:48:34.295069" } STDERR: 2018-05-22 22:48:34.495651 7f89482c6700 0 -- 192.168.17.10:0/1022712 >> 192.168.17.12:6789/0 pipe(0x7f8944067010 sd=4 :42654 s=1 pgs=0 cs=0 l=1 c=0x7f894405d510).connect protocol feature mismatch, my 83ffffffffffff < peer 481dff8eea4fffb missing 400000000000000 2018-05-22 22:48:34.495684 7f89482c6700 0 -- 192.168.17.10:0/1022712 >> 192.168.17.12:6789/0 pipe(0x7f8944067010 sd=4 :42654 s=1 pgs=0 cs=0 l=1 c=0x7f894405d510).fault This is not really representative on the real error since the 'ceph' cli is available on that machine. On other environments we will have something like "command not found: ceph". Signed-off-by: Sébastien Han <seb@redhat.com>	2018-05-23 04:44:12 +02:00
Guillaume Abrioux	9801bde4d4	purge_cluster: fix dmcrypt purge dmcrypt devices aren't closed properly, therefore, it may fail when trying to redeploy after a purge. Typical errors: ``` ceph-disk: Cannot discover filesystem type: device /dev/sdb1: Command '/sbin/blkid' returned non-zero exit status 2 ``` ``` ceph-disk: Error: unable to read dm-crypt key: /var/lib/ceph/osd-lockbox/c6e01af1-ed8c-4d40-8be7-7fc0b4e104cf: /etc/ceph/dmcrypt-keys/c6e01af1-ed8c-4d40-8be7-7fc0b4e104cf.luks.key ``` Closing properly dmcrypt devices allows to redeploy without error. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-21 08:23:10 +02:00
Guillaume Abrioux	415dc0a29b	take-over: fix bug when trying to override variable A customer has been facing an issue when trying to override `monitor_interface` in inventory host file. In his use case, all nodes had the same interface for `monitor_interface` name except one. Therefore, they tried to override this variable for that node in the inventory host file but the take-over-existing-cluster playbook was failing when trying to generate the new ceph.conf file because of undefined variable. Typical error: ``` fatal: [srvcto103cnodep01]: FAILED! => {"failed": true, "msg": "'dict object' has no attribute u'ansible_bond0.15'"} ``` Including variables like this `include_vars: group_vars/all.yml` prevent us from overriding anything in inventory host file because it overwrites everything you would have defined in inventory. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1575915 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-18 10:10:08 +02:00
Sébastien Han	49a4712485	switch: disable ceph-disk units During the transition from jewel non-container to container old ceph units are disabled. ceph-disk can still remain in some cases and will appear as 'loaded failed', this is not a problem although operators might not like to see these units failing. That's why we remove them if we find them. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1577846 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-05-17 08:48:28 +02:00
Guillaume Abrioux	a9247c4de7	purge_cluster: wipe all partitions In order to ensure there is no leftover after having purged a cluster, we must wipe all partitions properly. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-17 08:37:17 +02:00
Guillaume Abrioux	9cad113e2f	purge_cluster: fix bug when building device list there is some leftover on devices when purging osds because of a invalid device list construction. typical error: ``` changed: [osd3] => (item=/dev/sda sda1) => { "changed": true, "cmd": "# if the disk passed is a raw device AND the boot system disk\n if parted -s \"/dev/sda sda1\" print \| grep -sq boot; then\n echo \"Looks like /dev/sda sda1 has a boot partition,\"\n echo \"if you want to delete specific partitions point to the partition instead of the raw device\"\n echo \"Do not use your system disk!\"\n exit 1\n fi\n echo sgdisk -Z \"/dev/sda sda1\"\n echo dd if=/dev/zero of=\"/dev/sda sda1\" bs=1M count=200\n echo udevadm settle --timeout=600", "delta": "0:00:00.015188", "end": "2018-05-16 12:41:40.408597", "item": "/dev/sda sda1", "rc": 0, "start": "2018-05-16 12:41:40.393409" } STDOUT: sgdisk -Z /dev/sda sda1 dd if=/dev/zero of=/dev/sda sda1 bs=1M count=200 udevadm settle --timeout=600 STDERR: Error: Could not stat device /dev/sda sda1 - No such file or directory. ``` the devices list in the task `resolve parent device` isn't built properly because the command used to resolve the parent device doesn't return the expected output eg: ``` changed: [osd3] => (item=/dev/sda1) => { "changed": true, "cmd": "echo /dev/$(lsblk -no pkname \"/dev/sda1\")", "delta": "0:00:00.013634", "end": "2018-05-16 12:41:09.068166", "item": "/dev/sda1", "rc": 0, "start": "2018-05-16 12:41:09.054532" } STDOUT: /dev/sda sda1 ``` For instance, it will result with a devices list like: `['/dev/sda sda1', '/dev/sdb', '/dev/sdc sdc1']` where we expect to have: `['/dev/sda', '/dev/sdb', '/dev/sdc']` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-17 08:37:17 +02:00
Sébastien Han	d80a871a07	rolling_update: move osd flag section During a minor update from a jewel to a higher jewel version (10.2.9 to 10.2.10 for example) osd flags don't get applied because they were done in the mgr section which is skipped in jewel since this daemons does not exist. Moving the set flag section after all the mons have been updated solves that problem. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1548071 Co-authored-by: Tomas Petr <tpetr@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com>	2018-05-17 08:17:16 +02:00
Guillaume Abrioux	1b4c3f292d	rolling_update: fix dest path for mgr keys fetching the role `ceph-mgr` that is played later in the playbook fails because the destination path for the fetched keys is wrong. This patch fix the destination path used in the task `fetch ceph mgr key(s)` so there is no mismatch. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-15 19:30:34 +02:00
Guillaume Abrioux	3b89f1bfb1	rolling_update: get fsid in mgr pre_task {{ fsid }} points to {{ cluster_uuid.stdout }} which is not defined in this part of the rolling_update playbook. Since we need to call {{ fsid }} we must get the fsid and register it to `cluster_uuid`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-05-15 09:01:42 +02:00
Sébastien Han	52fc8a0385	rolling_update: move mgr key creation Until all the mons haven't been updated to Luminous, there is no way to create a key. So we should do the key creation in the mon role only if we are not part of an update. If we are then the key creation is done after the mons upgrade to Luminous. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-05-15 09:01:42 +02:00
Guillaume Abrioux	adeecc51f8	switch: fix ceph_uid fact for osd In addition to b324c17 this commit fix the ceph uid for osd role in the switch from non containerized to containerized playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-30 08:15:18 +02:00
Sébastien Han	5fa92804f9	switch: resolve device path so we can umount the osd data dir If we don't do this, umounting devices declared like this /dev/disk/by-id/ata-QEMU_HARDDISK_QM00001 will fail like: umount: /dev/disk/by-id/ata-QEMU_HARDDISK_QM000011: mountpoint not found Since we append '1' (partition 1), this won't work. So we need to resolved the link to get something like /dev/sdb and then append 1 to /dev/sdb1 Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-30 08:15:18 +02:00
Sébastien Han	767abb5de0	switch: fix ceph_uid fact Latest is now centos not ubuntu anymore so the condition was wrong. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-30 08:15:18 +02:00
Sébastien Han	85732d11b9	mon/client: remove acl code Applying ACL on the keyrings is not used anymore so let's remove this code. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-23 18:34:58 +02:00
Sébastien Han	66c1ea8cd5	shrink-osd: ability to shrink NVMe drives Now if the service name contains nvme we know we need to remove the last 2 character instead of 1. If nvme then osd_to_kill_disks is nvme0n1, we need nvme0 If ssd or hdd then osd_to_kill_disks is sda1, we need sda Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1561456 Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-20 15:08:29 +02:00
Sébastien Han	641f141c0f	selinux: remove chcon calls We know bindmount with the :z option at the end of the -v command so this will basically run the exact same command as we used to run. So to speak: chcon -Rt svirt_sandbox_file_t /var/lib/ceph Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-19 14:59:37 +02:00
Sébastien Han	473939d215	infra: add playbook example for ceph_key module Helper playbook to manage CephX keys. Signed-off-by: Sébastien Han <seb@redhat.com>	2018-04-11 12:18:34 +02:00
Andrew Schoen	08f4875533	ceph_volume: refactor to not run ceph osd destroy This changes state to action and gives the options 'create' or 'zap'. The zap parameter is also removed. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-04-10 14:19:21 +02:00
Andrew Schoen	c6e8f8fb11	purge-cluster: no need to use objectstore for ceph_volume module When zapping objectstore is not required. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-04-10 14:19:21 +02:00
Andrew Schoen	c29a75ac7f	purge-cluster: use ceph_volume module to zap and destroy OSDs Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-04-10 14:19:21 +02:00
Randy J. Martinez	d1f2d64b15	purge-docker: added conditionals needed to successfully re-run purge Added 'ignore_errors: true' to multiple lines which run docker commands; even in cases where docker is no longer installed. Because of this, certain tasks in the purge-docker-cluster.yml will cause the playbook to fail if re-run and stop the purge. This leaves behind a dirty environment, and a playbook which can no longer be run. Fix Regex line 275: Sometimes 'list-units' will output 4 spaces between loaded+active. The update will account for both scenarios. purge fetch_directory: in other roles fetch_directory is hard linked ex.: "{{ fetch_directory }}"/"{{ somedir }}". That being said, fetch_directory will never have a trailing slash in the all.yml so this task was never being run(causing failures when trying to re-deploy). Signed-off-by: Randy J. Martinez <ramartin@redhat.com>	2018-04-10 13:39:14 +02:00
Guillaume Abrioux	e32a177af8	purge-docker: remove redundant task The `remove_packages` prompt is redundant to the `ireallymeanit` prompt since it does exactly the same thing. I guess the only goal of this task was to make a break to warn user about `--skip-tags=with_pkg` feature. This warning should be part of the first prompt. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-04-03 11:54:42 +02:00
Andy McCrae	60d4b75f51	Cleanup plugins directories and references Having callback_plugins, and action plugins in random locations causes a lot of disparity. We should centralize this into one place in the plugins directory and fix up the ansible.cfg to reflect this. Additionally, since the ansible.cfg already reflects action_plugins, we don't need a link to action_plugins in the base of the repository.	2018-03-14 11:15:39 +01:00
jtudelag	691f7c5146	Adds handy ceph aliases whe containerized installations. Same approach as openshift-ansible etcdctl: * https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/etcd/tasks/auxiliary/drop_etcdctl.yml * https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/etcd/etcdctl.sh	2018-03-08 13:56:39 +01:00
Guillaume Abrioux	c04e67347c	update: look for short and fqdn in ceph_health_raw According to hostname configuration, the task waiting for mons to be in quorum might fail. The idea here is to look for both shortname and fqdn in `ceph_health_raw` instead of just `ansible_hostname` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1546127 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-02-19 10:27:47 +01:00
Andrew Schoen	699c777e68	rolling update: fix undefined jewel_minor_update failure Variables set at the play level with ``vars`` do not carry over into the next play in the playbook. The var jewel_minor_update was set in a previous play but used in this one and was failing because it was not defined. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1544029 Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-02-13 17:03:05 +01:00
Andrew Schoen	7c7017ebe6	infra: do not include host_vars/* in take-over-existing-cluster.yml These are better collected by ansible automatically. This would also fail if the host_var file didn't exist. Signed-off-by: Andrew Schoen <aschoen@redhat.com>	2018-02-12 11:48:47 +01:00
Guillaume Abrioux	3b2f6c34e4	purge-docker: fix ceph-osd-zap name container the `zap ceph osd disks` task should iter on `resolved_parent_device` instead of `combined_devices_list` which contain only the base device name (vs. full path name in `combined_devices_list`). this fixes the issue where docker complain about container name because of illegal characters such as `/` : ``` "/usr/bin/docker-current: Error response from daemon: Invalid container name (ceph-osd-zap-magna074-/dev/sdb1), only [a-zA-Z0-9][a-zA-Z0-9_.-] are allowed.","See '/usr/bin/docker-current run --help'." "" ``` having the the basename of the device path is enough for the container name. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1540137 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-02-02 22:09:11 +01:00
Guillaume Abrioux	dd0c98c5a2	common: do not use `shell` module when it is not needed There is no need here to use `shell` instead of `command` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2018-01-31 10:45:34 +01:00

... 3 4 5 6 7 ...

692 Commits (e31363ea9b7b39d0ea34a26f693ed58874caa42d)