ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	0a0cdc0963	purge: ensure no ceph kernel thread is present This tries to first unmount any cephfs/nfs-ganesha mount point on client nodes, then unmap any mapped rbd devices and finally it tries to remove ceph kernel modules. If it fails it means some resources are still busy and should be cleaned manually before continuing to purge the cluster. This is done early in the playbook so the cluster stays untouched until everything is ready for that operation, otherwise if you try to redeploy a cluster it could end up by getting confused by leftover from previous deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1337915 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `20e4852888`)	2019-06-24 13:20:50 +02:00
Guillaume Abrioux	77d24203fa	upgrade: accept HEALTH_OK and HEALTH_WARN as valid state `3a100cfa52` introduced a check which is a bit too restrictive, let's accept HEALTH_OK and HEALTH_WARN. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `6dce51183b`)	2019-06-21 15:47:33 +00:00
Dimitri Savineau	aa197f77fc	remove ceph restapi references The ceph restapi configuration was only available until Luminous release so we don't need those leftovers for nautilus+. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `da8b7ab7fb`)	2019-06-20 15:15:10 -04:00
Guillaume Abrioux	b93064c7c8	rolling_update: fail early if cluster state is not OK starting an upgrade if the cluster isn't HEALTH_OK isn't a good idea. Let's check for the cluster status before trying to upgrade. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3a100cfa52`)	2019-06-19 08:41:25 +00:00
Guillaume Abrioux	53dd58e84c	rolling_update: only mask and stop unit in mgr part Otherwise it fails like following: ``` fatal: [mon0]: FAILED! => changed=false msg: \|- Unable to enable service ceph-mgr@mon0: Failed to execute operation: Cannot send after transport endpoint shutdown ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `51b2813e04`)	2019-06-19 08:41:25 +00:00
Dimitri Savineau	6e565b251d	remove ceph-agent role and references The ceph-agent role was used only for RHCS 2 (jewel) so it's not usefull anymore. The current code will fail on CentOS distribution because the rhscon package is only avaible on Red Hat with the RHCS 2 repository and this ceph release is supported on stable-3.0 branch. Resolves: #4020 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `7503098ca0`)	2019-06-17 15:56:00 -04:00
L3D	1daca1ba83	ansible: use 'bool' filter on boolean conditionals By running ceph-ansible there are a lot ``[DEPRECATION WARNING]`` like these: ``` [DEPRECATION WARNING]: evaluating containerized_deployment as a bare variable, this behaviour will go away and you might need to add \|bool to the expression in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. ``` Now appended ``\| bool`` on a lot of the affected variables. Sometimes the coding style from ``variable\|bool`` changed to ``variable \| bool`` (with spaces at the pipe). Closes: #4022 Signed-off-by: L3D <l3d@c3woc.de> (cherry picked from commit `ab54fe20ec`)	2019-06-07 16:05:51 +02:00
Dimitri Savineau	7a384e7ec2	purge-cluster: clean all ceph repo files We currently only purge rh_storage yum repository file but depending on the ceph_repository value we are using, the ceph repository file could have a different name. Resolves: #4056 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `44c63903ca`)	2019-06-07 12:05:40 +00:00
guihecheng	a6312ba9bc	Add section for purging rgw loadbalancer in purge-cluster.yml Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com> (cherry picked from commit `59e702ec39`)	2019-06-06 19:44:30 +00:00
Guillaume Abrioux	16c6d530c6	roles: introduce `ceph-container-engine` role This commit splits the current `ceph-container-common` role. This introduces a new role `ceph-container-engine` which handles the tasks specific to the installation of containers tools (docker/podman). This is needed for the ceph-dashboard implementation for 2 main reasons: 1/ Since the ceph-dashboard stack is only containerized, we must install everything needed to run containers even in non containerized deployments. Splitting this role allows us to not have to call the full `ceph-container-common` role which would run a bunch of unneeded tasks that would have been skipped anyway. 2/ The current implementation would have required to run `ceph-container-common` on all ceph-clients nodes which would have been conflicting with `9d3517c670` (we don't want to run ceph-container-common on all client nodes, see mentioned commit for more details) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `55420d6253`)	2019-05-22 15:24:11 -04:00
Guillaume Abrioux	d83db2c8ed	switch to ansible 2.8 - remove private attribute with import_role. - update documentation. - update rpm spec requirement. - fix MagicMock python import in unit tests. Closes: #3765 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `72d8315299`)	2019-05-21 09:17:46 +02:00
Dimitri Savineau	023cdffd95	purge-docker-cluster: don't remove data on atomic Because we don't manage the docker service on atomic (yet) via the ceph-container-common role then we can't stop docker dans remove the data. For now let's do that only for non atomic hosts. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `638604929b`)	2019-05-17 10:44:52 -04:00
Guillaume Abrioux	e29fd842a6	rename docker_exec_cmd variable This commit renames the `docker_exec_cmd` variable to `container_exec_cmd` so it's more generic. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e74d80e72f`)	2019-05-17 16:05:58 +02:00
Zack Cerza	0496ce8e5c	purge-docker-cluster.yml: Default lvm_volumes We were failing when that variable is unset; purge-cluster.yml contains this workaround. Signed-off-by: Zack Cerza <zack@redhat.com> (cherry picked from commit `9b4339a2ba`)	2019-05-17 16:05:58 +02:00
Boris Ranto	5ac7559736	Merge cephmetrics/dashboard-ansible repo This commit will merge dashboard-ansible installation scripts with ceph-ansible. This includes several new roles to setup ceph-dashboard and the underlying technologies like prometheus and grafana server. Signed-off-by: Boris Ranto & Zack Cerza <team-gmeno@redhat.com> Co-authored-by: Zack Cerza <zcerza@redhat.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `2f141a6e80`)	2019-05-17 16:05:58 +02:00
wumingqiao	30b1ca9aeb	shrink_osd: mark all osd(s) out in one command Signed-off-by: wumingqiao <wumingqiao@beyondcent.com> (cherry picked from commit `5320aa11c4`)	2019-05-15 21:44:30 -04:00
Dimitri Savineau	1e23d853f9	purge-docker-cluster: remove docker data We never clean the content of /var/lib/docker so we can still have some data present in this directory after run the purge playbook. Pip isn't used anymore. Also update the docker package name (especially the python binding one). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `168d7cd016`)	2019-05-14 11:00:30 +02:00
Dimitri Savineau	6814fd5ce5	gather-ceph-logs: fix logs list generation The shell module doesn't have a stdout_lines attributes. Instead of using the shell module, we can use the find modules. Also adding `become: false` to the local tmp directory creation otherwise we won't have enough right to fetch the files into this directory. Resolves: #3966 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ea1f8f551c`)	2019-05-13 10:33:26 -04:00
Mike Christie	78a55a3df3	igw: Fix rolling update service ordering We must stop tcmu-runner after the other rbd-target-* services because they may need to interact with tcmu-runner during shutdown. There is also a bug in some kernels where IO can get stuck in the kernel and by stopping rbd-target-* first we can make sure all IO is flushed. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1659611 Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `d7ef12910e`)	2019-05-10 15:53:44 +02:00
Rishabh Dave	b6d5352783	remove infrastructure-playbooks/rgw-standalone.yml We don't need infrastructure-playbooks/rgw-standalone.yml since site.yml.sample and site-cotainer.yml.sample can add a new RGW node to an already deployed Ceph cluster. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `6e8fb2b3ea`)	2019-05-07 13:11:48 +02:00
letterwuyu	27a8179cd8	Fix comment content Signed-off-by: lishuhao letterwuyu@gmail.com (cherry picked from commit `d57f6fcdc6`)	2019-05-07 11:11:22 +02:00
Rishabh Dave	06b3ab2a6b	improve coding style Keywords requiring only one item shouldn't express it by creating a list with single item. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `739a662c80`) Conflicts: roles/ceph-mon/tasks/ceph_keys.yml roles/ceph-validate/tasks/check_devices.yml	2019-05-06 15:09:06 +00:00
Dimitri Savineau	92340d049c	rolling_update: restart all ceph-iscsi services Currently only rbd-target-gw service is restarted during an update. We also need to restart tcmu-runner and rbd-target-api services during the ceph iscsi upgrade. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1659611 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `f1048627ea`)	2019-04-30 12:09:52 -04:00
Andrew Schoen	f1e04835f4	rolling_update: ceph commands should use --cluster Signed-off-by: Andrew Schoen <aschoen@redhat.com> (cherry picked from commit `e2529dcd7f`)	2019-04-18 19:12:13 +02:00
Andrew Schoen	545d93aae8	rolling_update: set num_osds to the number of running osds We do this so that the ceph-config role can most accurately report the number of osds for the generation of the ceph.conf file. We don't want to use ceph-volume to determine the number of osds because in an upgrade to nautilus ceph-volume won't be able to accurately count osds created by ceph-disk. Signed-off-by: Andrew Schoen <aschoen@redhat.com> (cherry picked from commit `67453853ff`)	2019-04-18 19:12:13 +02:00
Andrew Schoen	c28388bb06	rolling_update: migrate ceph-disk osds to ceph-volume When upgrading to nautlius run ``ceph-volume simple scan`` and ``ceph-volume simple activate --all`` to migrate any running ceph-disk osds to ceph-volume. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1656460 Signed-off-by: Andrew Schoen <aschoen@redhat.com> (cherry picked from commit `28c47e4d1b`)	2019-04-18 19:12:13 +02:00
Guillaume Abrioux	35afd6a63a	update: ensure tasks are executed on an upgraded mon These tasks must be run from a monitor which is upgraded otherwise it might fail. See: https://tracker.ceph.com/issues/39355 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `7eb42c9e8e`)	2019-04-18 19:10:10 +02:00
Guillaume Abrioux	495711f296	update: ensure ceph command returns 0 these commands could return something else than 0. Let's ensure all retries have been done before actually failing. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `ed84325b1d`)	2019-04-18 19:10:10 +02:00
Guillaume Abrioux	4a678ac102	update: set osd flags before upgrading any mon Typical error: ``` failed: [mon0 -> mon2] (item=noout) => changed=true cmd: - ceph - --cluster - ceph - osd - set - noout delta: '0:00:00.293756' end: '2019-04-17 06:31:57.552386' item: noout msg: non-zero return code rc: 1 start: '2019-04-17 06:31:57.258630' stderr: \|- Traceback (most recent call last): File "/bin/ceph", line 1222, in <module> retval = main() File "/bin/ceph", line 1146, in main sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli') File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs cmd['sig'] = parse_funcsig(cmd['sig']) File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig raise JsonFormat(s) ceph_argparse.JsonFormat: unknown type CephBool stderr_lines: - 'Traceback (most recent call last):' - ' File "/bin/ceph", line 1222, in <module>' - ' retval = main()' - ' File "/bin/ceph", line 1146, in main' - ' sigdict = parse_json_funcsigs(outbuf.decode(''utf-8''), ''cli'')' - ' File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs' - ' cmd[''sig''] = parse_funcsig(cmd[''sig''])' - ' File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig' - ' raise JsonFormat(s)' - 'ceph_argparse.JsonFormat: unknown type CephBool' stdout: '' stdout_lines: <omitted> ``` Having mixed versions of monitors seems to cause this error. Moving these tasks before any monitor gets upgraded seems to be enough to get around this issue. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `543d1e2e41`)	2019-04-18 19:10:10 +02:00
Rishabh Dave	72309b49fe	allow adding a monitor to a deployed cluster Add a playbook that deploys a new monitor on a new node, adds that node to the Ceph cluster and the monitor to the quorum and updates the ceph configuration file on OSD nodes. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `d5967af7fb`)	2019-04-16 11:14:21 +02:00
Dimitri Savineau	1c3fbe5a60	purge-cluster: remove python-ceph-argparse package When using purge-cluster playbook with nautilus, there's still the python-ceph-argparse package installed on the host preventing to reinstall a ceph cluster with a different version (like luminous or mimic) Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `eb658b3af6`)	2019-04-15 17:32:22 +02:00
Dimitri Savineau	f90c051589	switch-from-non-containerized: stop all osds `e6bfb84` introduced a regression in the switch from non containerized to container deployment. We need to stop all previous OSDs services. We just don't need the ceph-disk pattern in the regex. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `150acba8c5`)	2019-04-12 00:45:21 +00:00
Guillaume Abrioux	f8c544c4a8	purge: remove references to ceph-disk as of stable-4.0, ceph-disk is no longer supported. These tasks aren't needed anymore. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `a1254d767c`)	2019-04-12 00:45:21 +00:00
Guillaume Abrioux	f1ede335e4	shrink-osd: remove legacy playbook as of stable-4.0, ceph-disk is no longer supported. Let's remove this legacy version of the playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `73aa788459`)	2019-04-12 00:45:21 +00:00
Guillaume Abrioux	f5478dcc0b	switch_to_containers: remove ceph-disk references as of stable-4.0, ceph-disk is no longer supported. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e6bfb843f4`)	2019-04-12 00:45:21 +00:00
Guillaume Abrioux	4a663e1fc0	osd: remove variable osd_scenario As of stable-4.0, the only valid scenario is `lvm`. Thus, this makes this variable useless. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4d35e9eeed`)	2019-04-12 00:45:21 +00:00
Guillaume Abrioux	2581c4d511	update: fix undefined error when no mgr group is declared if mgr group isn't defined in inventory, that task will fail with undefined error. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c1e4529b0e`)	2019-04-11 09:20:22 -04:00
Dimitri Savineau	532d749b2e	rolling_update: Remove ceph aliases ceph aliases have been introduced in stable-3.2 during the ceph deployment. On master this has been removed but we don't handle this removal in the upgrade from stable-3.2 to master via the rolling_update playbook. Also remove the task from purge-docker-cluster missing from `d9e7835` Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `57b4e76d11`)	2019-04-10 00:02:35 +00:00
Guillaume Abrioux	b723ef3fa2	purge: fix lvm-batch purge osd `lvm_volumes` and/or `devices` variable(s) can be undefined depending on the scenario chosen. These tasks should be run only if these variable are defined, otherwise it ends up with undefined variable errors. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `0180738313`)	2019-04-04 03:38:52 +02:00
Guillaume Abrioux	f55e2b08be	remove all NBSPs on master branch Similar to #3658 Since there's too many changes between master and stable branches let's commit directly in each branches instead of trying to backport this commit. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-28 11:57:55 +00:00
Dimitri Savineau	c8442f3705	rolling_update: Update systemd unit regex for nvme The systemd unit regex doesn't handle nvme devices (/dev/nvmeXn1). Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1687828 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2019-03-26 12:01:00 +00:00
Guillaume Abrioux	78aac3e96a	update: followup on `edfdc49` all rgw instances should be stopped according to the multiple rgw instances support added in rolling_update.yml Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	f6e0185146	update: add containerized deployment upgrade support (L->N) Add a couple of fixes to allow containerized deployments upgrade support to upgrade from luminous/mimic to nautilus. - pass CEPH_CONTAINER_IMAGE and CEPH_CONTAINER_BINARY environment variable to the ceph_key module, - fix the docker exec command in 'waiting for the containerized monitor to join the quorum' task according to the `delegate_to` parameter, - override `docker_exec_cmd` in `ceph-facts` with `mon_host` when rolling_update is `True`, - do not run unnecessarily `create_mds_filesystems.yml` when performing an upgrade. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	1816b876ee	update: add missing hosts in facts gathering iscsigws were missing. The 'complete upgrade' couldn't complete because rolling_update was set to False for iscsigw nodes. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	45ba90c169	update: remove rbdmirror legacy task This task is no longer needed for next release. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	0ea0adf039	update: show all daemons version at the end Let's display all daemons version at the end of the playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	f31d6d9485	update: enable new nautilus-only functionality once the cluster is upgraded to nautilus, we can complete the process by disallowing pre-nautilus OSDs and enabling all new nautilus-only functionality Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	afdaa70a63	update: enable msgr2 protocol This commit enable the msgr2 protocol when the cluster is fully upgraded to nautilus Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	ef096dd021	update: ensure mgrs are upgraded after ALL monitors As of `1c760904b0`, ceph-ansible implicitly bootstrap managers on monitors. mgrs must be upgraded only after all monitors, therefore, this commit refact the way mgrs are upgraded to be sure we don't upgrade a mgr during the monitors upgrade. This commit also ensure we handle the case were we split managers on dedicated nodes. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00
Guillaume Abrioux	7fa2434f0f	update: ensure /var/lib/ceph/bootstrap-rbd-mirror is present This directory is created by ceph-config node by node. In the upgrade context we need it to be created on ALL monitors as soon as the first iteration because of the task right after which creates and sends the keyrings on all monitors. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2019-03-25 16:02:56 -04:00

1 2 3 4 5 ...

450 Commits (df0d146166e3bfa25e1b3de2e6fbd6c2472aee95)