ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Kevin Jones	3a8de9cc36	Set proper ownership command performance improvement By changing the set ownership command from using the file module in combination with a with_items loop to a raw chown command, we can achieve a 98% performance increase here. On a ceph cluster with a significant amount of directories and files in /var/lib/ceph, the file module has to run checks on ownership of all those directories and files to determine whether a change is needed. In this case, we just want to explicitly set the ownership of all these directories and files to the ceph_uid Added context note to all set proper ownership tasks Signed-off-by: Kevin Jones <kevinjones@redhat.com> (cherry picked from commit `47bf47c9d8`)	2019-08-22 12:59:58 +02:00
Guillaume Abrioux	236020fb2b	shrink-mon: refact 'verify the monitor is out of the cluster' task use `from_json` filter instead of a `\| python` so we can get rid of the `shell` module usage here. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5573f17e76`)	2019-08-19 18:47:14 +00:00
Rishabh Dave	b28ed96378	use pre_tasks and post_tasks in shrink-mon.yml too This commit should've been part of commit `2fb12ae554`. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `2034387f57`)	2019-08-19 18:47:14 +00:00
Guillaume Abrioux	2f77704591	common: use discovered_interpreter_python fact in order to use the right binary name when using python cli in command or shell module. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `13815ad3ca`)	2019-08-19 18:47:14 +00:00
Dimitri Savineau	f9d9ffac8f	dashboard: run dashboard role on mgr/mon nodes We don't need to execute the ceph-dashboard role on the nodes present in the grafana-server group. This one is dedicated to the grafana and prometheus stack. The ceph-dashboard needs to executed where the ceph-mgr is running. It is either on the dedicated mgr nodes or if mgr and mon are collocated implicitly on the mon nodes. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `16939eff9e`)	2019-08-08 13:47:09 +02:00
Rishabh Dave	72a062b6fa	add a playbook the remove rgw from a given node Add a playbook named shrink-rgw.yml to infrastructure-playbooks/ that can remove a RGW from a node in an already deployed Ceph cluster. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `632a44bdf2`)	2019-07-31 15:25:15 -04:00
Rishabh Dave	8ca88b41cc	infra-playbooks: rewite a condition for better readability Use facility built-in in Ansible to check whether a command was executed successfully rather looking at its return value. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `5aecdd3ba6`)	2019-07-29 15:52:29 +02:00
Guillaume Abrioux	d0ad1cf0f1	dashboard: use dedicated group only There's no need to add complexity and trying to fallback on other group. Let's deploy dashboard on all nodes present in grafana-server group. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d67230b2a2`)	2019-07-29 15:46:58 +02:00
Dimitri Savineau	dd87db70ca	dashboard: move code into a dedicated playbook Move dashboard, grafana/prometheus and node-exporter plays into a dedicated playbook in infrastructure-playbook directory. To avoid using 'dashboard_enabled \| bool' condition multiple time in the main playbook we can just import the dashboard playbook or not. This patch also allows to use an unique dashboard playbook for both baremetal and container playbooks. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `43135840b1`)	2019-07-29 15:46:58 +02:00
Dimitri Savineau	43d625b59a	Remove NBSP characters Some NBSP are still present in the yaml files. Adding a test in travis CI. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `07c6695d16`)	2019-07-26 16:23:41 -04:00
Guillaume Abrioux	bee8a31afe	shrink-rbdmirror: check if rbdmirror is well removed from cluster This commits adds a check to ensure the daemon has been removed from the cluster. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `916dc1f52f`)	2019-07-16 15:02:49 +02:00
Rishabh Dave	0a15d1d112	add a playbook that removes rbd-mirror from a node Add a playbook named "shrink-rbdmirror.yml" in infrastructure-playbooks/ that removes a RBD Mirror from a node in an already deployed Ceph cluster. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `c4824acb19`)	2019-07-16 15:02:49 +02:00
Rishabh Dave	6197d1c8d9	add a playbook that removes manager from a node Add a playbook, named "shrink-mgr.yml", in infrastructure-playbooks/ that removes a MGR from a node in an already deployed Ceph cluster. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `f4ea75051b`)	2019-07-09 15:00:56 +00:00
Guillaume Abrioux	85a448429d	shrink-mds: refact post tasks This commit refacts the way we check the "mds_to_kill" node is well stopped. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-authored-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `7df62fde34`)	2019-07-09 12:07:47 +02:00
Rishabh Dave	38c2785e95	add a playbook that removes mds from a node Add a playbook, named "shrink-mds.yml", in infrastructure-playbooks/ that removes a MDS from a node in an already deployed Ceph cluster. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `235b1fccc6`)	2019-07-09 12:07:47 +02:00
Mike Christie	cf6050d4e6	igw: Support new ceph-iscsi package during purge The ceph-iscsi-config and ceph-iscsi-cli packages were combined into ceph-iscsi and its APIs changed. This fixes up the iscsi purge task to support the new API and old one. Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `b163206db7`)	2019-07-04 00:04:04 +00:00
Guillaume Abrioux	0a0cdc0963	purge: ensure no ceph kernel thread is present This tries to first unmount any cephfs/nfs-ganesha mount point on client nodes, then unmap any mapped rbd devices and finally it tries to remove ceph kernel modules. If it fails it means some resources are still busy and should be cleaned manually before continuing to purge the cluster. This is done early in the playbook so the cluster stays untouched until everything is ready for that operation, otherwise if you try to redeploy a cluster it could end up by getting confused by leftover from previous deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1337915 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `20e4852888`)	2019-06-24 13:20:50 +02:00
Guillaume Abrioux	77d24203fa	upgrade: accept HEALTH_OK and HEALTH_WARN as valid state `3a100cfa52` introduced a check which is a bit too restrictive, let's accept HEALTH_OK and HEALTH_WARN. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `6dce51183b`)	2019-06-21 15:47:33 +00:00
Dimitri Savineau	aa197f77fc	remove ceph restapi references The ceph restapi configuration was only available until Luminous release so we don't need those leftovers for nautilus+. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `da8b7ab7fb`)	2019-06-20 15:15:10 -04:00
Guillaume Abrioux	b93064c7c8	rolling_update: fail early if cluster state is not OK starting an upgrade if the cluster isn't HEALTH_OK isn't a good idea. Let's check for the cluster status before trying to upgrade. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3a100cfa52`)	2019-06-19 08:41:25 +00:00
Guillaume Abrioux	53dd58e84c	rolling_update: only mask and stop unit in mgr part Otherwise it fails like following: ``` fatal: [mon0]: FAILED! => changed=false msg: \|- Unable to enable service ceph-mgr@mon0: Failed to execute operation: Cannot send after transport endpoint shutdown ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `51b2813e04`)	2019-06-19 08:41:25 +00:00
Dimitri Savineau	6e565b251d	remove ceph-agent role and references The ceph-agent role was used only for RHCS 2 (jewel) so it's not usefull anymore. The current code will fail on CentOS distribution because the rhscon package is only avaible on Red Hat with the RHCS 2 repository and this ceph release is supported on stable-3.0 branch. Resolves: #4020 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `7503098ca0`)	2019-06-17 15:56:00 -04:00
L3D	1daca1ba83	ansible: use 'bool' filter on boolean conditionals By running ceph-ansible there are a lot ``[DEPRECATION WARNING]`` like these: ``` [DEPRECATION WARNING]: evaluating containerized_deployment as a bare variable, this behaviour will go away and you might need to add \|bool to the expression in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. ``` Now appended ``\| bool`` on a lot of the affected variables. Sometimes the coding style from ``variable\|bool`` changed to ``variable \| bool`` (with spaces at the pipe). Closes: #4022 Signed-off-by: L3D <l3d@c3woc.de> (cherry picked from commit `ab54fe20ec`)	2019-06-07 16:05:51 +02:00
Dimitri Savineau	7a384e7ec2	purge-cluster: clean all ceph repo files We currently only purge rh_storage yum repository file but depending on the ceph_repository value we are using, the ceph repository file could have a different name. Resolves: #4056 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `44c63903ca`)	2019-06-07 12:05:40 +00:00
guihecheng	a6312ba9bc	Add section for purging rgw loadbalancer in purge-cluster.yml Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com> (cherry picked from commit `59e702ec39`)	2019-06-06 19:44:30 +00:00
Guillaume Abrioux	16c6d530c6	roles: introduce `ceph-container-engine` role This commit splits the current `ceph-container-common` role. This introduces a new role `ceph-container-engine` which handles the tasks specific to the installation of containers tools (docker/podman). This is needed for the ceph-dashboard implementation for 2 main reasons: 1/ Since the ceph-dashboard stack is only containerized, we must install everything needed to run containers even in non containerized deployments. Splitting this role allows us to not have to call the full `ceph-container-common` role which would run a bunch of unneeded tasks that would have been skipped anyway. 2/ The current implementation would have required to run `ceph-container-common` on all ceph-clients nodes which would have been conflicting with `9d3517c670` (we don't want to run ceph-container-common on all client nodes, see mentioned commit for more details) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `55420d6253`)	2019-05-22 15:24:11 -04:00
Guillaume Abrioux	d83db2c8ed	switch to ansible 2.8 - remove private attribute with import_role. - update documentation. - update rpm spec requirement. - fix MagicMock python import in unit tests. Closes: #3765 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `72d8315299`)	2019-05-21 09:17:46 +02:00
Dimitri Savineau	023cdffd95	purge-docker-cluster: don't remove data on atomic Because we don't manage the docker service on atomic (yet) via the ceph-container-common role then we can't stop docker dans remove the data. For now let's do that only for non atomic hosts. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `638604929b`)	2019-05-17 10:44:52 -04:00
Guillaume Abrioux	e29fd842a6	rename docker_exec_cmd variable This commit renames the `docker_exec_cmd` variable to `container_exec_cmd` so it's more generic. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e74d80e72f`)	2019-05-17 16:05:58 +02:00
Zack Cerza	0496ce8e5c	purge-docker-cluster.yml: Default lvm_volumes We were failing when that variable is unset; purge-cluster.yml contains this workaround. Signed-off-by: Zack Cerza <zack@redhat.com> (cherry picked from commit `9b4339a2ba`)	2019-05-17 16:05:58 +02:00
Boris Ranto	5ac7559736	Merge cephmetrics/dashboard-ansible repo This commit will merge dashboard-ansible installation scripts with ceph-ansible. This includes several new roles to setup ceph-dashboard and the underlying technologies like prometheus and grafana server. Signed-off-by: Boris Ranto & Zack Cerza <team-gmeno@redhat.com> Co-authored-by: Zack Cerza <zcerza@redhat.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `2f141a6e80`)	2019-05-17 16:05:58 +02:00
wumingqiao	30b1ca9aeb	shrink_osd: mark all osd(s) out in one command Signed-off-by: wumingqiao <wumingqiao@beyondcent.com> (cherry picked from commit `5320aa11c4`)	2019-05-15 21:44:30 -04:00
Dimitri Savineau	1e23d853f9	purge-docker-cluster: remove docker data We never clean the content of /var/lib/docker so we can still have some data present in this directory after run the purge playbook. Pip isn't used anymore. Also update the docker package name (especially the python binding one). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `168d7cd016`)	2019-05-14 11:00:30 +02:00
Dimitri Savineau	6814fd5ce5	gather-ceph-logs: fix logs list generation The shell module doesn't have a stdout_lines attributes. Instead of using the shell module, we can use the find modules. Also adding `become: false` to the local tmp directory creation otherwise we won't have enough right to fetch the files into this directory. Resolves: #3966 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ea1f8f551c`)	2019-05-13 10:33:26 -04:00
Mike Christie	78a55a3df3	igw: Fix rolling update service ordering We must stop tcmu-runner after the other rbd-target-* services because they may need to interact with tcmu-runner during shutdown. There is also a bug in some kernels where IO can get stuck in the kernel and by stopping rbd-target-* first we can make sure all IO is flushed. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1659611 Signed-off-by: Mike Christie <mchristi@redhat.com> (cherry picked from commit `d7ef12910e`)	2019-05-10 15:53:44 +02:00
Rishabh Dave	b6d5352783	remove infrastructure-playbooks/rgw-standalone.yml We don't need infrastructure-playbooks/rgw-standalone.yml since site.yml.sample and site-cotainer.yml.sample can add a new RGW node to an already deployed Ceph cluster. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `6e8fb2b3ea`)	2019-05-07 13:11:48 +02:00
letterwuyu	27a8179cd8	Fix comment content Signed-off-by: lishuhao letterwuyu@gmail.com (cherry picked from commit `d57f6fcdc6`)	2019-05-07 11:11:22 +02:00
Rishabh Dave	06b3ab2a6b	improve coding style Keywords requiring only one item shouldn't express it by creating a list with single item. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `739a662c80`) Conflicts: roles/ceph-mon/tasks/ceph_keys.yml roles/ceph-validate/tasks/check_devices.yml	2019-05-06 15:09:06 +00:00
Dimitri Savineau	92340d049c	rolling_update: restart all ceph-iscsi services Currently only rbd-target-gw service is restarted during an update. We also need to restart tcmu-runner and rbd-target-api services during the ceph iscsi upgrade. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1659611 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `f1048627ea`)	2019-04-30 12:09:52 -04:00
Andrew Schoen	f1e04835f4	rolling_update: ceph commands should use --cluster Signed-off-by: Andrew Schoen <aschoen@redhat.com> (cherry picked from commit `e2529dcd7f`)	2019-04-18 19:12:13 +02:00
Andrew Schoen	545d93aae8	rolling_update: set num_osds to the number of running osds We do this so that the ceph-config role can most accurately report the number of osds for the generation of the ceph.conf file. We don't want to use ceph-volume to determine the number of osds because in an upgrade to nautilus ceph-volume won't be able to accurately count osds created by ceph-disk. Signed-off-by: Andrew Schoen <aschoen@redhat.com> (cherry picked from commit `67453853ff`)	2019-04-18 19:12:13 +02:00
Andrew Schoen	c28388bb06	rolling_update: migrate ceph-disk osds to ceph-volume When upgrading to nautlius run ``ceph-volume simple scan`` and ``ceph-volume simple activate --all`` to migrate any running ceph-disk osds to ceph-volume. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1656460 Signed-off-by: Andrew Schoen <aschoen@redhat.com> (cherry picked from commit `28c47e4d1b`)	2019-04-18 19:12:13 +02:00
Guillaume Abrioux	35afd6a63a	update: ensure tasks are executed on an upgraded mon These tasks must be run from a monitor which is upgraded otherwise it might fail. See: https://tracker.ceph.com/issues/39355 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `7eb42c9e8e`)	2019-04-18 19:10:10 +02:00
Guillaume Abrioux	495711f296	update: ensure ceph command returns 0 these commands could return something else than 0. Let's ensure all retries have been done before actually failing. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `ed84325b1d`)	2019-04-18 19:10:10 +02:00
Guillaume Abrioux	4a678ac102	update: set osd flags before upgrading any mon Typical error: ``` failed: [mon0 -> mon2] (item=noout) => changed=true cmd: - ceph - --cluster - ceph - osd - set - noout delta: '0:00:00.293756' end: '2019-04-17 06:31:57.552386' item: noout msg: non-zero return code rc: 1 start: '2019-04-17 06:31:57.258630' stderr: \|- Traceback (most recent call last): File "/bin/ceph", line 1222, in <module> retval = main() File "/bin/ceph", line 1146, in main sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli') File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs cmd['sig'] = parse_funcsig(cmd['sig']) File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig raise JsonFormat(s) ceph_argparse.JsonFormat: unknown type CephBool stderr_lines: - 'Traceback (most recent call last):' - ' File "/bin/ceph", line 1222, in <module>' - ' retval = main()' - ' File "/bin/ceph", line 1146, in main' - ' sigdict = parse_json_funcsigs(outbuf.decode(''utf-8''), ''cli'')' - ' File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs' - ' cmd[''sig''] = parse_funcsig(cmd[''sig''])' - ' File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig' - ' raise JsonFormat(s)' - 'ceph_argparse.JsonFormat: unknown type CephBool' stdout: '' stdout_lines: <omitted> ``` Having mixed versions of monitors seems to cause this error. Moving these tasks before any monitor gets upgraded seems to be enough to get around this issue. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `543d1e2e41`)	2019-04-18 19:10:10 +02:00
Rishabh Dave	72309b49fe	allow adding a monitor to a deployed cluster Add a playbook that deploys a new monitor on a new node, adds that node to the Ceph cluster and the monitor to the quorum and updates the ceph configuration file on OSD nodes. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit `d5967af7fb`)	2019-04-16 11:14:21 +02:00
Dimitri Savineau	1c3fbe5a60	purge-cluster: remove python-ceph-argparse package When using purge-cluster playbook with nautilus, there's still the python-ceph-argparse package installed on the host preventing to reinstall a ceph cluster with a different version (like luminous or mimic) Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `eb658b3af6`)	2019-04-15 17:32:22 +02:00
Dimitri Savineau	f90c051589	switch-from-non-containerized: stop all osds `e6bfb84` introduced a regression in the switch from non containerized to container deployment. We need to stop all previous OSDs services. We just don't need the ceph-disk pattern in the regex. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `150acba8c5`)	2019-04-12 00:45:21 +00:00
Guillaume Abrioux	f8c544c4a8	purge: remove references to ceph-disk as of stable-4.0, ceph-disk is no longer supported. These tasks aren't needed anymore. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `a1254d767c`)	2019-04-12 00:45:21 +00:00
Guillaume Abrioux	f1ede335e4	shrink-osd: remove legacy playbook as of stable-4.0, ceph-disk is no longer supported. Let's remove this legacy version of the playbook. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `73aa788459`)	2019-04-12 00:45:21 +00:00

1 2 3 4 5 ...

466 Commits (0d55eeba79dd50618e4a0f0b1de7e87424257167)