ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Benoît Knecht	f9478472af	ceph-handler: Fix osd handler in check mode Run the Ceph commands that only gather information (without making any changes to the cluster) when running Ansible in check mode. This allows the tasks that depend on the variables set by those tasks to succeed in check mode. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `498acd7527`)	2021-08-02 15:54:04 +02:00
Dimitri Savineau	9ee44013c5	library: remove unused module import Move the import at the top of the file and remove unused module import. - E402 module level import not at top of file - F401 'xxxx' imported but unused This also removes the '# noqa E402' statement from the code. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `2138a00a32`)	2021-08-02 15:51:39 +02:00
Wong Hoi Sing Edison	c475d84310	library: flake8 ceph-ansible modules This commit ensure all ceph-ansible modules pass flake8 properly. Signed-off-by: Wong Hoi Sing Edison <hswong3i@pantarei-design.com> (cherry picked from commit `beda1fe773`)	2021-08-02 15:51:39 +02:00
Dimitri Savineau	d7edc71fd5	ceph-defaults: update grafana dashboards source We currently download the grafana dashboars from the ceph@master branch for all ceph releases. We should use the right ceph branch according to the ceph release. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-27 11:44:50 -04:00
Dimitri Savineau	3e8d9b4a1f	ceph-defaults: add missing grafana dashboards The radosgw-sync-overview and rbd-details grafana dashboars were missing from the list. Closes: #6758 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `f0ccf3ebf0`)	2021-07-27 10:53:47 -04:00
Guillaume Abrioux	b9cc91f622	update: check the ceph release Check early which Ceph release is going to be deployed and fail if it doesn't correspond to the ceph-ansible version being used. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978643 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `eec38784ec`)	2021-07-26 13:39:20 -04:00
Dimitri Savineau	f5ee8dfb26	alertmanager: allow disable dashboard tls verify When using self-signed/untrusted CA certificates, alertmanager displays an error in logs. With this commit this should make those messages disappear. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1936299 Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com> Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `9f77b929d1`)	2021-07-25 22:02:16 -04:00
Guillaume Abrioux	f085f681f0	purge: support osd_auto_discovery This adds a task that zaps by osd id so we can support the scenario where osds were deployed with `osd_auto_discovery` is true. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1876860 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4144074a50`)	2021-07-22 17:10:01 -04:00
Guillaume Abrioux	0ef447704f	purge: merge playbooks This refactor merges the two playbooks so we only have to maintain 1 playbook. (Symlink the old purge-container-cluster.yml playbook for backward compatibility). Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `17cd83bf3a`)	2021-07-22 17:10:01 -04:00
Guillaume Abrioux	b2b2871ccd	purge: drop variables from 'hosts' sections Those variables are useless given this is not possible to override them. Let's replace them with the hardcoded name instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `6b50401d0c`)	2021-07-22 17:10:01 -04:00
Dimitri Savineau	88e07f0bbc	multisite: use node fqdn for endpoints when https When the rgw_multisite_proto variable is set to https then we shoudn't use the IP address in the zone endpoints list but the node FQDN to match the TLS certificate CN. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1965504 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ad05a08160`)	2021-07-22 22:48:03 +02:00
Dimitri Savineau	06158c2ac5	common: remove unnecessary run_once statements `1303611` introduced tasks for disabling the pg_autoscaler on pools and the balancer but thoses tasks are already executed on the first monitor node so we don't need to add the run_once statement. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `738fa9428a`)	2021-07-21 10:01:25 -04:00
Dimitri Savineau	f9d60644ad	common: fix py2 pool_list from_json when skipped When using python 2 and the task with a loop is skipped then it generates an error. Unexpected templating type error occurred on ({{ (pool_list.stdout \| from_json)['pools'] }}): expected string or buffer Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `cf6e33346e`)	2021-07-21 08:57:53 -04:00
Guillaume Abrioux	f3a9135241	common: disable/enable pg_autoscaler The PG autoscaler can disrupt the PG checks so the idea here is to disable it and re-enable it back after the restart is done. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `13036115e2`)	2021-07-20 11:48:39 -04:00
Dimitri Savineau	7434157891	ceph-mgr: move mgr module list to common Populating the ceph_mgr_modules list in the mgr_modules doesn't make sense since that file is only executed if the list isn't empty or we're using the dashboard. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `cd06e7c046`)	2021-07-19 15:02:55 -04:00
Dimitri Savineau	925e3efc35	ceph-nfs: allow overriding NFS_CORE_PARAM We already have config override variables for existing block (like ganesha_ceph_export_overrides, ganesha_log_overrides, etc...) or a global one (ganesha_conf_overrides) but redefining the NFS_CORE_PARAM block in that variable will erase all previous values (currently only Bind_Addr). ganesha_core_param_overrides: \| Enable_UDP = false; NFS_Port = 2050; Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1941775 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `9817d29543`)	2021-07-19 14:13:02 -04:00
Guillaume Abrioux	559b379f73	purge: reindent playbook This commit reindents the playbook. Also improve readability by adding an extra line between plays. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `60aa70a128`)	2021-07-13 17:02:45 -04:00
Guillaume Abrioux	710176e36a	lib/ceph-volume: support zapping by osd_id This commit adds the support for zapping an osd by osd_id in the ceph_volume module. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `70f1d6e2cd`)	2021-07-13 18:04:17 +02:00
Dimitri Savineau	876fa07175	rolling_update: check quorum state before upgrade If one a the monitor is out of the quorum then nothing prevents the upgrade playbook to run. We only check if we have at least three monitor nodes but we should also check if those monitor nodes are correctly present in the quorum. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1952571 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `97148dd58c`)	2021-07-12 12:59:48 -04:00
Neelaksh Singh	9c04909d9c	Sensitive key data now hidden in output log Fixes: #6529 Signed-off-by: Neelaksh Singh <neelaksh48@gmail.com> (cherry picked from commit `d18a9860cd`)	2021-07-12 08:49:49 -04:00
Guillaume Abrioux	a14a3e56c0	update: fail the playbook if straw2 conversion failed It's better to fail the playbook so the user is aware the straw2 migration has failed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c396122ad9`)	2021-07-09 16:32:56 -04:00
Guillaume Abrioux	361f373e18	update: followup on pr #6689 add mising 'osd' command. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4eb4268dee`)	2021-07-09 11:34:22 +02:00
Guillaume Abrioux	a0087c425b	update: convert straw bucket After an upgrade, the presence of straw buckets will produce the following warning (HEALTH_WARN): ``` crush map has legacy tunables (require firefly, min is hammer) ``` because straw bucket is a firefly feature it needs to be converted to straw2. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967964 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `eee576477c`)	2021-07-09 09:15:42 +02:00
Guillaume Abrioux	867376c30b	dashboard: remove "certificate is valid for" error When deploying dashboard with ssl certificates generated by ceph-ansible, we enforce the CN to 'ceph-dashboard' which can makes application such alertmanager complain like following: `err="Post https://mgr0:8443/api/prometheus_receiver: x509: certificate is valid for ceph-dashboard, not mgr0" context_err="context deadline exceeded"` The idea here is to add alternative names matching all mgr/mon instances in the certificate so this error won't appear in logs. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978869 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `72a0336c71`)	2021-07-07 17:19:11 +02:00
Dimitri Savineau	b3dde31a06	infra: add playbook to purge dashboard/monitoring The dashboard/monitoring stack can be deployed via the dashboard_enabled variable. But there's nothing similar if we can to remove that part only and keep the ceph cluster up and running. The current purge playbooks remove everything. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786691 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `8e4ef7d6da`)	2021-07-06 11:40:38 -04:00
Guillaume Abrioux	d5784c01c0	dashboard: support dedicated network for the dashboard This introduces a new variable `dashboard_network` in order to support deploying the dashboard on a different subnet. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1927574 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `f4f73b6197`)	2021-07-06 14:54:12 +02:00
Dimitri Savineau	bbaa09ec2d	ceph-crash: add install checkpoint The ceph crash insatll checkpoint callback was missing in the main playbooks. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `993d06c4d9`)	2021-07-05 18:11:42 +02:00
Guillaume Abrioux	4366cb3b30	cephadm_adopt: add any_errors_fatal on play Add any_errors_fatal: true in cephadm-adopt playbook. We should stop the playbook execution when a task throws an error. Otherwise it can lead to unexpected behavior. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1976179 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3b804a61dd`)	2021-07-03 11:59:27 +02:00
Dimitri Savineau	c97f90af92	ceph-facts: move device facts to its own file Instead of reusing the condition 'inventory_hostname in groups[osds]' on each device facts tasks then we can move all the tasks into a dedicated file and set the condition on the import_tasks statement. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `d704b05e52`)	2021-07-02 22:21:32 +02:00
Dimitri Savineau	efd80b07de	ceph-validate: check logical volumes We currently don't check if the logical volume used in lvm_volumes list for either bluestore data/db/wal or filestore data/journal exist. We're only doing this on raw devices for batch scenario. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `55bca07cb6`)	2021-07-02 22:21:32 +02:00
Dimitri Savineau	b61b20f25b	ceph-validate: check db/journal/wal devices too When using dedicated devices for db/journal/wal objecstore with ceph-volume lvm batch then we should also validate that those devices exist and don't use a gpt partition table in addition of the devices and lvm_volume.data variables. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `808e7106de`)	2021-07-02 22:21:32 +02:00
Dimitri Savineau	71c0f2d6e1	ceph-validate: use root device from ansible_mounts Instead of using findmnt command to find the device associated to the root mount point then we can use the ansible_mounts fact. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `7e50380f7f`)	2021-07-02 22:21:32 +02:00
Dimitri Savineau	5476d606a3	ceph-validate: do not resolve devices This is already done in the ceph-facts role. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `0df99dda8d`)	2021-07-02 22:21:32 +02:00
Dimitri Savineau	11bb1ece2e	ceph-validate: check block presence first Instead of doing two parted calls we can check first if the device exist and then test the partition table. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `14d458b3b4`)	2021-07-02 22:21:32 +02:00
Dimitri Savineau	a3b5b15160	ceph-validate: check devices from lvm_volumes `2888c08` introduced a regression as the check_devices tasks file was only included based on the devices variable. But that file also validate some devices from the lvm_volumes variable. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1906022 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ac0342b72e`)	2021-07-02 22:21:32 +02:00
Dimitri Savineau	6bda641584	prometheus: fix prometheus target url The prometheus service isn't binding on localhost. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1933560 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `1d56818658`)	2021-07-02 14:37:49 -04:00
Guillaume Abrioux	21a6cc2cdf	purge: add monitoring group in final cleanup play This adds the monitoring group in the "final cleanup play" so any cid files generated are well removed when purging the cluster. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1974536 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `037d8cd05e`)	2021-07-02 14:37:09 -04:00
Dimitri Savineau	4ce49927f7	container: set tcmalloc value by default All ceph daemons need to have the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES environment variable set to 128MB by default in container setup. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970913 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `9758e3c513`)	2021-07-01 15:46:06 +02:00
Guillaume Abrioux	22fd0846a7	update: do not gather facts on each play There's no benefit to gather facts again on each play in rolling_update.yml Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `2c77d0094c`)	2021-06-30 20:39:50 +02:00
Guillaume Abrioux	2957b69c8b	ceph_key: handle error in a better way When calling the `ceph_key` module with `state: info`, if the ceph command called fails, the actual error is hidden by the module which makes it pretty difficult to troubleshoot. The current code always states that if rc is not equal to 0 the keyring doesn't exist. `state: info` should always return the actual rc, stdout and stderr. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1964889 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d58500ade0`)	2021-06-30 20:34:30 +02:00
Boris Ranto	a348116bf7	dashboard: Add new prometheus alert It was requested for us to update our alerting definitions to include a slow OSD Ops health check. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1951664 Signed-off-by: Boris Ranto <branto@redhat.com> (cherry picked from commit `2491d4e004`)	2021-06-30 15:30:44 +02:00
Dimitri Savineau	1a6d48cdab	tox: add ceph_stable_release to switch2container We need to set the ceph_stable_release variable during the switch2container playbook. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-06-30 09:30:12 +02:00
Dimitri Savineau	0b273bbac6	switch2container: run ceph-validate role This adds the ceph-validate role before starting the switch to a containerized deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1968177 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `fc160b3be1`)	2021-06-30 09:30:12 +02:00
Guillaume Abrioux	99902e33e9	workflows: test against 1 python version only Let's drop py3.6 and py3.7 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d191ba38d3`)	2021-06-30 08:28:49 +02:00
Guillaume Abrioux	0e52ea0f99	workflows: add signed-off check This adds a github workflow for checking the signed off line in commit messages. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `8c09497567`)	2021-06-30 08:28:49 +02:00
Guillaume Abrioux	ac77bc5c83	workflow: add group_vars/defaults checks let's use github workflow for checking defaults values. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d71db816c6`)	2021-06-30 08:28:49 +02:00
Guillaume Abrioux	fc33d9c2d2	workflow: add syntax check This adds the ansible --syntax-check test in the ansible-lint workflow Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5ed423ad88`)	2021-06-30 08:28:49 +02:00
Guillaume Abrioux	b576b009e2	tests: remove legacy file This inventory isn't used anywhere. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `304d1cbb97`)	2021-06-29 17:52:33 +02:00
Guillaume Abrioux	fe2e057b51	shrink-mgr: modify existing mgr check Do not rely on the inventory aliases in order to check if the selected manager to be removed is present. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967897 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `26a7256c4c`)	2021-06-29 17:52:33 +02:00
Guillaume Abrioux	768677610c	cephadm-adopt/rgw: add host target in svc_id If multi-realms were deployed with several instances belonging to the same realm and zone using the same port on different nodes, the service id expected by cephadm will be the same and therefore only one service will be deployed. We need to create a service called `<node>.<realm>.<zone>.<port>` to be sure the service name will be unique and well deployed on the expected node in order to preserve backward compatibility with the rgws instances that were deployed with ceph-ansible. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967455 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `31311b03ed`)	2021-06-29 15:19:02 +02:00

1 2 3 4 5 ...

5742 Commits (5d1a7d60b3ed78e084687f4b8d2caf8b63db5574) All Branches Search

5742 Commits (5d1a7d60b3ed78e084687f4b8d2caf8b63db5574)

All Branches