ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	02750a94cc	dashboard: subj_alt_names fact refactor the current way the variable is built results in: ``` 2021-08-03 04:18:23,020 - ceph.ceph - INFO - ok: [ceph-sangadi-4x-indpt6-node1-installer] => changed=false ansible_facts: subj_alt_names: \|- subjectAltName=ceph-sangadi-4x-indpt6-node1-installer/subjectAltName=10.0.210.223/subjectAltName=ceph-sangadi-4x-indpt6-node1-installersubjectAltName=ceph-sangadi-4x-indpt6-node2/subjectAltName=10.0.210.252/subjectAltName=ceph-sangadi-4x-indpt6-node2/ ``` which is incorrect. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978869 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `6f1a0634f7`)	2021-08-09 15:14:48 -04:00
VasishtaShastry	4ae9f321ac	Fixes typo in rgw-add-users-buckets playbook Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com> (cherry picked from commit `478d9fdcb6`)	2021-08-09 14:31:55 -04:00
Dimitri Savineau	e1e22933a7	add-osd: use container_exec_cmd fact from mon host Because we're delegating the task to the first monitor node, we need to be sure that the container_exec_cmd fact is the one from that node too otherwise we could have a mismatch on the ceph-mon container name. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1990772 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-09 15:48:23 +02:00
Teoman ONAY	3d4e15cebf	podman pids.max default value is 2048, docker's one is 4096 which are sufficient for the default value (512) of rgw thread pool size. But if its value is increased near to the pids-limit value, it does not leave place for the other processes to spawn and run within the container and the container crashes. pids-limit set to unlimited regardless of the container engine. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1987041 Signed-off-by: Teoman ONAY <tonay@redhat.com> (cherry picked from commit `9b5d97adb9`)	2021-08-05 11:04:31 -04:00
Dimitri Savineau	03ed9e111c	infra: use dedicated variables for balancer status The balancer status is registered during the cephadm-adopt, rolling_update and swith2container playbooks. But it is also used in the ceph-handler role which is included in those playbooks too. Even if the ceph-handler tasks are skipped for rolling_update and switch2container, the balancer_status variable is erased with the skip task result. play1: register: balancer_status play2: register: balancer_status <-- skipped play3: when: (balancer_status.stdout \| from_json)['active'] \| bool This leads to issue like: The conditional check '(balancer_status.stdout \| from_json)['active'] \| bool' failed. The error was: Unexpected templating type error occurred on ({% if (balancer_status.stdout \| from_json)['active'] \| bool %} True {% else %} False {% endif %}): expected string or buffer. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1982054 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `386661699b`)	2021-08-04 11:48:13 -04:00
Dimitri Savineau	1044940304	osds: use osd pool ls instead of osd dump command The ceph osd pool ls detail command is a subset of the ceph osd dump command. $ ceph osd dump --format json\|wc -c 10117 $ ceph osd pool ls detail --format json\|wc -c 4740 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `06471a4b82`)	2021-08-03 14:03:35 -04:00
Dimitri Savineau	5c6921e553	rolling_update: get ceph version when mons exist `eec3878` introduced a regression for upgrade scenarios where there's no monitor nodes at all (like ganesha standalone, external clients, etc..) TASK [get the ceph release being deployed] ********************************** task path: infrastructure-playbooks/rolling_update.yml:121 Thursday 29 July 2021 15:55:29 +0000 (0:00:00.484) 0:00:15.802 ******* fatal: [client0]: FAILED! => msg: '''dict object'' has no attribute ''mons''' Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `e87a47cf0c`)	2021-08-03 12:42:28 -04:00
Benoît Knecht	bacaa654b1	infrastructure-playbooks: Get Ceph info in check mode In the `set osd flags` block, run the Ceph commands that gather information from the cluster (and don't make any changes to it) even when running in check mode. This allows the tasks that depend on the variables set by those tasks to succeed in check mode. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `d7653dca95`)	2021-08-02 15:54:34 +02:00
Benoît Knecht	9668137daf	ceph-handler: Fix osd handler in check mode Run the Ceph commands that only gather information (without making any changes to the cluster) when running Ansible in check mode. This allows the tasks that depend on the variables set by those tasks to succeed in check mode. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `498acd7527`)	2021-08-02 15:54:34 +02:00
Dimitri Savineau	1270d5964a	library: remove unused module import Move the import at the top of the file and remove unused module import. - E402 module level import not at top of file - F401 'xxxx' imported but unused This also removes the '# noqa E402' statement from the code. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `2138a00a32`)	2021-08-02 15:53:00 +02:00
Wong Hoi Sing Edison	5bf73d4766	library: flake8 ceph-ansible modules This commit ensure all ceph-ansible modules pass flake8 properly. Signed-off-by: Wong Hoi Sing Edison <hswong3i@pantarei-design.com> (cherry picked from commit `beda1fe773`)	2021-08-02 15:53:00 +02:00
Dimitri Savineau	deac21c6bf	ceph-defaults: add missing grafana dashboards The radosgw-sync-overview and rbd-details grafana dashboars were missing from the list. Closes: #6758 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `f0ccf3ebf0`)	2021-07-27 10:53:54 -04:00
Guillaume Abrioux	77171216fb	update: check the ceph release Check early which Ceph release is going to be deployed and fail if it doesn't correspond to the ceph-ansible version being used. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978643 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `eec38784ec`)	2021-07-26 14:10:24 -04:00
Dimitri Savineau	c39e7cb151	alertmanager: allow disable dashboard tls verify When using self-signed/untrusted CA certificates, alertmanager displays an error in logs. With this commit this should make those messages disappear. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1936299 Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com> Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `9f77b929d1`)	2021-07-26 13:19:13 -04:00
Guillaume Abrioux	72bbc8285e	dashboard: support dedicated network for the dashboard This introduces a new variable `dashboard_network` in order to support deploying the dashboard on a different subnet. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1927574 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `f4f73b6197`)	2021-07-26 13:19:03 -04:00
Dimitri Savineau	00e0ebc911	multisite: use node fqdn for endpoints when https When the rgw_multisite_proto variable is set to https then we shoudn't use the IP address in the zone endpoints list but the node FQDN to match the TLS certificate CN. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1965504 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ad05a08160`)	2021-07-26 17:54:13 +02:00
Guillaume Abrioux	907fb08956	purge: support osd_auto_discovery This adds a task that zaps by osd id so we can support the scenario where osds were deployed with `osd_auto_discovery` is true. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1876860 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4144074a50`)	2021-07-26 17:53:06 +02:00
Guillaume Abrioux	3dcfbc2edf	purge: merge playbooks This refactor merges the two playbooks so we only have to maintain 1 playbook. (Symlink the old purge-container-cluster.yml playbook for backward compatibility). Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `17cd83bf3a`)	2021-07-26 17:53:06 +02:00
Guillaume Abrioux	e4fea521d9	purge: drop variables from 'hosts' sections Those variables are useless given this is not possible to override them. Let's replace them with the hardcoded name instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `6b50401d0c`)	2021-07-26 17:53:06 +02:00
Guillaume Abrioux	cf812d06e3	purge: reindent playbook This commit reindents the playbook. Also improve readability by adding an extra line between plays. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `60aa70a128`)	2021-07-26 17:53:06 +02:00
Dimitri Savineau	eba580320c	ceph-mgr: don't install dashboard pkg by default This is a partial backport of `2547ab60`. We are currently installing the ceph-mgr-dashboard package even if the dashboard_enabled variable is set to false. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-26 17:50:42 +02:00
Dimitri Savineau	8d58c50f45	ceph-mgr: move mgr module list to common Populating the ceph_mgr_modules list in the mgr_modules doesn't make sense since that file is only executed if the list isn't empty or we're using the dashboard. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `cd06e7c046`)	2021-07-26 17:50:35 +02:00
Dimitri Savineau	364186a86e	ceph-nfs: allow overriding NFS_CORE_PARAM We already have config override variables for existing block (like ganesha_ceph_export_overrides, ganesha_log_overrides, etc...) or a global one (ganesha_conf_overrides) but redefining the NFS_CORE_PARAM block in that variable will erase all previous values (currently only Bind_Addr). ganesha_core_param_overrides: \| Enable_UDP = false; NFS_Port = 2050; Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1941775 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `9817d29543`)	2021-07-26 17:50:05 +02:00
Guillaume Abrioux	79b8f3a74d	lib/ceph-volume: support zapping by osd_id This commit adds the support for zapping an osd by osd_id in the ceph_volume module. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `70f1d6e2cd`)	2021-07-26 17:49:42 +02:00
Dimitri Savineau	f433d06a93	rolling_update: check quorum state before upgrade If one a the monitor is out of the quorum then nothing prevents the upgrade playbook to run. We only check if we have at least three monitor nodes but we should also check if those monitor nodes are correctly present in the quorum. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1952571 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `97148dd58c`)	2021-07-26 17:49:23 +02:00
Dimitri Savineau	ddc3df9f9a	ceph-facts: move device facts to its own file Instead of reusing the condition 'inventory_hostname in groups[osds]' on each device facts tasks then we can move all the tasks into a dedicated file and set the condition on the import_tasks statement. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `d704b05e52`)	2021-07-26 17:49:03 +02:00
Dimitri Savineau	50447e89fb	ceph-validate: check logical volumes We currently don't check if the logical volume used in lvm_volumes list for either bluestore data/db/wal or filestore data/journal exist. We're only doing this on raw devices for batch scenario. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `55bca07cb6`)	2021-07-26 17:49:03 +02:00
Dimitri Savineau	ceca225344	ceph-validate: check db/journal/wal devices too When using dedicated devices for db/journal/wal objecstore with ceph-volume lvm batch then we should also validate that those devices exist and don't use a gpt partition table in addition of the devices and lvm_volume.data variables. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `808e7106de`)	2021-07-26 17:49:03 +02:00
Dimitri Savineau	fe070fc19d	ceph-validate: use root device from ansible_mounts Instead of using findmnt command to find the device associated to the root mount point then we can use the ansible_mounts fact. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `7e50380f7f`)	2021-07-26 17:49:03 +02:00
Dimitri Savineau	f317df92ac	ceph-validate: do not resolve devices This is already done in the ceph-facts role. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `0df99dda8d`)	2021-07-26 17:49:03 +02:00
Dimitri Savineau	c67bfe84eb	ceph-validate: check block presence first Instead of doing two parted calls we can check first if the device exist and then test the partition table. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `14d458b3b4`)	2021-07-26 17:49:03 +02:00
Dimitri Savineau	5ef1d630d8	ceph-validate: check devices from lvm_volumes `2888c08` introduced a regression as the check_devices tasks file was only included based on the devices variable. But that file also validate some devices from the lvm_volumes variable. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1906022 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ac0342b72e`)	2021-07-26 17:49:03 +02:00
Dimitri Savineau	c8ca73f620	infra: add playbook to purge dashboard/monitoring The dashboard/monitoring stack can be deployed via the dashboard_enabled variable. But there's nothing similar if we can to remove that part only and keep the ceph cluster up and running. The current purge playbooks remove everything. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786691 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `8e4ef7d6da`)	2021-07-26 17:48:32 +02:00
Dimitri Savineau	4695df6d2b	monitoring: use config_template module for config The alertmanager, grafana and prometheus configuration file are generated with the template module which doesn't allow for using config overrides. Instead we could use the config_template plugin action and add a new variable for overrides (one for each component). With this patch, one should be able to add configuration to prometheus with the following: --- alertmanager_conf_overrides: global: smtp_smarthost: 'localhost:25' ... Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1902999 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `5a41026347`)	2021-07-26 17:47:51 +02:00
Dimitri Savineau	8e939dc377	common: remove unnecessary run_once statements `1303611` introduced tasks for disabling the pg_autoscaler on pools and the balancer but thoses tasks are already executed on the first monitor node so we don't need to add the run_once statement. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `738fa9428a`)	2021-07-21 10:03:36 -04:00
Dimitri Savineau	17b9ff03d2	common: fix py2 pool_list from_json when skipped When using python 2 and the task with a loop is skipped then it generates an error. Unexpected templating type error occurred on ({{ (pool_list.stdout \| from_json)['pools'] }}): expected string or buffer Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `cf6e33346e`)	2021-07-21 09:54:46 -04:00
Guillaume Abrioux	f7882bbc02	common: disable/enable pg_autoscaler The PG autoscaler can disrupt the PG checks so the idea here is to disable it and re-enable it back after the restart is done. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `13036115e2`)	2021-07-21 09:40:18 -04:00
Neelaksh Singh	5213612eaf	Sensitive key data now hidden in output log Fixes: #6529 Signed-off-by: Neelaksh Singh <neelaksh48@gmail.com> (cherry picked from commit `d18a9860cd`)	2021-07-12 09:43:12 +02:00
Guillaume Abrioux	f0cd3c4f48	update: fail the playbook if straw2 conversion failed It's better to fail the playbook so the user is aware the straw2 migration has failed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c396122ad9`)	2021-07-09 17:29:54 -04:00
Guillaume Abrioux	65ce69567a	update: followup on pr #6689 add mising 'osd' command. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4eb4268dee`)	2021-07-09 11:34:46 +02:00
Guillaume Abrioux	1179ea8b2f	update: convert straw bucket After an upgrade, the presence of straw buckets will produce the following warning (HEALTH_WARN): ``` crush map has legacy tunables (require firefly, min is hammer) ``` because straw bucket is a firefly feature it needs to be converted to straw2. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967964 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `eee576477c`)	2021-07-09 11:34:46 +02:00
Dimitri Savineau	58dddf586e	Revert "ceph-validate: check devices from lvm_volumes" This reverts commit `3557497336`. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-07 17:19:35 +02:00
Dimitri Savineau	a684a26428	Revert "ceph-validate: check block presence first" This reverts commit `4f89cdcd45`. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-07 17:19:35 +02:00
Dimitri Savineau	57f9553798	Revert "ceph-validate: do not resolve devices" This reverts commit `2020b1310c`. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-07 17:19:35 +02:00
Dimitri Savineau	bc570619b6	Revert "ceph-validate: use root device from ansible_mounts" This reverts commit `b1542fd340`. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-07 17:19:35 +02:00
Dimitri Savineau	e9123dda35	Revert "ceph-validate: check db/journal/wal devices too" This reverts commit `d6f3e6eac3`. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-07 17:19:35 +02:00
Dimitri Savineau	c096ec4033	Revert "ceph-validate: check logical volumes" This reverts commit `d7cefe0536`. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-07 17:19:35 +02:00
Dimitri Savineau	b82f4edb38	Revert "ceph-facts: move device facts to its own file" This reverts commit `9f1ec38bbf`. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-07 17:19:35 +02:00
Guillaume Abrioux	928d7c75a4	dashboard: remove "certificate is valid for" error When deploying dashboard with ssl certificates generated by ceph-ansible, we enforce the CN to 'ceph-dashboard' which can makes application such alertmanager complain like following: `err="Post https://mgr0:8443/api/prometheus_receiver: x509: certificate is valid for ceph-dashboard, not mgr0" context_err="context deadline exceeded"` The idea here is to add alternative names matching all mgr/mon instances in the certificate so this error won't appear in logs. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978869 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `72a0336c71`)	2021-07-07 17:19:22 +02:00
Dimitri Savineau	2bec707870	ceph-crash: add install checkpoint The ceph crash insatll checkpoint callback was missing in the main playbooks. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `993d06c4d9`)	2021-07-05 18:11:51 +02:00

1 2 3 4 5 ...

5535 Commits (6485e1a69ed90b446c2cf2d9fdbf703cd8105d6d) All Branches Search

5535 Commits (6485e1a69ed90b446c2cf2d9fdbf703cd8105d6d)

All Branches