ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	ea8f0c7bcb	update: convert straw bucket After an upgrade, the presence of straw buckets will produce the following warning (HEALTH_WARN): ``` crush map has legacy tunables (require firefly, min is hammer) ``` because straw bucket is a firefly feature it needs to be converted to straw2. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967964 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `eee576477c`)	2021-07-09 09:15:24 +02:00
Guillaume Abrioux	27755f9bff	dashboard: remove "certificate is valid for" error When deploying dashboard with ssl certificates generated by ceph-ansible, we enforce the CN to 'ceph-dashboard' which can makes application such alertmanager complain like following: `err="Post https://mgr0:8443/api/prometheus_receiver: x509: certificate is valid for ceph-dashboard, not mgr0" context_err="context deadline exceeded"` The idea here is to add alternative names matching all mgr/mon instances in the certificate so this error won't appear in logs. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978869 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `72a0336c71`)	2021-07-07 17:19:00 +02:00
Wong Hoi Sing Edison	35264b7381	library: flake8 ceph-ansible modules This commit ensure all ceph-ansible modules pass flake8 properly. Signed-off-by: Wong Hoi Sing Edison <hswong3i@pantarei-design.com> (cherry picked from commit `beda1fe773`)	2021-07-07 13:16:04 +02:00
Dimitri Savineau	ec648981e6	infra: add playbook to purge dashboard/monitoring The dashboard/monitoring stack can be deployed via the dashboard_enabled variable. But there's nothing similar if we can to remove that part only and keep the ceph cluster up and running. The current purge playbooks remove everything. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786691 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `8e4ef7d6da`)	2021-07-06 11:40:31 -04:00
Guillaume Abrioux	a7d2a53b37	dashboard: support dedicated network for the dashboard This introduces a new variable `dashboard_network` in order to support deploying the dashboard on a different subnet. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1927574 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `f4f73b6197`)	2021-07-06 14:54:00 +02:00
Dimitri Savineau	083b55a760	ceph-crash: add install checkpoint The ceph crash insatll checkpoint callback was missing in the main playbooks. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `993d06c4d9`)	2021-07-05 18:11:32 +02:00
Guillaume Abrioux	f80837c23e	cephadm_adopt: add any_errors_fatal on play Add any_errors_fatal: true in cephadm-adopt playbook. We should stop the playbook execution when a task throws an error. Otherwise it can lead to unexpected behavior. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1976179 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3b804a61dd`)	2021-07-03 11:58:57 +02:00
Dimitri Savineau	b3cf8212fa	ceph-facts: move device facts to its own file Instead of reusing the condition 'inventory_hostname in groups[osds]' on each device facts tasks then we can move all the tasks into a dedicated file and set the condition on the import_tasks statement. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `d704b05e52`)	2021-07-02 22:21:20 +02:00
Dimitri Savineau	255b3763ef	ceph-validate: check logical volumes We currently don't check if the logical volume used in lvm_volumes list for either bluestore data/db/wal or filestore data/journal exist. We're only doing this on raw devices for batch scenario. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `55bca07cb6`)	2021-07-02 22:21:20 +02:00
Dimitri Savineau	80d5bff7e5	ceph-validate: check db/journal/wal devices too When using dedicated devices for db/journal/wal objecstore with ceph-volume lvm batch then we should also validate that those devices exist and don't use a gpt partition table in addition of the devices and lvm_volume.data variables. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `808e7106de`)	2021-07-02 22:21:20 +02:00
Dimitri Savineau	f8ecb08ec2	ceph-validate: use root device from ansible_mounts Instead of using findmnt command to find the device associated to the root mount point then we can use the ansible_mounts fact. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `7e50380f7f`)	2021-07-02 22:21:20 +02:00
Dimitri Savineau	23ddab7f53	ceph-validate: do not resolve devices This is already done in the ceph-facts role. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `0df99dda8d`)	2021-07-02 22:21:20 +02:00
Dimitri Savineau	727aa93292	ceph-validate: check block presence first Instead of doing two parted calls we can check first if the device exist and then test the partition table. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `14d458b3b4`)	2021-07-02 22:21:20 +02:00
Dimitri Savineau	65b8f46a43	ceph-validate: check devices from lvm_volumes `2888c08` introduced a regression as the check_devices tasks file was only included based on the devices variable. But that file also validate some devices from the lvm_volumes variable. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1906022 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ac0342b72e`)	2021-07-02 22:21:20 +02:00
Dimitri Savineau	04b1665e5e	prometheus: fix prometheus target url The prometheus service isn't binding on localhost. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1933560 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `1d56818658`)	2021-07-02 14:37:34 -04:00
Guillaume Abrioux	0d4b029057	purge: add monitoring group in final cleanup play This adds the monitoring group in the "final cleanup play" so any cid files generated are well removed when purging the cluster. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1974536 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `037d8cd05e`)	2021-07-02 14:36:48 -04:00
Dimitri Savineau	3bd3dddcc2	container: set tcmalloc value by default All ceph daemons need to have the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES environment variable set to 128MB by default in container setup. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970913 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `9758e3c513`)	2021-07-01 15:45:54 +02:00
Guillaume Abrioux	676aad9ea2	update: do not gather facts on each play There's no benefit to gather facts again on each play in rolling_update.yml Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `2c77d0094c`)	2021-06-30 20:39:25 +02:00
Guillaume Abrioux	adfb9d3b2a	ceph_key: handle error in a better way When calling the `ceph_key` module with `state: info`, if the ceph command called fails, the actual error is hidden by the module which makes it pretty difficult to troubleshoot. The current code always states that if rc is not equal to 0 the keyring doesn't exist. `state: info` should always return the actual rc, stdout and stderr. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1964889 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d58500ade0`)	2021-06-30 20:34:17 +02:00
Dimitri Savineau	48f47e7023	rhcs: remove ISO install method Starting RHCS 5, there's no ISO available anymore. This removes all ISO variables and the ceph_repository_type variable. Closes: #6626 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `a05730b38a`)	2021-06-30 20:33:44 +02:00
Boris Ranto	5b18429c07	dashboard: Add new prometheus alert It was requested for us to update our alerting definitions to include a slow OSD Ops health check. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1951664 Signed-off-by: Boris Ranto <branto@redhat.com> (cherry picked from commit `2491d4e004`)	2021-06-30 15:30:31 +02:00
Dimitri Savineau	a71a80c167	tox: remove ceph_dev variables and dev_setup calls Since pacific is a stable release then we don't need to use shaman for getting the pacific build. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-06-30 09:29:58 +02:00
Dimitri Savineau	f64a4258ea	switch2container: run ceph-validate role This adds the ceph-validate role before starting the switch to a containerized deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1968177 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `fc160b3be1`)	2021-06-30 09:29:58 +02:00
Guillaume Abrioux	5b8d0b11d2	workflows: test against 1 python version only Let's drop py3.6 and py3.7 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d191ba38d3`)	2021-06-30 08:17:40 +02:00
Guillaume Abrioux	5fd24a3793	workflows: add signed-off check This adds a github workflow for checking the signed off line in commit messages. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `8c09497567`)	2021-06-30 08:17:40 +02:00
Guillaume Abrioux	3e894ca899	workflow: add group_vars/defaults checks let's use github workflow for checking defaults values. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d71db816c6`)	2021-06-30 08:17:40 +02:00
Guillaume Abrioux	51612aa7d3	workflow: add syntax check This adds the ansible --syntax-check test in the ansible-lint workflow Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `5ed423ad88`)	2021-06-30 08:17:40 +02:00
Guillaume Abrioux	5787048599	tests: remove legacy file This inventory isn't used anywhere. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `304d1cbb97`)	2021-06-29 17:52:22 +02:00
Guillaume Abrioux	16dc991351	shrink-mgr: modify existing mgr check Do not rely on the inventory aliases in order to check if the selected manager to be removed is present. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967897 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `26a7256c4c`)	2021-06-29 17:52:22 +02:00
Guillaume Abrioux	0856d3e47f	cephadm-adopt/rgw: add host target in svc_id If multi-realms were deployed with several instances belonging to the same realm and zone using the same port on different nodes, the service id expected by cephadm will be the same and therefore only one service will be deployed. We need to create a service called `<node>.<realm>.<zone>.<port>` to be sure the service name will be unique and well deployed on the expected node in order to preserve backward compatibility with the rgws instances that were deployed with ceph-ansible. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967455 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `31311b03ed`)	2021-06-29 15:18:49 +02:00
Guillaume Abrioux	aa332ac64d	cephadm-adopt: support rgw multisite adoption We need to support rgw multisite deployments. This commit makes the adoption playbook support this kind of deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967455 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `fc784fc44c`)	2021-06-24 09:48:27 +02:00
Guillaume Abrioux	1d0651e465	nfs: do no copy client.bootstrap-rgw when using mds There's no need to copy this keyring when using nfs with mds Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `8dbee99882`)	2021-06-17 08:15:36 +02:00
Guillaume Abrioux	0a26f118f1	multisite: fix bug during switch2containers When running the switch-to-containers playbook with multisite enabled, the fact "rgw_instances" is only set for the node being processed (serial: 1), the consequence of that is that the set_fact of 'rgw_instances_all' can't iterate over all rgw node in order to look up each 'rgw_instances_host'. Adding a condition checking whether hostvars[item]["rgw_instances_host"] is defined fixes this issue. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967926 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `8279d14d32`)	2021-06-17 08:15:09 +02:00
VasishtaShastry	e49c38f8b7	Container: Fixing service name lvm2-lvmetad Playbook failing saying: msg: 'Could not find the requested service lvmetad: host' Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1955040 Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>	2021-06-17 01:50:27 +02:00
Guillaume Abrioux	93f1765259	update: block upgrade when nfs+rgw is deployed This is an unsupported configuration since there are issues with RGW+NFS upgraded from Nautilus to Pacific. This approach might be seen as a bit aggressive but it is preferable to wait before upgrading in that case. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970003 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-16 19:39:42 +02:00
Guillaume Abrioux	1bfedb8b8f	tests: use nfs + cephfs instead of rgw in update job Since nfs+rgw isn't going to be supported in Ceph Pacific, let's not cover this in the CI. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-16 19:39:42 +02:00
Guillaume Abrioux	9b57f33e78	rolling_update: fix mon+rgw/multisite collocation When monitors and rgw are collocated with multisite enabled, the rolling_update playbook fails because during the workflow, we run some radosgw-admin commands very early on the first mon even though this is the monitor being upgraded, it means the container doesn't exist since it was stopped. This block is relevant only for scaling out rgw daemons or initial deployment. In rolling_update workflow, it is not needed so let's skip it. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970232 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `f7166cccbf`)	2021-06-14 13:58:50 +02:00
Guillaume Abrioux	17f9780274	cephadm-adopt: fix mgr placement hosts task When no `[mgrs]` group is defined in the inventory, mgr daemon are implicitly collocated with monitors. This task currently relies on the length of the mgr group in order to tell cephadm to deploy mgr daemons. If there's no `[mgrs]` group defined in the inventory, it will ask cephadm to deploy 0 mgr daemon which doesn't make sense and will throw an error. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970313 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `f9a73149a4`)	2021-06-14 13:55:45 +02:00
Guillaume Abrioux	b5214b29fc	tests: use CentOS 8.4 image CentOS 8.4 vagrant image is available at https://cloud.centos.org let's use it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c2aaa96fc7`)	2021-06-11 10:49:51 +02:00
Guillaume Abrioux	8440ccabe1	dashboard: set cookie_secure in grafana When using grafana behind https `cookie_secure` should be set to `true`. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1966880 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4daed1f137`)	2021-06-07 15:12:19 +02:00
Guillaume Abrioux	8dda6d0b4d	fs2bs: use match filter in selectattr() `0990ae4109` changed the filter in selectattr() from 'match' to 'equalto' but due to an incompatibility with the Jinja2 version for python 2.7 on el7 we must stick to using 'match' filter. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d6745e9cd9`)	2021-05-26 09:15:43 +02:00
Guillaume Abrioux	b2759c0c51	fs2bs: fix wrong filter when setting osd_ids using 'match' filter in that task will lead to bad behavior if I have the following node names for instance: - node1 - node11 - node111 with `selectattr('name', 'match', inventory_hostname)` it will match 'node1' along with 'node11' and 'node111'. using 'equalto' filter will make sure we only match the target node. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1963066 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `0990ae4109`)	2021-05-25 20:50:10 +02:00
Guillaume Abrioux	11f953a15f	tests: pull images from cloud.centos.org temporary work around vagrant cloud issue which seems broken at the time of pushing this commit. Let's pull images from cloud.centos.org for now since vagrant cloud hosted images return a 403 error. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `9efca34ac3`)	2021-05-25 18:59:03 +02:00
Guillaume Abrioux	e0bcd59c04	prometheus: enforce osd nodes in templates When osd nodes are collocated in the clients group (HCI context for instance), the current logic will exclude osd nodes since they are present in the client group. The best fix would be to exclude clients node only when they are not member of another group but for now, as a workaround, we can enforce the addition of osd nodes to fix this specific case. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1947695 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `664dae0564`)	2021-05-25 18:59:03 +02:00
Guillaume Abrioux	01256ffe1b	container: conditionnally disable lvmetad Enabling lvmetad in containerized deployments on el7 based OS might cause issues. This commit make it possible to disable this service if needed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1955040 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-25 16:50:30 +02:00
Dimitri Savineau	e91e9d6502	group_vars: fix default values It looks like the generate_group_vars_sample.sh script wasn't executed during previous PRs that were modifying the default values. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `83a8dd5a6a`)	2021-05-21 13:28:41 +02:00
Guillaume Abrioux	f453e4737d	nfs: get org.ganesha.nfsd.conf from container Since we need to revert `33bfb10`, this is an alternative to initial approach. We can avoid maintaining this file since it is present in container image. The idea is to simply get it from the image container and write it to the host. Fixes: #6501 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `e6d8b058ba`)	2021-05-07 16:34:33 +02:00
Dimitri Savineau	dfcb6ed45f	ceph-rgw: fix pg_autoscale_mode for pool The pg_autoscale_mode for rgw pools introduced in `9f03a52` was wrong and was missing a `value` keyword because `rgw_create_pools` is a dict. Fixes: #6516 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `a670982a38`)	2021-05-07 13:36:09 +02:00
Guillaume Abrioux	d319da14c8	update: fix ceph-crash stop task This is a workaround for an issue in ansible. When trying to stop/mask/disable this service in one task, the stop didn't actually happen, the task doesn't fail but for some reason the container is still present and running. Then the task starting the service in the role ceph-crash fails because it can't start the container since it's already running with the same name. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1955393 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3db1ea7ec4`)	2021-05-04 15:59:46 +02:00
Benoît Knecht	0ce27a73b6	ceph-mon: Fix check mode for deploy monitor tasks Skip the `get initial keyring when it already exists` task when both commands whose `stdout` output it requires have been skipped (e.g. when running in check mode). Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `2437f14581`)	2021-04-30 15:04:12 +02:00

... 3 4 5 6 7 ...

5920 Commits (5cd25ea8c1df3d193410ac2f5234c712e4496742) All Branches Search

5920 Commits (5cd25ea8c1df3d193410ac2f5234c712e4496742)

All Branches