ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Dimitri Savineau	c5a2239e5e	workflow: add dashboard playbook to ansible-lint The dashboard.yml playbook was missing from the ansible-lint workflow. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-06 09:03:48 +02:00
Dimitri Savineau	8e4ef7d6da	infra: add playbook to purge dashboard/monitoring The dashboard/monitoring stack can be deployed via the dashboard_enabled variable. But there's nothing similar if we can to remove that part only and keep the ceph cluster up and running. The current purge playbooks remove everything. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786691 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-06 09:02:37 +02:00
Guillaume Abrioux	f4f73b6197	dashboard: support dedicated network for the dashboard This introduces a new variable `dashboard_network` in order to support deploying the dashboard on a different subnet. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1927574 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-05 21:34:43 +02:00
Dimitri Savineau	993d06c4d9	ceph-crash: add install checkpoint The ceph crash insatll checkpoint callback was missing in the main playbooks. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-05 18:03:13 +02:00
Guillaume Abrioux	3b804a61dd	cephadm_adopt: add any_errors_fatal on play Add any_errors_fatal: true in cephadm-adopt playbook. We should stop the playbook execution when a task throws an error. Otherwise it can lead to unexpected behavior. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1976179 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-02 22:15:07 +02:00
Guillaume Abrioux	037d8cd05e	purge: add monitoring group in final cleanup play This adds the monitoring group in the "final cleanup play" so any cid files generated are well removed when purging the cluster. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1974536 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-02 13:37:15 -04:00
Dimitri Savineau	1d56818658	prometheus: fix prometheus target url The prometheus service isn't binding on localhost. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1933560 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-02 17:20:02 +02:00
Dimitri Savineau	d704b05e52	ceph-facts: move device facts to its own file Instead of reusing the condition 'inventory_hostname in groups[osds]' on each device facts tasks then we can move all the tasks into a dedicated file and set the condition on the import_tasks statement. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-02 14:02:30 +02:00
Dimitri Savineau	55bca07cb6	ceph-validate: check logical volumes We currently don't check if the logical volume used in lvm_volumes list for either bluestore data/db/wal or filestore data/journal exist. We're only doing this on raw devices for batch scenario. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-02 14:02:30 +02:00
Dimitri Savineau	808e7106de	ceph-validate: check db/journal/wal devices too When using dedicated devices for db/journal/wal objecstore with ceph-volume lvm batch then we should also validate that those devices exist and don't use a gpt partition table in addition of the devices and lvm_volume.data variables. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-02 14:02:30 +02:00
Dimitri Savineau	7e50380f7f	ceph-validate: use root device from ansible_mounts Instead of using findmnt command to find the device associated to the root mount point then we can use the ansible_mounts fact. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-02 14:02:30 +02:00
Dimitri Savineau	0df99dda8d	ceph-validate: do not resolve devices This is already done in the ceph-facts role. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-02 14:02:30 +02:00
Dimitri Savineau	14d458b3b4	ceph-validate: check block presence first Instead of doing two parted calls we can check first if the device exist and then test the partition table. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-02 14:02:30 +02:00
Dimitri Savineau	ac0342b72e	ceph-validate: check devices from lvm_volumes `2888c08` introduced a regression as the check_devices tasks file was only included based on the devices variable. But that file also validate some devices from the lvm_volumes variable. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1906022 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-02 14:02:30 +02:00
Dimitri Savineau	9758e3c513	container: set tcmalloc value by default All ceph daemons need to have the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES environment variable set to 128MB by default in container setup. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970913 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-06-30 20:30:55 +02:00
Dimitri Savineau	a05730b38a	rhcs: remove ISO install method Starting RHCS 5, there's no ISO available anymore. This removes all ISO variables and the ceph_repository_type variable. Closes: #6626 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-06-30 18:03:03 +02:00
Wong Hoi Sing Edison	beda1fe773	library: flake8 ceph-ansible modules This commit ensure all ceph-ansible modules pass flake8 properly. Signed-off-by: Wong Hoi Sing Edison <hswong3i@pantarei-design.com>	2021-06-30 15:39:48 +02:00
Guillaume Abrioux	d191ba38d3	workflows: test against 1 python version only Let's drop py3.6 and py3.7 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-30 01:47:07 +02:00
Guillaume Abrioux	8c09497567	workflows: add signed-off check This adds a github workflow for checking the signed off line in commit messages. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-30 01:47:07 +02:00
Guillaume Abrioux	d71db816c6	workflow: add group_vars/defaults checks let's use github workflow for checking defaults values. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-30 01:47:07 +02:00
Guillaume Abrioux	5ed423ad88	workflow: add syntax check This adds the ansible --syntax-check test in the ansible-lint workflow Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-30 01:47:07 +02:00
Guillaume Abrioux	304d1cbb97	tests: remove legacy file This inventory isn't used anywhere. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-29 14:53:19 +02:00
Guillaume Abrioux	26a7256c4c	shrink-mgr: modify existing mgr check Do not rely on the inventory aliases in order to check if the selected manager to be removed is present. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967897 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-29 14:53:19 +02:00
Guillaume Abrioux	31311b03ed	cephadm-adopt/rgw: add host target in svc_id If multi-realms were deployed with several instances belonging to the same realm and zone using the same port on different nodes, the service id expected by cephadm will be the same and therefore only one service will be deployed. We need to create a service called `<node>.<realm>.<zone>.<port>` to be sure the service name will be unique and well deployed on the expected node in order to preserve backward compatibility with the rgws instances that were deployed with ceph-ansible. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967455 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-29 14:41:09 +02:00
Dimitri Savineau	fc160b3be1	switch2container: run ceph-validate role This adds the ceph-validate role before starting the switch to a containerized deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1968177 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-06-28 18:06:53 +02:00
Wong Hoi Sing Edison	793d529302	library/ceph_key.py: rewrite for generate_ceph_cmd() Also code lint with flake8 Signed-off-by: Wong Hoi Sing Edison <hswong3i@pantarei-design.com>	2021-06-24 09:46:29 +02:00
Boris Ranto	2491d4e004	dashboard: Add new prometheus alert It was requested for us to update our alerting definitions to include a slow OSD Ops health check. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1951664 Signed-off-by: Boris Ranto <branto@redhat.com>	2021-06-24 09:02:21 +02:00
Guillaume Abrioux	fc784fc44c	cephadm-adopt: support rgw multisite adoption We need to support rgw multisite deployments. This commit makes the adoption playbook support this kind of deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967455 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-23 22:01:59 +02:00
Guillaume Abrioux	8279d14d32	multisite: fix bug during switch2containers When running the switch-to-containers playbook with multisite enabled, the fact "rgw_instances" is only set for the node being processed (serial: 1), the consequence of that is that the set_fact of 'rgw_instances_all' can't iterate over all rgw node in order to look up each 'rgw_instances_host'. Adding a condition checking whether hostvars[item]["rgw_instances_host"] is defined fixes this issue. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967926 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-17 01:49:29 +02:00
David Galloway	3eba2a1584	tests: Retry generating SSH vagrant config. Also add some debug. Signed-off-by: David Galloway <dgallowa@redhat.com>	2021-06-16 18:57:11 +02:00
Guillaume Abrioux	8dbee99882	nfs: do no copy client.bootstrap-rgw when using mds There's no need to copy this keyring when using nfs with mds Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-16 06:32:43 +02:00
Guillaume Abrioux	38bfad46e8	container: conditionnally disable lvmetad Enabling lvmetad in containerized deployments on el7 based OS might cause issues. This commit make it possible to disable this service if needed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1955040 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-15 20:16:38 +02:00
Guillaume Abrioux	d58500ade0	ceph_key: handle error in a better way When calling the `ceph_key` module with `state: info`, if the ceph command called fails, the actual error is hidden by the module which makes it pretty difficult to troubleshoot. The current code always states that if rc is not equal to 0 the keyring doesn't exist. `state: info` should always return the actual rc, stdout and stderr. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1964889 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-14 23:46:20 +02:00
Guillaume Abrioux	f9a73149a4	cephadm-adopt: fix mgr placement hosts task When no `[mgrs]` group is defined in the inventory, mgr daemon are implicitly collocated with monitors. This task currently relies on the length of the mgr group in order to tell cephadm to deploy mgr daemons. If there's no `[mgrs]` group defined in the inventory, it will ask cephadm to deploy 0 mgr daemon which doesn't make sense and will throw an error. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970313 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-14 10:38:37 +02:00
Guillaume Abrioux	b49cdea750	tests: allocate more memory for all_in_one job Since we fire up much less VMs than other job, we can affoard allocating more memory here for this job. Each VM hosts more daemon so 1024Mb can be too few. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-14 10:10:15 +02:00
Guillaume Abrioux	f7166cccbf	rolling_update: fix mon+rgw/multisite collocation When monitors and rgw are collocated with multisite enabled, the rolling_update playbook fails because during the workflow, we run some radosgw-admin commands very early on the first mon even though this is the monitor being upgraded, it means the container doesn't exist since it was stopped. This block is relevant only for scaling out rgw daemons or initial deployment. In rolling_update workflow, it is not needed so let's skip it. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970232 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-11 10:50:50 +02:00
Guillaume Abrioux	c2aaa96fc7	tests: use CentOS 8.4 image CentOS 8.4 vagrant image is available at https://cloud.centos.org let's use it. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-11 06:53:41 +02:00
Neelaksh Singh	d18a9860cd	Sensitive key data now hidden in output log Fixes: #6529 Signed-off-by: Neelaksh Singh <neelaksh48@gmail.com>	2021-06-08 20:46:37 +02:00
Guillaume Abrioux	d4dfa204d2	Revert "tests: disable test_mgr_dashboard_is_listening" This reverts commit `2e19d1705e`. A new build of ceph@master including the fix is available so this is not needed anymore. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-08 09:03:20 +02:00
Guillaume Abrioux	2e19d1705e	tests: disable test_mgr_dashboard_is_listening Due to a recent commit that has introduced a regression in ceph, this test is failing. Temporarily disabling it to unblock the CI. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-04 14:01:28 +02:00
Guillaume Abrioux	4daed1f137	dashboard: set cookie_secure in grafana When using grafana behind https `cookie_secure` should be set to `true`. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1966880 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-04 14:01:28 +02:00
Guillaume Abrioux	d6745e9cd9	fs2bs: use match filter in selectattr() `0990ae4109` changed the filter in selectattr() from 'match' to 'equalto' but due to an incompatibility with the Jinja2 version for python 2.7 on el7 we must stick to using 'match' filter. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-26 08:14:38 +02:00
Guillaume Abrioux	0990ae4109	fs2bs: fix wrong filter when setting osd_ids using 'match' filter in that task will lead to bad behavior if I have the following node names for instance: - node1 - node11 - node111 with `selectattr('name', 'match', inventory_hostname)` it will match 'node1' along with 'node11' and 'node111'. using 'equalto' filter will make sure we only match the target node. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1963066 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-25 16:59:30 +02:00
Guillaume Abrioux	664dae0564	prometheus: enforce osd nodes in templates When osd nodes are collocated in the clients group (HCI context for instance), the current logic will exclude osd nodes since they are present in the client group. The best fix would be to exclude clients node only when they are not member of another group but for now, as a workaround, we can enforce the addition of osd nodes to fix this specific case. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1947695 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-25 16:53:49 +02:00
Guillaume Abrioux	43b1c7bea9	vagrant_up: fix bash legacy syntax This commit rewrites the deprecated syntax used in vagrant_up.sh Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-25 10:57:00 +02:00
Guillaume Abrioux	9efca34ac3	tests: pull images from cloud.centos.org temporary work around vagrant cloud issue which seems broken at the time of pushing this commit. Let's pull images from cloud.centos.org for now since vagrant cloud hosted images return a 403 error. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-25 10:17:37 +02:00
Guillaume Abrioux	2c77d0094c	update: do not gather facts on each play There's no benefit to gather facts again on each play in rolling_update.yml Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-22 08:33:44 +02:00
Guillaume Abrioux	e6d8b058ba	nfs: get org.ganesha.nfsd.conf from container Since we need to revert `33bfb10`, this is an alternative to initial approach. We can avoid maintaining this file since it is present in container image. The idea is to simply get it from the image container and write it to the host. Fixes: #6501 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-07 13:35:37 +02:00
Dimitri Savineau	a670982a38	ceph-rgw: fix pg_autoscale_mode for pool The pg_autoscale_mode for rgw pools introduced in `9f03a52` was wrong and was missing a `value` keyword because `rgw_create_pools` is a dict. Fixes: #6516 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-05-06 10:15:13 +02:00
Guillaume Abrioux	3db1ea7ec4	update: fix ceph-crash stop task This is a workaround for an issue in ansible. When trying to stop/mask/disable this service in one task, the stop didn't actually happen, the task doesn't fail but for some reason the container is still present and running. Then the task starting the service in the role ceph-crash fails because it can't start the container since it's already running with the same name. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1955393 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-04 13:06:47 +02:00

... 2 3 4 5 6 ...

5887 Commits (1c740c424a5c397489e9b5f67bc5c0e5167a3a81) All Branches Search

5887 Commits (1c740c424a5c397489e9b5f67bc5c0e5167a3a81)

All Branches