ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	47adc2bb08	osd: add pg autoscaler support This commit adds the pg autoscaler support. The structure for pool definition has now two additional attributes `pg_autoscale_mode` and `target_size_ratio`, eg: ``` test: name: "test" pg_num: "{{ osd_pool_default_pg_num }}" pgp_num: "{{ osd_pool_default_pg_num }}" rule_name: "replicated_rule" application: "rbd" type: 1 erasure_profile: "" expected_num_objects: "" size: "{{ osd_pool_default_size }}" min_size: "{{ osd_pool_default_min_size }}" pg_autoscale_mode: False target_size_ratio": 0.1 ``` when `pg_autoscale_mode` is `True` user has to set a decent value in `target_size_ratio`. Given that it's a new feature, it's still disabled by default. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1782253 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-04 09:29:01 -05:00
Guillaume Abrioux	bf1f125d71	osd: refact osd pool creation Currently, the command executed is wrong, eg: ``` cmd: - podman - exec - ceph-mon-controller-0 - ceph - --cluster - ceph - osd - pool - create - volumes - '32' - '32' - replicated_rule - '1' delta: '0:00:01.625525' end: '2020-02-27 16:41:05.232705' item: ``` From documentation, the osd pool creation command is : ``` ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \ [crush-rule-name] [expected-num-objects] ceph osd pool create {pool-name} {pg-num} {pgp-num} erasure \ [erasure-code-profile] [crush-rule-name] [expected_num_objects] ``` it means we pass '1' (from item.type) as value for `expected_num_objects` by default which is very likely not what we want. Also, this commit modifies the default value when no `rule_name` is set to use the existing variable `osd_pool_default_crush_rule` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808495 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-04 09:29:01 -05:00
Guillaume Abrioux	896d00b50e	tests: add lvm batch filestore testing This commit adds an OSD node in lvm-batch scenario in order to test filestore backend. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-03 13:50:19 -05:00
Guillaume Abrioux	0fc99bb6fa	tests: increase journal_size value Looks like we are still seeing issue [1]. Let's increase this value to unlock the CI (however, it still needs to be investigated). Typical error (see [1] for further details) : ``` [root@osd2 ~]# ceph-volume --cluster ceph lvm batch --filestore --yes --journal-size '2048' /dev/sda /dev/sdb --journal-devices /dev/sdc Running command: /sbin/vgcreate --force --yes ceph-journals-817ef90b-77ac-4f52-b8a9-30893849fb78 /dev/sdc stdout: Physical volume "/dev/sdc" successfully created. stdout: Volume group "ceph-journals-817ef90b-77ac-4f52-b8a9-30893849fb78" successfully created --> Refusing to continue with configured size for journal --> RuntimeError: journal sizes must be larger than 2GB, detected: 1024.00 MB ``` [1] https://tracker.ceph.com/issues/41374 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-03 13:23:57 -05:00
Guillaume Abrioux	50939369ca	library: fix bug in ceph_volume This commit fixes a regression introduced by `0326d992c2`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-03-03 13:23:57 -05:00
Dimitri Savineau	d1316ce77b	shrink-rbdmirror: fix presence after removal We should add retry/delay to check the presence of the rbdmirror daemon in the cluster status because the status takes some time to be updated. Also the metadata.hostname isn't a good key to check because it doesn't reflect the ansible_hostname fact. We should use metadata.id instead. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-03 10:32:15 +01:00
Dimitri Savineau	a664159061	shrink-mgr: fix systemd condition This playbook was using mds systemd condition. Also a command task was using pipeline which is not allowed. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-03 10:32:15 +01:00
Dimitri Savineau	2f4413f5ce	tox: update shrink scenario configuration The shrink scenarios don't need the docker variables (except for OSD). Removing pytest for shrink-mgr. Adding environment variables for xxx_to_kill ansible variable. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-03 10:32:15 +01:00
Dimitri Savineau	08ac2e3034	shrink: don't use localhost node The ceph-facts are running on localhost so if this node is using a different OS/release that the ceph node we can have a mismatch between docker/podman container binary. This commit also reduces the scope of the ceph-facts role because we only need the container_binary tasks. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-03 10:32:15 +01:00
Dimitri Savineau	be8b315102	ceph-validate: add key format validation If the user provides manually the key value for a specific keyring then there's not valation on the content which could lead to unexpected failures in the ceph_key module. Closes: #5104 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-03 10:01:58 +01:00
Dimitri Savineau	9d3b49293d	purge: stop rgw instances by iteration It looks like that the service module doesn't support wildcard anymore for stopping/disabling multiple services. fatal: [rgw0]: FAILED! => changed=false msg: 'This module does not currently support using glob patterns, found '''' in service name: ceph-radosgw@' ...ignoring Instead we should iterate over the rgw_instances list. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-02 16:32:06 +01:00
Dimitri Savineau	90b1fc8fe9	ceph-infra: install firewalld python bindings When using the firewalld ansible module we need to be sure that the python bindings are installed. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-02 16:32:06 +01:00
Dimitri Savineau	45fb9241c0	ceph-infra: split firewalld tasks Since ansible 2.9 the firewalld task could not be used with service and source in the same time anymore. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-02 16:32:06 +01:00
Dimitri Savineau	aefba82a2e	Add ansible 2.9 support This commit adds ansible 2.9 support in addition of 2.8. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-03-02 16:32:06 +01:00
Guillaume Abrioux	0326d992c2	osd: add journal option in ceph_volume call (batch) This commit adds the journal option to the ceph_volume call when scenario is lvm batch Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-28 17:29:59 -05:00
Guillaume Abrioux	a2d2e70ac2	requirements: enforce ansible version requirement See https://github.com/advisories/GHSA-3m93-m4q6-mc6v Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-27 09:28:17 -05:00
Guillaume Abrioux	a084a2a347	common: support OSDs with more than 2 digits When running environment with OSDs having ID with more than 2 digits, some tasks don't match the system units and therefore, playbook can fail. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1805643 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-27 09:48:36 +01:00
Guillaume Abrioux	1de2bf9991	shrink-osd: support shrinking ceph-disk prepared osds This commit adds the ceph-disk prepared osds support Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1796453 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-26 11:45:41 -05:00
Guillaume Abrioux	55970b18f1	shrink-osd: don't run ceph-facts entirely We need to call ceph-facts only for setting `container_binary`. Since this task has been isolated we can use `tasks_from` to only execute the needed task. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-26 11:45:41 -05:00
Dimitri Savineau	535da53d69	filestore-to-bluestore: reuse dedicated journal If the filestore configuration was using a dedicated journal with either a partition or a LV/VG then we need to reuse this for bluestore DB. When filestore is using a raw devices then we shouldn't destroy everything (data + journal) but only data otherwise the journal partition won't exist anymore. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790479 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-25 16:07:21 +01:00
Dimitri Savineau	195944b123	doc: update infra playbooks statements We don't need to copy the infrastructure playbooks in the root ceph-ansible directory. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-25 15:27:52 +01:00
Dimitri Savineau	44e750ee5d	ceph-rgw: increase connection timeout to 10 5s as a connection timeout could be low in some setup. Let's increase it to 10s. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-24 16:01:36 +01:00
Francesco Pantano	15ed9eebf1	Configure ceph dashboard backend and dashboard_frontend_vip This change introduces a new set of tasks to configure the ceph dashboard backend and listen just on the mgr related subnet (and not on '*'). For the same reason the proper server address is added in both prometheus and alertmanger systemd units. This patch also adds the "dashboard_frontend_vip" parameter to make sure we're able to support the HA model when multiple grafana instances are deployed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792230 Signed-off-by: Francesco Pantano <fpantano@redhat.com>	2020-02-19 17:52:53 -05:00
Benoît Knecht	8b3df4e418	infrastructure-playbooks: Run shrink-osd tasks on monitor Instead of running shring-osd tasks on localhost and delegating most of them to the first monitor, run all of them on the first monitor directly. This has the added advantage of becoming root on the monitor only, not on localhost. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2020-02-19 11:16:30 -05:00
Dimitri Savineau	ac0f68ccf0	ceph-dashboard: update create/get rgw user tasks Since [1] if a rgw user already exists then the radosgw-admin user create command will return an error instead of modifying the current user. We were already doing separated tasks for create and get operation but only for multisite configuration but it's not enough. Instead we should do the get task first and depending on the result execute the create. This commit also adds missing run_once and delegate_to statement. [1] https://github.com/ceph/ceph/commit/269e9b9 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-18 10:22:21 +01:00
Sam Choraria	2a2656a985	ceph-rgw: allow SSL certificate content to supplied Allow SSL certificate & key contents to be written to the path specified by radosgw_frontend_ssl_certificate. This permits a certificate to be deployed & renewal of expired certificates through ceph-ansible. Signed-off-by: Sam Choraria <sam.choraria@bbc.co.uk>	2020-02-17 16:22:11 +01:00
Dimitri Savineau	c644ea9041	ceph-defaults: remove bootstrap_dirs_xxx vars Both bootstrap_dirs_owner and bootstrap_dirs_group variables aren't used anymore in the code. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 16:17:40 +01:00
Ali Maredia	1834c1e48d	rgw: extend automatic rgw pool creation capability Add support for erasure code pools. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1731148 Signed-off-by: Ali Maredia <amaredia@redhat.com> Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 16:07:43 +01:00
Florian Faltermeier	9d081e2453	ceph-rgw-loadbalancer: Fix SSL newline issue The `ad7a5da` commit introduced a regression when using TLS on haproxy via the haproxy_frontend_ssl_certificate variable. This cause the "stats socket" and the "tune.ssl.default-dh-param" parameters to be on the same line resulting haproxy failing to start. [ALERT] 351/140240 (21388) : parsing [xxxxx] : 'stats socket' : unknown keyword 'tune.ssl.default-dh-param'. Registered [ALERT] 351/140240 (21388) : Fatal errors found in configuration. Fixes: #4869 Signed-off-by: Florian Faltermeier <florian.faltermeier@uibk.ac.at>	2020-02-17 16:05:42 +01:00
Dimitri Savineau	16e12bf2bb	rgw: don't create user on secondary zones The rgw user creation for the Ceph dashboard integration shouldn't be created on secondary rgw zones. Closes: #4707 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794351 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 15:08:11 +01:00
Dimitri Savineau	100e3a044e	purge-cluster: update package list to remove We only support python3 so renaming all ceph python packages. Some ceph packages were missing from the list (ceph-mon, ceph-osd or rbd-mirror) or didn't exist anymore (ceph-fs-common, libcephfs1). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 11:33:15 +01:00
Dimitri Savineau	85d7102a95	Revert "vagrant: temp workaround for CentOS 8 cloud image" The CentOS 8 vagrant image download is now fixed. This reverts commit `a5385e1048`. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 11:30:39 +01:00
John Fulton	e4bf4857f5	The _filtered_clients list should intersect with ansible_play_batch Client configuration with --limit fails without this patch because certain tasks are only done to the first host in the _filtered_clients list and it's likely that first host will not be included in what's sepcified with --limit. To fix this the _filtered_clients list should be built from all clients in the inventory that are also in the running play. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798781 Signed-off-by: John Fulton <fulton@redhat.com>	2020-02-17 11:29:18 +01:00
Dimitri Savineau	779a4a6d71	tests: don't install s3cmd on containerized setup The s3cmd package should only be installed on non containerized deployment. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 11:27:52 +01:00
Dimitri Savineau	6dd9b25565	ceph-iscsi: don't use ceph_dev_xxx variables Using ceph_dev_branch and ceph_dev_sha1 for configuring ceph-iscsi repositories from shaman doesn't make sense because the ceph devel branches and sha1 aren't compatible with ceph-iscsi devel. Instead we could rely on the master branch and the latest sha1. Currently it's not possible to using a custom ceph branch/sha1 value with iscsi setup otherwise the repository setup will fail. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 10:56:52 +01:00
Dimitri Savineau	10951eeea8	ceph-nfs: fix ceph_nfs_ceph_user variable The ceph_nfs_ceph_user variable is a string for the ceph-nfs role but a list in ceph-client role. `6a6785b` introduced a confusion between both variable type in the ceph-nfs role for external ceph with ganesha. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1801319 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 10:56:05 +01:00
Dimitri Savineau	0a3e85e8ca	ceph-nfs: add nfs-ganesha-rados-urls package Since nfs-ganesha 2.8.3 the rados-urls library has been move to a dedicated package. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 10:52:30 +01:00
Dimitri Savineau	1fc6b33714	ceph-{mon,osd}: move default crush variables Since `ed36a11` we move the crush rules creation code from the ceph-mon to the ceph-osd role. To keep the backward compatibility we kept the possibility to set the crush variables on the mons side but we didn't move the default values. As a result, when using crush_rule_config set to true and wanted to use the default values for crush_rules then the crush rule ansible task creation will fail. "msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute 'crush_rules'" This patch move the default crush variables from ceph-mon to ceph-osd role but also use those default values when nothing is defined on the mons side. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798864 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 10:50:53 +01:00
Dimitri Savineau	15bd4cd189	ceph-grafana: fix grafana_{crt,key} condition The grafana_{crt,key} aren't boolean variables but strings. The default value is an empty string so we should do the conditional on the string length instead of the bool filter Closes: #5053 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 10:49:08 +01:00
Dimitri Savineau	b9d975385c	ceph-prometheus: add alertmanager HA config When using multiple alertmanager nodes (via the grafana-server group) then we need to specify the other peers in the configuration. https://prometheus.io/docs/alerting/alertmanager/#high-availability https://github.com/prometheus/alertmanager#high-availability Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792225 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-17 10:46:21 +01:00
Dimitri Savineau	5a03e0ee1c	containers: add KillMode=none to systemd templates Because we are relying on docker\|podman for managing containers then we don't need systemd to manage the process (like kill). Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-13 16:11:33 +01:00
Dimitri Savineau	c6e96699f7	dashboard: allow configuring multiple grafana host When using multiple grafana hosts then we push set the grafana and prometheus URL and push the dashboard layout to a single node. grafana_server_addrs is the list of all grafana nodes and used during the ceph-dashboard role (on mgr/mon nodes). grafana_server_addr is the current grafana node used during the ceph-grafana and ceph-prometheus role (on grafana-server nodes). We don't have the grafana_server_addr fact duplication code between external vs collocated nodes. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1784011 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-10 11:18:45 -05:00
Guillaume Abrioux	3700aa5385	switch_to_containers: increase health check values This commit increases the default values for the following variable consumed in switch-from-non-containerized-to-containerized-ceph-daemons.yml playbook. This also moves these variables in `ceph-defaults` role so the user can set different values if needed. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783223 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-07 14:59:14 -05:00
Dimitri Savineau	cba0c8c063	Revert "rhcs: update container image name" This wasn't necesarry. The container image was fixed on the RedHat's registry This reverts commit `3bd250c742`. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-05 15:22:56 +01:00
Dimitri Savineau	3bd250c742	rhcs: update container image name The RHCS 4 container image is rhceph/rhceph-4 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1797743 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-04 16:10:05 -05:00
Guillaume Abrioux	910fc61fdc	tests: remove legacy `osd_scenario` variable As of stable-4.0 most of these references aren't needed anymore. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2020-02-04 10:05:33 +01:00
Dimitri Savineau	298ba0bf03	ceph-facts: set devices osd_auto_discovery on OSDs We only need to set the devices fact with osd_auto_discovery on OSD nodes. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-03 16:23:38 +01:00
Dimitri Savineau	ed461544a7	ceph-facts: remove is_podman fact This was used before the CentOS 8 requirement when using CentOS 7 atomic which has both docker and podman installed. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2020-02-03 10:11:03 -05:00
wujie1993	d8b0b3cbd9	purge: fix purge cluster failed Fix purge cluster failed when local container images does not exist. Purge node-exporter and grafana-server only when dashboard_enabled is set to True. Signed-off-by: wujie1993 qq594jj@gmail.com	2020-01-31 12:09:46 -05:00
Mike Christie	77f3b5d51b	iscsi: Fix crashes during rolling update During a rolling update we will run the ceph iscsigw tasks that start the daemons then run the configure_iscsi.yml tasks which can create iscsi objects like targets, disks, clients, etc. The problem is that once the daemons are started they will accept confifguration requests, or may want to update the system themself. Those operations can then conflict with the configure_iscsi.yml tasks that setup objects and we can end up in crashes due to the kernel being in a unsupported state. This could also happen during creation, but is less likely due to no objects being setup yet, so there are no watchers or users accessing the gws yet. The fix in this patch works for both update and initial setup. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1795806 Signed-off-by: Mike Christie <mchristi@redhat.com>	2020-01-31 11:15:36 -05:00

1 2 3 4 5 ...

5180 Commits (e3ba664ca5c197356acda1c883862c8048e76169) All Branches Search

5180 Commits (e3ba664ca5c197356acda1c883862c8048e76169)

All Branches