ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	e28c486e52	backup-and-restore: fix a typo Typo introduced during initial implementation. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2022-06-15 10:51:54 +02:00
Guillaume Abrioux	aa68b06c99	ansible: bump to ansible 2.12 Add required changes to support ansible 2.12 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2022-06-15 08:09:10 +02:00
Guillaume Abrioux	41d62596fc	cephadm_adopt: set autotune_memory_target_ratio This adds a task that sets `autotune_memory_target_ratio` depending on the value of `is_hci`. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2028693 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2022-05-30 14:56:42 +02:00
Francesco Pantano	0e9b3902b0	Add ceph_infra tag to rolling_update When the upgrade from Ceph 4 to 5 is performed in the OpenStack context, ceph-ansible triggers the rolling_update playbook, which is supposed to rollout new Ceph containers. The ceph-infra role tries to take care about firewall, ntp config and logrotate; however, TripleO manages them through tripleo-heat-templates. This patch just add an additional tag to skip the ceph-infra role in the OpenStack context. Closes: https://bugzilla.redhat.com/2090456 Signed-off-by: Francesco Pantano <fpantano@redhat.com>	2022-05-27 15:05:16 +02:00
Guillaume Abrioux	5ab46f836d	purge: reset-failed ceph-crash This ensures we always reset-failed the ceph-crash service. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2055992 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2022-05-23 10:37:42 +02:00
Guillaume Abrioux	c1649862a9	common: move to `ansible.utils.ipwrap` ipwrap has moved to ansible.utils see `db4920ebf6` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2022-05-12 22:51:31 +02:00
Guillaume Abrioux	6e2ebe857d	cephadm-adopt: remove legacy directory after adoption When this directory is left after the osd adoption, it leads to the following error: ``` [WRN] CEPHADM_REFRESH_FAILED: failed to probe daemons or devices host axdesec2ocs1n002.ecommerce.inditex.grp `cephadm ceph-volume` failed: cephadm exited with an error code: 1, stderr:Inferring config /var/lib/ceph/41555360-e96b-4b16-a37c-873e0c940091/mon.axdesec2ocs1n002/config ERROR: [Errno 2] No such file or directory: '/var/lib/ceph/41555360-e96b-4b16-a37c-873e0c940091/mon.axdesec2ocs1n002/config'. ``` this is because of an unexpected behavior regarding 'config inferring' when a legacy directory is present in /var/lib/ceph. Note: this doesn't fix the root cause, this is a workaround. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2075510 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2022-05-12 09:58:14 +02:00
Guillaume Abrioux	ed0bba4d77	contrib: add a playbook this playbook can backup or restore some ceph files. (/etc/ceph, /var/lib/ceph, ...) Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2022-04-28 22:57:27 +02:00
Teoman ONAY	f851d3232c	Using another user than root for cephadm ssh connections fails Fixes commit `da42f3d139` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2048734 Signed-off-by: Teoman ONAY <tonay@redhat.com>	2022-03-20 12:51:16 +01:00
Guillaume Abrioux	51bc8cb636	upgrade: block upgrade when rgw multisite is active With this commit, upgrading a cluster from Nautilus to Pacific with active rgw multisite replication will be blocked. This is because a lot of bugs are currently present in Pacific regarding RGW multisite. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2063702 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2022-03-20 08:13:38 +01:00
Guillaume Abrioux	266b6e739c	adopt: fix node labelling When using group of group, the playbook will apply undesired labels on nodes. This commit fixes it by applying only the expected labels. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2057528 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2022-03-03 15:52:00 +01:00
Teoman ONAY	f8c6bba657	Add cluster custom name support When using cluster custom names, cephadm commands are executed using the default admin keyring name which fails. Signed-off-by: Teoman ONAY <tonay@redhat.com>	2022-03-03 15:52:00 +01:00
Teoman ONAY	da42f3d139	Enable user to change the account used for ssh connection By default cephadm uses root account to connect remotely to other nodes in the cluster. This change allows to choose another account. This commit also allows to use a dedicated subnet for cephadm mgmt. Signed-off-by: Teoman ONAY <tonay@redhat.com>	2022-03-03 15:52:00 +01:00
Guillaume Abrioux	2f11982590	purge: ceph-crash purge fixes This fixes the service file removal and makes the playbook call `systemctl reset-failed` on the service because in Ceph Nautilus, ceph-crash doesn't handle `SIGTERM` signal. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2055992 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2022-03-02 10:08:35 +01:00
Guillaume Abrioux	f08129edf2	switch2containers: fail if less than 3 monitors This playbook doesn't support less than 3 monitors present in the inventory. Just like the rolling_update playbook, let's fail if less than 3 monitors are present. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2049132 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2022-02-21 21:07:27 +01:00
Guillaume Abrioux	94e51d5c14	adopt: fix rbd-mirror adoption We can't use `{{ cephadm_cmd }}` here because the monitors aren't yet adopted. We must use `{{ ceph_cmd }}` instead. This also fixes some filters `\| default()` (they must be moved before `\| from_json()`) Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967440 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2022-02-09 20:58:27 +01:00
Guillaume Abrioux	f30767432b	adopt: fix bug in mon_ip_list set_fact `default('{}')` must be before `\| from_json` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2022-02-09 11:32:00 +01:00
Guillaume Abrioux	ddae06e1a2	adopt: check for POOL_APP_NOT_ENABLED warning This commit makes the cephadm-adopt playbook fail if the cluster has the `POOL_APP_NOT_ENABLED` warning raised. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2040243 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2022-02-09 11:32:00 +01:00
jowsiewski	1dfd195c7e	Remove the remaining packages Signed-off-by: jowsiewski <owsiewski@gmail.com>	2022-02-04 10:00:44 +01:00
Francesco Pantano	12dd8b5df1	Add with_pkg tag on package related tasks In the OpenStack context we let the integration tool (TripleO) deal with repositories and packages. This change just adds the with_pkg tag to allow TripleO skipping both the repositories and packages installation. Signed-off-by: Francesco Pantano <fpantano@redhat.com>	2022-02-01 16:04:10 +01:00
Guillaume Abrioux	7f517cdd22	adopt: create nfs exports at the user level The current implementation is wrong. ceph-ansible lists all existing buckets and try to create an export for each of them. Instead, it's easier to create the export at the user level. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2037691 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2022-01-28 15:16:30 +01:00
Dmitriy Rabotyagov	2eb0a88a67	Use upstream config_template collection In order to reduce need of module internal maintenance and to join forces on plugin development, it's proposed to switch to using upstream version of config_template module. As it's shipped as collection, it's installation for end-users is trivial and aligns with general approach of shipping extra modules. Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@ya.ru>	2022-01-18 20:22:10 +01:00
Guillaume Abrioux	aee1f06497	cephadm-adopt: use named args in rgw export creation In order to avoid breaking changes, let's use named argument instead of positional argument syntax in the command line used to create rgw export. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2037691 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2022-01-06 15:56:07 +01:00
Guillaume Abrioux	817c03bc0e	update: speed up client play wip Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-12-15 08:42:23 +01:00
Guillaume Abrioux	7ece59b41d	cephadm-adopt: ensure /etc/ceph is present on monitoring node When deploying the monitoring stack on a dedicated node, the directory `/etc/ceph` has never been created. Therefore, the play for adopting the monitoring stack fails because it can't write the minimal config file. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2029697 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-12-07 22:06:46 +01:00
Guillaume Abrioux	20035852a4	purge: remove ceph directories on client nodes Otherwise any ceph directories are left over on client nodes after the purge. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2024815 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-12-02 08:53:24 +01:00
Guillaume Abrioux	c4fdf956bd	cephadm-adopt: bindmount /var/lib/ceph with 'ro' When collocating osds with iscsigw daemons, cephadm bindmounts the following: ``` -v /var/lib/ceph/6126c064-6a9e-4092-8a64-977930df0843/iscsi.rbd.ceph-ameenasuhani-4fs3bq-node5.vomtqb/configfs:/sys/kernel/config ``` this prevents cephadm-adopt playbook from running container and bindmounting `/var/lib/ceph:/var/lib/ceph:z` since 'ro' is enough in this playbook, let's replace the ':z' option on this bindmount with ':ro' Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2027411 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-11-30 18:39:31 +01:00
Guillaume Abrioux	e5ea2ece99	adopt: fix ceph_origin and ceph_repository defaults This is overriding those variables because the precedence at the 'block var' level is greater than the group_vars/host_vars. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2026861 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-11-29 13:40:00 +01:00
Dimitri Savineau	c41241244e	cephadm-adopt: remove logrotate configuration cephadm uses its own logrotate configuration file so ceph-ansible needs to remove that custom file during the cephadm-adopt playbook. Closes: #6944 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-11-03 08:49:24 +01:00
Guillaume Abrioux	e5edcc4214	update: move a set_fact ceph-facts roles makes decisions based on the fact `rolling_update` so it must be called before we run this role. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-11-03 08:48:09 +01:00
Guillaume Abrioux	82eee4303b	update: support --limit on monitor nodes Change needed in order to support --limit on mon nodes. Otherwise, a call to `hostvars[groups[mon_group_name][0]]['_current_monitor_address']` throws an error: ``` "The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute '_current_monitor_address'" ``` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304#c28 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-10-28 21:47:01 +02:00
Guillaume Abrioux	4f2c2af9b4	cephadm: support adding hosts with ipv6 The current implementation doesn't support adding hosts when using ipv6 addresses. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-10-28 16:37:22 +02:00
Guillaume Abrioux	2f34531304	cephadm: use public_network when adding hosts When adding host, using ansible_facts['default_ipv4']['address'] might not be the desired network, we shouldn't enforce the subnet with the default route. Let's use the public_network instead. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2006415 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-10-28 16:37:22 +02:00
Guillaume Abrioux	9aa9b4dda2	Revert "cephadm: use public_network when adding host" This reverts commit `7a12b854c4`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-10-28 16:37:22 +02:00
Guillaume Abrioux	7a12b854c4	cephadm: use public_network when adding host When adding host, using `ansible_facts['default_ipv4']['address']` might not be the desired network, we shouldn't enforce the subnet with the default route. Let's use the public_network instead. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2006415 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-10-26 21:09:06 +02:00
Guillaume Abrioux	9c794aa9bc	adopt: fix rbd mirror adoption The rbd mirroring is broken because cephadm doesn't bindmount /etc/ceph anymore. It means the keyrings and ceph config file aren't available after the migration. The idea here is to remove the current rbd mirror peer and add it back to the mon config store so we aren't bound to the /etc/ceph directory. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967440 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-10-25 15:45:17 +02:00
Guillaume Abrioux	4257410dcd	adopt: use mgr/nfs volume use the mgr 'nfs' module to recreate nfs exports. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1954971 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-10-25 15:44:53 +02:00
Guillaume Abrioux	50a21d695e	rolling_update: modify default health_osd_check_* let's do more retries with a shorter delay. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-10-25 15:44:17 +02:00
Guillaume Abrioux	fc9f87c45f	rolling_update: fix pre and post osd upgrade play when using --limit osds, the play before and after osd upgrade are skipped because we use `hosts: "{{ mon_group_name \| default('mons') }}[0]"` using `hosts: "{{ osds_group_name \| default('osds') }}" with `delegate_to` to the first monitor addresses this issue. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-10-25 14:23:00 +02:00
Guillaume Abrioux	e5cf9db2b0	update: support upgrading a subset of nodes It can be useful in a large cluster deployment to split the upgrade and only upgrade a group of nodes at a time. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-10-21 20:51:14 +02:00
Per Abildgaard Toft	84118a3063	shrink-osd: fix regression because of a wrong regex `968891f449` introduced a regression. The regex is wrong because it doesn't allow to shrink osds with id greater than 9 Fixes: #6950 Signed-off-by: Per Abildgaard Toft <per@minfejl.dk>	2021-10-21 10:01:23 +02:00
Seena Fallah	ae6be71b08	cephadm: set ssh configs at bootstrap step Add support ssh_user and ssh_config to cephadm bootstrap plugin Signed-off-by: Seena Fallah <seenafallah@gmail.com>	2021-10-15 14:40:51 +02:00
Guillaume Abrioux	968891f449	shrink-osd: check osd id format This adds a check early in order to ensure the format of osd ids passed is correct. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2005734 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-10-12 18:26:18 +02:00
Seena Fallah	5822936252	cephadm: install cephadm from repository Signed-off-by: Seena Fallah <seenafallah@gmail.com>	2021-10-08 16:56:47 +02:00
Seena Fallah	339212a7c6	cephadm-adopt: configure repository for cephadm installation Configure repository for cephadm installation and use package install in both containerized and non containerized deployment Signed-off-by: Seena Fallah <seenafallah@gmail.com>	2021-10-08 16:56:47 +02:00
Seena Fallah	0b78faa723	cephadm: use cephadm_ssh_user for ssh user Use cephadm_ssh_user to set custom user (not root) for cephadm to ssh to the hosts Signed-off-by: Seena Fallah <seenafallah@gmail.com>	2021-10-01 21:08:13 +02:00
Francesco Pantano	b7299f258b	Add ceph_nfs_adopt tag to the cephadm-adopt playbook There are existing OpenStack scenarios where nfs is still not managed by cephadm. For this reason sometimes is useful skip the nfs part of the adoption playbook and leave this daemon unmanaged. The purpose of this patch is providing a tag to enable the OpenStack operators to skip this playbook section. Closes: https://bugzilla.redhat.com/2009212 Signed-off-by: Francesco Pantano <fpantano@redhat.com>	2021-10-01 21:03:02 +02:00
Guillaume Abrioux	b555f1d1cd	cephadm: add admin label on mon nodes This is needed if you want a copy of the admin keyring on the admin nodes. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-10-01 17:17:01 +02:00
Guillaume Abrioux	0a3b916ee7	cephadm-adopt: add no_log: true Let's add a `no_log: true` on the `cephadm registry-login` task. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-09-28 08:11:03 +02:00
Guillaume Abrioux	d12efa1ab4	adopt: stop iscsi services in the first place If old containers are still running, it can make tcmu-runner process unable to open devices and there's nothing else to do than restarting the container. Also, as per discussion with iscsi experts, iscsi should be migrated before OSDs. (the client should be closed before the server) Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2000412 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-09-27 19:46:37 +02:00
Daniel Pivonka	1c50dc29cf	cephadm-adopt: set cephadm registry login info registry login info needs to be stored in cluster for cephadm and future hosts Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2000103 Signed-off-by: Daniel Pivonka <dpivonka@redhat.com>	2021-09-13 11:14:22 +02:00
Seena Fallah	ff39c8d70b	purge: add remove_docker tag This can help to skip docker removal tasks Signed-off-by: Seena Fallah <seenafallah@gmail.com>	2021-09-09 13:25:45 +02:00
Seena Fallah	a51ce767ca	purge: add container_binary needed for zap osds `container_binary` isn't set anymore in the purge osd play because of a regression introduced by `60aa70a`. The CI didn't catch it because the play purging node-exporter sets this variable for all nodes before we run the purge osd play. This commit fixes this regression. Signed-off-by: Seena Fallah <seenafallah@gmail.com>	2021-09-09 11:12:02 +02:00
Dimitri Savineau	cddc23f511	purge-dashboard: remove cid files This adds the service cid file cleanup as supported in the classic purge playbook since `b9dd253` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786691 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-09-08 15:40:46 +02:00
Dimitri Savineau	2630f8d47a	cephadm-adopt: fix orch host add with FQDN When a node is configured with FQDN as the hostname value then the `ceph orch host add` command will fail because the `ansible_hostname` used by that command contains the short hostname which won't match the current hostname (FQDN) Instead we can use the ansible_nodename fact. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1997083 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-26 15:50:32 -04:00
Dimitri Savineau	8ba6101bbb	cephadm-adopt: remove ceph-nfs.target This systemd target doesn't exist at all. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-18 20:08:22 +02:00
Guillaume Abrioux	09ef465f62	containers: introduce target systemd unit This adds ceph-*.target systemd unit files support for containerized deployments. This also fixes a regression introduced by PR #6719 (rgw and nfs systemd units not getting purged) Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1962748 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-08-18 11:08:50 -04:00
Seena Fallah	67389d08d4	cephadm-adopt: use cephadm_ssh_user for ssh user Use cephadm_ssh_user to set custom user (not root) for cephadm to ssh to the hosts Signed-off-by: Seena Fallah <seenafallah@gmail.com>	2021-08-18 09:10:56 +02:00
Guillaume Abrioux	c14e9114ba	update: gather facts only one time this play doesn't need to gather facts from localhost Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-08-17 14:41:17 -04:00
VasishtaShastry	478d9fdcb6	Fixes typo in rgw-add-users-buckets playbook Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>	2021-08-09 15:35:55 +02:00
Guillaume Abrioux	930fc4c850	adopt: import rgw ssl certificate into kv store Without this, when rgw is managed by cephadm, it fails to start because the ssl certificate isn't present in the kv store. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1987010 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1988404 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-05 13:02:25 -04:00
Dimitri Savineau	7c38e64681	cephadm-adopt: remove nfs pool and namespace This has been removed from the code (orch apply name). The default pool name is now .nfs Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-05 16:59:54 +02:00
Dimitri Savineau	386661699b	infra: use dedicated variables for balancer status The balancer status is registered during the cephadm-adopt, rolling_update and swith2container playbooks. But it is also used in the ceph-handler role which is included in those playbooks too. Even if the ceph-handler tasks are skipped for rolling_update and switch2container, the balancer_status variable is erased with the skip task result. play1: register: balancer_status play2: register: balancer_status <-- skipped play3: when: (balancer_status.stdout \| from_json)['active'] \| bool This leads to issue like: The conditional check '(balancer_status.stdout \| from_json)['active'] \| bool' failed. The error was: Unexpected templating type error occurred on ({% if (balancer_status.stdout \| from_json)['active'] \| bool %} True {% else %} False {% endif %}): expected string or buffer. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1982054 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-04 17:39:54 +02:00
Dimitri Savineau	06471a4b82	osds: use osd pool ls instead of osd dump command The ceph osd pool ls detail command is a subset of the ceph osd dump command. $ ceph osd dump --format json\|wc -c 10117 $ ceph osd pool ls detail --format json\|wc -c 4740 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-02 15:51:01 +02:00
Dimitri Savineau	e87a47cf0c	rolling_update: get ceph version when mons exist `eec3878` introduced a regression for upgrade scenarios where there's no monitor nodes at all (like ganesha standalone, external clients, etc..) TASK [get the ceph release being deployed] ********************************** task path: infrastructure-playbooks/rolling_update.yml:121 Thursday 29 July 2021 15:55:29 +0000 (0:00:00.484) 0:00:15.802 ******* fatal: [client0]: FAILED! => msg: '''dict object'' has no attribute ''mons''' Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-02 15:47:56 +02:00
Benoît Knecht	d7653dca95	infrastructure-playbooks: Get Ceph info in check mode In the `set osd flags` block, run the Ceph commands that gather information from the cluster (and don't make any changes to it) even when running in check mode. This allows the tasks that depend on the variables set by those tasks to succeed in check mode. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2021-07-28 14:04:54 +02:00
Guillaume Abrioux	eec38784ec	update: check the ceph release Check early which Ceph release is going to be deployed and fail if it doesn't correspond to the ceph-ansible version being used. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978643 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-26 18:11:22 +02:00
Guillaume Abrioux	4144074a50	purge: support osd_auto_discovery This adds a task that zaps by osd id so we can support the scenario where osds were deployed with `osd_auto_discovery` is true. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1876860 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-22 10:49:44 -04:00
Guillaume Abrioux	17cd83bf3a	purge: merge playbooks This refactor merges the two playbooks so we only have to maintain 1 playbook. (Symlink the old purge-container-cluster.yml playbook for backward compatibility). Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-22 10:49:44 -04:00
Guillaume Abrioux	6b50401d0c	purge: drop variables from 'hosts' sections Those variables are useless given this is not possible to override them. Let's replace them with the hardcoded name instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-22 10:49:44 -04:00
Dimitri Savineau	738fa9428a	common: remove unnecessary run_once statements `1303611` introduced tasks for disabling the pg_autoscaler on pools and the balancer but thoses tasks are already executed on the first monitor node so we don't need to add the run_once statement. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-21 09:55:21 -04:00
Dimitri Savineau	cf6e33346e	common: fix py2 pool_list from_json when skipped When using python 2 and the task with a loop is skipped then it generates an error. Unexpected templating type error occurred on ({{ (pool_list.stdout \| from_json)['pools'] }}): expected string or buffer Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-21 08:17:58 +02:00
Guillaume Abrioux	13036115e2	common: disable/enable pg_autoscaler The PG autoscaler can disrupt the PG checks so the idea here is to disable it and re-enable it back after the restart is done. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-20 07:37:07 +02:00
Guillaume Abrioux	60aa70a128	purge: reindent playbook This commit reindents the playbook. Also improve readability by adding an extra line between plays. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-13 09:47:30 -04:00
Dimitri Savineau	a305296384	cephadm-adopt: enable osd memory autotune for HCI This enables the osd_memory_target_autotune option on HCI environment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1973149 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-12 18:17:37 +02:00
Dimitri Savineau	97148dd58c	rolling_update: check quorum state before upgrade If one a the monitor is out of the quorum then nothing prevents the upgrade playbook to run. We only check if we have at least three monitor nodes but we should also check if those monitor nodes are correctly present in the quorum. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1952571 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-12 18:16:22 +02:00
Guillaume Abrioux	c396122ad9	update: fail the playbook if straw2 conversion failed It's better to fail the playbook so the user is aware the straw2 migration has failed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-09 11:44:06 -04:00
Guillaume Abrioux	4eb4268dee	update: followup on pr #6689 add mising 'osd' command. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-09 10:01:45 +02:00
Guillaume Abrioux	eee576477c	update: convert straw bucket After an upgrade, the presence of straw buckets will produce the following warning (HEALTH_WARN): ``` crush map has legacy tunables (require firefly, min is hammer) ``` because straw bucket is a firefly feature it needs to be converted to straw2. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967964 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-09 08:28:46 +02:00
Dimitri Savineau	aeb9f562e5	cephadm-adopt: set application on ganesha pool Set the nfs application to the ganesha pool. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1956840 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-08 20:35:58 +02:00
Dimitri Savineau	8e4ef7d6da	infra: add playbook to purge dashboard/monitoring The dashboard/monitoring stack can be deployed via the dashboard_enabled variable. But there's nothing similar if we can to remove that part only and keep the ceph cluster up and running. The current purge playbooks remove everything. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786691 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-06 09:02:37 +02:00
Guillaume Abrioux	3b804a61dd	cephadm_adopt: add any_errors_fatal on play Add any_errors_fatal: true in cephadm-adopt playbook. We should stop the playbook execution when a task throws an error. Otherwise it can lead to unexpected behavior. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1976179 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-02 22:15:07 +02:00
Guillaume Abrioux	037d8cd05e	purge: add monitoring group in final cleanup play This adds the monitoring group in the "final cleanup play" so any cid files generated are well removed when purging the cluster. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1974536 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-02 13:37:15 -04:00
Dimitri Savineau	a05730b38a	rhcs: remove ISO install method Starting RHCS 5, there's no ISO available anymore. This removes all ISO variables and the ceph_repository_type variable. Closes: #6626 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-06-30 18:03:03 +02:00
Guillaume Abrioux	26a7256c4c	shrink-mgr: modify existing mgr check Do not rely on the inventory aliases in order to check if the selected manager to be removed is present. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967897 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-29 14:53:19 +02:00
Guillaume Abrioux	31311b03ed	cephadm-adopt/rgw: add host target in svc_id If multi-realms were deployed with several instances belonging to the same realm and zone using the same port on different nodes, the service id expected by cephadm will be the same and therefore only one service will be deployed. We need to create a service called `<node>.<realm>.<zone>.<port>` to be sure the service name will be unique and well deployed on the expected node in order to preserve backward compatibility with the rgws instances that were deployed with ceph-ansible. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967455 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-29 14:41:09 +02:00
Dimitri Savineau	fc160b3be1	switch2container: run ceph-validate role This adds the ceph-validate role before starting the switch to a containerized deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1968177 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-06-28 18:06:53 +02:00
Guillaume Abrioux	fc784fc44c	cephadm-adopt: support rgw multisite adoption We need to support rgw multisite deployments. This commit makes the adoption playbook support this kind of deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967455 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-23 22:01:59 +02:00
Guillaume Abrioux	f9a73149a4	cephadm-adopt: fix mgr placement hosts task When no `[mgrs]` group is defined in the inventory, mgr daemon are implicitly collocated with monitors. This task currently relies on the length of the mgr group in order to tell cephadm to deploy mgr daemons. If there's no `[mgrs]` group defined in the inventory, it will ask cephadm to deploy 0 mgr daemon which doesn't make sense and will throw an error. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970313 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-14 10:38:37 +02:00
Guillaume Abrioux	d6745e9cd9	fs2bs: use match filter in selectattr() `0990ae4109` changed the filter in selectattr() from 'match' to 'equalto' but due to an incompatibility with the Jinja2 version for python 2.7 on el7 we must stick to using 'match' filter. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-26 08:14:38 +02:00
Guillaume Abrioux	0990ae4109	fs2bs: fix wrong filter when setting osd_ids using 'match' filter in that task will lead to bad behavior if I have the following node names for instance: - node1 - node11 - node111 with `selectattr('name', 'match', inventory_hostname)` it will match 'node1' along with 'node11' and 'node111'. using 'equalto' filter will make sure we only match the target node. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1963066 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-25 16:59:30 +02:00
Guillaume Abrioux	2c77d0094c	update: do not gather facts on each play There's no benefit to gather facts again on each play in rolling_update.yml Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-22 08:33:44 +02:00
Guillaume Abrioux	3db1ea7ec4	update: fix ceph-crash stop task This is a workaround for an issue in ansible. When trying to stop/mask/disable this service in one task, the stop didn't actually happen, the task doesn't fail but for some reason the container is still present and running. Then the task starting the service in the role ceph-crash fails because it can't start the container since it's already running with the same name. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1955393 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-05-04 13:06:47 +02:00
Guillaume Abrioux	22c18e82f0	cephadm_adopt: fix ceph-crash migration ceph-ansible leaves a ceph-crash container in containerized deployment. It means we end up with 2 ceph-crash containers running after the migration playbook is complete. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1954614 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-28 19:53:01 +02:00
Guillaume Abrioux	1f40c12502	cephadm_adopt: fix rgw placement task Due to a recent breaking change in ceph, this command must be modified to add the <svc_id> parameter. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-27 13:37:56 +02:00
Guillaume Abrioux	bb7d37fb6a	cephadm_adopt: create a 'nfs-ganesha' pool When migrating from a cluster with no MDS nodes deployed, `{{ cephfs_data_pool.name }}` doesn't exist so we need to create a pool for storing nfs export objects. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1950403 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-27 13:37:56 +02:00
Guillaume Abrioux	ddbc11c4a9	switch-to-containers: only chown corresponding files When collocating daemons, if we chown all files under `/var/lib/ceph` it can cause issues for the collocated daemons that wouldn't have been migrated yet. This commit makes the playbook chown only the files corresponding to the daemon being migrated. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-14 21:32:20 +02:00
Guillaume Abrioux	3d4267051f	fs2bs: add a final play This removes the fact `skipped_nodes` which is useless when we run with `--limit` since it gets reset when a new iteration is made. Instead, let's print within a final play which node has been skipped reusing the `skip_this_node` fact. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-14 14:56:02 +02:00
Guillaume Abrioux	a9220654f5	cephadm_adopt: support nfs-ganesha adoption This commit adds the nfs-ganesha adoption support in the `cephadm-adopt.yml` playbook. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1944504 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-12 14:43:19 +02:00
Guillaume Abrioux	1ffc4df6b6	cephadm_adopt: modify placement policy for rgw the adoption playbook should use `radosgw_num_instances` in order to determine how much rgw instance it should set recreate. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1943170 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-04-12 14:43:19 +02:00

1 2 3 4 5 ...

842 Commits (0b7f8fa6e3420b9ff0f7828906ce8c7dc6e831e6)