ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	d196881ebb	cephadm-adopt: add no_log: true Let's add a `no_log: true` on the `cephadm registry-login` task. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `0a3b916ee7`)	2021-09-28 21:15:02 +02:00
Guillaume Abrioux	a053adbe84	adopt: stop iscsi services in the first place If old containers are still running, it can make tcmu-runner process unable to open devices and there's nothing else to do than restarting the container. Also, as per discussion with iscsi experts, iscsi should be migrated before OSDs. (the client should be closed before the server) Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2000412 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d12efa1ab4`)	2021-09-28 18:46:49 +02:00
Seena Fallah	cb5a675e49	cephadm-adopt: use cephadm_ssh_user for ssh user Use cephadm_ssh_user to set custom user (not root) for cephadm to ssh to the hosts Signed-off-by: Seena Fallah <seenafallah@gmail.com> (cherry picked from commit `67389d08d4`)	2021-09-13 16:26:24 +02:00
Daniel Pivonka	969e41fa2e	cephadm-adopt: set cephadm registry login info registry login info needs to be stored in cluster for cephadm and future hosts Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2000103 Signed-off-by: Daniel Pivonka <dpivonka@redhat.com> (cherry picked from commit `1c50dc29cf`)	2021-09-13 16:18:40 +02:00
Seena Fallah	432ab37c6b	purge: add remove_docker tag This can help to skip docker removal tasks Signed-off-by: Seena Fallah <seenafallah@gmail.com> (cherry picked from commit `ff39c8d70b`)	2021-09-09 16:41:32 +02:00
Seena Fallah	0897c08518	purge: add container_binary needed for zap osds `container_binary` isn't set anymore in the purge osd play because of a regression introduced by `60aa70a`. The CI didn't catch it because the play purging node-exporter sets this variable for all nodes before we run the purge osd play. This commit fixes this regression. Signed-off-by: Seena Fallah <seenafallah@gmail.com> (cherry picked from commit `a51ce767ca`)	2021-09-09 14:40:30 +02:00
Dimitri Savineau	ac6604ab61	purge-dashboard: remove cid files This adds the service cid file cleanup as supported in the classic purge playbook since `b9dd253` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786691 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `cddc23f511`)	2021-09-08 12:05:22 -04:00
Dimitri Savineau	ac5353a2d8	cephadm-adopt: fix orch host add with FQDN When a node is configured with FQDN as the hostname value then the `ceph orch host add` command will fail because the `ansible_hostname` used by that command contains the short hostname which won't match the current hostname (FQDN) Instead we can use the ansible_nodename fact. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1997083 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `2630f8d47a`)	2021-08-26 17:10:55 -04:00
Dimitri Savineau	e3e849378e	cephadm-adopt: remove ceph-nfs.target This systemd target doesn't exist at all. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `8ba6101bbb`)	2021-08-18 15:29:03 -04:00
Guillaume Abrioux	d7311aeefc	containers: introduce target systemd unit This adds ceph-*.target systemd unit files support for containerized deployments. This also fixes a regression introduced by PR #6719 (rgw and nfs systemd units not getting purged) Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1962748 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `09ef465f62`)	2021-08-18 13:42:50 -04:00
Guillaume Abrioux	056b18aa0e	update: gather facts only one time this play doesn't need to gather facts from localhost Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c14e9114ba`)	2021-08-17 15:31:34 -04:00
VasishtaShastry	6ed0919796	Fixes typo in rgw-add-users-buckets playbook Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com> (cherry picked from commit `478d9fdcb6`)	2021-08-09 14:31:42 -04:00
Guillaume Abrioux	6e9cf80747	adopt: import rgw ssl certificate into kv store Without this, when rgw is managed by cephadm, it fails to start because the ssl certificate isn't present in the kv store. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1987010 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1988404 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-authored-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `930fc4c850`)	2021-08-05 14:47:47 -04:00
Dimitri Savineau	2377da8f9b	infra: use dedicated variables for balancer status The balancer status is registered during the cephadm-adopt, rolling_update and swith2container playbooks. But it is also used in the ceph-handler role which is included in those playbooks too. Even if the ceph-handler tasks are skipped for rolling_update and switch2container, the balancer_status variable is erased with the skip task result. play1: register: balancer_status play2: register: balancer_status <-- skipped play3: when: (balancer_status.stdout \| from_json)['active'] \| bool This leads to issue like: The conditional check '(balancer_status.stdout \| from_json)['active'] \| bool' failed. The error was: Unexpected templating type error occurred on ({% if (balancer_status.stdout \| from_json)['active'] \| bool %} True {% else %} False {% endif %}): expected string or buffer. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1982054 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `386661699b`)	2021-08-04 11:43:47 -04:00
Dimitri Savineau	31cc8bd2aa	osds: use osd pool ls instead of osd dump command The ceph osd pool ls detail command is a subset of the ceph osd dump command. $ ceph osd dump --format json\|wc -c 10117 $ ceph osd pool ls detail --format json\|wc -c 4740 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `06471a4b82`)	2021-08-03 13:57:14 -04:00
Dimitri Savineau	7f5b986e01	rolling_update: get ceph version when mons exist `eec3878` introduced a regression for upgrade scenarios where there's no monitor nodes at all (like ganesha standalone, external clients, etc..) TASK [get the ceph release being deployed] ********************************** task path: infrastructure-playbooks/rolling_update.yml:121 Thursday 29 July 2021 15:55:29 +0000 (0:00:00.484) 0:00:15.802 ******* fatal: [client0]: FAILED! => msg: '''dict object'' has no attribute ''mons''' Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `e87a47cf0c`)	2021-08-03 12:17:47 -04:00
Benoît Knecht	c8348ab0d9	infrastructure-playbooks: Get Ceph info in check mode In the `set osd flags` block, run the Ceph commands that gather information from the cluster (and don't make any changes to it) even when running in check mode. This allows the tasks that depend on the variables set by those tasks to succeed in check mode. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `d7653dca95`)	2021-08-02 15:53:49 +02:00
Guillaume Abrioux	76f68843e5	update: check the ceph release Check early which Ceph release is going to be deployed and fail if it doesn't correspond to the ceph-ansible version being used. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978643 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `eec38784ec`)	2021-07-26 13:35:30 -04:00
Guillaume Abrioux	036b03a7bb	purge: support osd_auto_discovery This adds a task that zaps by osd id so we can support the scenario where osds were deployed with `osd_auto_discovery` is true. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1876860 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4144074a50`)	2021-07-22 12:43:57 -04:00
Guillaume Abrioux	b36e8ec935	purge: merge playbooks This refactor merges the two playbooks so we only have to maintain 1 playbook. (Symlink the old purge-container-cluster.yml playbook for backward compatibility). Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `17cd83bf3a`)	2021-07-22 12:43:57 -04:00
Guillaume Abrioux	fc5440b71c	purge: drop variables from 'hosts' sections Those variables are useless given this is not possible to override them. Let's replace them with the hardcoded name instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `6b50401d0c`)	2021-07-22 12:43:57 -04:00
Dimitri Savineau	e54c8e93ee	cephadm-adopt: set application on ganesha pool Set the nfs application to the ganesha pool. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1956840 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `aeb9f562e5`)	2021-07-21 16:22:25 +02:00
Dimitri Savineau	3ec8e90b34	cephadm-adopt: enable osd memory autotune for HCI This enables the osd_memory_target_autotune option on HCI environment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1973149 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `a305296384`)	2021-07-21 16:22:10 +02:00
Dimitri Savineau	f6cd8b9816	common: remove unnecessary run_once statements `1303611` introduced tasks for disabling the pg_autoscaler on pools and the balancer but thoses tasks are already executed on the first monitor node so we don't need to add the run_once statement. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `738fa9428a`)	2021-07-21 10:01:15 -04:00
Dimitri Savineau	cf734e19b7	common: fix py2 pool_list from_json when skipped When using python 2 and the task with a loop is skipped then it generates an error. Unexpected templating type error occurred on ({{ (pool_list.stdout \| from_json)['pools'] }}): expected string or buffer Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `cf6e33346e`)	2021-07-21 14:00:30 +02:00
Guillaume Abrioux	3cc8c667d0	common: disable/enable pg_autoscaler The PG autoscaler can disrupt the PG checks so the idea here is to disable it and re-enable it back after the restart is done. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `13036115e2`)	2021-07-20 11:04:25 -04:00
Guillaume Abrioux	fb825e0659	purge: reindent playbook This commit reindents the playbook. Also improve readability by adding an extra line between plays. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `60aa70a128`)	2021-07-13 14:47:44 -04:00
Dimitri Savineau	e08cb421d4	rolling_update: check quorum state before upgrade If one a the monitor is out of the quorum then nothing prevents the upgrade playbook to run. We only check if we have at least three monitor nodes but we should also check if those monitor nodes are correctly present in the quorum. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1952571 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `97148dd58c`)	2021-07-12 12:58:02 -04:00
Guillaume Abrioux	bf5d0b7374	update: fail the playbook if straw2 conversion failed It's better to fail the playbook so the user is aware the straw2 migration has failed. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c396122ad9`)	2021-07-09 16:32:47 -04:00
Guillaume Abrioux	0a348bd396	update: followup on pr #6689 add mising 'osd' command. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4eb4268dee`)	2021-07-09 11:34:12 +02:00
Guillaume Abrioux	ea8f0c7bcb	update: convert straw bucket After an upgrade, the presence of straw buckets will produce the following warning (HEALTH_WARN): ``` crush map has legacy tunables (require firefly, min is hammer) ``` because straw bucket is a firefly feature it needs to be converted to straw2. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967964 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `eee576477c`)	2021-07-09 09:15:24 +02:00
Dimitri Savineau	ec648981e6	infra: add playbook to purge dashboard/monitoring The dashboard/monitoring stack can be deployed via the dashboard_enabled variable. But there's nothing similar if we can to remove that part only and keep the ceph cluster up and running. The current purge playbooks remove everything. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786691 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `8e4ef7d6da`)	2021-07-06 11:40:31 -04:00
Guillaume Abrioux	f80837c23e	cephadm_adopt: add any_errors_fatal on play Add any_errors_fatal: true in cephadm-adopt playbook. We should stop the playbook execution when a task throws an error. Otherwise it can lead to unexpected behavior. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1976179 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3b804a61dd`)	2021-07-03 11:58:57 +02:00
Guillaume Abrioux	0d4b029057	purge: add monitoring group in final cleanup play This adds the monitoring group in the "final cleanup play" so any cid files generated are well removed when purging the cluster. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1974536 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `037d8cd05e`)	2021-07-02 14:36:48 -04:00
Guillaume Abrioux	676aad9ea2	update: do not gather facts on each play There's no benefit to gather facts again on each play in rolling_update.yml Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `2c77d0094c`)	2021-06-30 20:39:25 +02:00
Dimitri Savineau	48f47e7023	rhcs: remove ISO install method Starting RHCS 5, there's no ISO available anymore. This removes all ISO variables and the ceph_repository_type variable. Closes: #6626 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `a05730b38a`)	2021-06-30 20:33:44 +02:00
Dimitri Savineau	f64a4258ea	switch2container: run ceph-validate role This adds the ceph-validate role before starting the switch to a containerized deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1968177 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `fc160b3be1`)	2021-06-30 09:29:58 +02:00
Guillaume Abrioux	16dc991351	shrink-mgr: modify existing mgr check Do not rely on the inventory aliases in order to check if the selected manager to be removed is present. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967897 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `26a7256c4c`)	2021-06-29 17:52:22 +02:00
Guillaume Abrioux	0856d3e47f	cephadm-adopt/rgw: add host target in svc_id If multi-realms were deployed with several instances belonging to the same realm and zone using the same port on different nodes, the service id expected by cephadm will be the same and therefore only one service will be deployed. We need to create a service called `<node>.<realm>.<zone>.<port>` to be sure the service name will be unique and well deployed on the expected node in order to preserve backward compatibility with the rgws instances that were deployed with ceph-ansible. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967455 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `31311b03ed`)	2021-06-29 15:18:49 +02:00
Guillaume Abrioux	aa332ac64d	cephadm-adopt: support rgw multisite adoption We need to support rgw multisite deployments. This commit makes the adoption playbook support this kind of deployment. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967455 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `fc784fc44c`)	2021-06-24 09:48:27 +02:00
Guillaume Abrioux	93f1765259	update: block upgrade when nfs+rgw is deployed This is an unsupported configuration since there are issues with RGW+NFS upgraded from Nautilus to Pacific. This approach might be seen as a bit aggressive but it is preferable to wait before upgrading in that case. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970003 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-06-16 19:39:42 +02:00
Guillaume Abrioux	17f9780274	cephadm-adopt: fix mgr placement hosts task When no `[mgrs]` group is defined in the inventory, mgr daemon are implicitly collocated with monitors. This task currently relies on the length of the mgr group in order to tell cephadm to deploy mgr daemons. If there's no `[mgrs]` group defined in the inventory, it will ask cephadm to deploy 0 mgr daemon which doesn't make sense and will throw an error. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1970313 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `f9a73149a4`)	2021-06-14 13:55:45 +02:00
Guillaume Abrioux	8dda6d0b4d	fs2bs: use match filter in selectattr() `0990ae4109` changed the filter in selectattr() from 'match' to 'equalto' but due to an incompatibility with the Jinja2 version for python 2.7 on el7 we must stick to using 'match' filter. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d6745e9cd9`)	2021-05-26 09:15:43 +02:00
Guillaume Abrioux	b2759c0c51	fs2bs: fix wrong filter when setting osd_ids using 'match' filter in that task will lead to bad behavior if I have the following node names for instance: - node1 - node11 - node111 with `selectattr('name', 'match', inventory_hostname)` it will match 'node1' along with 'node11' and 'node111'. using 'equalto' filter will make sure we only match the target node. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1963066 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `0990ae4109`)	2021-05-25 20:50:10 +02:00
Guillaume Abrioux	d319da14c8	update: fix ceph-crash stop task This is a workaround for an issue in ansible. When trying to stop/mask/disable this service in one task, the stop didn't actually happen, the task doesn't fail but for some reason the container is still present and running. Then the task starting the service in the role ceph-crash fails because it can't start the container since it's already running with the same name. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1955393 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3db1ea7ec4`)	2021-05-04 15:59:46 +02:00
Guillaume Abrioux	747d259511	cephadm_adopt: fix ceph-crash migration ceph-ansible leaves a ceph-crash container in containerized deployment. It means we end up with 2 ceph-crash containers running after the migration playbook is complete. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1954614 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `22c18e82f0`)	2021-04-29 07:14:17 +02:00
Guillaume Abrioux	60c0fb8a7a	cephadm_adopt: fix rgw placement task Due to a recent breaking change in ceph, this command must be modified to add the <svc_id> parameter. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `1f40c12502`)	2021-04-27 15:17:28 +02:00
Guillaume Abrioux	a1f445cc73	cephadm_adopt: create a 'nfs-ganesha' pool When migrating from a cluster with no MDS nodes deployed, `{{ cephfs_data_pool.name }}` doesn't exist so we need to create a pool for storing nfs export objects. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1950403 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `bb7d37fb6a`)	2021-04-27 15:17:28 +02:00
Guillaume Abrioux	e332051b46	switch-to-containers: only chown corresponding files When collocating daemons, if we chown all files under `/var/lib/ceph` it can cause issues for the collocated daemons that wouldn't have been migrated yet. This commit makes the playbook chown only the files corresponding to the daemon being migrated. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `ddbc11c4a9`)	2021-04-15 05:24:12 +02:00
Guillaume Abrioux	fd0da6f43c	fs2bs: add a final play This removes the fact `skipped_nodes` which is useless when we run with `--limit` since it gets reset when a new iteration is made. Instead, let's print within a final play which node has been skipped reusing the `skip_this_node` fact. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3d4267051f`)	2021-04-14 16:46:31 +02:00

1 2 3 4 5 ...

745 Commits (c204166696cac5dd955f579d3cdb46ed02e46c0b)