ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Seena Fallah	12f0f711f4	ceph-defaults: set ceph_stable_release default to the stable branch release ceph_stable_release is a legacy from the time where a single branch of ceph-ansible supported more than one release of ceph Signed-off-by: Seena Fallah <seenafallah@gmail.com> (cherry picked from commit `fb99626987`)	2021-10-02 20:45:35 +02:00
Francesco Pantano	642a83dc6b	Add ceph_nfs_adopt tag to the cephadm-adopt playbook There are existing OpenStack scenarios where nfs is still not managed by cephadm. For this reason sometimes is useful skip the nfs part of the adoption playbook and leave this daemon unmanaged. The purpose of this patch is providing a tag to enable the OpenStack operators to skip this playbook section. Closes: https://bugzilla.redhat.com/2009212 Signed-off-by: Francesco Pantano <fpantano@redhat.com> (cherry picked from commit `b7299f258b`)	2021-10-01 23:32:33 +02:00
Seena Fallah	c3fe1a6206	cephadm: use cephadm_ssh_user for ssh user Use cephadm_ssh_user to set custom user (not root) for cephadm to ssh to the hosts Signed-off-by: Seena Fallah <seenafallah@gmail.com> (cherry picked from commit `0b78faa723`)	2021-10-01 23:31:39 +02:00
Guillaume Abrioux	4b5a0c0443	cephadm: add admin label on mon nodes This is needed if you want a copy of the admin keyring on the admin nodes. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `b555f1d1cd`)	2021-10-01 23:23:06 +02:00
Guillaume Abrioux	af964565bc	dashboard: retry setting rgw-credentials for some reason, this task can fail in the CI. Adding a retry can help to avoid this failure. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `f8d49827a4`)	2021-09-30 18:30:38 +02:00
Guillaume Abrioux	c204166696	tests: add osd node in collocation we update the pool size from 1 to 2 in idempotency test but only 1 node is available. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `b6c470c7e2`)	2021-09-30 18:30:38 +02:00
Guillaume Abrioux	a4e979df09	tests: set rgw_instances in collect-logs.yml in order to gather rgw logs, we need rgw_instances to be set. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c2e46fe5a5`)	2021-09-30 17:52:41 +02:00
Guillaume Abrioux	7f6cb83f51	tests: update collect-logs.yml playbook - change `ceph -s` output to json-pretty. - gather rgw logs - add `health detail` command Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `b2ccc7234a`)	2021-09-30 17:48:21 +02:00
Guillaume Abrioux	7ecc85cc41	tests: move collect-logs.yml to ceph-ansible repo related ceph-build PR: ceph/ceph-build#1914 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `702564518b`)	2021-09-29 16:41:16 +02:00
Alex Lambert	fe617bed09	dashboard: allow disabling of unused features Unconfigured dashboard features can lead to empty tabs in the dashboard containing no meaningful content. Allow users to disable dashboard features they know will not be used. A list of features to be disabled allows the user to define a streamlined dashboard as standard across deployments. Defaults to disabling no features, ensuring that users are sure they do not need the dashboard feature before disabling it. Signed-off-by: Alex Lambert <lamberta@microsoft.com> (cherry picked from commit `a9680ab17f`)	2021-09-29 16:31:34 +02:00
Guillaume Abrioux	dbc19729bd	tests: fix container-cephadm job add missing variable `containerized_deployment` in group_vars Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `66f3eb377c`)	2021-09-29 09:57:59 +02:00
Guillaume Abrioux	d196881ebb	cephadm-adopt: add no_log: true Let's add a `no_log: true` on the `cephadm registry-login` task. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `0a3b916ee7`)	2021-09-28 21:15:02 +02:00
Guillaume Abrioux	a053adbe84	adopt: stop iscsi services in the first place If old containers are still running, it can make tcmu-runner process unable to open devices and there's nothing else to do than restarting the container. Also, as per discussion with iscsi experts, iscsi should be migrated before OSDs. (the client should be closed before the server) Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2000412 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `d12efa1ab4`)	2021-09-28 18:46:49 +02:00
Dimitri Savineau	942420088a	ceph-dashboard: fix oject gateway integration Since [1] multiple ceph dashboard commands have been removed and this is breaking the current ceph-ansible dashboard with RGW automation. This removes the following dashboard rgw commands: - ceph dashboard set-rgw-api-access-key - ceph dashboard set-rgw-api-secret-key - ceph dashboard set-rgw-api-host - ceph dashboard set-rgw-api-port - ceph dashboard set-rgw-api-scheme Which are replaced by `ceph dashboard set-rgw-credentials` The RGW user creation task is also removed. Finally moving the delegate_to statement from the rgw tasks at the block level. [1] https://github.com/ceph/ceph/pull/42252 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `2ee2194ee0`)	2021-09-18 06:35:33 +02:00
Seena Fallah	cb5a675e49	cephadm-adopt: use cephadm_ssh_user for ssh user Use cephadm_ssh_user to set custom user (not root) for cephadm to ssh to the hosts Signed-off-by: Seena Fallah <seenafallah@gmail.com> (cherry picked from commit `67389d08d4`)	2021-09-13 16:26:24 +02:00
Daniel Pivonka	969e41fa2e	cephadm-adopt: set cephadm registry login info registry login info needs to be stored in cluster for cephadm and future hosts Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2000103 Signed-off-by: Daniel Pivonka <dpivonka@redhat.com> (cherry picked from commit `1c50dc29cf`)	2021-09-13 16:18:40 +02:00
Seena Fallah	432ab37c6b	purge: add remove_docker tag This can help to skip docker removal tasks Signed-off-by: Seena Fallah <seenafallah@gmail.com> (cherry picked from commit `ff39c8d70b`)	2021-09-09 16:41:32 +02:00
Seena Fallah	0897c08518	purge: add container_binary needed for zap osds `container_binary` isn't set anymore in the purge osd play because of a regression introduced by `60aa70a`. The CI didn't catch it because the play purging node-exporter sets this variable for all nodes before we run the purge osd play. This commit fixes this regression. Signed-off-by: Seena Fallah <seenafallah@gmail.com> (cherry picked from commit `a51ce767ca`)	2021-09-09 14:40:30 +02:00
Dimitri Savineau	121bb58f20	ceph-defaults: set quay.io as the default registry Because the ceph container images are now only pushed to the quay.io registry then this updates the default registry value. The docker.io registry can still be used but doesn't receive updated container images. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `e7b43c1fc6`)	2021-09-09 13:42:49 +02:00
Dimitri Savineau	ac6604ab61	purge-dashboard: remove cid files This adds the service cid file cleanup as supported in the classic purge playbook since `b9dd253` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786691 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `cddc23f511`)	2021-09-08 12:05:22 -04:00
Seena Fallah	1626caaf6a	ceph-container-engine: allow override container_package_name and container_service_name Only include specific variables when they are undefined Signed-off-by: Seena Fallah <seenafallah@gmail.com> (cherry picked from commit `95bce32270`)	2021-09-08 15:35:04 +02:00
Dimitri Savineau	feb0ba9dcc	tests/rgw: use json format output for user info If the radosgw user already exists then we need to have the output in json format because we are expecting to load the output with json.loads() Otherwise we have pytest failure like: ```console self = <json.decoder.JSONDecoder object at 0x7fa2f00a5fd0>, s = '', idx = 0 def raw_decode(self, s, idx=0): """Decode a JSON document from ``s`` (a ``str`` beginning with a JSON document) and return a 2-tuple of the Python representation and the index in ``s`` where the document ended. This can be used to decode a JSON document from a string that may have extraneous data at the end. """ try: obj, end = self.scan_once(s, idx) except StopIteration as err: > raise JSONDecodeError("Expecting value", s, err.value) from None E json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) ``` Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `f2bd8ae70f`)	2021-08-27 14:32:56 -04:00
Dimitri Savineau	0186819855	tests/rgw: add timeout 5s to radosgw-admin command If the radosgw daemons aren't up and running correctly (like not registered in the servicemap or the OSD are down) then the radosgw-admin will hang forever. Jenkins will kill the jobs after 3h but we don't want to wait until this global timeout. Adding the timeout 5 command to the radosgw-admin commands (which is already present on other ceph calls) allows the job to fail earlier. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `f01ae82eec`)	2021-08-27 14:32:56 -04:00
Dimitri Savineau	ac5353a2d8	cephadm-adopt: fix orch host add with FQDN When a node is configured with FQDN as the hostname value then the `ceph orch host add` command will fail because the `ansible_hostname` used by that command contains the short hostname which won't match the current hostname (FQDN) Instead we can use the ansible_nodename fact. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1997083 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `2630f8d47a`)	2021-08-26 17:10:55 -04:00
Dimitri Savineau	f71b172d2b	container: explicitly pull monitoring images We don't pull the monitoring container images (alertmanager, prometheus, node-exporter and grafana) in a dedicated task like we're doing for the ceph container image. This means that the container image pull is done during the start of the systemd service. By doing this, pulling the image behind a proxy isn't working with podman. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1995574 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `5bb7240f87`)	2021-08-23 16:08:10 -04:00
Guillaume Abrioux	299c81c58b	iscsi: don't set default value for trusted_ip_list It restricts access to the iSCSI API. It can be left empty if the API isn't going to be access from outside the gateway node Even though this seems to be a limited use case, it's better to leave it empty by default than having a meaningless default value. We could make this variable mandatory but that would be a breaking change. Let's just add a logic in the template in order to set this variable in the configuration file only if it was specified by users. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1994930 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-authored-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `6802b8dddd`)	2021-08-19 12:06:42 -04:00
Dimitri Savineau	e3e849378e	cephadm-adopt: remove ceph-nfs.target This systemd target doesn't exist at all. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `8ba6101bbb`)	2021-08-18 15:29:03 -04:00
Guillaume Abrioux	d7311aeefc	containers: introduce target systemd unit This adds ceph-*.target systemd unit files support for containerized deployments. This also fixes a regression introduced by PR #6719 (rgw and nfs systemd units not getting purged) Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1962748 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `09ef465f62`)	2021-08-18 13:42:50 -04:00
Guillaume Abrioux	a4f8bc688d	roles: remove leftover from pr #4319 pr #4319 introduced some uesless `become: true` on systemd tasks. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `1db8fa8989`)	2021-08-18 11:08:21 -04:00
Guillaume Abrioux	d3ab0a1ca7	Vagrantfile: fallback on 'varant_variables.yml.sample' When using a vagrant command from the root directory of the repo, it throws an error if no 'vagrant_variables.yml' file is present. ``` Message: Errno::ENOENT: No such file or directory @ rb_sysopen - /home/guits/workspaces/ceph-ansible/vagrant_variables.yml ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `3d27f9e7dc`)	2021-08-18 11:07:58 -04:00
Guillaume Abrioux	056b18aa0e	update: gather facts only one time this play doesn't need to gather facts from localhost Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `c14e9114ba`)	2021-08-17 15:31:34 -04:00
Dimitri Savineau	490a1c6ba6	ceph-mon: do not log monitor keyring We don't want to display the keyring in the ansible log. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `e44075abd6`)	2021-08-12 13:30:50 +02:00
Guillaume Abrioux	634baa9b63	common: do not log keyring secret let's not display any keyring secret by default in ansible log. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1980744 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `7511195738`)	2021-08-11 14:59:25 -04:00
Dimitri Savineau	4b090b661e	ceph-dashboard: fix TLS cert openssl generation With OpenSSL version prior 1.1.1 (like CentOS 7 with 1.0.2k), the -addext doesn't exist. As a solution, this uses the default openssl.cnf configuration file as a template and add the subjectAltName in the v3_ca section. This temp openssl configuration file is removed after the TLS certificate creation. This patch also move the run_once statement at the block level. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978869 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `5e0ace7e54`)	2021-08-09 15:14:30 -04:00
Guillaume Abrioux	f02295be85	dashboard: subj_alt_names fact refactor the current way the variable is built results in: ``` 2021-08-03 04:18:23,020 - ceph.ceph - INFO - ok: [ceph-sangadi-4x-indpt6-node1-installer] => changed=false ansible_facts: subj_alt_names: \|- subjectAltName=ceph-sangadi-4x-indpt6-node1-installer/subjectAltName=10.0.210.223/subjectAltName=ceph-sangadi-4x-indpt6-node1-installersubjectAltName=ceph-sangadi-4x-indpt6-node2/subjectAltName=10.0.210.252/subjectAltName=ceph-sangadi-4x-indpt6-node2/ ``` which is incorrect. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978869 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `6f1a0634f7`)	2021-08-09 15:14:30 -04:00
VasishtaShastry	6ed0919796	Fixes typo in rgw-add-users-buckets playbook Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com> (cherry picked from commit `478d9fdcb6`)	2021-08-09 14:31:42 -04:00
Guillaume Abrioux	6e9cf80747	adopt: import rgw ssl certificate into kv store Without this, when rgw is managed by cephadm, it fails to start because the ssl certificate isn't present in the kv store. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1987010 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1988404 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-authored-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `930fc4c850`)	2021-08-05 14:47:47 -04:00
Teoman ONAY	f8facde33a	podman pids.max default value is 2048, docker's one is 4096 which are sufficient for the default value (512) of rgw thread pool size. But if its value is increased near to the pids-limit value, it does not leave place for the other processes to spawn and run within the container and the container crashes. pids-limit set to unlimited regardless of the container engine. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1987041 Signed-off-by: Teoman ONAY <tonay@redhat.com> (cherry picked from commit `9b5d97adb9`)	2021-08-05 11:04:18 -04:00
Dimitri Savineau	2377da8f9b	infra: use dedicated variables for balancer status The balancer status is registered during the cephadm-adopt, rolling_update and swith2container playbooks. But it is also used in the ceph-handler role which is included in those playbooks too. Even if the ceph-handler tasks are skipped for rolling_update and switch2container, the balancer_status variable is erased with the skip task result. play1: register: balancer_status play2: register: balancer_status <-- skipped play3: when: (balancer_status.stdout \| from_json)['active'] \| bool This leads to issue like: The conditional check '(balancer_status.stdout \| from_json)['active'] \| bool' failed. The error was: Unexpected templating type error occurred on ({% if (balancer_status.stdout \| from_json)['active'] \| bool %} True {% else %} False {% endif %}): expected string or buffer. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1982054 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `386661699b`)	2021-08-04 11:43:47 -04:00
Dimitri Savineau	31cc8bd2aa	osds: use osd pool ls instead of osd dump command The ceph osd pool ls detail command is a subset of the ceph osd dump command. $ ceph osd dump --format json\|wc -c 10117 $ ceph osd pool ls detail --format json\|wc -c 4740 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `06471a4b82`)	2021-08-03 13:57:14 -04:00
Dimitri Savineau	17884d9848	library: exit on user creation failure When the ceph dashboard user creation fails then the issue is hidden as we don't check the return code and don't print the error message in the module output. This ends up with a failure on the ceph dashboard set roles command saying that the user doesn't exist. By failing on the user creation, we will have an explicit explaination of the issue (like weak password). Closes: #6197 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `17784624e0`)	2021-08-03 13:27:54 -04:00
Dimitri Savineau	7f5b986e01	rolling_update: get ceph version when mons exist `eec3878` introduced a regression for upgrade scenarios where there's no monitor nodes at all (like ganesha standalone, external clients, etc..) TASK [get the ceph release being deployed] ********************************** task path: infrastructure-playbooks/rolling_update.yml:121 Thursday 29 July 2021 15:55:29 +0000 (0:00:00.484) 0:00:15.802 ******* fatal: [client0]: FAILED! => msg: '''dict object'' has no attribute ''mons''' Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `e87a47cf0c`)	2021-08-03 12:17:47 -04:00
Benoît Knecht	c8348ab0d9	infrastructure-playbooks: Get Ceph info in check mode In the `set osd flags` block, run the Ceph commands that gather information from the cluster (and don't make any changes to it) even when running in check mode. This allows the tasks that depend on the variables set by those tasks to succeed in check mode. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `d7653dca95`)	2021-08-02 15:53:49 +02:00
Benoît Knecht	39fa5e2f2c	ceph-handler: Fix osd handler in check mode Run the Ceph commands that only gather information (without making any changes to the cluster) when running Ansible in check mode. This allows the tasks that depend on the variables set by those tasks to succeed in check mode. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch> (cherry picked from commit `498acd7527`)	2021-08-02 15:53:49 +02:00
Dimitri Savineau	877b99b17e	ceph-defaults: update grafana dashboards source We currently download the grafana dashboars from the ceph@master branch for all ceph releases. We should use the right ceph branch according to the ceph release. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-27 11:44:43 -04:00
Dimitri Savineau	6f5f1a1955	ceph-defaults: add missing grafana dashboards The radosgw-sync-overview and rbd-details grafana dashboars were missing from the list. Closes: #6758 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `f0ccf3ebf0`)	2021-07-27 10:53:40 -04:00
Guillaume Abrioux	76f68843e5	update: check the ceph release Check early which Ceph release is going to be deployed and fail if it doesn't correspond to the ceph-ansible version being used. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978643 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `eec38784ec`)	2021-07-26 13:35:30 -04:00
Dimitri Savineau	d0a122e296	alertmanager: allow disable dashboard tls verify When using self-signed/untrusted CA certificates, alertmanager displays an error in logs. With this commit this should make those messages disappear. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1936299 Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com> Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `9f77b929d1`)	2021-07-25 22:01:43 -04:00
Dimitri Savineau	ebc961f7ff	multisite: use node fqdn for endpoints when https When the rgw_multisite_proto variable is set to https then we shoudn't use the IP address in the zone endpoints list but the node FQDN to match the TLS certificate CN. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1965504 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com> (cherry picked from commit `ad05a08160`)	2021-07-22 22:47:50 +02:00
Guillaume Abrioux	036b03a7bb	purge: support osd_auto_discovery This adds a task that zaps by osd id so we can support the scenario where osds were deployed with `osd_auto_discovery` is true. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1876860 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> (cherry picked from commit `4144074a50`)	2021-07-22 12:43:57 -04:00

1 2 3 4 5 ...

5885 Commits (0162fdc30db085bb70f4c5cc51a3d8d04519b230) All Branches Search

5885 Commits (0162fdc30db085bb70f4c5cc51a3d8d04519b230)

All Branches