ceph-ansible

Commit Graph

Author	SHA1	Message	Date
Guillaume Abrioux	0a3b916ee7	cephadm-adopt: add no_log: true Let's add a `no_log: true` on the `cephadm registry-login` task. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-09-28 08:11:03 +02:00
Guillaume Abrioux	d12efa1ab4	adopt: stop iscsi services in the first place If old containers are still running, it can make tcmu-runner process unable to open devices and there's nothing else to do than restarting the container. Also, as per discussion with iscsi experts, iscsi should be migrated before OSDs. (the client should be closed before the server) Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2000412 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-09-27 19:46:37 +02:00
Dimitri Savineau	9125bba48d	tests: auth_allow_insecure_global_id_reclaim false Otherwise the clients won't be able to reconnect after the reboot in the all_daemons and collocation jobs. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-09-17 07:34:40 +02:00
Guillaume Abrioux	66f3eb377c	tests: fix container-cephadm job add missing variable `containerized_deployment` in group_vars Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-09-16 16:57:16 +02:00
Guillaume Abrioux	c49d6804bd	common: install ceph-volume package After pacific release, ceph-volume has its own package. ceph-ansible has to explicitly install it on osd nodes. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-09-16 15:12:08 +02:00
Daniel Pivonka	1c50dc29cf	cephadm-adopt: set cephadm registry login info registry login info needs to be stored in cluster for cephadm and future hosts Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2000103 Signed-off-by: Daniel Pivonka <dpivonka@redhat.com>	2021-09-13 11:14:22 +02:00
Guillaume Abrioux	c42ad1f487	Revert "tests: rename grafana to monitoring" This reverts commit `a36586a777`. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-09-09 10:10:13 -04:00
Dimitri Savineau	a36586a777	tests: rename grafana to monitoring Since the grafana-server group has been renamed to monitoring then changing the associated tests. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-09-09 13:27:27 +02:00
Seena Fallah	ff39c8d70b	purge: add remove_docker tag This can help to skip docker removal tasks Signed-off-by: Seena Fallah <seenafallah@gmail.com>	2021-09-09 13:25:45 +02:00
Seena Fallah	a51ce767ca	purge: add container_binary needed for zap osds `container_binary` isn't set anymore in the purge osd play because of a regression introduced by `60aa70a`. The CI didn't catch it because the play purging node-exporter sets this variable for all nodes before we run the purge osd play. This commit fixes this regression. Signed-off-by: Seena Fallah <seenafallah@gmail.com>	2021-09-09 11:12:02 +02:00
Dimitri Savineau	e7b43c1fc6	ceph-defaults: set quay.io as the default registry Because the ceph container images are now only pushed to the quay.io registry then this updates the default registry value. The docker.io registry can still be used but doesn't receive updated container images. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-09-09 10:56:09 +02:00
Dimitri Savineau	cddc23f511	purge-dashboard: remove cid files This adds the service cid file cleanup as supported in the classic purge playbook since `b9dd253` Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786691 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-09-08 15:40:46 +02:00
Dimitri Savineau	f2bd8ae70f	tests/rgw: use json format output for user info If the radosgw user already exists then we need to have the output in json format because we are expecting to load the output with json.loads() Otherwise we have pytest failure like: ```console self = <json.decoder.JSONDecoder object at 0x7fa2f00a5fd0>, s = '', idx = 0 def raw_decode(self, s, idx=0): """Decode a JSON document from ``s`` (a ``str`` beginning with a JSON document) and return a 2-tuple of the Python representation and the index in ``s`` where the document ended. This can be used to decode a JSON document from a string that may have extraneous data at the end. """ try: obj, end = self.scan_once(s, idx) except StopIteration as err: > raise JSONDecodeError("Expecting value", s, err.value) from None E json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) ``` Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-27 13:50:20 -04:00
Dimitri Savineau	f01ae82eec	tests/rgw: add timeout 5s to radosgw-admin command If the radosgw daemons aren't up and running correctly (like not registered in the servicemap or the OSD are down) then the radosgw-admin will hang forever. Jenkins will kill the jobs after 3h but we don't want to wait until this global timeout. Adding the timeout 5 command to the radosgw-admin commands (which is already present on other ceph calls) allows the job to fail earlier. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-27 13:50:20 -04:00
Dimitri Savineau	2630f8d47a	cephadm-adopt: fix orch host add with FQDN When a node is configured with FQDN as the hostname value then the `ceph orch host add` command will fail because the `ansible_hostname` used by that command contains the short hostname which won't match the current hostname (FQDN) Instead we can use the ansible_nodename fact. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1997083 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-26 15:50:32 -04:00
Dimitri Savineau	5bb7240f87	container: explicitly pull monitoring images We don't pull the monitoring container images (alertmanager, prometheus, node-exporter and grafana) in a dedicated task like we're doing for the ceph container image. This means that the container image pull is done during the start of the systemd service. By doing this, pulling the image behind a proxy isn't working with podman. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1995574 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-23 14:12:45 -04:00
Dimitri Savineau	3905fd2126	Revert "tests: use old build of ceph@master" This reverts commit `47a451426a`. This build isn't available on shaman anymore. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-20 09:49:42 -04:00
Guillaume Abrioux	6802b8dddd	iscsi: don't set default value for trusted_ip_list It restricts access to the iSCSI API. It can be left empty if the API isn't going to be access from outside the gateway node Even though this seems to be a limited use case, it's better to leave it empty by default than having a meaningless default value. We could make this variable mandatory but that would be a breaking change. Let's just add a logic in the template in order to set this variable in the configuration file only if it was specified by users. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1994930 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-19 09:28:08 -04:00
Dimitri Savineau	8ba6101bbb	cephadm-adopt: remove ceph-nfs.target This systemd target doesn't exist at all. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-18 20:08:22 +02:00
Guillaume Abrioux	09ef465f62	containers: introduce target systemd unit This adds ceph-*.target systemd unit files support for containerized deployments. This also fixes a regression introduced by PR #6719 (rgw and nfs systemd units not getting purged) Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1962748 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-08-18 11:08:50 -04:00
Guillaume Abrioux	3d27f9e7dc	Vagrantfile: fallback on 'varant_variables.yml.sample' When using a vagrant command from the root directory of the repo, it throws an error if no 'vagrant_variables.yml' file is present. ``` Message: Errno::ENOENT: No such file or directory @ rb_sysopen - /home/guits/workspaces/ceph-ansible/vagrant_variables.yml ``` Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-08-18 09:12:40 +02:00
Seena Fallah	95bce32270	ceph-container-engine: allow override container_package_name and container_service_name Only include specific variables when they are undefined Signed-off-by: Seena Fallah <seenafallah@gmail.com>	2021-08-18 09:12:00 +02:00
Seena Fallah	67389d08d4	cephadm-adopt: use cephadm_ssh_user for ssh user Use cephadm_ssh_user to set custom user (not root) for cephadm to ssh to the hosts Signed-off-by: Seena Fallah <seenafallah@gmail.com>	2021-08-18 09:10:56 +02:00
Guillaume Abrioux	1db8fa8989	roles: remove leftover from pr #4319 pr #4319 introduced some uesless `become: true` on systemd tasks. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-08-18 09:10:15 +02:00
Guillaume Abrioux	c14e9114ba	update: gather facts only one time this play doesn't need to gather facts from localhost Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-08-17 14:41:17 -04:00
Dimitri Savineau	2ee2194ee0	ceph-dashboard: fix oject gateway integration Since [1] multiple ceph dashboard commands have been removed and this is breaking the current ceph-ansible dashboard with RGW automation. This removes the following dashboard rgw commands: - ceph dashboard set-rgw-api-access-key - ceph dashboard set-rgw-api-secret-key - ceph dashboard set-rgw-api-host - ceph dashboard set-rgw-api-port - ceph dashboard set-rgw-api-scheme Which are replaced by `ceph dashboard set-rgw-credentials` The RGW user creation task is also removed. Finally moving the delegate_to statement from the rgw tasks at the block level. [1] https://github.com/ceph/ceph/pull/42252 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-17 12:53:58 -04:00
Dimitri Savineau	687b20fb22	ceph-volume: hide OSD keyring during creation When using ceph-volume lvm create/prepare/batch then the keyring of each OSD created is displayed in the output. Let's replace those by some '*' chars. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-13 01:10:03 +02:00
Guillaume Abrioux	47a451426a	tests: use old build of ceph@master for unlocking the ci. this is intended to be reverted. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-08-13 01:05:02 +02:00
Dimitri Savineau	e44075abd6	ceph-mon: do not log monitor keyring We don't want to display the keyring in the ansible log. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-12 08:42:05 +02:00
Guillaume Abrioux	7511195738	common: do not log keyring secret let's not display any keyring secret by default in ansible log. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1980744 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-08-11 17:33:34 +02:00
Dimitri Savineau	5e0ace7e54	ceph-dashboard: fix TLS cert openssl generation With OpenSSL version prior 1.1.1 (like CentOS 7 with 1.0.2k), the -addext doesn't exist. As a solution, this uses the default openssl.cnf configuration file as a template and add the subjectAltName in the v3_ca section. This temp openssl configuration file is removed after the TLS certificate creation. This patch also move the run_once statement at the block level. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978869 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-09 14:19:17 -04:00
VasishtaShastry	478d9fdcb6	Fixes typo in rgw-add-users-buckets playbook Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>	2021-08-09 15:35:55 +02:00
Guillaume Abrioux	6f1a0634f7	dashboard: subj_alt_names fact refactor the current way the variable is built results in: ``` 2021-08-03 04:18:23,020 - ceph.ceph - INFO - ok: [ceph-sangadi-4x-indpt6-node1-installer] => changed=false ansible_facts: subj_alt_names: \|- subjectAltName=ceph-sangadi-4x-indpt6-node1-installer/subjectAltName=10.0.210.223/subjectAltName=ceph-sangadi-4x-indpt6-node1-installersubjectAltName=ceph-sangadi-4x-indpt6-node2/subjectAltName=10.0.210.252/subjectAltName=ceph-sangadi-4x-indpt6-node2/ ``` which is incorrect. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978869 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-08-05 18:53:38 -04:00
Guillaume Abrioux	930fc4c850	adopt: import rgw ssl certificate into kv store Without this, when rgw is managed by cephadm, it fails to start because the ssl certificate isn't present in the kv store. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1987010 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1988404 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com> Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-05 13:02:25 -04:00
Dimitri Savineau	7c38e64681	cephadm-adopt: remove nfs pool and namespace This has been removed from the code (orch apply name). The default pool name is now .nfs Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-05 16:59:54 +02:00
Dimitri Savineau	386661699b	infra: use dedicated variables for balancer status The balancer status is registered during the cephadm-adopt, rolling_update and swith2container playbooks. But it is also used in the ceph-handler role which is included in those playbooks too. Even if the ceph-handler tasks are skipped for rolling_update and switch2container, the balancer_status variable is erased with the skip task result. play1: register: balancer_status play2: register: balancer_status <-- skipped play3: when: (balancer_status.stdout \| from_json)['active'] \| bool This leads to issue like: The conditional check '(balancer_status.stdout \| from_json)['active'] \| bool' failed. The error was: Unexpected templating type error occurred on ({% if (balancer_status.stdout \| from_json)['active'] \| bool %} True {% else %} False {% endif %}): expected string or buffer. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1982054 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-04 17:39:54 +02:00
Teoman ONAY	9b5d97adb9	podman pids.max default value is 2048, docker's one is 4096 which are sufficient for the default value (512) of rgw thread pool size. But if its value is increased near to the pids-limit value, it does not leave place for the other processes to spawn and run within the container and the container crashes. pids-limit set to unlimited regardless of the container engine. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1987041 Signed-off-by: Teoman ONAY <tonay@redhat.com>	2021-08-04 10:20:25 +02:00
Dimitri Savineau	b02cc6931f	ceph-defaults: remove radosgw_civetweb_ variables radosgw_civetweb_xxx variables are legacy variables and users should have switched to radosgw_frontend_xxx variables instead. Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-04 09:13:08 +02:00
Dimitri Savineau	06471a4b82	osds: use osd pool ls instead of osd dump command The ceph osd pool ls detail command is a subset of the ceph osd dump command. $ ceph osd dump --format json\|wc -c 10117 $ ceph osd pool ls detail --format json\|wc -c 4740 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-02 15:51:01 +02:00
Dimitri Savineau	17784624e0	library: exit on user creation failure When the ceph dashboard user creation fails then the issue is hidden as we don't check the return code and don't print the error message in the module output. This ends up with a failure on the ceph dashboard set roles command saying that the user doesn't exist. By failing on the user creation, we will have an explicit explaination of the issue (like weak password). Closes: #6197 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-02 15:50:02 +02:00
Dimitri Savineau	e87a47cf0c	rolling_update: get ceph version when mons exist `eec3878` introduced a regression for upgrade scenarios where there's no monitor nodes at all (like ganesha standalone, external clients, etc..) TASK [get the ceph release being deployed] ********************************** task path: infrastructure-playbooks/rolling_update.yml:121 Thursday 29 July 2021 15:55:29 +0000 (0:00:00.484) 0:00:15.802 ******* fatal: [client0]: FAILED! => msg: '''dict object'' has no attribute ''mons''' Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-08-02 15:47:56 +02:00
Benoît Knecht	d7653dca95	infrastructure-playbooks: Get Ceph info in check mode In the `set osd flags` block, run the Ceph commands that gather information from the cluster (and don't make any changes to it) even when running in check mode. This allows the tasks that depend on the variables set by those tasks to succeed in check mode. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2021-07-28 14:04:54 +02:00
Benoît Knecht	498acd7527	ceph-handler: Fix osd handler in check mode Run the Ceph commands that only gather information (without making any changes to the cluster) when running Ansible in check mode. This allows the tasks that depend on the variables set by those tasks to succeed in check mode. Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>	2021-07-28 14:04:54 +02:00
Dimitri Savineau	f0ccf3ebf0	ceph-defaults: add missing grafana dashboards The radosgw-sync-overview and rbd-details grafana dashboars were missing from the list. Closes: #6758 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-27 10:49:05 -04:00
Guillaume Abrioux	eec38784ec	update: check the ceph release Check early which Ceph release is going to be deployed and fail if it doesn't correspond to the ceph-ansible version being used. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978643 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-26 18:11:22 +02:00
Dimitri Savineau	9f77b929d1	alertmanager: allow disable dashboard tls verify When using self-signed/untrusted CA certificates, alertmanager displays an error in logs. With this commit this should make those messages disappear. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1936299 Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com> Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-25 02:56:18 +02:00
Dimitri Savineau	ad05a08160	multisite: use node fqdn for endpoints when https When the rgw_multisite_proto variable is set to https then we shoudn't use the IP address in the zone endpoints list but the node FQDN to match the TLS certificate CN. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1965504 Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>	2021-07-22 21:22:12 +02:00
Guillaume Abrioux	4144074a50	purge: support osd_auto_discovery This adds a task that zaps by osd id so we can support the scenario where osds were deployed with `osd_auto_discovery` is true. Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1876860 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-22 10:49:44 -04:00
Guillaume Abrioux	17cd83bf3a	purge: merge playbooks This refactor merges the two playbooks so we only have to maintain 1 playbook. (Symlink the old purge-container-cluster.yml playbook for backward compatibility). Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-22 10:49:44 -04:00
Guillaume Abrioux	6b50401d0c	purge: drop variables from 'hosts' sections Those variables are useless given this is not possible to override them. Let's replace them with the hardcoded name instead. Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>	2021-07-22 10:49:44 -04:00

1 2 3 4 5 ...

5801 Commits (0a3b916ee75277e14358af9e8a8aff4ffa194ee6) All Branches Search

5801 Commits (0a3b916ee75277e14358af9e8a8aff4ffa194ee6)

All Branches