Commit Graph

5821 Commits (e5cf9db2b04f55196d867f5a7248b455307f4407)
 

Author SHA1 Message Date
Guillaume Abrioux e5cf9db2b0 update: support upgrading a subset of nodes
It can be useful in a large cluster deployment to split the upgrade and
only upgrade a group of nodes at a time.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-10-21 20:51:14 +02:00
Guillaume Abrioux fb8a66149b tests: add new scenario subset_update
new scenario in order to test the subset upgrade approach using tags.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-10-21 20:51:14 +02:00
Per Abildgaard Toft 84118a3063 shrink-osd: fix regression because of a wrong regex
968891f449 introduced a regression.
The regex is wrong because it doesn't allow to shrink osds with id
greater than 9

Fixes: #6950

Signed-off-by: Per Abildgaard Toft <per@minfejl.dk>
2021-10-21 10:01:23 +02:00
Seena Fallah ae6be71b08 cephadm: set ssh configs at bootstrap step
Add support ssh_user and ssh_config to cephadm bootstrap plugin

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2021-10-15 14:40:51 +02:00
Guillaume Abrioux 968891f449 shrink-osd: check osd id format
This adds a check early in order to ensure the format of osd ids passed
is correct.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2005734

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-10-12 18:26:18 +02:00
Seena Fallah 5822936252 cephadm: install cephadm from repository
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2021-10-08 16:56:47 +02:00
Seena Fallah 339212a7c6 cephadm-adopt: configure repository for cephadm installation
Configure repository for cephadm installation and use package install in both containerized and non containerized deployment

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2021-10-08 16:56:47 +02:00
Seena Fallah 4f6da9d92f ceph-validate: export validate repository vars as a task
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2021-10-08 16:56:47 +02:00
Seena Fallah e79bda9a05 ceph-common: export repository configuration to a single task
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2021-10-08 16:56:47 +02:00
Seena Fallah 0b78faa723 cephadm: use cephadm_ssh_user for ssh user
Use cephadm_ssh_user to set custom user (not root) for cephadm to ssh to the hosts

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2021-10-01 21:08:13 +02:00
Francesco Pantano b7299f258b Add ceph_nfs_adopt tag to the cephadm-adopt playbook
There are existing OpenStack scenarios where nfs is still not managed
by cephadm. For this reason sometimes is useful skip the nfs part of
the adoption playbook and leave this daemon unmanaged.
The purpose of this patch is providing a tag to enable the OpenStack
operators to skip this playbook section.

Closes: https://bugzilla.redhat.com/2009212
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
2021-10-01 21:03:02 +02:00
Guillaume Abrioux b555f1d1cd cephadm: add admin label on mon nodes
This is needed if you want a copy of the admin keyring on the admin
nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-10-01 17:17:01 +02:00
Guillaume Abrioux f277a39dfe tests: remove all references to ceph_stable_release
this is legacy and not needed anymore.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-09-30 16:13:55 +02:00
Seena Fallah fb99626987 ceph-defaults: set ceph_stable_release default to the stable branch release
ceph_stable_release is a legacy from the time where a single branch of ceph-ansible supported more than one release of ceph

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2021-09-30 16:13:55 +02:00
Guillaume Abrioux c2e46fe5a5 tests: set rgw_instances in collect-logs.yml
in order to gather rgw logs, we need rgw_instances to be set.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-09-30 16:13:34 +02:00
Guillaume Abrioux b2ccc7234a tests: update collect-logs.yml playbook
- change `ceph -s` output to json-pretty.
- gather rgw logs
- add `health detail` command

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-09-30 08:31:22 +02:00
Guillaume Abrioux 702564518b tests: move collect-logs.yml to ceph-ansible repo
related ceph-build PR: ceph/ceph-build#1914

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-09-29 15:18:45 +02:00
Alex Lambert a9680ab17f dashboard: allow disabling of unused features
Unconfigured dashboard features can lead to empty tabs in the dashboard
containing no meaningful content. Allow users to disable dashboard features
they know will not be used.

A list of features to be disabled allows the user to define a streamlined
dashboard as standard across deployments. Defaults to disabling no features,
ensuring that users are sure they do not need the dashboard feature before
disabling it.

Signed-off-by: Alex Lambert <lamberta@microsoft.com>
2021-09-29 12:02:16 +02:00
Guillaume Abrioux f8d49827a4 dashboard: retry setting rgw-credentials
for some reason, this task can fail in the CI.
Adding a retry can help to avoid this failure.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-09-29 10:29:42 +02:00
Guillaume Abrioux b6c470c7e2 tests: add osd node in collocation
we update the pool size from 1 to 2 in idempotency test
but only 1 node is available.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-09-29 10:29:42 +02:00
Guillaume Abrioux 0a3b916ee7 cephadm-adopt: add no_log: true
Let's add a `no_log: true` on the `cephadm registry-login` task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-09-28 08:11:03 +02:00
Guillaume Abrioux d12efa1ab4 adopt: stop iscsi services in the first place
If old containers are still running, it can make tcmu-runner process
unable to open devices and there's nothing else to do than restarting
the container.

Also, as per discussion with iscsi experts, iscsi should be migrated before
OSDs. (the client should be closed before the server)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2000412

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-09-27 19:46:37 +02:00
Dimitri Savineau 9125bba48d tests: auth_allow_insecure_global_id_reclaim false
Otherwise the clients won't be able to reconnect after the reboot in the
all_daemons and collocation jobs.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-09-17 07:34:40 +02:00
Guillaume Abrioux 66f3eb377c tests: fix container-cephadm job
add missing variable `containerized_deployment` in group_vars

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-09-16 16:57:16 +02:00
Guillaume Abrioux c49d6804bd common: install ceph-volume package
After pacific release, ceph-volume has its own package.
ceph-ansible has to explicitly install it on osd nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-09-16 15:12:08 +02:00
Daniel Pivonka 1c50dc29cf cephadm-adopt: set cephadm registry login info
registry login info needs to be stored in cluster for cephadm and future hosts

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2000103
Signed-off-by: Daniel Pivonka <dpivonka@redhat.com>
2021-09-13 11:14:22 +02:00
Guillaume Abrioux c42ad1f487 Revert "tests: rename grafana to monitoring"
This reverts commit a36586a777.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-09-09 10:10:13 -04:00
Dimitri Savineau a36586a777 tests: rename grafana to monitoring
Since the grafana-server group has been renamed to monitoring then
changing the associated tests.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-09-09 13:27:27 +02:00
Seena Fallah ff39c8d70b purge: add remove_docker tag
This can help to skip docker removal tasks

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2021-09-09 13:25:45 +02:00
Seena Fallah a51ce767ca purge: add container_binary needed for zap osds
`container_binary` isn't set anymore in the purge osd play because of a
regression introduced by 60aa70a.
The CI didn't catch it because the play purging node-exporter sets this
variable for all nodes before we run the purge osd play.

This commit fixes this regression.

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2021-09-09 11:12:02 +02:00
Dimitri Savineau e7b43c1fc6 ceph-defaults: set quay.io as the default registry
Because the ceph container images are now only pushed to the quay.io
registry then this updates the default registry value.
The docker.io registry can still be used but doesn't receive updated
container images.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-09-09 10:56:09 +02:00
Dimitri Savineau cddc23f511 purge-dashboard: remove cid files
This adds the service cid file cleanup as supported in the classic purge
playbook since b9dd253

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786691

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-09-08 15:40:46 +02:00
Dimitri Savineau f2bd8ae70f tests/rgw: use json format output for user info
If the radosgw user already exists then we need to have the output in json
format because we are expecting to load the output with json.loads()
Otherwise we have pytest failure like:

```console
self = <json.decoder.JSONDecoder object at 0x7fa2f00a5fd0>, s = '', idx = 0

    def raw_decode(self, s, idx=0):
        """Decode a JSON document from ``s`` (a ``str`` beginning with
        a JSON document) and return a 2-tuple of the Python
        representation and the index in ``s`` where the document ended.

        This can be used to decode a JSON document from a string that may
        have extraneous data at the end.

        """
        try:
            obj, end = self.scan_once(s, idx)
        except StopIteration as err:
>           raise JSONDecodeError("Expecting value", s, err.value) from None
E           json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
```

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-08-27 13:50:20 -04:00
Dimitri Savineau f01ae82eec tests/rgw: add timeout 5s to radosgw-admin command
If the radosgw daemons aren't up and running correctly (like not registered
in the servicemap or the OSD are down) then the radosgw-admin will hang
forever.
Jenkins will kill the jobs after 3h but we don't want to wait until this global
timeout.
Adding the timeout 5 command to the radosgw-admin commands (which is already
present on other ceph calls) allows the job to fail earlier.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-08-27 13:50:20 -04:00
Dimitri Savineau 2630f8d47a cephadm-adopt: fix orch host add with FQDN
When a node is configured with FQDN as the hostname value then the
`ceph orch host add` command will fail because the `ansible_hostname` used
by that command contains the short hostname which won't match the current
hostname (FQDN)
Instead we can use the ansible_nodename fact.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1997083

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-08-26 15:50:32 -04:00
Dimitri Savineau 5bb7240f87 container: explicitly pull monitoring images
We don't pull the monitoring container images (alertmanager, prometheus,
node-exporter and grafana) in a dedicated task like we're doing for the
ceph container image.
This means that the container image pull is done during the start of the
systemd service.
By doing this, pulling the image behind a proxy isn't working with podman.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1995574

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-08-23 14:12:45 -04:00
Dimitri Savineau 3905fd2126 Revert "tests: use old build of ceph@master"
This reverts commit 47a451426a.

This build isn't available on shaman anymore.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-08-20 09:49:42 -04:00
Guillaume Abrioux 6802b8dddd iscsi: don't set default value for trusted_ip_list
It restricts access to the iSCSI API.
It can be left empty if the API isn't going to be access from outside the
gateway node

Even though this seems to be a limited use case, it's better to leave it
empty by default than having a meaningless default value.

We could make this variable mandatory but that would be a breaking
change. Let's just add a logic in the template in order to set this
variable in the configuration file only if it was specified by users.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1994930

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
2021-08-19 09:28:08 -04:00
Dimitri Savineau 8ba6101bbb cephadm-adopt: remove ceph-nfs.target
This systemd target doesn't exist at all.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-08-18 20:08:22 +02:00
Guillaume Abrioux 09ef465f62 containers: introduce target systemd unit
This adds ceph-*.target systemd unit files support for containerized
deployments.
This also fixes a regression introduced by PR #6719 (rgw and nfs systemd
units not getting purged)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1962748

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-08-18 11:08:50 -04:00
Guillaume Abrioux 3d27f9e7dc Vagrantfile: fallback on 'varant_variables.yml.sample'
When using a vagrant command from the root directory of the repo, it
throws an error if no 'vagrant_variables.yml' file is present.

```
Message: Errno::ENOENT: No such file or directory @ rb_sysopen - /home/guits/workspaces/ceph-ansible/vagrant_variables.yml
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-08-18 09:12:40 +02:00
Seena Fallah 95bce32270 ceph-container-engine: allow override container_package_name and container_service_name
Only include specific variables when they are undefined

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2021-08-18 09:12:00 +02:00
Seena Fallah 67389d08d4 cephadm-adopt: use cephadm_ssh_user for ssh user
Use cephadm_ssh_user to set custom user (not root) for cephadm to ssh to the hosts

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2021-08-18 09:10:56 +02:00
Guillaume Abrioux 1db8fa8989 roles: remove leftover from pr #4319
pr #4319 introduced some uesless `become: true` on systemd tasks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-08-18 09:10:15 +02:00
Guillaume Abrioux c14e9114ba update: gather facts only one time
this play doesn't need to gather facts from localhost

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-08-17 14:41:17 -04:00
Dimitri Savineau 2ee2194ee0 ceph-dashboard: fix oject gateway integration
Since [1] multiple ceph dashboard commands have been removed and this is
breaking the current ceph-ansible dashboard with RGW automation.
This removes the following dashboard rgw commands:

- ceph dashboard set-rgw-api-access-key
- ceph dashboard set-rgw-api-secret-key
- ceph dashboard set-rgw-api-host
- ceph dashboard set-rgw-api-port
- ceph dashboard set-rgw-api-scheme

Which are replaced by `ceph dashboard set-rgw-credentials`

The RGW user creation task is also removed.

Finally moving the delegate_to statement from the rgw tasks at the block
level.

[1] https://github.com/ceph/ceph/pull/42252

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-08-17 12:53:58 -04:00
Dimitri Savineau 687b20fb22 ceph-volume: hide OSD keyring during creation
When using ceph-volume lvm create/prepare/batch then the keyring of each
OSD created is displayed in the output.
Let's replace those by some '*' chars.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-08-13 01:10:03 +02:00
Guillaume Abrioux 47a451426a tests: use old build of ceph@master
for unlocking the ci.
this is intended to be reverted.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-08-13 01:05:02 +02:00
Dimitri Savineau e44075abd6 ceph-mon: do not log monitor keyring
We don't want to display the keyring in the ansible log.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2021-08-12 08:42:05 +02:00
Guillaume Abrioux 7511195738 common: do not log keyring secret
let's not display any keyring secret by default in ansible log.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1980744

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2021-08-11 17:33:34 +02:00