Commit Graph

5702 Commits (f2eab356d6d9c0506eec6412200f7e7fa9d7886c)
 

Author SHA1 Message Date
Guillaume Abrioux f2eab356d6 ceph_volume: support overriding bind-mounts
This makes it possible to call `podman run` with custom bind-mounts.

cephadm-adopt.yml playbook needs it for a very specific use case:

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2027411

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b02d71c307)

# Conflicts:
#	library/ceph_volume.py
2021-12-02 08:52:05 +01:00
Guillaume Abrioux 9423ec3eb6 adopt: fix ceph_origin and ceph_repository defaults
This is overriding those variables because the precedence at the 'block
var' level is greater than the group_vars/host_vars.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2026861

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e5ea2ece99)
2021-11-30 10:57:34 +01:00
Guillaume Abrioux 53dc75d29c validate: fix bug when using vault
since a variable encrypted with vault is no longer a string but a
encrypted object we can't use the filter | length, we have to convert it
to a string before.

Fixes: #6991

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6ad7e52869)
2021-11-29 13:42:24 +01:00
Guillaume Abrioux efc93f5669 cephadm: support adding hosts with ipv6
The current implementation doesn't support adding hosts when using ipv6
addresses.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4f2c2af9b4)
2021-11-08 10:36:27 +01:00
Guillaume Abrioux d06c856fca cephadm: use public_network when adding hosts
When adding host, using ansible_facts['default_ipv4']['address'] might
not be the desired network, we shouldn't enforce the subnet with the
default route.
Let's use the public_network instead.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2006415

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2f34531304)
2021-11-08 10:36:27 +01:00
Guillaume Abrioux 5f7ad182f9 update: move a set_fact
ceph-facts roles makes decisions based on the fact `rolling_update` so
it must be called before we run this role.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e5edcc4214)
2021-11-03 11:50:38 +01:00
Guillaume Abrioux e63df909af update: support --limit on monitor nodes
Change needed in order to support --limit on mon nodes.
Otherwise, a call to `hostvars[groups[mon_group_name][0]]['_current_monitor_address']`
throws an error:

```
"The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute '_current_monitor_address'"
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304#c28

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 82eee4303b)
2021-11-03 08:48:51 +01:00
Guillaume Abrioux 9526425111 rolling_update: modify default health_osd_check_*
let's do more retries with a shorter delay.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 50a21d695e)
2021-10-25 20:38:09 +02:00
Guillaume Abrioux 0d1c0c2813 rolling_update: fix pre and post osd upgrade play
when using --limit osds, the play before and after osd upgrade are
skipped because we use `hosts: "{{ mon_group_name | default('mons') }}[0]"`
using `hosts: "{{ osds_group_name | default('osds') }}" with
`delegate_to` to the first monitor addresses this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fc9f87c45f)
2021-10-25 20:15:17 +02:00
Guillaume Abrioux 120ed2b7f3 tests: add new scenario subset_update
new scenario in order to test the subset upgrade approach using tags.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fb8a66149b)
2021-10-25 20:15:17 +02:00
Guillaume Abrioux 1019c7bf25 update: support upgrading a subset of nodes
It can be useful in a large cluster deployment to split the upgrade and
only upgrade a group of nodes at a time.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e5cf9db2b0)
2021-10-25 20:15:17 +02:00
Guillaume Abrioux d73dde0fc7 adopt: fix rbd mirror adoption
The rbd mirroring is broken because cephadm doesn't bindmount /etc/ceph anymore.
It means the keyrings and ceph config file aren't available after the
migration.
The idea here is to remove the current rbd mirror peer and add it back
to the mon config store so we aren't bound to the /etc/ceph directory.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967440

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9c794aa9bc)
2021-10-25 20:14:24 +02:00
Per Abildgaard Toft 4271670a83 shrink-osd: fix regression because of a wrong regex
968891f449 introduced a regression.
The regex is wrong because it doesn't allow to shrink osds with id
greater than 9

Fixes: #6950

Signed-off-by: Per Abildgaard Toft <per@minfejl.dk>
(cherry picked from commit 84118a3063)
2021-10-21 12:38:45 +02:00
Guillaume Abrioux c9582945fa adopt: import rgw ssl certificate into kv store
Without this, when rgw is managed by cephadm, it fails to start because
the ssl certificate isn't present in the kv store.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1987010
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1988404

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 930fc4c850)
(cherry picked from commit 6e9cf80747)
2021-10-18 18:38:47 +02:00
Dimitri Savineau 4ab40842df library: make cephadm_adopt module idempotent
Running the cephadm_adopt module on an already adopted daemon will
fail because the cephadm adopt command isn't idempotent.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918424

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ff9d314305)
2021-10-18 18:38:47 +02:00
Dimitri Savineau 864acaae10 cephadm-adopt: make the playbook idempotent
If the cephadm-adopt.yml fails during the first execution and some
daemons have already been adopted by cephadm then we can't rerun
the playbook because the old container won't exist anymore.

Error: no container with name or ID ceph-mon-xxx found: no such container

If the daemons are adopted then the old systemd unit doesn't exist anymore
so any call to that unit with systemd will fail.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918424

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6886700a00)
2021-10-18 18:38:47 +02:00
Seena Fallah 360cfb156d cephadm: install cephadm from repository
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit 5822936252)
2021-10-18 18:38:47 +02:00
Seena Fallah 5e5f45d633 cephadm-adopt: configure repository for cephadm installation
Configure repository for cephadm installation and use package install in both
containerized and non containerized deployment

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit 339212a7c6)
2021-10-18 18:38:47 +02:00
Seena Fallah 075b1a94d5 ceph-validate: export validate repository vars as a task
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit 4f6da9d92f)
2021-10-18 18:38:47 +02:00
Seena Fallah 110b08c290 ceph-common: export repository configuration to a single task
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit e79bda9a05)
2021-10-18 18:38:47 +02:00
Seena Fallah 057f8e4315 cephadm: set ssh configs at bootstrap step
Add support ssh_user and ssh_config to cephadm bootstrap plugin

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit ae6be71b08)
2021-10-15 15:13:18 +02:00
Guillaume Abrioux 21a4c16b06 shrink-osd: check osd id format
This adds a check early in order to ensure the format of osd ids passed
is correct.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2005734

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 968891f449)
2021-10-15 14:35:34 +02:00
Guillaume Abrioux 5e40cb8957 tests: remove all references to ceph_stable_release
this is legacy and not needed anymore.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f277a39dfe)
2021-10-02 15:50:24 +02:00
Seena Fallah 59c7238741 ceph-defaults: set ceph_stable_release default to the stable branch release
ceph_stable_release is a legacy from the time where a single branch of ceph-ansible supported more than one release of ceph

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit fb99626987)
2021-10-02 15:50:24 +02:00
Francesco Pantano 2e93c80f73 Add ceph_nfs_adopt tag to the cephadm-adopt playbook
There are existing OpenStack scenarios where nfs is still not managed
by cephadm. For this reason sometimes is useful skip the nfs part of
the adoption playbook and leave this daemon unmanaged.
The purpose of this patch is providing a tag to enable the OpenStack
operators to skip this playbook section.

Closes: https://bugzilla.redhat.com/2009212
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
(cherry picked from commit b7299f258b)
2021-10-01 23:32:47 +02:00
Seena Fallah d2da6f8974 cephadm: use cephadm_ssh_user for ssh user
Use cephadm_ssh_user to set custom user (not root) for cephadm to ssh to the hosts

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit 0b78faa723)
2021-10-01 23:32:16 +02:00
Guillaume Abrioux 16e41d3a81 tests: add osd node in collocation
we update the pool size from 1 to 2 in idempotency test
but only 1 node is available.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b6c470c7e2)
2021-09-30 18:30:54 +02:00
Guillaume Abrioux f6fc6dcf7e tests: set rgw_instances in collect-logs.yml
in order to gather rgw logs, we need rgw_instances to be set.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c2e46fe5a5)
2021-09-30 17:53:01 +02:00
Guillaume Abrioux 77f8d7dfaa tests: update collect-logs.yml playbook
- change `ceph -s` output to json-pretty.
- gather rgw logs
- add `health detail` command

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b2ccc7234a)
2021-09-30 17:48:35 +02:00
Guillaume Abrioux 4682334924 tests: move collect-logs.yml to ceph-ansible repo
related ceph-build PR: ceph/ceph-build#1914

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 702564518b)
2021-09-29 16:41:28 +02:00
Alex Lambert de17b232e6 dashboard: allow disabling of unused features
Unconfigured dashboard features can lead to empty tabs in the dashboard
containing no meaningful content. Allow users to disable dashboard features
they know will not be used.

A list of features to be disabled allows the user to define a streamlined
dashboard as standard across deployments. Defaults to disabling no features,
ensuring that users are sure they do not need the dashboard feature before
disabling it.

Signed-off-by: Alex Lambert <lamberta@microsoft.com>
(cherry picked from commit a9680ab17f)
2021-09-29 14:28:26 +02:00
Guillaume Abrioux 5904a25684 tests: fix container-cephadm job
add missing variable `containerized_deployment` in group_vars

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 66f3eb377c)
2021-09-29 09:58:14 +02:00
Guillaume Abrioux da10c22500 cephadm-adopt: add no_log: true
Let's add a `no_log: true` on the `cephadm registry-login` task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0a3b916ee7)
2021-09-28 21:15:29 +02:00
Guillaume Abrioux 276b9fd49e adopt: stop iscsi services in the first place
If old containers are still running, it can make tcmu-runner process
unable to open devices and there's nothing else to do than restarting
the container.

Also, as per discussion with iscsi experts, iscsi should be migrated before
OSDs. (the client should be closed before the server)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2000412

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d12efa1ab4)
2021-09-28 18:47:02 +02:00
Seena Fallah 25e078f685 purge: add remove_docker tag
This can help to skip docker removal tasks

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit ff39c8d70b)
2021-09-14 20:49:55 +02:00
Seena Fallah eef429a75b cephadm-adopt: use cephadm_ssh_user for ssh user
Use cephadm_ssh_user to set custom user (not root) for cephadm to ssh to the hosts

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit 67389d08d4)
2021-09-14 20:49:33 +02:00
Daniel Pivonka c8cadaa154 cephadm-adopt: set cephadm registry login info
registry login info needs to be stored in cluster for cephadm and future hosts

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2000103
Signed-off-by: Daniel Pivonka <dpivonka@redhat.com>
(cherry picked from commit 1c50dc29cf)
2021-09-13 16:18:53 +02:00
Seena Fallah c8841cdf41 purge: add container_binary needed for zap osds
`container_binary` isn't set anymore in the purge osd play because of a
regression introduced by 60aa70a.
The CI didn't catch it because the play purging node-exporter sets this
variable for all nodes before we run the purge osd play.

This commit fixes this regression.

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit a51ce767ca)
2021-09-09 14:40:42 +02:00
Dimitri Savineau 380d25a752 ceph-defaults: set quay.io as the default registry
Because the ceph container images are now only pushed to the quay.io
registry then this updates the default registry value.
The docker.io registry can still be used but doesn't receive updated
container images.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e7b43c1fc6)
2021-09-09 13:43:02 +02:00
Dimitri Savineau befe57d017 purge-dashboard: remove cid files
This adds the service cid file cleanup as supported in the classic purge
playbook since b9dd253

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786691

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cddc23f511)
2021-09-08 12:05:25 -04:00
Seena Fallah 688a673c48 ceph-container-engine: allow override container_package_name and container_service_name
Only include specific variables when they are undefined

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit 95bce32270)
2021-09-08 15:35:19 +02:00
Dimitri Savineau d054864366 tests/rgw: use json format output for user info
If the radosgw user already exists then we need to have the output in json
format because we are expecting to load the output with json.loads()
Otherwise we have pytest failure like:

```console
self = <json.decoder.JSONDecoder object at 0x7fa2f00a5fd0>, s = '', idx = 0

    def raw_decode(self, s, idx=0):
        """Decode a JSON document from ``s`` (a ``str`` beginning with
        a JSON document) and return a 2-tuple of the Python
        representation and the index in ``s`` where the document ended.

        This can be used to decode a JSON document from a string that may
        have extraneous data at the end.

        """
        try:
            obj, end = self.scan_once(s, idx)
        except StopIteration as err:
>           raise JSONDecodeError("Expecting value", s, err.value) from None
E           json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
```

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f2bd8ae70f)
2021-08-27 14:40:32 -04:00
Dimitri Savineau dc4b8445fa tests/rgw: add timeout 5s to radosgw-admin command
If the radosgw daemons aren't up and running correctly (like not registered
in the servicemap or the OSD are down) then the radosgw-admin will hang
forever.
Jenkins will kill the jobs after 3h but we don't want to wait until this global
timeout.
Adding the timeout 5 command to the radosgw-admin commands (which is already
present on other ceph calls) allows the job to fail earlier.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f01ae82eec)
2021-08-27 14:40:32 -04:00
Dimitri Savineau 6baa6e6b84 container: explicitly pull monitoring images
We don't pull the monitoring container images (alertmanager, prometheus,
node-exporter and grafana) in a dedicated task like we're doing for the
ceph container image.
This means that the container image pull is done during the start of the
systemd service.
By doing this, pulling the image behind a proxy isn't working with podman.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1995574

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5bb7240f87)
2021-08-23 16:08:16 -04:00
Guillaume Abrioux 6892e02a30 iscsi: don't set default value for trusted_ip_list
It restricts access to the iSCSI API.
It can be left empty if the API isn't going to be access from outside the
gateway node

Even though this seems to be a limited use case, it's better to leave it
empty by default than having a meaningless default value.

We could make this variable mandatory but that would be a breaking
change. Let's just add a logic in the template in order to set this
variable in the configuration file only if it was specified by users.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1994930

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6802b8dddd)
2021-08-19 12:06:50 -04:00
Guillaume Abrioux afe442a18f containers: introduce target systemd unit
This adds ceph-*.target systemd unit files support for containerized
deployments.
This also fixes a regression introduced by PR #6719 (rgw and nfs systemd
units not getting purged)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1962748

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 09ef465f62)
2021-08-18 13:42:56 -04:00
Guillaume Abrioux e7d9d0a7d4 roles: remove leftover from pr #4319
pr #4319 introduced some uesless `become: true` on systemd tasks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1db8fa8989)
2021-08-18 11:08:28 -04:00
Guillaume Abrioux da54ea555e Vagrantfile: fallback on 'varant_variables.yml.sample'
When using a vagrant command from the root directory of the repo, it
throws an error if no 'vagrant_variables.yml' file is present.

```
Message: Errno::ENOENT: No such file or directory @ rb_sysopen - /home/guits/workspaces/ceph-ansible/vagrant_variables.yml
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3d27f9e7dc)
2021-08-18 11:08:06 -04:00
Guillaume Abrioux 492c2b5389 update: gather facts only one time
this play doesn't need to gather facts from localhost

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c14e9114ba)
2021-08-17 15:31:41 -04:00
Dimitri Savineau a6b6706fdb ceph-mon: do not log monitor keyring
We don't want to display the keyring in the ansible log.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e44075abd6)
2021-08-12 13:31:00 +02:00