shrink_osd has its own tox config file (tox-shrink_osd.ini)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6623f34679)
CentOS 8 is EOL as of December 2021.
Let's use CentOS stream 8 instead.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bc36f60e8d)
Configure repository for cephadm installation and use package install in both
containerized and non containerized deployment
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit 339212a7c6)
The dashboard/monitoring stack can be deployed via the dashboard_enabled
variable. But there's nothing similar if we can to remove that part only
and keep the ceph cluster up and running.
The current purge playbooks remove everything.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786691
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8e4ef7d6da)
Since we use the rerun plugin in tox, we shouldn't need to add these
`sleep` commands.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e835c77a0e)
In order to avoid false positive in the CI that I've been unable to
reproduce.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f7fd1c2298)
Add the possibility to deploy rgw multisite configuration with a mix of
secondary and primary zones on a same rgw node.
Before that, on a same node, all instances were either primary
zones *OR* secondary.
Now you can define a rgw instance like following:
```
rgw_instances:
- instance_name: 'rgw0'
rgw_zonemaster: false
rgw_zonesecondary: true
rgw_zonegroupmaster: false
rgw_realm: 'france'
rgw_zonegroup: 'zonegroup-france'
rgw_zone: paris-00
radosgw_address: "{{ _radosgw_address }}"
radosgw_frontend_port: 8080
rgw_zone_user: jacques.chirac
rgw_zone_user_display_name: "Jacques Chirac"
system_access_key: P9Eb6S8XNyo4dtZZUUMy
system_secret_key: qqHCUtfdNnpHq3PZRHW5un9l0bEBM812Uhow0XfB
endpoint: http://192.168.101.12:8080
```
Basically it's now possible to define `rgw_zonemaster`,
`rgw_zonesecondary` and `rgw_zonegroupmaster` at the intsance
level instead of the whole node level.
Also, this commit adds an option `deploy_secondary_zones` (default True)
which can be set to `False` in order to explicitly ask the playbook to
not deploy secondary zones in case where the corresponding endpoint are
not deployed yet.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1915478
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 71a5e666e3)
When deploying the ceph OSD via the packages then the ceph-osd@.service
unit is configured as enabled-runtime.
This means that each ceph-osd service will inherit from that state.
The enabled-runtime systemd state doesn't survive after a reboot.
For non containerized deployment the OSD are still starting after a
reboot because there's the ceph-volume@.service and/or ceph-osd.target
units that are doing the job.
$ systemctl list-unit-files|egrep '^ceph-(volume|osd)'|column -t
ceph-osd@.service enabled-runtime
ceph-volume@.service enabled
ceph-osd.target enabled
When switching to containerized deployment we are stopping/disabling
ceph-osd@XX.servive, ceph-volume and ceph.target and then removing the
systemd unit files.
But the new systemd units for containerized ceph-osd service will still
inherit from ceph-osd@.service unit file.
As a consequence, if an OSD host is rebooting after the playbook execution
then the ceph-osd service won't come back because they aren't enabled at
boot.
This patch also adds a reboot and testinfra run after running the switch
to container playbook.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881288
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fa2bb3af86)
This playbook isn't needed anymore, we can achieve this operation by
running main playbook with `--limit` option.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 20718582da)
This adds an optional cephadm_adopt scenario which is based on
all_daemons.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 14eed63921)
This commit makes all jobs authenticating to docker hub in order to
avoid the rate limit.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 40307f810c)
Otherwise it might try to use the system installed version of ansible
when there's one available.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6d9acb5e6d)
The add-mgrs scenario is still using the testinfra command instead of
pytest so the tests exectution are failling.
ERROR: InvocationError for command could not find executable testinfra
This also adds the missing --ssh-config option to testinfra.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 92f538f1af)
This commits removes these two symlinks.
They were there for backward compatibility and were marked deprecated as
of stable-4.0
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9219991441)
The shrink scenarios don't need the docker variables (except for OSD).
Removing pytest for shrink-mgr.
Adding environment variables for xxx_to_kill ansible variable.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This commit replaces the playbook used for add_osds job given
accordingly to the add-osd.yml playbook removal
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit adds a playbook to be played before we run purge playbook,
it first creates an rbd image then map an rbd device on client0 so the
purge playbook will try to unmap it.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
It doesn't make sense to test the old 3.0.x container images with
nautilus+ ceph releases.
Also disable the dashboard deployment and switch to bluestore backend.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The secondary vagrant variables didn't have the grafana vm variable
set which create an vagrant error.
There was an error loading a Vagrantfile. The file being loaded
and the error message are shown below. This is usually caused by
an invalid or undefined variable.
This patch also changes the ssh-extra-args parameter to ssh-common-args
to get the same values for ssh/sftp/scp. Otherwise we can see warnings
from ansible and some tasks are failing.
[WARNING]: sftp transfer mechanism failed on [mon0]. Use ANSIBLE_DEBUG=1
to see detailed information
It also updates the ssh-common-args value for the rgw-multisite scenario
to reflect the ANSIBLE_SSH_ARGS environment variable value.
Finally changing the IP addresses due to the Vagrant refact done in the
commit 778c51a
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This change implements a filter_plugin that is used in the
ceph-facts, ceph-validate roles and infrastucture-playbooks.
The new filter plugin will return a list of all IP address
that reside in any one of the given IP ranges. The new filter
replaces the use of the ipaddr filter.
ceph.conf already support a comma separated list of CIDRs
for the public_network and cluster_network options.
Changes: [1] and [2] introduced a regression in ceph-ansible
where public_network can no longer be a comma separated list
of cidrs.
With this change a comma separated list of subnet CIDRs can
also be used for monitor_address_block and radosgw_address_block.
[1] commit: d67230b2a2
[2] commit: 20e4852888
Related-To: https://bugs.launchpad.net/tripleo/+bug/1840030
Related-To: https://bugzilla.redhat.com/show_bug.cgi?id=1740283Closes: #4333
Please backport to stable-4.0
Signed-off-by: Harald Jensås <hjensas@redhat.com>
The ANSIBLE_CONFIG value wasn't set correctly for two scenarios. This
environment variable doesn't use '-F'.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
test switch_to_containers job against the latest ceph@master
ceph-container image tag available.
In order to be sure the ceph release deployed in the first step (non
containerized deployment) isn't newer than the tag used for the
containerized migration (which would mean we try to downgrade the
version).
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add a new functional test that deploys a Ceph cluster with three nodes
for MON, OSD and RGW and then runs shrink-rgw.yml to test it.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
this setting is not needed here since we explicitely set it for
container and non container context.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add a new functional test that deploys Ceph cluster with three nodes for
MON, OSD and RBD Mirror and, then, runs shrink-rbdmirror.yml to test it.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Add a new functional test that deploys a Ceph cluster with three nodes
for MON, OSD and MGR and then runs shrink-mgr.yml to test it.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Add a new functional test that deploys a Ceph cluster with three nodes
for MON, OSD and MDS and then runs shrink-mds.yml to test it.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
adding back a sleep 30s after nodes have rebooted before running
testinfra.
This was removed accidentally by d5be83e
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Because we're using vagrant, a ssh config file will be created for
each nodes with options like user, host, port, identity, etc...
But via tox we're override ANSIBLE_SSH_ARGS to use this file. This
remove the default value set in ansible.cfg.
Also adding PreferredAuthentications=publickey because CentOS/RHEL
servers are configured with GSSAPIAuthenticationis enabled for ssh
server forcing the client to make a PTR DNS query.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
the rhel8 image used is an outdated beta version, it is not worth it to
maintain this image upstream, since it's possible to test podman with a
newer version of centos/atomic-host image.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit adds `pytest-rerunfailures` in requirements.txt so we can
retry failing test in testinfra to avoid false positive. (eg: sometimes it
can happen for some reason a service takes too much time to start)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
In order to support ansible 2.8 with testinfra we need to use the
latest release (3.0.x).
Adding ssh-config option to py.test.
Also bumping the pytest and xdist version.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Since a1a871c we don't need to copy the infrastructure playbooks
under the ceph-ansible root directory.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The current lvm_osds only tests filestore on one OSD node.
We also have bs_lvm_osds to test bluestore and encryption.
Let's use only one scenario to test filestore/bluestore and with or
without dmcrypt on four OSD nodes.
Also use validate_dmcrypt_bool_value instead of types.boolean on
dmcrypt validation via notario.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Add a playbook that deploys manager on a new node and adds that node to
the already deployed Ceph cluster.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Add a tox scenario that adds a new RGW node as a part of already
deployed Ceph cluster and deploys RGW there.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>