This commit fixes a bug when trying to scale out osd nodes with
`crush_rule_config` is enabled.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1822599
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4bcc52cb2a)
We were not testing the right ansible_distribution fact value for RHEL
distribution.
This commit also updates the minial RHEL version supported by RHCS.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5de74fe512)
Since 15ed9ee the ceph-mgr daemon binds on the IP address on the public
network instead of binding on all addresses.
This commit updates the testinfra code to reflect that change.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0f0a14772c)
Unregister marks generates warnings like:
PytestUnknownMarkWarning: Unknown pytest.mark.docker - is this a typo?
You can register custom marks to avoid this warning
https://docs.pytest.org/en/latest/mark.html
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ac4f8763aa)
The workflow in this playbook should be the same than in rolling_update,
we should first set noout and nodeep-scrub flags before migrating the
first osd and unset osd flags after the last osd is migrated.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2cfaa056e0)
When setting/unsetting osd flags, we can use `tasks_from` when importing
`ceph-facts` role to save some times given that we only need this role
for setting `container_binary`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6df7887f87)
The dashboard SSO mgr module requires the saml python library to be
installed. This is only a valid scenario for RHCS deployment because
the saml python library isn't available in other classic repositories.
This package is present in RHCS Tools repository so we also need to
enable it on the mgr nodes.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1820233
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6617d90733)
Fetch the key when it is present in the cluster but not on the node.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ccfa249919)
553584cbd0 introduced a regression when no
secret is passed, it overwrites the secret each time the task is run.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 003defec03)
The add-mgrs scenario is still using the testinfra command instead of
pytest so the tests exectution are failling.
ERROR: InvocationError for command could not find executable testinfra
This also adds the missing --ssh-config option to testinfra.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 92f538f1af)
The current centos/8 vagrant image (libvirt) is still using the
CentOS 8.0 release (1905) while the 8.1 release (1911) is already
available since few months.
Using an update CentOS 8 release fixes slow ceph-volume/lvm commands.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6264f6979e)
We must call `ceph-osd` role from `container_options_facts.yml` because
ceph-osd-run.sh.j2 needs variables set in this file.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1819681
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4a4f54f6ee)
With this change, the state `present` is enough to update a keyring.
If the keyring already exist, it will be updated if caps or secret
passed to the module are different.
If the keyring doen't exist, it will be created.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808367
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 553584cbd0)
When `rule_name` isn't set in `crush_rules` the osd pool creation will
fail.
This commit adds a new fact `ceph_osd_pool_default_crush_rule_name` with
the default crush rule name.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1817586
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1bb9860dfd)
Creating crush rules even with no crush hierarchy configuration is a
valid scenario so we shouldn't be bound to the first task result (which
configure crush hierarchy) to be able to add new crush rules.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816989
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5b0476385c)
This commits removes these two symlinks.
They were there for backward compatibility and were marked deprecated as
of stable-4.0
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9219991441)
Adding `--all` to the `systemctl list-units` command in order to get
*all* osds id on the node (including stoppped osds). Otherwise, it will
purge the cluster but there will be leftover after that.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1814542
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5e7962ccf6)
as of V3.2 / octopus, the url has changed.
This commit updates the url in the playbook accordingly.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
In master, even though we are using dev repo, the value here should be closer
from the last stable released.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit adds el8 repo configuration since we are still missing these
packages in el8.
This is only intended to make CI passing on stable-5.0 and *must* be
reverted once these packages will be available.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When using the lvm batch ceph-volume subcommand with dedicated devices
for filestore (journal) or bluestore (db/wal) then the list of devices
is convert to a string instead of being extended via an iterable.
This was working with only one dedicated device but starting with more
then the ceph_volume module fails.
TASK [ceph-osd : use ceph-volume lvm batch to create bluestore osds] **
fatal: [xxxxxx]: FAILED! => changed=true
cmd:
- ceph-volume
- --cluster
- ceph
- lvm
- batch
- --bluestore
- --yes
- --prepare
- --osds-per-device
- '4'
- /dev/nvme2n1
- /dev/nvme3n1
- /dev/nvme4n1
- /dev/nvme5n1
- /dev/nvme6n1
- --db-devices
- /dev/nvme0n1 /dev/nvme1n1
- --report
- --format=json
msg: non-zero return code
rc: 2
stderr: |2-
stderr: lsblk: /dev/nvme0n1 /dev/nvme1n1: not a block device
stderr: error: /dev/nvme0n1 /dev/nvme1n1: No such file or directory
stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
usage: ceph-volume lvm batch [-h] [--db-devices [DB_DEVICES [DB_DEVICES ...]]]
[--wal-devices [WAL_DEVICES [WAL_DEVICES ...]]]
[--journal-devices [JOURNAL_DEVICES [JOURNAL_DEVICES ...]]]
[--no-auto] [--bluestore] [--filestore]
[--report] [--yes] [--format {json,pretty}]
[--dmcrypt]
[--crush-device-class CRUSH_DEVICE_CLASS]
[--no-systemd]
[--osds-per-device OSDS_PER_DEVICE]
[--block-db-size BLOCK_DB_SIZE]
[--block-wal-size BLOCK_WAL_SIZE]
[--journal-size JOURNAL_SIZE] [--prepare]
[--osd-ids [OSD_IDS [OSD_IDS ...]]]
[DEVICES [DEVICES ...]]
ceph-volume lvm batch: error: Unable to proceed with non-existing device: /dev/nvme0n1 /dev/nvme1n1
So the dedicated device list is considered as a single string.
This commit also adds the journal_devices, block_db_devices and
wal_devices documentation to the ceph_volume module.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816713
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 760b6cd7b0)
Since Ceph Octopus is python3 only we don't need to specify the max open
files anymore with the container engine.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 64701437de)
Support for debian with RHCS has been dropped starting RHCS 4
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4ac99223b2)
RHCS 5 will be based on Ceph Octopus release and only supported on
RHEL 8.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 90ad110861)
This is no longer true, let's remove this comment given that this option
is not ignored in containerized deployments.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e551b5ba1a)
The rgw_instances_all fact is supposed to be the list of all radosgw
instances from all rgw nodes.
But the fact is always using the local rgw_instances variable so this
won't work on multiple nodes.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0487d21938)
This commit allows one to set the role for the admin user as read-only.
This can be controlled via the dashboard_admin_user_ro variable but the
default value is false for backward compatibility.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1810176
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We don't use the registry name when using the community dashboard
container images (grafana, prometheus, alertmanager & node exporter).
This commit adds the docker.io registry explicitly in the default
dashboard container image name values.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Since 8e8aa73 we're using grafana 5.4.3 in RHCS 4.1 via [1].
We should also update the grafana container tag from docker.io when
using the community release.
[1] registry.redhat.io/rhceph/rhceph-4-dashboard-rhel8:4
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We are planning to release updated grafana image for ceph dashboard in
RHCS 4.1. We need to update the rhcs edut to point to the new image
then.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786107
Signed-off-by: Boris Ranto <branto@redhat.com>
This option has been deprecated (As of 0.51).
By the way, ceph-ansible already sets the
auth_{service,client,cluster}_required variables.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623586
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When using the radosgw multi instances configuration then the firewall
rules aren't adapted to that setup.
We only open the port according to the radosgw_frontend_port variable
so only the first radosgw instance port will be opened in the firewall
configuration.
We should instead iterate over the rgw_instances list.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Since [1] we can't use osd pool without replicas (size: 1) by default.
We now need to set the mon_allow_pool_size_one flag to true in the ceph
configuration and add the --yes-i-really-mean-it flag to the osd pool
set size cli.
[1] https://github.com/ceph/ceph/commit/21508bd
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This commit make that task retrying 5 times to start the service
firewalld to avoid failure like following:
```
TASK [ceph-infra : start firewalld] ********************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-centos-container-purge/roles/ceph-infra/tasks/configure_firewall.yml:22
Monday 09 March 2020 08:58:48 +0000 (0:00:00.963) 0:02:16.457 **********
fatal: [osd4]: FAILED! => changed=false
msg: |-
Unable to enable service firewalld: Created symlink from /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service to /usr/lib/systemd/system/firewalld.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/firewalld.service to /usr/lib/systemd/system/firewalld.service.
Failed to execute operation: Connection reset by peer
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Instead of volumes as a static string the openstack_cinder_pool.name
variable should be used as with the other keys.
Signed-off-by: Christian Berendt <berendt@betacloud-solutions.de>