When using the lvm batch ceph-volume subcommand with dedicated devices
for bluestore (db/wal) then the list of devices is convert to a string
instead of being extended via an iterable.
This was working with only one dedicated device but starting with more
then the ceph_volume module fails.
TASK [ceph-osd : use ceph-volume lvm batch to create bluestore osds] **
fatal: [xxxxxx]: FAILED! => changed=true
cmd:
- ceph-volume
- --cluster
- ceph
- lvm
- batch
- --bluestore
- --yes
- --prepare
- --osds-per-device
- '4'
- /dev/nvme2n1
- /dev/nvme3n1
- /dev/nvme4n1
- /dev/nvme5n1
- /dev/nvme6n1
- --db-devices
- /dev/nvme0n1 /dev/nvme1n1
- --report
- --format=json
msg: non-zero return code
rc: 2
stderr: |2-
stderr: lsblk: /dev/nvme0n1 /dev/nvme1n1: not a block device
stderr: error: /dev/nvme0n1 /dev/nvme1n1: No such file or directory
stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
usage: ceph-volume lvm batch [-h] [--db-devices [DB_DEVICES [DB_DEVICES ...]]]
[--wal-devices [WAL_DEVICES [WAL_DEVICES ...]]]
[--journal-devices [JOURNAL_DEVICES [JOURNAL_DEVICES ...]]]
[--no-auto] [--bluestore] [--filestore]
[--report] [--yes] [--format {json,pretty}]
[--dmcrypt]
[--crush-device-class CRUSH_DEVICE_CLASS]
[--no-systemd]
[--osds-per-device OSDS_PER_DEVICE]
[--block-db-size BLOCK_DB_SIZE]
[--block-wal-size BLOCK_WAL_SIZE]
[--journal-size JOURNAL_SIZE] [--prepare]
[--osd-ids [OSD_IDS [OSD_IDS ...]]]
[DEVICES [DEVICES ...]]
ceph-volume lvm batch: error: Unable to proceed with non-existing device: /dev/nvme0n1 /dev/nvme1n1
So the dedicated device list is considered as a single string.
This commit also adds the block_db_devices and wal_devices documentation
to the ceph_volume module.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816713
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 760b6cd7b0)
The latest Ceph stable release is now Octopus so the "latest" container
image tag is pointing to Octopus and not Nautilus anymore.
This commit updates the ceph_docker_image_tag with "latest-nautilus".
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Support for debian with RHCS has been dropped starting RHCS 4
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4ac99223b2)
This is no longer true, let's remove this comment given that this option
is not ignored in containerized deployments.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e551b5ba1a)
The rgw_instances_all fact is supposed to be the list of all radosgw
instances from all rgw nodes.
But the fact is always using the local rgw_instances variable so this
won't work on multiple nodes.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0487d21938)
In fc02fc9 the group_vars samples have been generated but only for
monitor_address variable not radosgw_address.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b46839bad0)
This commit allows one to set the role for the admin user as read-only.
This can be controlled via the dashboard_admin_user_ro variable but the
default value is false for backward compatibility.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1810176
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fb69f6990c)
Since 8e8aa73 we're using grafana 5.4.3 in RHCS 4.1 via [1].
We should also update the grafana container tag from docker.io when
using the community release.
[1] registry.redhat.io/rhceph/rhceph-4-dashboard-rhel8:4
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b97a4d5201)
We are planning to release updated grafana image for ceph dashboard in
RHCS 4.1. We need to update the rhcs edut to point to the new image
then.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786107
Signed-off-by: Boris Ranto <branto@redhat.com>
(cherry picked from commit 8e8aa735e0)
This option has been deprecated (As of 0.51).
By the way, ceph-ansible already sets the
auth_{service,client,cluster}_required variables.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623586
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 152c2caa9f)
We don't need to copy the infrastructure playbooks in the root
ceph-ansible directory.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 195944b123)
This commit adds the rgw multi-instances support in ceph-handler
(restart_rgw_daemons.sh.j2)
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3626c688cf)
When using the radosgw multi instances configuration then the firewall
rules aren't adapted to that setup.
We only open the port according to the radosgw_frontend_port variable
so only the first radosgw instance port will be opened in the firewall
configuration.
We should instead iterate over the rgw_instances list.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e8bf0a0cf2)
This commit make that task retrying 5 times to start the service
firewalld to avoid failure like following:
```
TASK [ceph-infra : start firewalld] ********************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-centos-container-purge/roles/ceph-infra/tasks/configure_firewall.yml:22
Monday 09 March 2020 08:58:48 +0000 (0:00:00.963) 0:02:16.457 **********
fatal: [osd4]: FAILED! => changed=false
msg: |-
Unable to enable service firewalld: Created symlink from /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service to /usr/lib/systemd/system/firewalld.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/firewalld.service to /usr/lib/systemd/system/firewalld.service.
Failed to execute operation: Connection reset by peer
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b3d943fe9f)
We only disable the ceph-osd services but not the ceph-volume lvm
services during the filestore to bluestore migration.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 38a683e5bf)
If the filestore configuration was using a dedicated journal with either
a partition or a LV/VG then we need to reuse this for bluestore DB.
When filestore is using a raw devices then we shouldn't destroy
everything (data + journal) but only data otherwise the journal
partition won't exist anymore.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790479
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 535da53d69)
3.4 is the latest testinfra release available but python2 is dropped
starting 4.0.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ccec67aa6a)
Sometimes, these task can timeout for some reason.
Adding these retries can help to avoid unexcepted failures.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7a8a719e75)
Update this scenario due to recent changes.
e361c0ff94 added more nodes in order to
cover code erasure pool creation in the CI. Therefore, we must add more
nodes in the 3.2 deployment too.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
There's no need to run this part of the role when upgrading clients
node. Let's skip it when rolling_update.yml is being run.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit eac207091b)
This commit adds more osd nodes in all_daemons scenario in order to test
erasure pool creation.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9f0c6df94f)
This commit changes the value passed for the attribute 'rule_name' in
openstack_pools definition. It doesn't make sense to have emptry string
as passed value here.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 248978596a)
This commit adds condition in order to not try to customize pools size
when its type is erasure.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e17c79b871)
This commit makes the CI testing an OSD pool erasure creation due to the
recent refact of the OSD pool creation tasks in the playbook.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8cacba1f54)
This commit adds the pg autoscaler support.
The structure for pool definition has now two additional attributes
`pg_autoscale_mode` and `target_size_ratio`, eg:
```
test:
name: "test"
pg_num: "{{ osd_pool_default_pg_num }}"
pgp_num: "{{ osd_pool_default_pg_num }}"
rule_name: "replicated_rule"
application: "rbd"
type: 1
erasure_profile: ""
expected_num_objects: ""
size: "{{ osd_pool_default_size }}"
min_size: "{{ osd_pool_default_min_size }}"
pg_autoscale_mode: False
target_size_ratio": 0.1
```
when `pg_autoscale_mode` is `True` user has to set a decent value in
`target_size_ratio`.
Given that it's a new feature, it's still disabled by default.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1782253
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 47adc2bb08)
Currently, the command executed is wrong, eg:
```
cmd:
- podman
- exec
- ceph-mon-controller-0
- ceph
- --cluster
- ceph
- osd
- pool
- create
- volumes
- '32'
- '32'
- replicated_rule
- '1'
delta: '0:00:01.625525'
end: '2020-02-27 16:41:05.232705'
item:
```
From documentation, the osd pool creation command is :
```
ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \
[crush-rule-name] [expected-num-objects]
ceph osd pool create {pool-name} {pg-num} {pgp-num} erasure \
[erasure-code-profile] [crush-rule-name] [expected_num_objects]
```
it means we pass '1' (from item.type) as value for
`expected_num_objects` by default which is very likely not what we want.
Also, this commit modifies the default value when no `rule_name` is set
to use the existing variable `osd_pool_default_crush_rule`
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808495
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bf1f125d71)
Make it so that more than one realm, zonegroup,
or zone can be created during a run of the rgw
multisite ansible playbooks.
The rgw hosts now need to be grouped into zones
and realms in the inventory.
.yml files need to be created in group_vars
for the realms and zones. Sample yaml files
are available.
Also remove multsite destroy playbook
and add --cluster before radosgw-admin commands
remove manually added rgw_zone_endpoints var
and have ceph-ansible automatically add the
correct endpoints of all the rgws in a rgw_zone
from the information provided in that rgws hostvars.
Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 71f55bd54d)
We should add retry/delay to check the presence of the rbdmirror daemon
in the cluster status because the status takes some time to be updated.
Also the metadata.hostname isn't a good key to check because it doesn't
reflect the ansible_hostname fact. We should use metadata.id instead.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d1316ce77b)
This playbook was using mds systemd condition.
Also a command task was using pipeline which is not allowed.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a664159061)
The shrink scenarios don't need the docker variables (except for OSD).
Removing pytest for shrink-mgr.
Adding environment variables for xxx_to_kill ansible variable.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2f4413f5ce)
The ceph-facts are running on localhost so if this node is using a
different OS/release that the ceph node we can have a mismatch between
docker/podman container binary.
This commit also reduces the scope of the ceph-facts role because we only
need the container_binary tasks.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 08ac2e3034)
It looks like that the service module doesn't support wildcard anymore
for stopping/disabling multiple services.
fatal: [rgw0]: FAILED! => changed=false
msg: 'This module does not currently support using glob patterns,
found ''*'' in service name: ceph-radosgw@*'
...ignoring
Instead we should iterate over the rgw_instances list.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9d3b49293d)
When using the firewalld ansible module we need to be sure that the
python bindings are installed.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 90b1fc8fe9)
Since ansible 2.9 the firewalld task could not be used with service and
source in the same time anymore.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 45fb9241c0)
It doesn't make sense to start validating configuration if the ansible
version isn't the good one.
This commit moves the check_system task as the first task in the
ceph-validate role.
The ansible version test tasks are moved at the top of this file.
Also moving the iscsi kernel tests from check_system to check_iscsi
file.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1a77dd7e91)
When running environment with OSDs having ID with more than 2 digits,
some tasks don't match the system units and therefore, playbook can fail.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1805643
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a084a2a347)
We need to call ceph-facts only for setting `container_binary`.
Since this task has been isolated we can use `tasks_from` to only execute the
needed task.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 55970b18f1)
Instead of running shring-osd tasks on localhost and delegating most of
them to the first monitor, run all of them on the first monitor
directly.
This has the added advantage of becoming root on the monitor only, not
on localhost.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 8b3df4e418)
This change introduces a new set of tasks to configure the
ceph dashboard backend and listen just on the mgr related
subnet (and not on '*'). For the same reason the proper
server address is added in both prometheus and alertmanger
systemd units.
This patch also adds the "dashboard_frontend_vip" parameter
to make sure we're able to support the HA model when multiple
grafana instances are deployed.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792230
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
(cherry picked from commit 15ed9eebf1)