Since 15ed9ee the ceph-mgr daemon binds on the IP address on the public
network instead of binding on all addresses.
This commit updates the testinfra code to reflect that change.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Unregister marks generates warnings like:
PytestUnknownMarkWarning: Unknown pytest.mark.docker - is this a typo?
You can register custom marks to avoid this warning
https://docs.pytest.org/en/latest/mark.html
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This commit allows one to set the role for the admin user as read-only.
This can be controlled via the dashboard_admin_user_ro variable but the
default value is false for backward compatibility.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1810176
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We don't use the registry name when using the community dashboard
container images (grafana, prometheus, alertmanager & node exporter).
This commit adds the docker.io registry explicitly in the default
dashboard container image name values.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Since 8e8aa73 we're using grafana 5.4.3 in RHCS 4.1 via [1].
We should also update the grafana container tag from docker.io when
using the community release.
[1] registry.redhat.io/rhceph/rhceph-4-dashboard-rhel8:4
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We are planning to release updated grafana image for ceph dashboard in
RHCS 4.1. We need to update the rhcs edut to point to the new image
then.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786107
Signed-off-by: Boris Ranto <branto@redhat.com>
This option has been deprecated (As of 0.51).
By the way, ceph-ansible already sets the
auth_{service,client,cluster}_required variables.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623586
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When using the radosgw multi instances configuration then the firewall
rules aren't adapted to that setup.
We only open the port according to the radosgw_frontend_port variable
so only the first radosgw instance port will be opened in the firewall
configuration.
We should instead iterate over the rgw_instances list.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Since [1] we can't use osd pool without replicas (size: 1) by default.
We now need to set the mon_allow_pool_size_one flag to true in the ceph
configuration and add the --yes-i-really-mean-it flag to the osd pool
set size cli.
[1] https://github.com/ceph/ceph/commit/21508bd
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This commit make that task retrying 5 times to start the service
firewalld to avoid failure like following:
```
TASK [ceph-infra : start firewalld] ********************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-centos-container-purge/roles/ceph-infra/tasks/configure_firewall.yml:22
Monday 09 March 2020 08:58:48 +0000 (0:00:00.963) 0:02:16.457 **********
fatal: [osd4]: FAILED! => changed=false
msg: |-
Unable to enable service firewalld: Created symlink from /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service to /usr/lib/systemd/system/firewalld.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/firewalld.service to /usr/lib/systemd/system/firewalld.service.
Failed to execute operation: Connection reset by peer
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Instead of volumes as a static string the openstack_cinder_pool.name
variable should be used as with the other keys.
Signed-off-by: Christian Berendt <berendt@betacloud-solutions.de>
In the Demo part of the Document, for both Vagrant and Bare metal the description was mentioned as "Deployment from scratch on bare metal machines".
Changed "bare metal" to "vagrant" for Vagrant section
Signed-off-by: Nizamudeen <nia@redhat.com>
Sometimes, these task can timeout for some reason.
Adding these retries can help to avoid unexcepted failures.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We only disable the ceph-osd services but not the ceph-volume lvm
services during the filestore to bluestore migration.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
There's no need to run this part of the role when upgrading clients
node. Let's skip it when rolling_update.yml is being run.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Make it so that more than one realm, zonegroup,
or zone can be created during a run of the rgw
multisite ansible playbooks.
The rgw hosts now need to be grouped into zones
and realms in the inventory.
.yml files need to be created in group_vars
for the realms and zones. Sample yaml files
are available.
Also remove multsite destroy playbook
and add --cluster before radosgw-admin commands
remove manually added rgw_zone_endpoints var
and have ceph-ansible automatically add the
correct endpoints of all the rgws in a rgw_zone
from the information provided in that rgws hostvars.
Signed-off-by: Ali Maredia <amaredia@redhat.com>
This commit changes the value passed for the attribute 'rule_name' in
openstack_pools definition. It doesn't make sense to have emptry string
as passed value here.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit adds condition in order to not try to customize pools size
when its type is erasure.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit makes the CI testing an OSD pool erasure creation due to the
recent refact of the OSD pool creation tasks in the playbook.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit adds the pg autoscaler support.
The structure for pool definition has now two additional attributes
`pg_autoscale_mode` and `target_size_ratio`, eg:
```
test:
name: "test"
pg_num: "{{ osd_pool_default_pg_num }}"
pgp_num: "{{ osd_pool_default_pg_num }}"
rule_name: "replicated_rule"
application: "rbd"
type: 1
erasure_profile: ""
expected_num_objects: ""
size: "{{ osd_pool_default_size }}"
min_size: "{{ osd_pool_default_min_size }}"
pg_autoscale_mode: False
target_size_ratio": 0.1
```
when `pg_autoscale_mode` is `True` user has to set a decent value in
`target_size_ratio`.
Given that it's a new feature, it's still disabled by default.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1782253
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Currently, the command executed is wrong, eg:
```
cmd:
- podman
- exec
- ceph-mon-controller-0
- ceph
- --cluster
- ceph
- osd
- pool
- create
- volumes
- '32'
- '32'
- replicated_rule
- '1'
delta: '0:00:01.625525'
end: '2020-02-27 16:41:05.232705'
item:
```
From documentation, the osd pool creation command is :
```
ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \
[crush-rule-name] [expected-num-objects]
ceph osd pool create {pool-name} {pg-num} {pgp-num} erasure \
[erasure-code-profile] [crush-rule-name] [expected_num_objects]
```
it means we pass '1' (from item.type) as value for
`expected_num_objects` by default which is very likely not what we want.
Also, this commit modifies the default value when no `rule_name` is set
to use the existing variable `osd_pool_default_crush_rule`
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808495
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Looks like we are still seeing issue [1].
Let's increase this value to unlock the CI (however, it still needs to
be investigated).
Typical error (see [1] for further details) :
```
[root@osd2 ~]# ceph-volume --cluster ceph lvm batch --filestore --yes --journal-size '2048' /dev/sda /dev/sdb --journal-devices /dev/sdc
Running command: /sbin/vgcreate --force --yes ceph-journals-817ef90b-77ac-4f52-b8a9-30893849fb78 /dev/sdc
stdout: Physical volume "/dev/sdc" successfully created.
stdout: Volume group "ceph-journals-817ef90b-77ac-4f52-b8a9-30893849fb78" successfully created
--> Refusing to continue with configured size for journal
--> RuntimeError: journal sizes must be larger than 2GB, detected: 1024.00 MB
```
[1] https://tracker.ceph.com/issues/41374
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We should add retry/delay to check the presence of the rbdmirror daemon
in the cluster status because the status takes some time to be updated.
Also the metadata.hostname isn't a good key to check because it doesn't
reflect the ansible_hostname fact. We should use metadata.id instead.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This playbook was using mds systemd condition.
Also a command task was using pipeline which is not allowed.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The shrink scenarios don't need the docker variables (except for OSD).
Removing pytest for shrink-mgr.
Adding environment variables for xxx_to_kill ansible variable.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The ceph-facts are running on localhost so if this node is using a
different OS/release that the ceph node we can have a mismatch between
docker/podman container binary.
This commit also reduces the scope of the ceph-facts role because we only
need the container_binary tasks.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
If the user provides manually the key value for a specific keyring then
there's not valation on the content which could lead to unexpected
failures in the ceph_key module.
Closes: #5104
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
It looks like that the service module doesn't support wildcard anymore
for stopping/disabling multiple services.
fatal: [rgw0]: FAILED! => changed=false
msg: 'This module does not currently support using glob patterns,
found ''*'' in service name: ceph-radosgw@*'
...ignoring
Instead we should iterate over the rgw_instances list.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When using the firewalld ansible module we need to be sure that the
python bindings are installed.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Since ansible 2.9 the firewalld task could not be used with service and
source in the same time anymore.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When running environment with OSDs having ID with more than 2 digits,
some tasks don't match the system units and therefore, playbook can fail.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1805643
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>