Temporarily disable iscsigw testing for containerized deployments
because it's broken upstream on ceph@master.
non-containerized deployments use stable build for iscsigw to get around
this issue.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since it is broken at the moment with dev repos, let's test against
stable builds so the CI is unlocked.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Due to [1], ceph-volume has now a dependency on pyyaml but it's not
installed by default via the package dependency.
This patch only add the required package on non containerized
deployment and as temporary workaround for the CI.
[1] https://tracker.ceph.com/issues/46759
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
when rerunning lvm_setup.yml on existing cluster with OSDs already
deployed, it fails like following:
```
fatal: [osd0]: FAILED! => changed=false
msg: Sorry, no shrinking of data-lv2 to 0 permitted.
```
because we are asking `lvol` module to create a volume on an empty VG
with size extents = `100%FREE`.
The default behavior of `lvol` is to shrink the volume if the LV's current
size is greater than the requested size.
Given the requested size is calculated like this:
`size_requested = size_percent * this_vg['free'] / 100`
in our case, it is similar to:
`size_requested = 100 * 0 / 100` which basically means `0`
So the current LV size is well greater than the requested size which
leads the module to attempt to shrink it to 0 which isn't obviously now
allowed.
Adding `shrink: false` to the module calls fixes this issue.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit introduces a new role `ceph-crash` in order to deploy
everything needed for the ceph-crash daemon.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This adds a new playbook for deploying ceph via cephadm.
This also adds a new dedicated tox file for CI purpose.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Since we only have one scenario since nautilus then we can just move
the container start command from ceph-osd-run.sh to the systemd unit
service.
As a result, the ceph-osd-run.sh.j2 template and the
ceph_osd_docker_run_script_path variable are removed.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This fixes a long standing fail in ceph-volumes lvm test suite.
Otherwise the default behaviour should not change.
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
setting attributes with empty string is a bad user input.
Also, removing `rule_name` attribute when creating a code erasure pool.
(this rule isnt intended for code erasure pool type).
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The dashboard nodes (alertmanager, grafana, node-exporter, and prometheus)
were not manage during the docker to podman migration.
This adds the systemd container template of those services to a dedicated
file (systemd.yml) in order to include it in the docker2podman playbook.
This also adds the dashboard container images pull from docker to podman.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1829389
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The CentOS 7 distribution could still be used be deploying ceph if
- it's a containerized deployment
- it's a non containerized deployment without the dashboard (due to
missing python3 libraries).
The ceph_stable_redhat_distro variable has been remove because we can
rely on the ansible_distribution_major_version fact instead.
The copr el8 repository configuration is only applied for CentOS 8.
The ceph-mgr-dashboard package is only installed when the
dashboard_enabled variable is set to true.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Since 15ed9ee the ceph-mgr daemon binds on the IP address on the public
network instead of binding on all addresses.
This commit updates the testinfra code to reflect that change.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This commit allows one to set the role for the admin user as read-only.
This can be controlled via the dashboard_admin_user_ro variable but the
default value is false for backward compatibility.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1810176
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Since [1] we can't use osd pool without replicas (size: 1) by default.
We now need to set the mon_allow_pool_size_one flag to true in the ceph
configuration and add the --yes-i-really-mean-it flag to the osd pool
set size cli.
[1] https://github.com/ceph/ceph/commit/21508bd
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Make it so that more than one realm, zonegroup,
or zone can be created during a run of the rgw
multisite ansible playbooks.
The rgw hosts now need to be grouped into zones
and realms in the inventory.
.yml files need to be created in group_vars
for the realms and zones. Sample yaml files
are available.
Also remove multsite destroy playbook
and add --cluster before radosgw-admin commands
remove manually added rgw_zone_endpoints var
and have ceph-ansible automatically add the
correct endpoints of all the rgws in a rgw_zone
from the information provided in that rgws hostvars.
Signed-off-by: Ali Maredia <amaredia@redhat.com>
This commit changes the value passed for the attribute 'rule_name' in
openstack_pools definition. It doesn't make sense to have emptry string
as passed value here.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit makes the CI testing an OSD pool erasure creation due to the
recent refact of the OSD pool creation tasks in the playbook.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Looks like we are still seeing issue [1].
Let's increase this value to unlock the CI (however, it still needs to
be investigated).
Typical error (see [1] for further details) :
```
[root@osd2 ~]# ceph-volume --cluster ceph lvm batch --filestore --yes --journal-size '2048' /dev/sda /dev/sdb --journal-devices /dev/sdc
Running command: /sbin/vgcreate --force --yes ceph-journals-817ef90b-77ac-4f52-b8a9-30893849fb78 /dev/sdc
stdout: Physical volume "/dev/sdc" successfully created.
stdout: Volume group "ceph-journals-817ef90b-77ac-4f52-b8a9-30893849fb78" successfully created
--> Refusing to continue with configured size for journal
--> RuntimeError: journal sizes must be larger than 2GB, detected: 1024.00 MB
```
[1] https://tracker.ceph.com/issues/41374
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since [1] if a rgw user already exists then the radosgw-admin user create
command will return an error instead of modifying the current user.
We were already doing separated tasks for create and get operation but
only for multisite configuration but it's not enough.
Instead we should do the get task first and depending on the result
execute the create.
This commit also adds missing run_once and delegate_to statement.
[1] https://github.com/ceph/ceph/commit/269e9b9
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When osd_auto_discovery is set then we need to refresh the
ansible_devices fact between after the filestore OSD purge
otherwise the devices fact won't be populated.
Also remove the gpt header on ceph_disk_osds_devices because
the devices is empty at this point for osd_auto_discovery.
Adding the bool filter when needed.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We still need --destroy when using a raw device otherwise we won't be
able to recreate the lvm stack on that device with bluestore.
Running command: /usr/sbin/vgcreate -s 1G --force --yes ceph-bdc67a84-894a-4687-b43f-bcd76317580a /dev/sdd
stderr: Physical volume '/dev/sdd' is already in volume group 'ceph-b7801d50-e827-4857-95ec-3291ad6f0151'
Unable to add physical volume '/dev/sdd' to volume group 'ceph-b7801d50-e827-4857-95ec-3291ad6f0151'
/dev/sdd: physical volume not initialized.
--> Was unable to complete a new OSD, will rollback changes
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792227
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The nobarrier mount flag doesn't exist anymoer on XFS in the EL 8
kernel. That's why the task wasn't working on those systems.
We can still use the other options instead of skipping the task.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Some docker commands were hardcoded in tests playbooks and some
conditions were not taking care of the containerized_deployment
variable but only the atomic fact.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This commit adds a new job in order to test the
filestore-to-bluestore.yml infrastructure playbook.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Having max_mds value equals to the number of mds nodes generates a
warning in the ceph cluster status:
cluster:
id: 6d3e49a4-ab4d-4e03-a7d6-58913b8ec00a'
health: HEALTH_WARN'
insufficient standby MDS daemons available'
(...)
services:
mds: cephfs:3 {0=mds1=up:active,1=mds0=up:active,2=mds2=up:active}'
Let's use 2 active and 1 standby mds.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The current ceph cluster health is in warning state:
health: HEALTH_WARN
13 pool(s) have no replicas configured
2 pool(s) have non-power-of-two pg_num
Because we're using only 1 replica then we need to disable the redundancy
check.
The pool pg num should be a power of two number (like 16).
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This adds device class support to crush rules when using the class key
in the rule dict via the create-replicated sub command.
If the class key isn't specified then we use the create-simple sub
command for backward compatibility.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This commit adds a playbook to be played before we run purge playbook,
it first creates an rbd image then map an rbd device on client0 so the
purge playbook will try to unmap it.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit removes the backslash in allow command parameter, this was
needed before the ceph_key module integration.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
It doesn't make sense to test the old 3.0.x container images with
nautilus+ ceph releases.
Also disable the dashboard deployment and switch to bluestore backend.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The commit replaces the pv/vg/lv commands used with the ansible command
module by the lvg and lvol modules.
This also fixes the size of the second data LV because we were only using
50% of the remaining space instead of 100%.
With a 50G device, the result was:
- data-lv1 was 25G
- data-lv2 was 12.5G
Instead of:
- data-lv1 was 25G
- data-lv2 was 25G
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The secondary vagrant variables didn't have the grafana vm variable
set which create an vagrant error.
There was an error loading a Vagrantfile. The file being loaded
and the error message are shown below. This is usually caused by
an invalid or undefined variable.
This patch also changes the ssh-extra-args parameter to ssh-common-args
to get the same values for ssh/sftp/scp. Otherwise we can see warnings
from ansible and some tasks are failing.
[WARNING]: sftp transfer mechanism failed on [mon0]. Use ANSIBLE_DEBUG=1
to see detailed information
It also updates the ssh-common-args value for the rgw-multisite scenario
to reflect the ANSIBLE_SSH_ARGS environment variable value.
Finally changing the IP addresses due to the Vagrant refact done in the
commit 778c51a
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This was added for debugging purpose.
It's generating very large log output, let's remove this now.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since the pg_autoscaler has been enabled recently in ceph, this check
should stick to validate the requested pools are well created only.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We don't use multiple grafana nodes for the moment on the others
scenarios and I don't think this is supposed to be working.
We can often see failure on grafana on that scenario.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This reverts commit 83940e624b.
Because nfs-ganesha@master (2.9-dev) build has been fixed by [1] then
we can test nfs-ganesha in the CI for master/octopus.
[1] https://github.com/ceph/ceph-build/pull/1346
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The shrink_rgw scenario has been merge just after the PR about enable
ceph dashboard by default.
So right now the shrink_rgw scenrio doesn't have nodes in the grafana
group and fails.
We just need to set dashboard_enabled to false.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
there's no dedicated nodes for mgr, let's use monitor nodes.
The mgr0 instance spawned isn't used, so if this node is part of the
inventory for this scenario, testinfra will complain because there's no
ceph.conf on this node.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add a new functional test that deploys a Ceph cluster with three nodes
for MON, OSD and RGW and then runs shrink-rgw.yml to test it.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
nfs-ganesha repositories @ dev are broken, this commit disables the
nfs-ganesha deployment so the CI isn't stuck.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The Vagrant dashboard scenario creates a dedicated grafana node but
was not use in the ansible inventory.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Add a new functional test that deploys Ceph cluster with three nodes for
MON, OSD and RBD Mirror and, then, runs shrink-rbdmirror.yml to test it.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Add a new functional test that deploys a Ceph cluster with three nodes
for MON, OSD and MGR and then runs shrink-mgr.yml to test it.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Add a new functional test that deploys a Ceph cluster with three nodes
for MON, OSD and MDS and then runs shrink-mds.yml to test it.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
gateway_ip_list is depreciated and is only used when using the old
ceph-iscsi-config/cli packages that are no longer being developed
(GH repos are archived). Because ceph-iscsi-config/cli is no longer
being worked on, this modifies the tests to stress the ceph-iscsi
based installs.
Signed-off-by: Mike Christie <mchristi@redhat.com>
- clean some leftover.
- move nfs_ganesha_[stable|dev] in group_vars so dev_setup.yml can modify them.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The ceph restapi configuration was only available until Luminous
release so we don't need those leftovers for nautilus+.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
CI is facing issues where docker pull reach the timeout, let's increase
this to avoid CI failures.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The definitions of cephfs pools should match openstack pools.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Co-Authored-by: Simone Caronni <simone.caronni@teralytics.net>
if we don't assign the rbd application tag on this pool,
the cluster will get `HEALTH_WARN` state like following:
```
HEALTH_WARN application not enabled on 1 pool(s)
POOL_APP_NOT_ENABLED application not enabled on 1 pool(s)
application not enabled on pool 'rbd'
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Few fixes on systemd unit templates for node_exporter and
alertmanager container parameters.
Added the ability to use a dedicated instance to deploy the
dashboard components (prometheus and grafana).
This commit also introduces the grafana_group_name variable
to refer grafana group and keep consistency with the other
groups.
During the integration with TripleO some grafana/prometheus
template variables resulted undefined. This commit adds the
ability to check if the group exist and create, accordingly,
different job groups in prometheus template.
Signed-off-by: fmount <fpantano@redhat.com>
By running ceph-ansible there are a lot ``[DEPRECATION WARNING]`` like these:
```
[DEPRECATION WARNING]: evaluating containerized_deployment as a bare variable,
this behaviour will go away and you might need to add |bool to the expression
in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This
feature will be removed in version 2.12. Deprecation warnings can be disabled
by setting deprecation_warnings=False in ansible.cfg.
```
Now appended ``| bool`` on a lot of the affected variables.
Sometimes the coding style from ``variable|bool`` changed to ``variable | bool`` *(with spaces at the pipe)*.
Closes: #4022
Signed-off-by: L3D <l3d@c3woc.de>
the rhel8 image used is an outdated beta version, it is not worth it to
maintain this image upstream, since it's possible to test podman with a
newer version of centos/atomic-host image.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
In order to support ansible 2.8 with testinfra we need to use the
latest release (3.0.x).
Adding ssh-config option to py.test.
Also bumping the pytest and xdist version.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
c907ec41ae introduced a typo.
This commit fixes it.
```
[WARNING]: While constructing a mapping from /home/guits/ceph-ansible/tests/functional/dev_setup.yml, line 21, column 9, found a duplicate dict key (replace).
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The current lvm_osds only tests filestore on one OSD node.
We also have bs_lvm_osds to test bluestore and encryption.
Let's use only one scenario to test filestore/bluestore and with or
without dmcrypt on four OSD nodes.
Also use validate_dmcrypt_bool_value instead of types.boolean on
dmcrypt validation via notario.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Add a playbook that deploys manager on a new node and adds that node to
the already deployed Ceph cluster.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Add a tox scenario that adds a new RGW node as a part of already
deployed Ceph cluster and deploys RGW there.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Add a tox scenario that adds a new RBD mirror node as a part of already
deployed Ceph cluster and deploys RBD mirror there.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Instead of creating a dedicated test and using the same testinfra
module we can group them into a single test to avoid multiple ansible
connections and testinfra module execution.
This patch also adds parametrize pytest decorator when possible.
Finally fixing some flake minor issue.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This test deploys a luminous cluster with ceph-disk created osds
and then upgrades to nautilus and migrates those osds to ceph-volume.
The nodes are then rebooted and cluster state verified.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Since there's only only scenario available we don't need lvm_scenario
and no_lvm_scenario.
Also add missing assert for ceph-volume tests.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
In the CI jobs we can change the mount options of the main partition
to avoid extra operations on disk.
Adding jmespath to tests/requirements.txt due to the json_query
filter usage.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Since nautilus and msgr2 the monitors also bind on port 3300 in
addition of 6789.
This patch updates test_mons to reflect that change.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>