Add a new functional test that deploys a Ceph cluster with three nodes
for MON, OSD and MDS and then runs shrink-mds.yml to test it.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 324b3b4a6c)
# Conflicts:
# tox.ini
gateway_ip_list is depreciated and is only used when using the old
ceph-iscsi-config/cli packages that are no longer being developed
(GH repos are archived). Because ceph-iscsi-config/cli is no longer
being worked on, this modifies the tests to stress the ceph-iscsi
based installs.
Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit 1e64efc2f0)
The gateway_ip_list is not used in container setups, so drop it
for that case.
Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit b7b2213be1)
- clean some leftover.
- move nfs_ganesha_[stable|dev] in group_vars so dev_setup.yml can modify them.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 45041f52fd)
Add back the nfs-ganesha deployment testing which was removed because of
broken dependencies.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 013ae62177)
this commit bring back the nfs-ganesha testing in containerized
deployment.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9201674b5b)
The ceph restapi configuration was only available until Luminous
release so we don't need those leftovers for nautilus+.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit da8b7ab7fb)
The definitions of cephfs pools should match openstack pools.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Co-Authored-by: Simone Caronni <simone.caronni@teralytics.net>
(cherry picked from commit 67071c3169)
CI is facing issues where docker pull reach the timeout, let's increase
this to avoid CI failures.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1019e3b3dc)
if we don't assign the rbd application tag on this pool,
the cluster will get `HEALTH_WARN` state like following:
```
HEALTH_WARN application not enabled on 1 pool(s)
POOL_APP_NOT_ENABLED application not enabled on 1 pool(s)
application not enabled on pool 'rbd'
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4cf17a6fdd)
Few fixes on systemd unit templates for node_exporter and
alertmanager container parameters.
Added the ability to use a dedicated instance to deploy the
dashboard components (prometheus and grafana).
This commit also introduces the grafana_group_name variable
to refer grafana group and keep consistency with the other
groups.
During the integration with TripleO some grafana/prometheus
template variables resulted undefined. This commit adds the
ability to check if the group exist and create, accordingly,
different job groups in prometheus template.
Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit 069076bbfd)
By running ceph-ansible there are a lot ``[DEPRECATION WARNING]`` like these:
```
[DEPRECATION WARNING]: evaluating containerized_deployment as a bare variable,
this behaviour will go away and you might need to add |bool to the expression
in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This
feature will be removed in version 2.12. Deprecation warnings can be disabled
by setting deprecation_warnings=False in ansible.cfg.
```
Now appended ``| bool`` on a lot of the affected variables.
Sometimes the coding style from ``variable|bool`` changed to ``variable | bool`` *(with spaces at the pipe)*.
Closes: #4022
Signed-off-by: L3D <l3d@c3woc.de>
(cherry picked from commit ab54fe20ec)
the rhel8 image used is an outdated beta version, it is not worth it to
maintain this image upstream, since it's possible to test podman with a
newer version of centos/atomic-host image.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a78fb209b1)
In order to support ansible 2.8 with testinfra we need to use the
latest release (3.0.x).
Adding ssh-config option to py.test.
Also bumping the pytest and xdist version.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit de147469d7)
This commit add a new scenario to test the dashboard deployment via
ceph-ansible.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 17634fc3df)
c907ec41ae introduced a typo.
This commit fixes it.
```
[WARNING]: While constructing a mapping from /home/guits/ceph-ansible/tests/functional/dev_setup.yml, line 21, column 9, found a duplicate dict key (replace).
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2798774e96)
The current lvm_osds only tests filestore on one OSD node.
We also have bs_lvm_osds to test bluestore and encryption.
Let's use only one scenario to test filestore/bluestore and with or
without dmcrypt on four OSD nodes.
Also use validate_dmcrypt_bool_value instead of types.boolean on
dmcrypt validation via notario.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 52b9f3fb28)
Add a playbook that deploys manager on a new node and adds that node to
the already deployed Ceph cluster.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit d2cfd8b780)
Add a tox scenario that adds a new RGW node as a part of already
deployed Ceph cluster and deploys RGW there.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit f201222447)
Conflicts:
tox.ini
replaced "dev" and "nautilus" during cherry-pick.
Add a tox scenario that adds a new RBD mirror node as a part of already
deployed Ceph cluster and deploys RBD mirror there.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 221b2b4988)
Conflicts:
tox.ini
"dev" was to replaced by "nautilus" in "envlist"
Keywords requiring only one item shouldn't express it by creating a
list with single item.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 739a662c80)
Conflicts:
roles/ceph-mon/tasks/ceph_keys.yml
roles/ceph-validate/tasks/check_devices.yml
Instead of creating a dedicated test and using the same testinfra
module we can group them into a single test to avoid multiple ansible
connections and testinfra module execution.
This patch also adds parametrize pytest decorator when possible.
Finally fixing some flake minor issue.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 564ec9c992)
This test deploys a luminous cluster with ceph-disk created osds
and then upgrades to nautilus and migrates those osds to ceph-volume.
The nodes are then rebooted and cluster state verified.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 399a821439)
Since nautilus and msgr2 the monitors also bind on port 3300 in
addition of 6789.
This patch updates test_mons to reflect that change.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c84a74592a)
Since there's only only scenario available we don't need lvm_scenario
and no_lvm_scenario.
Also add missing assert for ceph-volume tests.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f601549a8a)
Add a playbook that deploys a new monitor on a new node, adds that node
to the Ceph cluster and the monitor to the quorum and updates the ceph
configuration file on OSD nodes.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit d5967af7fb)
this test is related to ceph-disk which is dropped as of stable-4.0
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 83e84c6a4a)
As of stable-4.0, the only valid scenario is `lvm`.
Thus, this makes this variable useless.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4d35e9eeed)
It's usefull to have logs in debug mode enabled in order to have
more information for developpers.
Also reindent to json file.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d25af1b872)
We don't need to have multiple ceph-override.json copies. We
currently already have symlink to all_daemons/ceph-override.json so
we can do it for all scenarios.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a19054be18)
ceph-volume didn't work when the devices where passed by path.
Since it now support it, let's allow this feature in ceph-ansible
Closes: #3812
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7e0adca7a4)
Add a tox scenario that adds an new MDS node as a part of already
deployed Ceph cluster and deploys MDS there.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit c0dfa9b61a)
On stable-4.0 branch we don't want to use dev setup but stable
release (nautilus).
Also update the container image tag to reflect this change.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
os.path.join adds the separator (i.e. '/') between the provided path
components only if needed. Providing a single path component doesn't
lead to any checks.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
otherwise a bunch of jobs will fail like following:
```
[WARNING]: Unable to parse /home/jenkins-build/build/workspace/ceph-ansible-nightly-luminous-ubuntu-container-stable-3.2-bluestore_lvm_osds/tests/functional/bs-lvm-osds/container/hosts-ubuntu as an inventory source
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
`test_mon_host_line_has_correct_value()` will cover this test in
anycase. It doesn't worth to have a dedicated test for this.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
nfs-ganesha v2.5 and 2.6 have hit EOL. Install nfs-ganesha v2.7
stable that is currently being maintained.
Signed-off-by: Ramana Raja <rraja@redhat.com>
using `!` mark in tox.ini doesn't work on comma separated list.
The idea here is to skip all containerized scenario in dev_setup.yml and
use the `!` for the update scenario.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
fix the wrong path used in various rgw testinfra tests.
set `1` as default value for `radosgw_num_instances`: if
`ansible_vars.get(radosgw_num_instances)` returns `None`, we can assume
there's only 1 instance since it's the default value in ceph-defaults.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
since we now set this variable in inventory host, the regexp needs to be
updated, the assignment operator is `=`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit reorganizes the testing directory layout.
The idea is to have more consistency with the names of scenario and
their corresponding path, eg: non-container vs. container: each scenario
has a subdirectory for container deployment.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
this commit refacts the way the environment are named by adding a factor
`{non_container,container}`. This will avoid a lot of duplicate
definition in tox.ini and bring kind of consistency.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
With this, we could have multiple rgw instances on a single host
with a single run, don't have to use rgw-standalone.yml which does not
seems able to bind ports separately.
If you want to have multiple rgw instances, just change 'radosgw_instances'
to the number you want, which defaults to 1.
Not compatible with Multi-Site yet.
Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>
to avoid duplicating code in `site.yml.sample`, `site-docker.yml.sample`
and `setup.yml`, let's isolate this part of the code and simply include
it each time we need it.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since msgr2 changes got merged, the OSDs in master (to be nautilus) will
double the amount of ports they listen to.
Signed-off-by: Alfredo Deza <adeza@redhat.com>
Based on https://github.com/ceph/ceph-container/pull/1269 and given
there are no stable packages and reliable repository, we disable nfs
ganesha temporarly.
Signed-off-by: Sébastien Han <seb@redhat.com>
Since we now collocated mgrs and mons on the same machine we have to
remove the mgrs section, they are not needed anymore.
Signed-off-by: Sébastien Han <seb@redhat.com>
This will speed up the deployment and also deploy mon and mgr collocated
just as recommended.
This won't prevent you of adding more and dedicaded machines for mgr if
needed.
Signed-off-by: Sébastien Han <seb@redhat.com>
bring the recent refact about `osd_pool_default_pg_num` and
`osd_pool_default_size` into podman scenario as well.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since we are now testing on docker and podman our functionnal tests must
reflect that. So now, if we detect the podman binary we will use it,
otherwise we default to docker.
Signed-off-by: Sébastien Han <seb@redhat.com>
We run an initial deployment with `osd_pool_default_size: 1` in
`ceph_conf_overrides`.
When re-running the playbook to test idempotency and handlers, we reset
`ceph_conf_overrides`, we must append a new value instead of just
overwritting it, otherwise, this can lead to error in the CI.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
`osd_pool_default_pg_num` parameter is set in `ceph-mon`.
When using ceph-ansible with `--limit` on a specifc group of nodes, it
will fail when trying to access this variables since it wouldn't be
defined.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1518696
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
setting this setting to 1 makes the CI covering the related code in the
playbook without breaking the upgrade scenarios.
Those scenarios were broken because there is a check `TASK [waiting for
clean pgs...]` in rolling_update.yml, since the pool size for
`cephfs_metadata` and `cephfs_data` are updated to `2` in
`ceph-override.json` and there is not enough osd to honor this size,
some PGs are degraded and make the mentioned check failing.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
** configuration seems to be for filestore:
[ERROR]: [ceph-osd0] Validation failed for variable: lvm_volumes
** Removing `radosgw_interface: eth1` to resolve:
The task includes an option with an undefined variable. The error was:
'ansible.vars.hostvars.HostVarsVars object' has no attribute
u'ansible_eth1'
The error appears to have been in
'/home/nwatkins/src/ceph-ansible/roles/ceph-defaults/tasks/set_radosgw_address.yml':
line 21, column 5, but may be elsewhere in the file depending on the
exact syntax problem.
The offending line appears to be:
- name: set_fact _radosgw_address to radosgw_interface - ipv4
^ here
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Adding more memory to VMs for rgw_multisite scenarios could avoid this error
I have recently hit in the CI:
(It is worth it to set 1024Mb since there is only 2 nodes in those
scenarios.)
```
fatal: [osd0]: FAILED! => {
"changed": false,
"cmd": [
"docker",
"run",
"--rm",
"--entrypoint",
"/usr/bin/ceph",
"docker.io/ceph/daemon:latest-luminous",
"--version"
],
"delta": "0:00:04.799084",
"end": "2018-10-29 17:10:39.136602",
"rc": 1,
"start": "2018-10-29 17:10:34.337518"
}
STDERR:
Traceback (most recent call last):
File "/usr/bin/ceph", line 125, in <module>
import rados
ImportError: libceph-common.so.0: cannot map zero-fill pages: Cannot allocate memory
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add a playbook that will upload a file on the master then try to get
info from the secondary node, this way we can check if the replication
is ok.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This will setup 2 cluster with rgw multisite enabled.
First cluster will act as the 'master', the 2nd will be the secondary
one.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
ceph-disk is now deprecated in ceph-ansible so let's convert all the ci
tests to use lvm instead of ceph-disk.
Signed-off-by: Sébastien Han <seb@redhat.com>
Since we are removing the ceph-disk test from the ci in master then
there is no need to have the functionnal tests in master anymore.
Signed-off-by: Sébastien Han <seb@redhat.com>
since we set `configure_firewall: true` in
`ceph-defaults/defaults/main.yml` there is no need to explicitly set it
in `centos7_cluster` and `docker_cluster` testing scenarios.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This approach doesn't work with all scenarios because it's comparing a
local OSD number expected to a global OSD number found in the whole
cluster.
This reverts commit b8ad35ceb9.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Let's get the osd tree from mons instead on osds.
This way we don't have to predict an OSD container name.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Three fixes:
- fix a typo in vagrant_variables that cause a networking issue for
containerized scenario.
- add containerized_deployment: true
- remove a useless block of code: the fact docker_exec_cmd is set in
ceph-defaults which is played right after.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Adding testing scenarios for day-2-operation playbook.
Steps:
- deploys a cluster,
- run testinfra,
- test idempotency,
- add a new osd node,
- run testinfra
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
As of now, we should no longer support Jewel in ceph-ansible.
The latest ceph-ansible release supporting Jewel is `stable-3.1`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>