Commit Graph

422 Commits (6f3d6967429bf08fb348e6914566d9d1b3e428f0)

Author SHA1 Message Date
Dimitri Savineau edfeb98593 tests: add mgr nodes to shrink_mon inventory
Since 306ce82 we explicitly fail when there's no mgr node preent in the
inventory.

fatal: [mon0]: FAILED! => {
    "changed": false
}

MSG:

Please add a mgr host to your inventory.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-04-02 22:02:35 +02:00
Guillaume Abrioux a0f01db800 tests: add inventory host for 4.0 upgrade job
This inventory is intended to be used in the upgrade scenario in
stable-4.0 branch.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-04 23:18:43 +01:00
Dimitri Savineau 2d2cec99fc tests: pg num should be a power of two number
This patch changes the pg_num value of the rgw pools foo and bar to be
a power of two number.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-02-17 14:52:09 -05:00
Guillaume Abrioux b7a21d94d3 tests: retry to fire up VMs on vagrant failure
Add a script to retry several times to fire up VMs to avoid vagrant
failures.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 1ecb3a9352)
2020-02-03 10:20:19 +01:00
Guillaume Abrioux 523a93b0e1 tests: add external_clients scenario
This commit adds a new 'external ceph clients' scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 641729357e)
2020-02-03 10:20:19 +01:00
Guillaume Abrioux ce7503a3a6 tests: add 'all_in_one' scenario
Add new scenario 'all_in_one' in order to catch more collocated related
issues.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3e7dbb4b16)
2020-01-31 11:26:40 +01:00
Guillaume Abrioux 01095f1f4c tests: add coverage on purge playbook
This commit adds a playbook to be played before we run purge playbook,
it first creates an rbd image then map an rbd device on client0 so the
purge playbook will try to unmap it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit db77fbda15)
2020-01-13 14:50:29 -05:00
Dimitri Savineau af57597df6 ceph-osd: add device class to crush rules
This adds device class support to crush rules when using the class key
in the rule dict via the create-replicated sub command.
If the class key isn't specified then we use the create-simple sub
command for backward compatibility.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ef2cb99f73)
2020-01-13 16:54:01 +01:00
Guillaume Abrioux 195c49eaa9 tests: add shrink-osd-legacy testing
This commit introduce back testing against ceph-disk deployed osds.

In stable-3.2 which is the most common version used at customers
(downstream pov), a bunch of OSDs are still deployed using ceph-disk.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-01-09 09:24:22 +01:00
Dimitri Savineau 193ce4f572 ceph-iscsi: add ceph-iscsi stable repositories
This commit adds the support of the ceph-iscsi stable repository when
use ceph_repository community instead of always using the devel
repositories.
We're still using the devel repositories for rtslib and tcmu-runner in
both cases (dev and community).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-01-08 17:47:52 +01:00
Dimitri Savineau c409d6e960 tests: add lvm-auto-discovery scenario
This adds the lvm-auto-discovery scenario.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-12-09 09:32:55 +01:00
Dimitri Savineau 825429658b tests: reduce max_mds from 3 to 2
Having max_mds value equals to the number of mds nodes generates a
warning in the ceph cluster status:

cluster:
id:     6d3e49a4-ab4d-4e03-a7d6-58913b8ec00a'
health: HEALTH_WARN'
        insufficient standby MDS daemons available'
(...)
services:
  mds:     cephfs:3 {0=mds1=up:active,1=mds0=up:active,2=mds2=up:active}'

Let's use 2 active and 1 standby mds.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4a6d19dae2)
2019-12-04 17:57:33 -05:00
Dimitri Savineau 52bba29a7f tests: fix the size on the second data LV
The commit replaces the pv/vg/lv commands used with the ansible command
module by the lvg and lvol modules.
This also fixes the size of the second data LV because we were only using
50% of the remaining space instead of 100%.

With a 50G device, the result was:
  - data-lv1 was 25G
  - data-lv2 was 12.5G
Instead of:
  - data-lv1 was 25G
  - data-lv2 was 25G

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2c03c6fcd3)
2019-10-18 14:49:57 -04:00
Guillaume Abrioux 9dad8fc201 tests: add multimds coverage
This commit makes the all_daemons scenario deploying 3 mds in order to
cover the multimds case.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-18 00:34:48 +02:00
Dimitri Savineau 1eea339f87 Remove validate action and notario dependency
The current ceph-validate role is using both validate action and fail
module tasks to validate the ceph configuration.
The validate action is based on the notario python library. When one of
the notario validation fails then a python stack trace is reported to the
ansible task. This output isn't understandable by users.

This patch removes the validate action and the notario depencendy. The
validation is now done with only fail ansible module.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1654790

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-15 18:05:16 +02:00
Dimitri Savineau 475d2b1557 tests: fix rgw multisite vagrant variables
The secondary vagrant variables didn't have the grafana vm variable
set which create an vagrant error.

There was an error loading a Vagrantfile. The file being loaded
and the error message are shown below. This is usually caused by
an invalid or undefined variable.

This patch also changes the ssh-extra-args parameter to ssh-common-args
to get the same values for ssh/sftp/scp. Otherwise we can see warnings
from ansible and some tasks are failing.

[WARNING]: sftp transfer mechanism failed on [mon0]. Use ANSIBLE_DEBUG=1
to see detailed information

It also updates the ssh-common-args value for the rgw-multisite scenario
to reflect the ANSIBLE_SSH_ARGS environment variable value.

Finally changing the IP addresses due to the Vagrant refact done in the
commit 778c51a

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 010158ff84)
2019-10-14 09:46:38 +02:00
Guillaume Abrioux d9f6b37ae6 tests: set gateway_ip_list dynamically
so we dont' have to hardcode this in the tests

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-04 07:39:41 +02:00
Dimitri Savineau ff8e3a5a2e tests: update dedidated mgr node all_daemons
5b29144 change the mgr node to a dedicated node instead of the first
monitor node.
But the change didn't update the switch-to-containers inventory which
cause this playbook to fail.
Also update the ubuntu inventory to have the same configuration.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-08-30 15:19:33 -04:00
Guillaume Abrioux 5b29144bbd tests: deploy mgr on a dedicated node (all_daemons scenario)
let's deploy mgr on a dedicated node.
This makes update job failing on stable-4.0 branch since there's a
mismatch between the two inventories.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-08-08 13:43:29 +02:00
Dimitri Savineau bf8bd4c0f1 tests: Update ooo-collocation scenario
The ooo-collocation scenario was still using an old container image and
doesn't match the requirement on latest stable-3.2 code. We need to use
at least the container image v3.2.5.
Also updating the OSD tests to reflect the changes introduced by the
commit bedc0ab because we don't have the OSD systemd unit script using
device name anymore.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-07-30 08:27:13 +02:00
Ramana Raja 9097f9847c Install nfs-ganesha stable v2.7
nfs-ganesha v2.5 and 2.6 have hit EOL. Install nfs-ganesha v2.7
stable that is currently being maintained.

Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit dfff89ce67)
2019-07-10 22:09:14 +02:00
Guillaume Abrioux 27aad73471 tests: add nfs-ganesha testing
This was removed because of broken repositories which made the CI
failing. That doesn't make sense anymore so adding back it

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-06-24 10:35:42 +02:00
Guillaume Abrioux 64659d2c82 iscsi: assign application (rbd) to pool 'rbd'
if we don't assign the rbd application tag on this pool,
the cluster will get `HEALTH_WARN` state like following:

```
HEALTH_WARN application not enabled on 1 pool(s)
POOL_APP_NOT_ENABLED application not enabled on 1 pool(s)
    application not enabled on pool 'rbd'
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4cf17a6fdd)
2019-06-13 14:43:25 +02:00
Dimitri Savineau 8a74928a19 tox: Refact lvm_osds scenario
The current lvm_osds only tests filestore on one OSD node.
We also have bs_lvm_osds to test bluestore and encryption.
Let's use only one scenario to test filestore/bluestore and with or
without dmcrypt on four OSD nodes.
Also use validate_dmcrypt_bool_value instead of types.boolean on
dmcrypt validation via notario.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 52b9f3fb28)
2019-05-10 11:24:32 +02:00
Guillaume Abrioux 5053f32c15 osds: allow passing devices by path
ceph-volume didn't work when the devices where passed by path.
Since it now support it, let's allow this feature in ceph-ansible

Closes: #3812

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8f2c45dfd3)
2019-05-09 14:21:43 +02:00
Dimitri Savineau f3785ef7dd tests: Add debug to ceph-override.json
It's usefull to have logs in debug mode enabled in order to have
more information for developpers.
Also reindent to json file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d25af1b872)
2019-04-11 15:38:14 +00:00
Dimitri Savineau e3e6285aa9 tests/functional: use ceph-override.json symlink
We don't need to have multiple ceph-override.json copies. We
currently already have symlink to all_daemons/ceph-override.json so
we can do it for all scenarios.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a19054be18)
2019-04-11 15:38:14 +00:00
Ali Maredia e943288cae rgw multisite: add more than 1 rgw to the master or secondary zone
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1664869

Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 37f46a8c5d)
2019-04-06 08:50:30 +00:00
Guillaume Abrioux 68a832e3c8 tests: pin pytest-xdist to 1.27.0
looks like newer version of pytest-xdist requires pytest>=4.4.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ba0a95211c)
2019-04-04 10:36:34 +00:00
Guillaume Abrioux f200f1ca87 tests: refact update scenario (stable-3.2)
refact the update scenario like it has been made in master.
(see f0e616962)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-01 16:35:24 +02:00
Guillaume Abrioux 005cb09ba9 tests: add mgr and nfs nodes in all_daemons
even not used, we need to fire up those VMs to be able to perform the
upgrade in the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-28 15:40:43 +01:00
Guillaume Abrioux 224bab0d70 tests: add mgrs section in non_container-collocation
No mgrs are deployed in this scenario, causing the testinfra jobs to
fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-05 10:49:45 +01:00
Guillaume Abrioux 36fafadc67 tests: fix collocation scenario
ceph_origin and ceph_repository are mandatory variables.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-05 10:49:45 +01:00
Guillaume Abrioux 1209fb1874 tests: pin testinfra version
As of testinfra 2.0.0, the binary name is `py.test`.

But let's pin the version to 1.19.0.
Indeed, migrating to 2.0.0 requires our current testing to be reworked a bit.
Since we don't have the bandwidth ATM for this, it's better to simply
keep testing with testinfra 1.19.0.

Note that I've replaced all `testinfra` occurences by `py.test` anyway.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b42250332a)
2019-03-04 15:48:44 +00:00
Guillaume Abrioux b7f5233d07 tests: add lvm bluestore dmcrypt support
Add coverage for container / non container lvm bluestore dmcrypt OSDs

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 207fae38d4)
2019-02-28 13:48:39 +00:00
Guillaume Abrioux 15b1f22ca3 tests: do not deploy iscsigw on ubuntu
not supported on non rhel based distribution

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-06 14:48:21 +01:00
Guillaume Abrioux 2738a945a3 tests: add inventory file
add missing inventory file for ubuntu-container-all_daemons job

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-06 14:48:21 +01:00
Noah Watkins ebd72708b1 test: add missing test dependency
[nwatkins@smash ceph-ansible]$ virtualenv env
[nwatkins@smash ceph-ansible]$ env/bin/pip install -r tests/requirements.txt
[nwatkins@smash ceph-ansible]$ env/bin/python -c "import mock"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'mock'

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
(cherry picked from commit 8a5530ee98)
2019-02-06 00:37:11 +00:00
Guillaume Abrioux 1877e1b330 tests: run lvm_setup.yml only when osd_scenario is lvm
especially for ooo_collocation scenario which is still using ceph-disk
testing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-01-31 00:33:10 +01:00
Guillaume Abrioux 2abde600cd tests: add nodes for container-all_daemons scenario
add back iscsigw and rbdmirror vm in all_daemons testing

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-01-30 14:58:59 +01:00
Noah Watkins fc6bae26ac Fixup shrink_osd[_container] scenario config
** configuration seems to be for filestore:

[ERROR]: [ceph-osd0] Validation failed for variable: lvm_volumes

** Removing `radosgw_interface: eth1` to resolve:

The task includes an option with an undefined variable. The error was:
'ansible.vars.hostvars.HostVarsVars object' has no attribute
u'ansible_eth1'

The error appears to have been in
'/home/nwatkins/src/ceph-ansible/roles/ceph-defaults/tasks/set_radosgw_address.yml':
line 21, column 5, but may be elsewhere in the file depending on the
exact syntax problem.

The offending line appears to be:

  - name: set_fact _radosgw_address to radosgw_interface - ipv4
    ^ here

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
(cherry picked from commit 50255b9640)
2019-01-30 14:58:59 +01:00
Guillaume Abrioux 299baed635 tests: refact testing in stable-3.2
Apply the same refact recently introduced in master to stable-3.2

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-01-30 14:58:59 +01:00
Sébastien Han 50fe56044e disable nfs scenario
The packages are broken, so let's remove it, until this solved.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit a502327e52)
2018-12-04 14:39:05 +00:00
Sébastien Han fa8bd10cac test: disable nfs for containers
Based on https://github.com/ceph/ceph-container/pull/1269 and given
there are no stable packages and reliable repository, we disable nfs
ganesha temporarly.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 6c3ef90ebe)
2018-12-04 14:39:05 +00:00
Guillaume Abrioux 68b2ad11ee mon: move `osd_pool_default_pg_num` in `ceph-defaults`
`osd_pool_default_pg_num` parameter is set in `ceph-mon`.
When using ceph-ansible with `--limit` on a specifc group of nodes, it
will fail when trying to access this variables since it wouldn't be
defined.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1518696

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d4c0960f04)
2018-11-29 01:49:05 +00:00
Guillaume Abrioux e8dd6b8993 tests: change default pools size
default pool size in our test should be explicitly set to 1

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-29 01:49:05 +00:00
Guillaume Abrioux 30cec03ae7 tests: do not fully override previous ceph_conf_overrides
We run an initial deployment with `osd_pool_default_size: 1` in
`ceph_conf_overrides`.
When re-running the playbook to test idempotency and handlers, we reset
`ceph_conf_overrides`, we must append a new value instead of just
overwritting it, otherwise, this can lead to error in the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f290e49df8)
2018-11-29 01:49:05 +00:00
Guillaume Abrioux 133615471a tests: set pool size to 1 in ceph-override.json
setting this setting to 1 makes the CI covering the related code in the
playbook without breaking the upgrade scenarios.

Those scenarios were broken because there is a check `TASK [waiting for
clean pgs...]` in rolling_update.yml, since the pool size for
`cephfs_metadata` and `cephfs_data` are updated to `2` in
`ceph-override.json` and there is not enough osd to honor this size,
some PGs are degraded and make the mentioned check failing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3ac6619fb9)
2018-11-28 23:11:46 +01:00
Guillaume Abrioux f52344300a tests: add more memory for rgw_multsite scenarios
Adding more memory to VMs for rgw_multisite scenarios could avoid this error
I have recently hit in the CI:

(It is worth it to set 1024Mb since there is only 2 nodes in those
scenarios.)

```
fatal: [osd0]: FAILED! => {
    "changed": false,
    "cmd": [
        "docker",
        "run",
        "--rm",
        "--entrypoint",
        "/usr/bin/ceph",
        "docker.io/ceph/daemon:latest-luminous",
        "--version"
    ],
    "delta": "0:00:04.799084",
    "end": "2018-10-29 17:10:39.136602",
    "rc": 1,
    "start": "2018-10-29 17:10:34.337518"
}

STDERR:

Traceback (most recent call last):
  File "/usr/bin/ceph", line 125, in <module>
    import rados
ImportError: libceph-common.so.0: cannot map zero-fill pages: Cannot allocate memory
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00
Guillaume Abrioux 37970a5b3c tests: add rgw_multisite functional test
Add a playbook that will upload a file on the master then try to get
info from the secondary node, this way we can check if the replication
is ok.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00