Commit Graph

377 Commits (26cd62b4e6d5c2edc567d499b0c2cb40742e88ae)

Author SHA1 Message Date
Rishabh Dave 8edbda96df use blocks directives to group tasks
Using block directives simplifies the playbooks and makes them more
readable.

Fixes: https://github.com/ceph/ceph-ansible/issues/2835
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-10-31 09:37:43 +01:00
Guillaume Abrioux 62c314e2ba tests: test master against ansible 2.7
Let's test ceph-ansible master against ansible 2.7 to catch early any
potential issue with this ansible version.

Closes: #3148

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 17:07:05 +01:00
Guillaume Abrioux d8d3e55006 remove restapi role
As of `mimic`, restapi is no longer available because of manager daemon.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:19:13 +01:00
Guillaume Abrioux f52344300a tests: add more memory for rgw_multsite scenarios
Adding more memory to VMs for rgw_multisite scenarios could avoid this error
I have recently hit in the CI:

(It is worth it to set 1024Mb since there is only 2 nodes in those
scenarios.)

```
fatal: [osd0]: FAILED! => {
    "changed": false,
    "cmd": [
        "docker",
        "run",
        "--rm",
        "--entrypoint",
        "/usr/bin/ceph",
        "docker.io/ceph/daemon:latest-luminous",
        "--version"
    ],
    "delta": "0:00:04.799084",
    "end": "2018-10-29 17:10:39.136602",
    "rc": 1,
    "start": "2018-10-29 17:10:34.337518"
}

STDERR:

Traceback (most recent call last):
  File "/usr/bin/ceph", line 125, in <module>
    import rados
ImportError: libceph-common.so.0: cannot map zero-fill pages: Cannot allocate memory
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00
Guillaume Abrioux 37970a5b3c tests: add rgw_multisite functional test
Add a playbook that will upload a file on the master then try to get
info from the secondary node, this way we can check if the replication
is ok.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00
Guillaume Abrioux 4d464c1003 rgw: add testing scenario for rgw multisite
This will setup 2 cluster with rgw multisite enabled.
First cluster will act as the 'master', the 2nd will be the secondary
one.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00
Sébastien Han 22aed97266 testinfra: change test osds for containers
We do not use  @<device> anymore so we don't need to perform the
readlink check anymore.

Also we are making an exception for ooo which is still using ceph-disk.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-29 18:31:17 +01:00
Sébastien Han 1cdec4069a test_osd: dynamically get the osd container
Do not enforce the container name since this will fail when we have
multiple VMs running OSDs.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-29 15:33:12 +01:00
Sébastien Han 876f6ced74 test: convert all the tests to use lvm
ceph-disk is now deprecated in ceph-ansible so let's convert all the ci
tests to use lvm instead of ceph-disk.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-29 15:33:12 +01:00
Sébastien Han 2fd7da12bb test: remove ceph-disk CI tests
Since we are removing the ceph-disk test from the ci in master then
there is no need to have the functionnal tests in master anymore.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-29 15:33:12 +01:00
Rishabh Dave ee2d52d33d allow custom pool size
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1596339
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-10-22 16:00:21 +02:00
Guillaume Abrioux c47aa2e83b tests: remove unnecessary variables definition
since we set `configure_firewall: true` in
`ceph-defaults/defaults/main.yml` there is no need to explicitly set it
in `centos7_cluster` and `docker_cluster` testing scenarios.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-19 15:12:45 +02:00
Guillaume Abrioux 1f9090884e Revert "tests: test `test_all_docker_osds_are_up_and_in()` from mon nodes"
This approach doesn't work with all scenarios because it's comparing a
local OSD number expected to a global OSD number found in the whole
cluster.

This reverts commit b8ad35ceb9.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-19 00:12:43 +00:00
Guillaume Abrioux cb35cac926 tests: set configure_firewall: true in centos7|docker_cluster
This way the CI will cover this part of the code.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-19 00:12:43 +00:00
Guillaume Abrioux b8ad35ceb9 tests: test `test_all_docker_osds_are_up_and_in()` from mon nodes
Let's get the osd tree from mons instead on osds.
This way we don't have to predict an OSD container name.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-17 17:07:25 +02:00
Guillaume Abrioux b8418ebd17 add-osds: followup on 3632b26
Three fixes:

- fix a typo in vagrant_variables that cause a networking issue for
containerized scenario.
- add containerized_deployment: true
- remove a useless block of code: the fact docker_exec_cmd is set in
ceph-defaults which is played right after.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-17 17:07:25 +02:00
Guillaume Abrioux 3632b26005 tests: add tests for day-2-operation playbook
Adding testing scenarios for day-2-operation playbook.

Steps:
- deploys a cluster,
- run testinfra,
- test idempotency,
- add a new osd node,
- run testinfra

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-17 11:26:11 +00:00
Guillaume Abrioux 40b7747af7 remove jewel support
As of now, we should no longer support Jewel in ceph-ansible.
The latest ceph-ansible release supporting Jewel is `stable-3.1`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-12 23:38:17 +00:00
Sébastien Han fa38b86cf8 test: fix docker test for lvm
The CI is still running ceph-disk tests upstream. So until
https://github.com/ceph/ceph-ansible/pull/3187 is merged nothing will
pass anymore.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-12 20:33:01 +00:00
Sébastien Han 31a0438cb2 ceph_volume: refactor
This commit does a couple of things:

* Avoid code duplication
* Clarify the code
* add more unit tests
* add myself to the author of the module

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Guillaume Abrioux d2ca24eca8 tests: do not install lvm2 on atomic host
we need to detect whether we are running on atomic host to not try to
install lvm2 package.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-10 16:08:41 -04:00
Sébastien Han 90c66a5848 ci: test lvm in containerized
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Sébastien Han 0735d39518 tests: osd adjust osd name
Now we use id of the OSD instead of the device name.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Guillaume Abrioux cc6f41f76a tests: fix lvm2 setup issue
not gathering fact causes `package` module to fail because it needs to
detect which OS we are running on to select the right package manager.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-09 16:12:54 -04:00
Alfredo Deza 3e488e8298 tests: install lvm2 before setting up ceph-volume/LVM tests
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2018-10-09 13:48:50 -04:00
Andrew Schoen a68c680225 tests: remove journal_size from lvm-batch testing scenario
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-10-09 10:09:50 -04:00
Sébastien Han 9fe86c2268 test: use osd_objecstore default value
Do not force filestore on our test but whatever is the default of
osd_objecstore.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-09-27 21:23:49 +00:00
Guillaume Abrioux 3285b47703 tests: add an RGW node on osd0 for ooo-collocation
get more coverage by adding an RGW daemon collocated on osd0.
We've missed a bug in the past which could have been caught earlier in
the CI.
Let's add this additional daemon in order to have a better coverage.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-09-24 14:35:25 +02:00
Guillaume Abrioux 3382c5226c tests: fix monitor_address for shrink_osd scenario
b89cc1746 introduced a typo. This commit fixes it

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-09-13 18:14:01 +02:00
Andrew Schoen b36f3e06b5 ceph_volume: adds the osds_per_device parameter
If this is set to anything other than the default value of 1 then the
--osds-per-device flag will be used by the batch command to define how
many osds will be created per device.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-09-12 20:27:14 +00:00
Alfredo Deza 58b2308036 tests: use new 'num_osds' variable in tests
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2018-08-31 21:23:20 +00:00
Alfredo Deza e5fcb0d2d2 tests: allow defining arbitrary number of OSDs
Some tests might want to set this since number of devices will not
necessarily map to number of OSDs

Signed-off-by: Alfredo Deza <adeza@redhat.com>
2018-08-31 21:23:20 +00:00
Sébastien Han 7012835d2b ci: stop using different images on the same run
There is no point of using hosts running on atomic AND centos hosts. So
let's run containerized scenarios on Atomic only.

This solves this error here:

```
fatal: [client2]: FAILED! => {
    "failed": true
}

MSG:

The conditional check 'ceph_current_status.rc == 0' failed. The error was: error while evaluating conditional (ceph_current_status.rc == 0): 'dict object' has no attribute 'rc'

The error appears to have been in '/home/jenkins-build/build/workspace/ceph-ansible-nightly-luminous-stable-3.1-ooo_collocation/roles/ceph-defaults/tasks/facts.yml': line 74, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: set_fact ceph_current_status (convert to json)
  ^ here
```

From https://2.jenkins.ceph.com/view/ceph-ansible-stable3.1/job/ceph-ansible-nightly-luminous-stable-3.1-ooo_collocation/37/consoleFull#1765217701b5dd38fa-a56e-4233-a5ca-584604e56e3a

What's happening here is all the hosts excepts the clients are running atomic, so here: https://github.com/ceph/ceph-ansible/blob/master/site-docker.yml.sample#L62
The condition will skipped all the nodes excepts the clients, thus when running ceph-default, the task "is ceph running already?" is skipped but the task above needs the rc of the skipped task.
This is not an error from the playbook, it's a CI setup issue.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-23 16:13:54 +02:00
Andrew Schoen 810cc47892 tests: adds a testing scenario for lv-create and lv-teardown
Using an explicitly named testing environment name allows us to have a
specific [testenv] block for this test. This greatly simplifies how it will
work as it doesn't really anything from the ceph cluster tests.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-08-16 16:38:23 +02:00
Andrew Schoen 647bbd8f1e tests: adds crush_device_class to lvm-batch scenario
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-08-09 09:41:58 -04:00
Andrew Schoen 6d431ec22d ceph-volume: implement the 'lvm batch' subcommand
This adds the action 'batch' to the ceph-volume module so that we can
run the new 'ceph-volume lvm batch' subcommand. A functional test is
also included.

If devices is defind and osd_scenario is lvm then the 'ceph-volume lvm
batch' command will be used to create the OSDs.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-08-09 09:41:58 -04:00
Sébastien Han 77d4023fbe test: follow up on osd_crush_location for containers
This was fixed by
578aa5c2d5
on non-container, we need to apply the same fix for containers.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-07 16:20:13 +00:00
Sébastien Han 50be3fd9e8 test: remove osd_crush_location from shrink scenarios
This is not needed since this is already covered by docker_cluster and
centos_cluster scenarios.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-07 16:20:13 +00:00
Guillaume Abrioux 578aa5c2d5 tests: leave an OSD node in default crush root
jewel used to create a default `rbd` pool in the default crush root
`default`, we need to have at least 1 osd to satisfy the PGs for this
created pool, otherwise the cluster will be in HEALTH_ERR state because
of `pgs stuck unclean`/`pgs stuck inactive`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-26 18:47:10 +00:00
Guillaume Abrioux 0a88bccf87 tests: followup on b89cc1746f
Update network subnets in group_vars/all

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-24 16:55:15 +02:00
Guillaume Abrioux b89cc1746f tests: do not deploy all daemons for shrink osds scenarios
Let's create a dedicated environment for these scenarios, there is no
need to deploy everything.
By the way, doing so will save some times.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-23 18:30:06 +02:00
Guillaume Abrioux af82e7523d tests: test master against ansible 2.6
Ansible 2.4 is currently end-of-life.
Ansible 2.5 will go end-of-life after Ansible 2.7 is released.

Fixes: #2901

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-23 11:59:15 +00:00
Guillaume Abrioux 0c863a3783 tests: add support of 'ooo-collocation' scenario when testing against ceph dev
The group_vars/all file is not available on 'ooo-collocation' scenario,
it's making the `dev_setup.yml` failing because this path is hardcoded.

The idea here is to check if the pattern 'ooo-collocation' is present in
`change_dir` variable so we can set this path properly according to the
scenario being run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-20 07:47:33 +02:00
Guillaume Abrioux d8281e50f1 tests: support update scenarios in test_rbd_mirror_is_up()
`test_rbd_mirror_is_up()` is failing on update scenarios because it
assumes the `ceph_stable_release` is still set to the value of the
original ceph release, it means it won't enter in the right part of the
condition and fails.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-20 07:46:41 +02:00
Guillaume Abrioux cc71bb96cc tests: followup on #2656
34f70428 has introduced a fix using `command` module while this could
have been achieved by using `lvol` module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-13 07:55:14 +00:00
Guillaume Abrioux 9a65ec231d tests: fix `_get_osd_id_from_host()` in TestOSDs()
We must initialize `children` variable in `_get_osd_id_from_host()`,
otherwise, if for any reason the deployment has failed and result with
an osd host with no OSD registered, we won't enter in the condition,
therefore, `children` is never set and the function tries to return
something undefined.

Typical error:
```
E       UnboundLocalError: local variable 'children' referenced before assignment
```

Fixes: #2860

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-10 13:06:23 +00:00
Guillaume Abrioux b6d09b510f tests: refact ci testing master
We should test ceph-ansible against the latest ansible stable version on
master.

This commit also remove the pinning to 1.7.1 version of testinfra
because ansible 2.5 requires a newer version.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-06 16:31:49 +00:00
Guillaume Abrioux 09d795b5b7 tests: add mimic support for test_rbd_mirror_is_up()
prior mimic, the data structure returned by `ceph -s -f json` used to
gather information about rbd-mirror daemons looked like below:

```
  "servicemap": {
    "epoch": 8,
    "modified": "2018-07-05 13:21:06.207483",
    "services": {
      "rbd-mirror": {
        "daemons": {
          "summary": "",
          "ceph-nano-luminous-faa32aebf00b": {
            "start_epoch": 8,
            "start_stamp": "2018-07-05 13:21:04.668450",
            "gid": 14107,
            "addr": "172.17.0.2:0/2229952892",
            "metadata": {
              "arch": "x86_64",
              "ceph_version": "ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)",
              "cpu": "Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz",
              "distro": "centos",
              "distro_description": "CentOS Linux 7 (Core)",
              "distro_version": "7",
              "hostname": "ceph-nano-luminous-faa32aebf00b",
              "instance_id": "14107",
              "kernel_description": "#1 SMP Wed Mar 14 15:12:16 UTC 2018",
              "kernel_version": "4.9.87-linuxkit-aufs",
              "mem_swap_kb": "1048572",
              "mem_total_kb": "2046652",
              "os": "Linux"
            }
          }
        }
      }
    }
  }
```

This part has changed from mimic and became:
```
  "servicemap": {
    "epoch": 2,
    "modified": "2018-07-04 09:54:36.164786",
    "services": {
      "rbd-mirror": {
        "daemons": {
          "summary": "",
          "14151": {
            "start_epoch": 2,
            "start_stamp": "2018-07-04 09:54:35.541272",
            "gid": 14151,
            "addr": "192.168.1.80:0/240942528",
            "metadata": {
              "arch": "x86_64",
              "ceph_release": "mimic",
              "ceph_version": "ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)",
              "ceph_version_short": "13.2.0",
              "cpu": "Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz",
              "distro": "centos",
              "distro_description": "CentOS Linux 7 (Core)",
              "distro_version": "7",
              "hostname": "ceph-rbd-mirror0",
              "id": "ceph-rbd-mirror0",
              "instance_id": "14151",
              "kernel_description": "#1 SMP Wed May 9 18:05:47 UTC 2018",
              "kernel_version": "3.10.0-862.2.3.el7.x86_64",
              "mem_swap_kb": "1572860",
              "mem_total_kb": "1015548",
              "os": "Linux"
            }
          }
        }
      }
    }
  }
```

This patch modifies the function `test_rbd_mirror_is_up()` in
`test_rbd_mirror.py` so it works with `mimic` and keeps backward compatibility
with `luminous`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-06 14:39:13 +02:00
Guillaume Abrioux f2e57a56db tests: factorize docker tests using docker_exec_cmd logic
avoid duplicating test unnecessarily just because of docker exec syntax.
Using the same logic than in the playbook with `docker_exec_cmd` allow us
to execute the same test on both containerized and non containerized environment.

The idea is to set a variable `docker_exec_cmd` with the
'docker exec <container-name>' string when containerized and
set it to '' when non containerized.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-27 07:00:14 +00:00
Guillaume Abrioux fe79a5d240 tests: refact test_all_*_osds_are_up_and_in
these tests are skipped on bluestore osds scenarios.
they were going to fail anyway since they are run on mon nodes and
`devices` is defined in inventory for each osd node. It means
`num_devices * num_osd_hosts` returns `0`.
The result is that the test expects to have 0 OSDs up.

The idea here is to move these tests so they are run on OSD nodes.
Each OSD node checks their respective OSD to be UP, if an OSD has 2
devices defined in `devices` variable, it means we are checking for 2
OSD to be up on that node, if each node has all its OSD up, we can say
all OSD are up.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-26 15:23:39 +00:00