Commit Graph

482 Commits (4e42d085f7147d38100b7535bc16667e747e9c52)

Author SHA1 Message Date
Guillaume Abrioux 5601af8de2 tests: change default pools size
default pool size in our test should be explicitly set to 1

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-21 18:23:07 +00:00
Guillaume Abrioux d4c0960f04 mon: move `osd_pool_default_pg_num` in `ceph-defaults`
`osd_pool_default_pg_num` parameter is set in `ceph-mon`.
When using ceph-ansible with `--limit` on a specifc group of nodes, it
will fail when trying to access this variables since it wouldn't be
defined.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1518696

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-21 15:42:50 +00:00
Guillaume Abrioux 3ac6619fb9 tests: set pool size to 1 in ceph-override.json
setting this setting to 1 makes the CI covering the related code in the
playbook without breaking the upgrade scenarios.

Those scenarios were broken because there is a check `TASK [waiting for
clean pgs...]` in rolling_update.yml, since the pool size for
`cephfs_metadata` and `cephfs_data` are updated to `2` in
`ceph-override.json` and there is not enough osd to honor this size,
some PGs are degraded and make the mentioned check failing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-12 10:51:48 +01:00
Sébastien Han e552026418 rbd-mirror: use the new rbd-mirror key
Instead of using the old rbd key let's use the new rbr-mirror key to
bootstrap the rbd -mirror daemon.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-09 12:45:52 +01:00
Noah Watkins 50255b9640 Fixup shrink_osd[_container] scenario config
** configuration seems to be for filestore:

[ERROR]: [ceph-osd0] Validation failed for variable: lvm_volumes

** Removing `radosgw_interface: eth1` to resolve:

The task includes an option with an undefined variable. The error was:
'ansible.vars.hostvars.HostVarsVars object' has no attribute
u'ansible_eth1'

The error appears to have been in
'/home/nwatkins/src/ceph-ansible/roles/ceph-defaults/tasks/set_radosgw_address.yml':
line 21, column 5, but may be elsewhere in the file depending on the
exact syntax problem.

The offending line appears to be:

  - name: set_fact _radosgw_address to radosgw_interface - ipv4
    ^ here

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2018-11-08 17:45:37 +01:00
Rishabh Dave 8edbda96df use blocks directives to group tasks
Using block directives simplifies the playbooks and makes them more
readable.

Fixes: https://github.com/ceph/ceph-ansible/issues/2835
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-10-31 09:37:43 +01:00
Guillaume Abrioux 62c314e2ba tests: test master against ansible 2.7
Let's test ceph-ansible master against ansible 2.7 to catch early any
potential issue with this ansible version.

Closes: #3148

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 17:07:05 +01:00
Guillaume Abrioux d8d3e55006 remove restapi role
As of `mimic`, restapi is no longer available because of manager daemon.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:19:13 +01:00
Guillaume Abrioux f52344300a tests: add more memory for rgw_multsite scenarios
Adding more memory to VMs for rgw_multisite scenarios could avoid this error
I have recently hit in the CI:

(It is worth it to set 1024Mb since there is only 2 nodes in those
scenarios.)

```
fatal: [osd0]: FAILED! => {
    "changed": false,
    "cmd": [
        "docker",
        "run",
        "--rm",
        "--entrypoint",
        "/usr/bin/ceph",
        "docker.io/ceph/daemon:latest-luminous",
        "--version"
    ],
    "delta": "0:00:04.799084",
    "end": "2018-10-29 17:10:39.136602",
    "rc": 1,
    "start": "2018-10-29 17:10:34.337518"
}

STDERR:

Traceback (most recent call last):
  File "/usr/bin/ceph", line 125, in <module>
    import rados
ImportError: libceph-common.so.0: cannot map zero-fill pages: Cannot allocate memory
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00
Guillaume Abrioux 37970a5b3c tests: add rgw_multisite functional test
Add a playbook that will upload a file on the master then try to get
info from the secondary node, this way we can check if the replication
is ok.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00
Guillaume Abrioux 4d464c1003 rgw: add testing scenario for rgw multisite
This will setup 2 cluster with rgw multisite enabled.
First cluster will act as the 'master', the 2nd will be the secondary
one.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00
Sébastien Han 22aed97266 testinfra: change test osds for containers
We do not use  @<device> anymore so we don't need to perform the
readlink check anymore.

Also we are making an exception for ooo which is still using ceph-disk.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-29 18:31:17 +01:00
Sébastien Han 1cdec4069a test_osd: dynamically get the osd container
Do not enforce the container name since this will fail when we have
multiple VMs running OSDs.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-29 15:33:12 +01:00
Sébastien Han 876f6ced74 test: convert all the tests to use lvm
ceph-disk is now deprecated in ceph-ansible so let's convert all the ci
tests to use lvm instead of ceph-disk.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-29 15:33:12 +01:00
Sébastien Han 2fd7da12bb test: remove ceph-disk CI tests
Since we are removing the ceph-disk test from the ci in master then
there is no need to have the functionnal tests in master anymore.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-29 15:33:12 +01:00
Rishabh Dave ee2d52d33d allow custom pool size
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1596339
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-10-22 16:00:21 +02:00
Guillaume Abrioux c47aa2e83b tests: remove unnecessary variables definition
since we set `configure_firewall: true` in
`ceph-defaults/defaults/main.yml` there is no need to explicitly set it
in `centos7_cluster` and `docker_cluster` testing scenarios.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-19 15:12:45 +02:00
Guillaume Abrioux 1f9090884e Revert "tests: test `test_all_docker_osds_are_up_and_in()` from mon nodes"
This approach doesn't work with all scenarios because it's comparing a
local OSD number expected to a global OSD number found in the whole
cluster.

This reverts commit b8ad35ceb9.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-19 00:12:43 +00:00
Guillaume Abrioux cb35cac926 tests: set configure_firewall: true in centos7|docker_cluster
This way the CI will cover this part of the code.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-19 00:12:43 +00:00
Guillaume Abrioux b8ad35ceb9 tests: test `test_all_docker_osds_are_up_and_in()` from mon nodes
Let's get the osd tree from mons instead on osds.
This way we don't have to predict an OSD container name.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-17 17:07:25 +02:00
Guillaume Abrioux b8418ebd17 add-osds: followup on 3632b26
Three fixes:

- fix a typo in vagrant_variables that cause a networking issue for
containerized scenario.
- add containerized_deployment: true
- remove a useless block of code: the fact docker_exec_cmd is set in
ceph-defaults which is played right after.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-17 17:07:25 +02:00
Guillaume Abrioux 3632b26005 tests: add tests for day-2-operation playbook
Adding testing scenarios for day-2-operation playbook.

Steps:
- deploys a cluster,
- run testinfra,
- test idempotency,
- add a new osd node,
- run testinfra

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-17 11:26:11 +00:00
Guillaume Abrioux 40b7747af7 remove jewel support
As of now, we should no longer support Jewel in ceph-ansible.
The latest ceph-ansible release supporting Jewel is `stable-3.1`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-12 23:38:17 +00:00
Sébastien Han fa38b86cf8 test: fix docker test for lvm
The CI is still running ceph-disk tests upstream. So until
https://github.com/ceph/ceph-ansible/pull/3187 is merged nothing will
pass anymore.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-12 20:33:01 +00:00
Sébastien Han 31a0438cb2 ceph_volume: refactor
This commit does a couple of things:

* Avoid code duplication
* Clarify the code
* add more unit tests
* add myself to the author of the module

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Guillaume Abrioux d2ca24eca8 tests: do not install lvm2 on atomic host
we need to detect whether we are running on atomic host to not try to
install lvm2 package.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-10 16:08:41 -04:00
Sébastien Han 90c66a5848 ci: test lvm in containerized
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Sébastien Han 0735d39518 tests: osd adjust osd name
Now we use id of the OSD instead of the device name.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-10 16:08:41 -04:00
Guillaume Abrioux cc6f41f76a tests: fix lvm2 setup issue
not gathering fact causes `package` module to fail because it needs to
detect which OS we are running on to select the right package manager.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-09 16:12:54 -04:00
Alfredo Deza 3e488e8298 tests: install lvm2 before setting up ceph-volume/LVM tests
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2018-10-09 13:48:50 -04:00
Andrew Schoen a68c680225 tests: remove journal_size from lvm-batch testing scenario
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-10-09 10:09:50 -04:00
Sébastien Han 9fe86c2268 test: use osd_objecstore default value
Do not force filestore on our test but whatever is the default of
osd_objecstore.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-09-27 21:23:49 +00:00
Guillaume Abrioux 3285b47703 tests: add an RGW node on osd0 for ooo-collocation
get more coverage by adding an RGW daemon collocated on osd0.
We've missed a bug in the past which could have been caught earlier in
the CI.
Let's add this additional daemon in order to have a better coverage.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-09-24 14:35:25 +02:00
Guillaume Abrioux 3382c5226c tests: fix monitor_address for shrink_osd scenario
b89cc1746 introduced a typo. This commit fixes it

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-09-13 18:14:01 +02:00
Andrew Schoen b36f3e06b5 ceph_volume: adds the osds_per_device parameter
If this is set to anything other than the default value of 1 then the
--osds-per-device flag will be used by the batch command to define how
many osds will be created per device.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-09-12 20:27:14 +00:00
Alfredo Deza 58b2308036 tests: use new 'num_osds' variable in tests
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2018-08-31 21:23:20 +00:00
Alfredo Deza e5fcb0d2d2 tests: allow defining arbitrary number of OSDs
Some tests might want to set this since number of devices will not
necessarily map to number of OSDs

Signed-off-by: Alfredo Deza <adeza@redhat.com>
2018-08-31 21:23:20 +00:00
Sébastien Han 7012835d2b ci: stop using different images on the same run
There is no point of using hosts running on atomic AND centos hosts. So
let's run containerized scenarios on Atomic only.

This solves this error here:

```
fatal: [client2]: FAILED! => {
    "failed": true
}

MSG:

The conditional check 'ceph_current_status.rc == 0' failed. The error was: error while evaluating conditional (ceph_current_status.rc == 0): 'dict object' has no attribute 'rc'

The error appears to have been in '/home/jenkins-build/build/workspace/ceph-ansible-nightly-luminous-stable-3.1-ooo_collocation/roles/ceph-defaults/tasks/facts.yml': line 74, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: set_fact ceph_current_status (convert to json)
  ^ here
```

From https://2.jenkins.ceph.com/view/ceph-ansible-stable3.1/job/ceph-ansible-nightly-luminous-stable-3.1-ooo_collocation/37/consoleFull#1765217701b5dd38fa-a56e-4233-a5ca-584604e56e3a

What's happening here is all the hosts excepts the clients are running atomic, so here: https://github.com/ceph/ceph-ansible/blob/master/site-docker.yml.sample#L62
The condition will skipped all the nodes excepts the clients, thus when running ceph-default, the task "is ceph running already?" is skipped but the task above needs the rc of the skipped task.
This is not an error from the playbook, it's a CI setup issue.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-23 16:13:54 +02:00
Andrew Schoen 810cc47892 tests: adds a testing scenario for lv-create and lv-teardown
Using an explicitly named testing environment name allows us to have a
specific [testenv] block for this test. This greatly simplifies how it will
work as it doesn't really anything from the ceph cluster tests.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-08-16 16:38:23 +02:00
Andrew Schoen 647bbd8f1e tests: adds crush_device_class to lvm-batch scenario
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-08-09 09:41:58 -04:00
Andrew Schoen 6d431ec22d ceph-volume: implement the 'lvm batch' subcommand
This adds the action 'batch' to the ceph-volume module so that we can
run the new 'ceph-volume lvm batch' subcommand. A functional test is
also included.

If devices is defind and osd_scenario is lvm then the 'ceph-volume lvm
batch' command will be used to create the OSDs.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-08-09 09:41:58 -04:00
Sébastien Han 77d4023fbe test: follow up on osd_crush_location for containers
This was fixed by
578aa5c2d5
on non-container, we need to apply the same fix for containers.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-07 16:20:13 +00:00
Sébastien Han 50be3fd9e8 test: remove osd_crush_location from shrink scenarios
This is not needed since this is already covered by docker_cluster and
centos_cluster scenarios.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-07 16:20:13 +00:00
Guillaume Abrioux 578aa5c2d5 tests: leave an OSD node in default crush root
jewel used to create a default `rbd` pool in the default crush root
`default`, we need to have at least 1 osd to satisfy the PGs for this
created pool, otherwise the cluster will be in HEALTH_ERR state because
of `pgs stuck unclean`/`pgs stuck inactive`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-26 18:47:10 +00:00
Guillaume Abrioux 0a88bccf87 tests: followup on b89cc1746f
Update network subnets in group_vars/all

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-24 16:55:15 +02:00
Guillaume Abrioux b89cc1746f tests: do not deploy all daemons for shrink osds scenarios
Let's create a dedicated environment for these scenarios, there is no
need to deploy everything.
By the way, doing so will save some times.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-23 18:30:06 +02:00
Guillaume Abrioux af82e7523d tests: test master against ansible 2.6
Ansible 2.4 is currently end-of-life.
Ansible 2.5 will go end-of-life after Ansible 2.7 is released.

Fixes: #2901

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-23 11:59:15 +00:00
Guillaume Abrioux 0c863a3783 tests: add support of 'ooo-collocation' scenario when testing against ceph dev
The group_vars/all file is not available on 'ooo-collocation' scenario,
it's making the `dev_setup.yml` failing because this path is hardcoded.

The idea here is to check if the pattern 'ooo-collocation' is present in
`change_dir` variable so we can set this path properly according to the
scenario being run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-20 07:47:33 +02:00
Guillaume Abrioux d8281e50f1 tests: support update scenarios in test_rbd_mirror_is_up()
`test_rbd_mirror_is_up()` is failing on update scenarios because it
assumes the `ceph_stable_release` is still set to the value of the
original ceph release, it means it won't enter in the right part of the
condition and fails.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-20 07:46:41 +02:00
Guillaume Abrioux cc71bb96cc tests: followup on #2656
34f70428 has introduced a fix using `command` module while this could
have been achieved by using `lvol` module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-13 07:55:14 +00:00
Guillaume Abrioux 9a65ec231d tests: fix `_get_osd_id_from_host()` in TestOSDs()
We must initialize `children` variable in `_get_osd_id_from_host()`,
otherwise, if for any reason the deployment has failed and result with
an osd host with no OSD registered, we won't enter in the condition,
therefore, `children` is never set and the function tries to return
something undefined.

Typical error:
```
E       UnboundLocalError: local variable 'children' referenced before assignment
```

Fixes: #2860

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-10 13:06:23 +00:00
Guillaume Abrioux b6d09b510f tests: refact ci testing master
We should test ceph-ansible against the latest ansible stable version on
master.

This commit also remove the pinning to 1.7.1 version of testinfra
because ansible 2.5 requires a newer version.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-06 16:31:49 +00:00
Guillaume Abrioux 09d795b5b7 tests: add mimic support for test_rbd_mirror_is_up()
prior mimic, the data structure returned by `ceph -s -f json` used to
gather information about rbd-mirror daemons looked like below:

```
  "servicemap": {
    "epoch": 8,
    "modified": "2018-07-05 13:21:06.207483",
    "services": {
      "rbd-mirror": {
        "daemons": {
          "summary": "",
          "ceph-nano-luminous-faa32aebf00b": {
            "start_epoch": 8,
            "start_stamp": "2018-07-05 13:21:04.668450",
            "gid": 14107,
            "addr": "172.17.0.2:0/2229952892",
            "metadata": {
              "arch": "x86_64",
              "ceph_version": "ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)",
              "cpu": "Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz",
              "distro": "centos",
              "distro_description": "CentOS Linux 7 (Core)",
              "distro_version": "7",
              "hostname": "ceph-nano-luminous-faa32aebf00b",
              "instance_id": "14107",
              "kernel_description": "#1 SMP Wed Mar 14 15:12:16 UTC 2018",
              "kernel_version": "4.9.87-linuxkit-aufs",
              "mem_swap_kb": "1048572",
              "mem_total_kb": "2046652",
              "os": "Linux"
            }
          }
        }
      }
    }
  }
```

This part has changed from mimic and became:
```
  "servicemap": {
    "epoch": 2,
    "modified": "2018-07-04 09:54:36.164786",
    "services": {
      "rbd-mirror": {
        "daemons": {
          "summary": "",
          "14151": {
            "start_epoch": 2,
            "start_stamp": "2018-07-04 09:54:35.541272",
            "gid": 14151,
            "addr": "192.168.1.80:0/240942528",
            "metadata": {
              "arch": "x86_64",
              "ceph_release": "mimic",
              "ceph_version": "ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)",
              "ceph_version_short": "13.2.0",
              "cpu": "Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz",
              "distro": "centos",
              "distro_description": "CentOS Linux 7 (Core)",
              "distro_version": "7",
              "hostname": "ceph-rbd-mirror0",
              "id": "ceph-rbd-mirror0",
              "instance_id": "14151",
              "kernel_description": "#1 SMP Wed May 9 18:05:47 UTC 2018",
              "kernel_version": "3.10.0-862.2.3.el7.x86_64",
              "mem_swap_kb": "1572860",
              "mem_total_kb": "1015548",
              "os": "Linux"
            }
          }
        }
      }
    }
  }
```

This patch modifies the function `test_rbd_mirror_is_up()` in
`test_rbd_mirror.py` so it works with `mimic` and keeps backward compatibility
with `luminous`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-06 14:39:13 +02:00
Guillaume Abrioux f2e57a56db tests: factorize docker tests using docker_exec_cmd logic
avoid duplicating test unnecessarily just because of docker exec syntax.
Using the same logic than in the playbook with `docker_exec_cmd` allow us
to execute the same test on both containerized and non containerized environment.

The idea is to set a variable `docker_exec_cmd` with the
'docker exec <container-name>' string when containerized and
set it to '' when non containerized.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-27 07:00:14 +00:00
Guillaume Abrioux fe79a5d240 tests: refact test_all_*_osds_are_up_and_in
these tests are skipped on bluestore osds scenarios.
they were going to fail anyway since they are run on mon nodes and
`devices` is defined in inventory for each osd node. It means
`num_devices * num_osd_hosts` returns `0`.
The result is that the test expects to have 0 OSDs up.

The idea here is to move these tests so they are run on OSD nodes.
Each OSD node checks their respective OSD to be UP, if an OSD has 2
devices defined in `devices` variable, it means we are checking for 2
OSD to be up on that node, if each node has all its OSD up, we can say
all OSD are up.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-26 15:23:39 +00:00
Guillaume Abrioux 2d560b562a tests: skip tests for node iscsi-gw when deploying jewel
CI is deploying a iscsigw node anyway but its not deployed let's skip
test accordingly

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-26 15:23:39 +00:00
Guillaume Abrioux 1c3dae4a90 tests: skip rgw_tuning_pools_are_set when rgw_create_pools is not defined
since ooo_collocation scenario is supposed to be the same scenario than the
one tested by OSP and they are not passing `rgw_create_pools` the test
`test_docker_rgw_tuning_pools_are_set` will fail:
```
>       pools = node["vars"]["rgw_create_pools"]
E       KeyError: 'rgw_create_pools'
```

skipping this test if `node["vars"]["rgw_create_pools"]` is not defined
fixes this failure.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-26 15:23:39 +00:00
Guillaume Abrioux d83b24d271 tests: fix broken test when collocated daemons scenarios
At the moment, a lot of tests are skipped when daemons are collocated.
Our tests consider a node belong to only 1 group while it's possible for
certain scenario it can belong to multiple groups.

Also pinning to pytest 3.6.1 so we can use `request.node.iter_markers()`

Co-Authored-by: Alfredo Deza <adeza@redhat.com>
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-26 15:23:39 +00:00
Guillaume Abrioux f68936ca7e tests: fix *_has_correct_value tests
It might happen that the list of ips/hosts in following line (ceph.conf)
- `mon initial memebers = <hosts>`
- `mon host = <ips>`

are not ordered the same way depending on deployment.

This patch makes the tests looking for each ip or hostname in respective
lines.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-20 08:01:57 +02:00
Guillaume Abrioux 481c14455a tests: add more nodes in ooo testing scenario
adding more node in this scenario could help to have a better coverage
so we can catch more potential bugs.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-18 16:44:23 +02:00
Guillaume Abrioux 21894655a7 tests: keep same ceph release during handlers/idempotency test
since `latest` points to `mimic`, we need to force the test to keep the
same ceph release when testing anything else than `mimic`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-15 11:45:51 -04:00
Guillaume Abrioux bbb8691335 tests: increase memory to 1024Mb for centos7_cluster scenario
we see more and more failure like `fatal: [mon0]: UNREACHABLE! => {}` in
`centos7_cluster` scenario, Since we have 30Gb RAM on hypervisors, we
can give monitors a bit more RAM. By the way, nodes on containerized cluster
testing scenario have already 1024Mb memory allocated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-11 23:52:15 +08:00
Sébastien Han 6035978ed9 test: only on containerized iscsi
We don't have the same service running on non-container for now, this
will change soon but for let's only run the test on container.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-11 08:34:48 +02:00
Sébastien Han 20c8065e48 ceph-iscsi: rename group iscsi_gws
Let's try to avoid using dashes as testinfra needs to be able to read
the groups.
Typically, with iscsi-gws we can't add a marker for these iscsi nodes,
using an underscore fixes the issue.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-08 10:21:54 +02:00
Sébastien Han c00fb12497 ci: add functionnal tests for iscsi
We test if:

* packages are installed
* services are runnning
* service units are enabled

Also fix linting issues

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-08 10:21:54 +02:00
Sébastien Han 5ff2f03e3f ci: add iscsi test
Add iscsi CI coverage, this will now deploy iscsi gateways in container.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-08 10:21:54 +02:00
Guillaume Abrioux 28d21b4e9c tests: update ooo inventory hostfile
Update the inventory host for tripleo testing scenario so it's the same
parameters than in tripleo CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-07 17:26:35 +02:00
Guillaume Abrioux 5eacc8f8d8 tests: add a dummy value for 'dev' release
Functional tests are broken when testing against 'dev' release (ceph).
Adding a dummy value here will make it possible to run ceph-ansible CI
against dev ceph release.

Typical error:

```
>       if request.node.get_marker("from_luminous") and ceph_release_num[ceph_stable_release] < ceph_release_num['luminous']:
E       KeyError: 'dev'
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fd1487d93f21b609a637053f5b33cd2a4e408d00)
2018-06-07 13:59:17 +02:00
Guillaume Abrioux c94ada69e8 tests: improve mds tests
the expected number of mds daemon consist of number of daemons that are
'up' + number of daemons 'up:standby'.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-07 14:01:58 +08:00
Guillaume Abrioux f0cd4b0651 tests: skip disabling fastest mirror detection on atomic host
There is no need to execute this task on atomic hosts.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-05 15:39:37 +02:00
Guillaume Abrioux 47276764f7 tests: fix rgw tests
41b4632 has introduced a change in functionnals tests.
Since the admin keyring isn't copied on rgw nodes anymore in tests, let's use
the rgw keyring to achieve them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-05 15:24:32 +02:00
Sébastien Han 41b4632abc test: do not always copy admin key
The admin key must be copied on the osd nodes only when we test the
shrink scenario. Shrink relies on ceph-disk commands that require the
admin key on the node where it's being executed.

Now we only copy the key when running on the shrink-osd scenario.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-05 09:39:30 +02:00
Guillaume Abrioux 2cf06b515f rgw: refact rgw pools creation
Refact of 8704144e31
There is no need to have duplicated tasks for this. The rgw pools
creation should be delegated on a monitor node se we don't have to care
if the admin keyring is present on rgw node.
By the way, only one task is needed to create the pools, we just need to
use the `docker_exec_cmd` fact already defined in `ceph-defaults` to
achieve it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1550281

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-05 15:00:20 +08:00
Erwan Velu 493f615eae ceph-defaults: Enable local epel repository
During the tests, the remote epel repository is generating a lots of
errors leading to broken jobs (issue #2666)

This patch is about using a local repository instead of a random one.
To achieve that, we make a preliminary install of epel-release, remove
the metalink and enforce a baseurl to our local http mirror.

That should speed up the build process but also avoid the random errors
we face.

This patch is part of a patch series that tries to remove all possible yum failures.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2018-06-04 08:11:35 +02:00
jtudelag 600e1e2c26 rgws: renames create_pools variable with rgw_create_pools.
Renamed to be consistent with the role (rgw) and have a meaningful name.

Signed-off-by: Jorge Tudela <jtudelag@redhat.com>
2018-06-04 06:23:42 +02:00
jtudelag 8704144e31 Adds RGWs pool creation to containerized installation.
ceph command has to be executed from one of the monitor containers
if not admin copy present in RGWs. Task has to be delegated then.

Adds test to check proper RGW pool creation for Docker container scenarios.

Signed-off-by: Jorge Tudela <jtudelag@redhat.com>
2018-06-04 06:23:42 +02:00
Guillaume Abrioux c68126d6fd mdss: do not make pg_num a mandatory params
When playing ceph-mds role, mon nodes have set a fact with the default
pg num for osd pools, we can simply default to this value for cephfs
pools (`cephfs_pools` variable).

At the moment the variable definition for `cephfs_pools` looks like:

```
cephfs_pools:
  - { name: "{{ cephfs_data }}", pgs: "" }
  - { name: "{{ cephfs_metadata }}", pgs: "" }
```

and we have a task in `ceph-validate` to ensure `pgs` has been set to a
valid value.

We could simply avoid this check by setting the default value of `pgs`
to `hostvars[groups[mon_group_name][0]]['osd_pool_default_pg_num']` and
let to users the possibility to override this value.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1581164

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-30 16:20:34 +02:00
Guillaume Abrioux 6f489015e4 tests: fix broken symlink
`requirements2.5.txt` is pointing to `tests/requirements2.4.txt` while
it should point to `requirements2.4.txt` since they are in the same
directory.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-30 16:13:47 +02:00
Guillaume Abrioux 34f7042852 tests: resize root partition when atomic host
For a few moment we can see failures in the CI for containerized
scenarios because VMs are running out of space at some point.

The default in the images used is to have only 3Gb for root partition
which doesn't sound like a lot.

Typical error seen:

```
STDERR:

failed to register layer: Error processing tar file(exit status 1): open /usr/share/zoneinfo/Atlantic/Canary: no space left on device
```

Indeed, on the machine we can see:
```
Every 2.0s: df -h                                                                                                                                                                                                                                       Tue May 29 17:21:13 2018
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/atomicos-root  3.0G  3.0G   14M 100% /
```

The idea here is to expand this partition with all the available space
remaining by issuing an `lvresize` followed by an `xfs_growfs`.

```
-bash-4.2# lvresize -l +100%FREE /dev/atomicos/root
  Size of logical volume atomicos/root changed from <2.93 GiB (750 extents) to 9.70 GiB (2484 extents).
  Logical volume atomicos/root successfully resized.
```

```
-bash-4.2# xfs_growfs /
meta-data=/dev/mapper/atomicos-root isize=512    agcount=4, agsize=192000 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=768000, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 768000 to 2543616
```

```
-bash-4.2# df -h
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/atomicos-root  9.7G  1.4G  8.4G  14% /
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-30 10:54:35 +02:00
Guillaume Abrioux 98cb6ed8f6 tests: avoid yum failures
In the CI we can see at many times failures like following:

`Failure talking to yum: Cannot find a valid baseurl for repo:
base/7/x86_64`

It seems the fastest mirror detection is sometimes counterproductive and
leads yum to fail.

This fix has been added in the `setup.yml`.
This playbook was used until now only just before playing `testinfra`
and could be used before running ceph-ansible so we can add some
provisionning tasks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Erwan Velu <evelu@redhat.com>
2018-05-28 22:04:35 +02:00
Guillaume Abrioux a10e73d78d tests: move cephfs_pools variable
let's move this variable in group_vars/all.yml in all testing scenarios
accordingly to this commit 1f15a81c48 so
we keep consistency between the playbook and the tests.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-24 09:39:38 -07:00
Guillaume Abrioux 564a662baf osds: move openstack pools creation in ceph-osd
When deploying a large number of OSD nodes it can be an issue because the
protection check [1] won't pass since it tries to create pools before all
OSDs are active.

The idea here is to move openstack pools creation at the end of `ceph-osd` role.

[1] e59258943b/src/mon/OSDMonitor.cc (L5673)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-24 09:39:38 -07:00
Luigi Toscano 43e96c1f98 ceph-radosgw: disable NSS PKI db when SSL is disabled
The NSS PKI database is needed only if radosgw_keystone_ssl
is explicitly set to true, otherwise the SSL integration is
not enabled.

It is worth noting that the PKI support was removed from Keystone
starting from the Ocata release, so some code paths should be
changed anyway.

Also, remove radosgw_keystone, which is not useful anymore.
This variable was used until fcba2c801a.
Now profiles drives the setting of rgw keystone *.

Signed-off-by: Luigi Toscano <ltoscano@redhat.com>
2018-05-23 23:24:09 -07:00
Andrew Schoen dea1ea93d5 tests: use notario>=0.0.13 when testing
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Andrew Schoen 9f68dad2ff validate: first pass at validating the install options
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-05-18 17:58:24 +02:00
Guillaume Abrioux a68091c923 tests: update the type for the rule used in pools
As of ceph 12.2.5 the type of the parameter `type` is not a name anymore but
an id, therefore an `int` is expected otherwise it will fail with the
following error

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-04-30 08:15:18 +02:00
Sébastien Han 71efa2eaf4 ci: bump client nodes to 2
In order to test the key distribution is correct we must have 2 client
nodes.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-23 18:34:58 +02:00
Sébastien Han 203c9af0ac ci: test ansible 2.5
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-23 10:17:24 +02:00
Guillaume Abrioux 77831ccb7a tests: update tests for mds to cover multimds case
in case of multimds we must check for the number of mds up instead of
just checking if the hostname of the node is in the fsmap.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-04-12 18:20:58 +02:00
Sébastien Han 82589021e0 ci: fix tripleO scenario
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-11 12:18:34 +02:00
Sébastien Han 2011ec3bcd ci: client copy admin key
If we don't copy the admin key we can't add the key into ceph.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-11 12:18:34 +02:00
Sébastien Han cf73647e7a ci: remove useless tests
These are already handled by ceph-client/defaults/main.yml so the keys
will be created once user_config is set to True.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-11 12:18:34 +02:00
Andrew Schoen 98e237d234 tests: no need to remove partitions in lvm_setup.yml
Now that we are using ceph_volume_zap the partitions are
kept around and should be able to be reused.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-04-10 14:19:21 +02:00
Sébastien Han f3caee8460 ceph-iscsi: fix certificates generation and distribution
Prior to this patch, the certificates where being generated on a single
node only (because of the run_once: true). Thus certificates were not
distributed on all the gateway nodes.

This would require a second ansible run to work. This patches fix the
creation and keys's distribution on all the nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540845
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-04 09:27:39 +02:00
John Fulton e6e6bd078a Refer to expected-num-ojects as expected_num_objects, not size
Follow up patch to PR 2432 [1] which replaces "size" (sorry if
the original bug used that term, which can be confusing) with
expected_num_objects as is used in the Ceph documentation [2].

[1] https://github.com/ceph/ceph-ansible/pull/2432/files
[2] http://docs.ceph.com/docs/jewel/rados/operations/pools
2018-03-26 15:41:51 +02:00
Sébastien Han 3ab89ab48c ci: re-arrange group_vars files
We should stop putting everything in 'all'. This is too easy and this is
error prone as well for those who are separating variables into host
type, things that you should do.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han d5f8cac820 ci: remove left over iscsi_gws file
Wrong file that is not used, only iscsi-ggw that is present is correct.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han 8000ae342e remove unsed ceph_rgw_civetweb_port variable
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han f119b25bbe client: implement proper pools creation
Just like we did for the monitor and openstack_config we now have the
ability to precisely create pools.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han e302c1baae mon: add support for erasure code pool
You can now specify type: erasure and   erasure_profile to use when
declaring the pool dictionnary.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han 4806ff4ff8 ci: test pool creation on container
On containerized scenario we also want to test pool creation.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han fc0fa48e0d test: add tests for creating crush tree
We now run tests on the newly created ceph_crush module. Now the CI will
create a specific hierarchy for the OSD.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-06 15:24:31 +00:00
Sébastien Han fd94840a6e ci: add copy_admin_key test to container scenario
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-02 20:59:10 +00:00
Sébastien Han 165d9dec10 remove kernel.pid_max
This is now managed by Ceph packages.

See: https://github.com/ceph/ceph/pull/18544/files

http://tracker.ceph.com/issues/21929

Closes: https://github.com/ceph/ceph-ansible/issues/2410

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-02-23 13:57:57 +01:00
Guillaume Abrioux 4a8986459f tests: change ceph_docker_image_tag for 2nd run
The ceph-ansible upstream CI runs severals tests, including a
'idempotency/handlers' test. It means the playbook is run a first time
and then a second time with an other container image version to ensure the
handlers run properly and the containers are well restarted.
This can cause issues.
For instance, in that specific case which drove me to submit this commit,
I've hit the case where `latest` image ships ceph 12.2.3 while the `stable-3.0`
(which is the image used for the second run) ships ceph 12.2.2.

The goal of this test is not to verify we can upgrade from a specific
version to another but to ensure handlers are working even if it's a valid
failure here.
It should be caught by a test dedicated to that usecase.

We just need to have a container image which has a different id for
the upstream CI, we need the same content in container imagebut a different
image id in the registry since the test relies on image id to decide whether
the container should be restarted.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-02-23 13:54:32 +01:00
Guillaume Abrioux 707458c979 ci: add tripleo scenario testing
This should help to see earlier any failure in a tripleo deployment scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-02-23 13:54:32 +01:00
Sébastien Han 7d690878df test: add test for containers resources changes
We change the ceph_mon_docker_memory_limit on the second run, this
should trigger a restart of services.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-02-14 02:01:29 +01:00
Sébastien Han 79864a8936 test: add test for restart on new container image
Since we have a task to test the handlers we can test a new container to
validate the service restart on a new container image.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-02-14 02:01:29 +01:00
Guillaume Abrioux deaf273b25 syntax: change local_action syntax
Use a nicer syntax for `local_action` tasks.
We used to have oneliner like this:
```
local_action: wait_for port=22 host={{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }} state=started delay=10 timeout=500 }}
```

The usual syntax:
```
    local_action:
      module: wait_for
      port: 22
      host: "{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}"
      state: started
      delay: 10
      timeout: 500
```
is nicer and kind of way to keep consistency regarding the whole
playbook.

This also fix a potential issue about missing quotation :

```
Traceback (most recent call last):
  File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 213, in <module>
    main()
  File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 185, in main
    rc, out, err = module.run_command(args, executable=executable, use_unsafe_shell=shell, encoding=None, data=stdin)
  File "/tmp/ansible_wQtWsi/ansible_modlib.zip/ansible/module_utils/basic.py", line 2710, in run_command
  File "/usr/lib64/python2.7/shlex.py", line 279, in split
    return list(lex)                                                                                                                                                                                                                                                                                                            File "/usr/lib64/python2.7/shlex.py", line 269, in next
    token = self.get_token()
  File "/usr/lib64/python2.7/shlex.py", line 96, in get_token
    raw = self.read_token()
  File "/usr/lib64/python2.7/shlex.py", line 172, in read_token
    raise ValueError, "No closing quotation"
ValueError: No closing quotation
```

writing `local_action: shell echo {{ fsid }} | tee {{ fetch_directory }}/ceph_cluster_uuid.conf`
can cause trouble because it's complaining with missing quotes, this fix solves this issue.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-01-31 10:45:34 +01:00
Andrew Schoen cfb75b8e29 tests: remove crush_device_class from lvm tests
The --crush-device-class flag for ceph-volume is not available in luminous so lets
remove this testing option for now until it's more widely available.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-01-18 15:03:38 +01:00
Andrew Schoen 64f5772140 tests: adds crush_device_class to lvm tests
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-01-17 13:49:29 +01:00
Sébastien Han 39f2bfd5d5 fix jewel scenarios on container
When deploying Jewel from master we still need to enable this code since
the container image has such check. This check still exists because
ceph-disk is not able to create a GPT label on a drive that does not
have one.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-12-20 13:43:19 +01:00
Guillaume Abrioux ab1dd3027a client: don't try to generate keys
the entrypoint to generate users keyring is `ceph-authtool`, therefore,
it can expand the `$(ceph-authtool --gen-print-key)` inside the
container. Users must generate a keyring themselves.
This commit also adds a check to ensure keyring are properly filled when
`user_config: true`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-12-14 17:22:07 +01:00
Guillaume Abrioux aa0b1ed118 tests: remove OSD_FORCE_ZAP variable from tests
according to ceph/ceph-container#840, this variable is no longer needed.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-11-14 17:55:01 +01:00
Sébastien Han d05206236c
Merge pull request #2124 from ceph/lvm-setup-fix
test: when creating the /dev/sdc2 partition specify label as gpt
2017-10-31 16:51:16 +01:00
Andrew Schoen 37a48209cc test: when creating the /dev/sdc2 partition specify label as gpt
ansible==2.4 requires that label be set to gpt, or it will be defaulted
to msdos.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2017-10-31 09:38:47 -05:00
Guillaume Abrioux c28882c1cd tests: add missing test for rbd
Add a missing test `test_rbd_mirror_service_is_running_from_luminous()`.
Also using bash -c "<cmd>" to make testinfra aware that later in
the upgrade process we are now running `luminous` ceph release so we
must skip the rbd tests related to `jewel` ceph release.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-10-30 19:44:56 +01:00
Guillaume Abrioux 97b1cb0258 tests: followup on testing against ansible2.4
ceph-ansible is now being testing against ansible2.2 and ansible2.4. We
need to update tox.ini so we use the right version of testinfra
regarding which ansible version we are using.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-10-30 16:40:39 +01:00
Sébastien Han b3c9de90f4 Merge pull request #2090 from ceph/ansible-2.4
[skip ci] Test ansible 2.4.1
2017-10-27 17:39:05 +02:00
Sébastien Han faccd0acf0 Merge pull request #2100 from ceph/lvm-bluestore
ceph-volume lvm bluestore support
2017-10-27 17:36:16 +02:00
Sébastien Han c4ad247718 Test ansible 2.4.1
We now test with Ansible 2.4. We had to change testinfra's version since
only recent versions work with 2.4. See:
https://github.com/philpep/testinfra/issues/249

Closes: https://github.com/ceph/ceph-ansible/issues/2087
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-27 15:20:13 +02:00
Major Hayden f73232caa4
Use check_mode instead of always_run
This patch changes the `always_run: yes` task option to
`check_mode: no` to avoid Ansible warnings.
2017-10-25 09:53:34 -05:00
Major Hayden c2b5118c1b
Revert "Avoid deprecated always_run"
This reverts commit 620fb37dd4.
2017-10-25 09:48:09 -05:00
Alfredo Deza 027d57dd29 tests create a bluestore osd scenario
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2017-10-25 06:46:39 -04:00
Sébastien Han a53aa9e8b4 ci: new osd scenarios
This commit add new osd scenarios, it aims to simplify the CI setup and
brings a better coverage on the OSD scenarios.
We decided to differentiate between filestore and bluestore, thinking
ahead when filestore won't be supported anymore.
So we now have two classes of tests:

* Filestore
* Bluestore

In each of those classes we have container and non-container.
Then for each we test the following:

* collocated
* collocated dmcrypt
* non-collocated
* non-collocated dmcrypt
* auto discovery collocated
* auto discovery collocated dmcrypt

This gives us a nice coverage and also reduces the footprint on the CI.
We are now up to 4 scenarios, each containing 6 OSD VMs.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-18 09:26:06 +02:00
Guillaume Abrioux 7ee9aa94b5 Merge pull request #1963 from ceph/pull-in-para
site-docker.yml try to fetch images in //
2017-10-13 19:35:11 +02:00
Sébastien Han 71d819620c mds: fix fs pool creation
1. add the variables to docker_collocation
2. trigger the check when a MDS is part of the inventory file, not when
we run on an MDS...

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-13 16:03:04 +02:00
Sébastien Han 90ce4276ca ci: use a container client VM
The client won't run on centos7 anymore but on Atomic host just like the
rest of the daemons.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-13 15:26:03 +02:00
Sébastien Han 3e058bff06 ci: reboot with ansible instead of vagrant reload
vagrant is serialized and takes a lot of time compare to simple reboot.
See the benchmarks below for 3 VMs:

[leseb@rick docker]$ time ANSIBLE_SSH_ARGS="-F
/home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/vagrant_ssh_config"  ansible-playbook -i /home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/hosts reboot.yml

PLAY [mons]
****************************************************************************************************************************************************************************************************

TASK [Gathering Facts]
*****************************************************************************************************************************************************************************************
ok: [mon1]
ok: [mon2]
ok: [mon0]

TASK [restart machine]
*****************************************************************************************************************************************************************************************
changed: [mon2]
changed: [mon1]
changed: [mon0]

TASK [wait for server to boot]
*********************************************************************************************************************************************************************************
ok: [mon2 -> localhost]
ok: [mon0 -> localhost]
ok: [mon1 -> localhost]

TASK [uptime]
**************************************************************************************************************************************************************************************************
changed: [mon2]
changed: [mon0]
changed: [mon1]

PLAY RECAP
*****************************************************************************************************************************************************************************************************
mon0                       : ok=4    changed=2    unreachable=0
failed=0
mon1                       : ok=4    changed=2    unreachable=0
failed=0
mon2                       : ok=4    changed=2    unreachable=0
failed=0

real    0m35.112s
user    0m5.737s
sys     0m1.849s

[leseb@rick docker]$ time vagrant reload
==> mon0: Halting domain...
==> mon0: Starting domain.
==> mon0: Waiting for domain to get an IP address...
==> mon0: Waiting for SSH to become available...
==> mon0: Creating shared folders metadata...
==> mon0: Rsyncing folder:
/home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/
=> /home/vagrant/sync
==> mon0: Machine already provisioned. Run `vagrant provision` or use
the `--provision`
==> mon0: flag to force provisioning. Provisioners marked to run always
will still run.
==> mon1: Halting domain...
==> mon1: Starting domain.
==> mon1: Waiting for domain to get an IP address...
==> mon1: Waiting for SSH to become available...
==> mon1: Creating shared folders metadata...
==> mon1: Rsyncing folder:
/home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/
=> /home/vagrant/sync
==> mon1: Machine already provisioned. Run `vagrant provision` or use
the `--provision`
==> mon1: flag to force provisioning. Provisioners marked to run always
will still run.
==> mon2: Halting domain...
==> mon2: Starting domain.
==> mon2: Waiting for domain to get an IP address...
==> mon2: Waiting for SSH to become available...
==> mon2: Creating shared folders metadata...
==> mon2: Rsyncing folder:
/home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/
=> /home/vagrant/sync
==> mon2: Machine already provisioned. Run `vagrant provision` or use
the `--provision`
==> mon2: flag to force provisioning. Provisioners marked to run always
will still run.

real    1m31.850s
user    0m7.387s
sys     0m0.796s

Reboot via Ansible: 0m35.112s
Reboot via vagrant: 1m31.850s

We save 1/3 time.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-13 09:04:26 +02:00
Guillaume Abrioux 17623a2157 Merge pull request #2036 from ceph/cephfs-pool
mds: precisely define cephfs pool
2017-10-12 17:47:10 +02:00
Sébastien Han 6bd152d555 Merge pull request #2037 from major/remove-always-run
Avoid deprecated always_run
2017-10-12 17:15:28 +02:00
Sébastien Han b49f9bda21 mds: precisely define cephfs pool
We now have a variable called ceph_pools that is mandatory when
deploying a MDS.
It's a dictionnary that contains a pool name and a PG count. PG count is
mandatory and must be set, the playbook will fail otherwise.

Closes: https://github.com/ceph/ceph-ansible/issues/2017
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-12 15:56:04 +02:00
Major Hayden 620fb37dd4
Avoid deprecated always_run
The `always_run` key is deprecated and being removed in Ansible 2.4.
Using it causes a warning to be displayed:

    [DEPRECATION WARNING]: always_run is deprecated.

This patch changes all instances of `always_run` to use the `always`
tag, which causes the task to run each time the playbook runs.
2017-10-12 08:29:44 -05:00
Guillaume Abrioux a179e312fd tests: add missing override for collocation scenario
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-10-12 14:43:25 +02:00
Guillaume Abrioux a2880e6345 tests: rbd/rgw adapt testinfra for jewel
- the rbd-mirror unit systemd name is not the same when running jewel vs
luminous.
- servicemap is not available on jewel.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-10-12 00:06:08 +02:00
Guillaume Abrioux a1ea6e7f59 tests: adapt current testing for collocation scenario
Since we introduced collocation testing scenario, we need to adapt
current tests to this new scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-10-09 17:25:45 +02:00
Sébastien Han 6d7b73fa91 ci: re-add osd_pool_default_size to 1 with the override
If we don't do this the client will create pools with a replica 3 since
osd_pool_default_size was gone in ceph-override.json. This was making
switch_to_containers failing.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-09 17:25:45 +02:00
Sébastien Han abb8c374cf ci: use by-id instead of by-path
by-id relies on the disk WWID which is more reliable then by-path
(pointing to the PCI info)

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-07 03:39:09 +02:00
Guillaume Abrioux 680ec8758e tests: skip tests for nfs nodes when release is jewel
nfs nodes are not deployed on jewel so we should skip the tests on them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-10-06 12:49:39 +02:00
Sébastien Han b6b24a5ca9 iscsi: fix wrong group name for iscsi
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1498490
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-05 17:25:32 +02:00
Guillaume Abrioux 53a69640c9 tests: disable shared folder
Shared folder is not required for tests.
We should avoid hitting the error :
```
uninitialized constant VagrantPlugins::ProviderLibvirt::Action::ShareFolders
```
Also, disabling it might reduce the needed time in certains cases for the VMs
to be started.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-10-05 15:07:38 +02:00
Guillaume Abrioux 6aa7050acd tests: make all subnet uniq per scenario
If two environments are using the same subnet, we will get trouble
because of ips addresses conflicts.
This commit ensures each scenario has a uniq subnet for both public and cluster
network so we can setup several test environment at a time on a same hypervisor.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-10-05 15:07:38 +02:00
Guillaume Abrioux 635111bf6a tests: add ceph-override.json for ubuntu/cluster
in addition to 18e2ab4d this commit adds the same file for ubuntu
testing scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-10-05 12:59:29 +02:00
Guillaume Abrioux 4135091c98 tests: fix broken osd test for xenial_cluster
the path `/dev/disk/by-path/pci-0000:00:01.1-ata-1.0` doesn't exist.
it has to be changed to `/dev/disk/by-path/pci-0000:00:01.1-ata-1`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-10-05 11:03:41 +02:00
Guillaume Abrioux cdb5023d84 tests: fix brokens tests for mds
5968cf0 broke the test on mds because of leftover.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-10-04 16:48:23 +02:00
Guillaume Abrioux 2c4258a0fd Refact code for set_osd_pool_default_*
This commit refacts the code regarding all `set_osd_pool_default_*`
related tasks by avoiding usage of useless `set_fact` to determine
whether a key is present in `ceph_conf_overrides`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2017-10-04 15:40:10 +02:00
Sébastien Han 5968cf09b1 ci: add collocation scenario
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-04 11:19:12 +02:00
Sébastien Han 3bd341f6c0 osd: container use id instead of dev name
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1494127
Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-03 14:44:00 +02:00
Sébastien Han 18e2ab4d07 test: add handler support
Add idempotency and handler test.

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-03 14:44:00 +02:00
Sébastien Han 39ee25637b test: add test for device with 'by-path'
We now test devices to be passed like:
/dev/disk/by-path/pci-0000:00:01.1-ata-1.0

Signed-off-by: Sébastien Han <seb@redhat.com>
2017-10-03 14:43:57 +02:00