Commit Graph

290 Commits (50be3fd9e8c0944cdddbd88bc8287e65765b0c63)

Author SHA1 Message Date
Sébastien Han 50be3fd9e8 test: remove osd_crush_location from shrink scenarios
This is not needed since this is already covered by docker_cluster and
centos_cluster scenarios.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-07 16:20:13 +00:00
Guillaume Abrioux 578aa5c2d5 tests: leave an OSD node in default crush root
jewel used to create a default `rbd` pool in the default crush root
`default`, we need to have at least 1 osd to satisfy the PGs for this
created pool, otherwise the cluster will be in HEALTH_ERR state because
of `pgs stuck unclean`/`pgs stuck inactive`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-26 18:47:10 +00:00
Guillaume Abrioux 0a88bccf87 tests: followup on b89cc1746f
Update network subnets in group_vars/all

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-24 16:55:15 +02:00
Guillaume Abrioux b89cc1746f tests: do not deploy all daemons for shrink osds scenarios
Let's create a dedicated environment for these scenarios, there is no
need to deploy everything.
By the way, doing so will save some times.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-23 18:30:06 +02:00
Guillaume Abrioux 0c863a3783 tests: add support of 'ooo-collocation' scenario when testing against ceph dev
The group_vars/all file is not available on 'ooo-collocation' scenario,
it's making the `dev_setup.yml` failing because this path is hardcoded.

The idea here is to check if the pattern 'ooo-collocation' is present in
`change_dir` variable so we can set this path properly according to the
scenario being run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-20 07:47:33 +02:00
Guillaume Abrioux d8281e50f1 tests: support update scenarios in test_rbd_mirror_is_up()
`test_rbd_mirror_is_up()` is failing on update scenarios because it
assumes the `ceph_stable_release` is still set to the value of the
original ceph release, it means it won't enter in the right part of the
condition and fails.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-20 07:46:41 +02:00
Guillaume Abrioux cc71bb96cc tests: followup on #2656
34f70428 has introduced a fix using `command` module while this could
have been achieved by using `lvol` module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-13 07:55:14 +00:00
Guillaume Abrioux 9a65ec231d tests: fix `_get_osd_id_from_host()` in TestOSDs()
We must initialize `children` variable in `_get_osd_id_from_host()`,
otherwise, if for any reason the deployment has failed and result with
an osd host with no OSD registered, we won't enter in the condition,
therefore, `children` is never set and the function tries to return
something undefined.

Typical error:
```
E       UnboundLocalError: local variable 'children' referenced before assignment
```

Fixes: #2860

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-10 13:06:23 +00:00
Guillaume Abrioux 09d795b5b7 tests: add mimic support for test_rbd_mirror_is_up()
prior mimic, the data structure returned by `ceph -s -f json` used to
gather information about rbd-mirror daemons looked like below:

```
  "servicemap": {
    "epoch": 8,
    "modified": "2018-07-05 13:21:06.207483",
    "services": {
      "rbd-mirror": {
        "daemons": {
          "summary": "",
          "ceph-nano-luminous-faa32aebf00b": {
            "start_epoch": 8,
            "start_stamp": "2018-07-05 13:21:04.668450",
            "gid": 14107,
            "addr": "172.17.0.2:0/2229952892",
            "metadata": {
              "arch": "x86_64",
              "ceph_version": "ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)",
              "cpu": "Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz",
              "distro": "centos",
              "distro_description": "CentOS Linux 7 (Core)",
              "distro_version": "7",
              "hostname": "ceph-nano-luminous-faa32aebf00b",
              "instance_id": "14107",
              "kernel_description": "#1 SMP Wed Mar 14 15:12:16 UTC 2018",
              "kernel_version": "4.9.87-linuxkit-aufs",
              "mem_swap_kb": "1048572",
              "mem_total_kb": "2046652",
              "os": "Linux"
            }
          }
        }
      }
    }
  }
```

This part has changed from mimic and became:
```
  "servicemap": {
    "epoch": 2,
    "modified": "2018-07-04 09:54:36.164786",
    "services": {
      "rbd-mirror": {
        "daemons": {
          "summary": "",
          "14151": {
            "start_epoch": 2,
            "start_stamp": "2018-07-04 09:54:35.541272",
            "gid": 14151,
            "addr": "192.168.1.80:0/240942528",
            "metadata": {
              "arch": "x86_64",
              "ceph_release": "mimic",
              "ceph_version": "ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)",
              "ceph_version_short": "13.2.0",
              "cpu": "Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz",
              "distro": "centos",
              "distro_description": "CentOS Linux 7 (Core)",
              "distro_version": "7",
              "hostname": "ceph-rbd-mirror0",
              "id": "ceph-rbd-mirror0",
              "instance_id": "14151",
              "kernel_description": "#1 SMP Wed May 9 18:05:47 UTC 2018",
              "kernel_version": "3.10.0-862.2.3.el7.x86_64",
              "mem_swap_kb": "1572860",
              "mem_total_kb": "1015548",
              "os": "Linux"
            }
          }
        }
      }
    }
  }
```

This patch modifies the function `test_rbd_mirror_is_up()` in
`test_rbd_mirror.py` so it works with `mimic` and keeps backward compatibility
with `luminous`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-06 14:39:13 +02:00
Guillaume Abrioux f2e57a56db tests: factorize docker tests using docker_exec_cmd logic
avoid duplicating test unnecessarily just because of docker exec syntax.
Using the same logic than in the playbook with `docker_exec_cmd` allow us
to execute the same test on both containerized and non containerized environment.

The idea is to set a variable `docker_exec_cmd` with the
'docker exec <container-name>' string when containerized and
set it to '' when non containerized.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-27 07:00:14 +00:00
Guillaume Abrioux fe79a5d240 tests: refact test_all_*_osds_are_up_and_in
these tests are skipped on bluestore osds scenarios.
they were going to fail anyway since they are run on mon nodes and
`devices` is defined in inventory for each osd node. It means
`num_devices * num_osd_hosts` returns `0`.
The result is that the test expects to have 0 OSDs up.

The idea here is to move these tests so they are run on OSD nodes.
Each OSD node checks their respective OSD to be UP, if an OSD has 2
devices defined in `devices` variable, it means we are checking for 2
OSD to be up on that node, if each node has all its OSD up, we can say
all OSD are up.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-26 15:23:39 +00:00
Guillaume Abrioux 1c3dae4a90 tests: skip rgw_tuning_pools_are_set when rgw_create_pools is not defined
since ooo_collocation scenario is supposed to be the same scenario than the
one tested by OSP and they are not passing `rgw_create_pools` the test
`test_docker_rgw_tuning_pools_are_set` will fail:
```
>       pools = node["vars"]["rgw_create_pools"]
E       KeyError: 'rgw_create_pools'
```

skipping this test if `node["vars"]["rgw_create_pools"]` is not defined
fixes this failure.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-26 15:23:39 +00:00
Guillaume Abrioux f68936ca7e tests: fix *_has_correct_value tests
It might happen that the list of ips/hosts in following line (ceph.conf)
- `mon initial memebers = <hosts>`
- `mon host = <ips>`

are not ordered the same way depending on deployment.

This patch makes the tests looking for each ip or hostname in respective
lines.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-20 08:01:57 +02:00
Guillaume Abrioux 481c14455a tests: add more nodes in ooo testing scenario
adding more node in this scenario could help to have a better coverage
so we can catch more potential bugs.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-18 16:44:23 +02:00
Guillaume Abrioux 21894655a7 tests: keep same ceph release during handlers/idempotency test
since `latest` points to `mimic`, we need to force the test to keep the
same ceph release when testing anything else than `mimic`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-15 11:45:51 -04:00
Guillaume Abrioux bbb8691335 tests: increase memory to 1024Mb for centos7_cluster scenario
we see more and more failure like `fatal: [mon0]: UNREACHABLE! => {}` in
`centos7_cluster` scenario, Since we have 30Gb RAM on hypervisors, we
can give monitors a bit more RAM. By the way, nodes on containerized cluster
testing scenario have already 1024Mb memory allocated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-11 23:52:15 +08:00
Sébastien Han 6035978ed9 test: only on containerized iscsi
We don't have the same service running on non-container for now, this
will change soon but for let's only run the test on container.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-11 08:34:48 +02:00
Sébastien Han 20c8065e48 ceph-iscsi: rename group iscsi_gws
Let's try to avoid using dashes as testinfra needs to be able to read
the groups.
Typically, with iscsi-gws we can't add a marker for these iscsi nodes,
using an underscore fixes the issue.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-08 10:21:54 +02:00
Sébastien Han c00fb12497 ci: add functionnal tests for iscsi
We test if:

* packages are installed
* services are runnning
* service units are enabled

Also fix linting issues

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-08 10:21:54 +02:00
Sébastien Han 5ff2f03e3f ci: add iscsi test
Add iscsi CI coverage, this will now deploy iscsi gateways in container.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-08 10:21:54 +02:00
Guillaume Abrioux 28d21b4e9c tests: update ooo inventory hostfile
Update the inventory host for tripleo testing scenario so it's the same
parameters than in tripleo CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-07 17:26:35 +02:00
Guillaume Abrioux c94ada69e8 tests: improve mds tests
the expected number of mds daemon consist of number of daemons that are
'up' + number of daemons 'up:standby'.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-07 14:01:58 +08:00
Guillaume Abrioux f0cd4b0651 tests: skip disabling fastest mirror detection on atomic host
There is no need to execute this task on atomic hosts.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-05 15:39:37 +02:00
Guillaume Abrioux 47276764f7 tests: fix rgw tests
41b4632 has introduced a change in functionnals tests.
Since the admin keyring isn't copied on rgw nodes anymore in tests, let's use
the rgw keyring to achieve them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-05 15:24:32 +02:00
Sébastien Han 41b4632abc test: do not always copy admin key
The admin key must be copied on the osd nodes only when we test the
shrink scenario. Shrink relies on ceph-disk commands that require the
admin key on the node where it's being executed.

Now we only copy the key when running on the shrink-osd scenario.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-06-05 09:39:30 +02:00
Guillaume Abrioux 2cf06b515f rgw: refact rgw pools creation
Refact of 8704144e31
There is no need to have duplicated tasks for this. The rgw pools
creation should be delegated on a monitor node se we don't have to care
if the admin keyring is present on rgw node.
By the way, only one task is needed to create the pools, we just need to
use the `docker_exec_cmd` fact already defined in `ceph-defaults` to
achieve it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1550281

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-06-05 15:00:20 +08:00
Erwan Velu 493f615eae ceph-defaults: Enable local epel repository
During the tests, the remote epel repository is generating a lots of
errors leading to broken jobs (issue #2666)

This patch is about using a local repository instead of a random one.
To achieve that, we make a preliminary install of epel-release, remove
the metalink and enforce a baseurl to our local http mirror.

That should speed up the build process but also avoid the random errors
we face.

This patch is part of a patch series that tries to remove all possible yum failures.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2018-06-04 08:11:35 +02:00
jtudelag 600e1e2c26 rgws: renames create_pools variable with rgw_create_pools.
Renamed to be consistent with the role (rgw) and have a meaningful name.

Signed-off-by: Jorge Tudela <jtudelag@redhat.com>
2018-06-04 06:23:42 +02:00
jtudelag 8704144e31 Adds RGWs pool creation to containerized installation.
ceph command has to be executed from one of the monitor containers
if not admin copy present in RGWs. Task has to be delegated then.

Adds test to check proper RGW pool creation for Docker container scenarios.

Signed-off-by: Jorge Tudela <jtudelag@redhat.com>
2018-06-04 06:23:42 +02:00
Guillaume Abrioux c68126d6fd mdss: do not make pg_num a mandatory params
When playing ceph-mds role, mon nodes have set a fact with the default
pg num for osd pools, we can simply default to this value for cephfs
pools (`cephfs_pools` variable).

At the moment the variable definition for `cephfs_pools` looks like:

```
cephfs_pools:
  - { name: "{{ cephfs_data }}", pgs: "" }
  - { name: "{{ cephfs_metadata }}", pgs: "" }
```

and we have a task in `ceph-validate` to ensure `pgs` has been set to a
valid value.

We could simply avoid this check by setting the default value of `pgs`
to `hostvars[groups[mon_group_name][0]]['osd_pool_default_pg_num']` and
let to users the possibility to override this value.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1581164

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-30 16:20:34 +02:00
Guillaume Abrioux 34f7042852 tests: resize root partition when atomic host
For a few moment we can see failures in the CI for containerized
scenarios because VMs are running out of space at some point.

The default in the images used is to have only 3Gb for root partition
which doesn't sound like a lot.

Typical error seen:

```
STDERR:

failed to register layer: Error processing tar file(exit status 1): open /usr/share/zoneinfo/Atlantic/Canary: no space left on device
```

Indeed, on the machine we can see:
```
Every 2.0s: df -h                                                                                                                                                                                                                                       Tue May 29 17:21:13 2018
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/atomicos-root  3.0G  3.0G   14M 100% /
```

The idea here is to expand this partition with all the available space
remaining by issuing an `lvresize` followed by an `xfs_growfs`.

```
-bash-4.2# lvresize -l +100%FREE /dev/atomicos/root
  Size of logical volume atomicos/root changed from <2.93 GiB (750 extents) to 9.70 GiB (2484 extents).
  Logical volume atomicos/root successfully resized.
```

```
-bash-4.2# xfs_growfs /
meta-data=/dev/mapper/atomicos-root isize=512    agcount=4, agsize=192000 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=768000, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 768000 to 2543616
```

```
-bash-4.2# df -h
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/atomicos-root  9.7G  1.4G  8.4G  14% /
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-30 10:54:35 +02:00
Guillaume Abrioux 98cb6ed8f6 tests: avoid yum failures
In the CI we can see at many times failures like following:

`Failure talking to yum: Cannot find a valid baseurl for repo:
base/7/x86_64`

It seems the fastest mirror detection is sometimes counterproductive and
leads yum to fail.

This fix has been added in the `setup.yml`.
This playbook was used until now only just before playing `testinfra`
and could be used before running ceph-ansible so we can add some
provisionning tasks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Erwan Velu <evelu@redhat.com>
2018-05-28 22:04:35 +02:00
Guillaume Abrioux a10e73d78d tests: move cephfs_pools variable
let's move this variable in group_vars/all.yml in all testing scenarios
accordingly to this commit 1f15a81c48 so
we keep consistency between the playbook and the tests.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-24 09:39:38 -07:00
Guillaume Abrioux 564a662baf osds: move openstack pools creation in ceph-osd
When deploying a large number of OSD nodes it can be an issue because the
protection check [1] won't pass since it tries to create pools before all
OSDs are active.

The idea here is to move openstack pools creation at the end of `ceph-osd` role.

[1] e59258943b/src/mon/OSDMonitor.cc (L5673)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-05-24 09:39:38 -07:00
Luigi Toscano 43e96c1f98 ceph-radosgw: disable NSS PKI db when SSL is disabled
The NSS PKI database is needed only if radosgw_keystone_ssl
is explicitly set to true, otherwise the SSL integration is
not enabled.

It is worth noting that the PKI support was removed from Keystone
starting from the Ocata release, so some code paths should be
changed anyway.

Also, remove radosgw_keystone, which is not useful anymore.
This variable was used until fcba2c801a.
Now profiles drives the setting of rgw keystone *.

Signed-off-by: Luigi Toscano <ltoscano@redhat.com>
2018-05-23 23:24:09 -07:00
Guillaume Abrioux a68091c923 tests: update the type for the rule used in pools
As of ceph 12.2.5 the type of the parameter `type` is not a name anymore but
an id, therefore an `int` is expected otherwise it will fail with the
following error

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-04-30 08:15:18 +02:00
Sébastien Han 71efa2eaf4 ci: bump client nodes to 2
In order to test the key distribution is correct we must have 2 client
nodes.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-23 18:34:58 +02:00
Guillaume Abrioux 77831ccb7a tests: update tests for mds to cover multimds case
in case of multimds we must check for the number of mds up instead of
just checking if the hostname of the node is in the fsmap.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-04-12 18:20:58 +02:00
Sébastien Han 82589021e0 ci: fix tripleO scenario
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-11 12:18:34 +02:00
Sébastien Han 2011ec3bcd ci: client copy admin key
If we don't copy the admin key we can't add the key into ceph.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-11 12:18:34 +02:00
Sébastien Han cf73647e7a ci: remove useless tests
These are already handled by ceph-client/defaults/main.yml so the keys
will be created once user_config is set to True.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-11 12:18:34 +02:00
Andrew Schoen 98e237d234 tests: no need to remove partitions in lvm_setup.yml
Now that we are using ceph_volume_zap the partitions are
kept around and should be able to be reused.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-04-10 14:19:21 +02:00
Sébastien Han f3caee8460 ceph-iscsi: fix certificates generation and distribution
Prior to this patch, the certificates where being generated on a single
node only (because of the run_once: true). Thus certificates were not
distributed on all the gateway nodes.

This would require a second ansible run to work. This patches fix the
creation and keys's distribution on all the nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540845
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-04-04 09:27:39 +02:00
John Fulton e6e6bd078a Refer to expected-num-ojects as expected_num_objects, not size
Follow up patch to PR 2432 [1] which replaces "size" (sorry if
the original bug used that term, which can be confusing) with
expected_num_objects as is used in the Ceph documentation [2].

[1] https://github.com/ceph/ceph-ansible/pull/2432/files
[2] http://docs.ceph.com/docs/jewel/rados/operations/pools
2018-03-26 15:41:51 +02:00
Sébastien Han 3ab89ab48c ci: re-arrange group_vars files
We should stop putting everything in 'all'. This is too easy and this is
error prone as well for those who are separating variables into host
type, things that you should do.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han d5f8cac820 ci: remove left over iscsi_gws file
Wrong file that is not used, only iscsi-ggw that is present is correct.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han 8000ae342e remove unsed ceph_rgw_civetweb_port variable
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han f119b25bbe client: implement proper pools creation
Just like we did for the monitor and openstack_config we now have the
ability to precisely create pools.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han e302c1baae mon: add support for erasure code pool
You can now specify type: erasure and   erasure_profile to use when
declaring the pool dictionnary.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00
Sébastien Han 4806ff4ff8 ci: test pool creation on container
On containerized scenario we also want to test pool creation.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-03-14 14:22:00 +01:00