Commit Graph

3981 Commits (90c66a5848284fdc90e6e0993abb678f30440a49)
 

Author SHA1 Message Date
Sébastien Han 50be3fd9e8 test: remove osd_crush_location from shrink scenarios
This is not needed since this is already covered by docker_cluster and
centos_cluster scenarios.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-08-07 16:20:13 +00:00
Christian Berendt d168d81c16 docs: overall syntax improvements
* add some missing dots and ``
* add/remove line breaks
* consistent use of shell prompt in consoles outpus
* fix block indents Bearbeiten
* use code blocks

Signed-off-by: Christian Berendt <berendt@b1-systems.de>
2018-08-07 10:33:48 +00:00
Artur Fijalkowski 52d9d406b1 Fix in regular expression matching OSD ID on non-contenerized
deployment.
restart_osd_daemon.sh is used to discover and restart all OSDs on a
host. To do it the scripts loops the list of ceph-osd@ services in the
system. This commit fixes bug in the regular expression responsile for
extraction of OSDs - prior version uses `[0-9]{1,2}` expression
which is ignoring all OSDS which numbers are greater than 99 (thus
longer than 2 digits). Fix removed upper limit of digits in the number.
This problem existed in two places in the script.

Closes: #2964

Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com>
2018-08-06 15:53:49 +00:00
Guillaume Abrioux 1164cdc002 iscsigw: install ceph-iscsi-cli package
Install ceph-iscsi-cli in order to provide the `gwcli` command tool.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1602785

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-08-06 14:11:52 +02:00
Guillaume Abrioux 0a6ff6bbf8 defaults: backward compatibility with fqdn deployments
This commit ensures we are backward compatible with fqdn deployments.
Since ceph-container enforces deployment to be done with shortname, we
must keep backward compatibility with clusters already deployed with
fqdn configuration

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-08-06 10:14:58 +00:00
Sébastien Han ea9e60d48d config: enforce socket name
This was introduced by
59ee2e8d3b
and made our socket checks impossible to run. The PID could be found,
but the cctid cannot.
This happens during upgrade to mimic and on cluster running on mimic.

So let's force the admin socket the way it was so we can properly check
for existing instances also the line $cluster-$name.$pid.$cctid.asok
is only needed when running multiple instances of the same daemon,
thing ceph-ansible cannot do at the time of writing

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1610220
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-31 10:58:04 +02:00
Mike Christie 6f72f96dad igw: do not fail purge on rbd removal errors
Instead of failing the entire purge operation when the rbd command fails
just log an error. This will allow the higher level target and config
cleanup to complete, and the user only has to manually delete the rbd
images.

Signed-off-by: Mike Christie <mchristi@redhat.com>
2018-07-31 10:08:26 +02:00
Mike Christie d572a9a602 igw: fix image removal during purge
We were not passing in the ceph conf info into the rbd image removal
command, so if the clustername was not the default igw purge would fail
due to the rbd rm command failing.

This just fixes the bug by passing in the ceph conf info which has the
clustername to use.

This fixes Red Hat bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=1601949

Signed-off-by: Mike Christie <mchristi@redhat.com>
2018-07-31 10:08:26 +02:00
Sébastien Han b334cdcbe5 restapi: disable it when ceph version > luminous
ceph-rest-api binary has been removed in mimic so we cannot deploy it
anymore. We just keep the role and the compability for existing users.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-30 13:18:30 +00:00
Sébastien Han 2ca8c51906 osd: do not remove expose_partition container
The container runs with --rm which means it will be deleted by Docker
when exiting. Also 'docker rm -f' is not idempotent and returns 1 if the
container does not exist.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1609007
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-30 10:38:15 +02:00
Sébastien Han 1f341e69d1 site: report ceph -s status at the end of the deployment
We now show the output of 'ceph -s'. Example output below:

TASK [display post install message] **********************************************************************************************************************************************************************************************************
ok: [localhost] => {
    "msg": [
        "  cluster:",
        "    id:     753212df-f32a-4cc9-a097-2db6fe89a251",
        "    health: HEALTH_OK",
        " ",
        "  services:",
        "    mon: 1 daemons, quorum ceph-nano-lul-faa32aebf00b",
        "    mgr: ceph-nano-lul-faa32aebf00b(active)",
        "    osd: 1 osds: 1 up, 1 in",
        " ",
        "  data:",
        "    pools:   4 pools, 32 pgs",
        "    objects: 224 objects, 2546 bytes",
        "    usage:   1027 MB used, 9212 MB / 10240 MB avail",
        "    pgs:     32 active+clean",
        " "
    ]
}

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1602910
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-27 14:49:42 +00:00
Guillaume Abrioux 578aa5c2d5 tests: leave an OSD node in default crush root
jewel used to create a default `rbd` pool in the default crush root
`default`, we need to have at least 1 osd to satisfy the PGs for this
created pool, otherwise the cluster will be in HEALTH_ERR state because
of `pgs stuck unclean`/`pgs stuck inactive`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-26 18:47:10 +00:00
Guillaume Abrioux a1ca2c8fd3 iscsigw: do not run common roles when deploying jewel
Let's not deploy common roles when iscsigw nodes for jewel deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-26 18:47:10 +00:00
Guillaume Abrioux 1ecbbbdcfa rbd-mirror: bring back compatibility with jewel deployment
rbd-mirror can't start when deploying jewel because it needs admin
keyring.
Getting back this task brings backward compatibility for jewel
deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-26 18:47:10 +00:00
Guillaume Abrioux 053709da97 ceph-osds: backward compatibility with jewel for osp pools creation
If we want to be backward compatible with release prior to luminous, we
have to set the rule name accordingly to default values used in jewel.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-26 18:47:10 +00:00
Guillaume Abrioux 2597a557c5 client: fix an incorrect title in a task
This task would be run on both containerized *and* non containerized
deployment.
Let's have a proper title to avoid confusion.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-26 15:57:41 +02:00
Arata Notsu 2bbb4acca6 site.yml.sample: fix install python2
Check `systempython2.stat` instead of `systempython2.stat.exists`.

Without this change, in the case that python2 is not installed, the `stat`
task fails without defining `systempython2.stat`. It leads that the next
installation tasks fail because of undefined `systempython2.stat`.

An example error output (edited for readability):

```
TASK [check for python2] ***********************************************
Wednesday 25 July 2018  14:52:47 +0900 (0:00:00.182)       0:00:00.182 *
fatal: [ceph-osd1.vlan221.vtj]: FAILED! => {
"changed": false, "module_stderr": "/bin/sh: 1: /usr/bin/python: not
found\n", "module_stdout": "", "msg": "MODULE FAILURE", "rc": 127}
...ignoring

TASK [install python2 for debian based systems] ************************
Wednesday 25 July 2018  14:51:00 +0900 (0:00:01.742)       0:00:01.926 *
fatal: [ceph-mon2]: FAILED! => {
"msg": "The conditional check 'systempython2.stat.exists is undefined or
systempython2.stat.exists == false' failed. The error was: error while
evaluating conditional (systempython2.stat.exists is undefined or
systempython2.stat.exists == false): 'dict object' has no attribute 'stat'
\n\n The error appears to have been in
'/Users/arata/git/ceph-ansible/site.yml.sample': line 36, column 7, but
may\n be elsewhere in the file depending on the exact syntax problem.\n\n
The offending line appears to be:\n\n\n
    - name: install python2 for debian based systems\n
      ^ here\n
"}
...ignoring
```

Fixes: #2930
Signed-off-by: Arata Notsu <arata776@gmail.com>
2018-07-25 16:59:37 +00:00
Sébastien Han e2ea5bac51 rgw: add more config option for civetweb frontend
In containerized deployments we now inherite from the
radosgw_civetweb_options options when bootstrapping the container.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1582411
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-25 13:19:14 +00:00
Giulio Fidente e85e5ea781 Run creation of empty rados index object to first monitor
When distributing ceph-nfs role, creation of rados index object
fails as it assumes availability of client.admin locally.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1607970
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
2018-07-25 11:40:11 +02:00
Guillaume Abrioux 0640e2aa1e main: update requirements.txt
update requirements.txt accordingly to the last ansible version tested
on master.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-24 18:23:04 +02:00
Sébastien Han 235d1b3f55 validate: add checks for interfaces
Check if the interface provided:

* exists in the gathered facts (thus on the system)
* is active
* has an IP address (depending on ip_version )

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600227
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-24 17:59:30 +02:00
Sébastien Han b3266c5be2 rolling_update: set osd sortbitwise
upgrade RHCS 2 -> RHCS 3 will fail if cluster has still set
sortnibblewise,
it stay stuck on "TASK [waiting for clean pgs...]" as RHCS 3 osds will
not start if nibblewise is set.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-24 17:19:02 +02:00
Guillaume Abrioux 0a88bccf87 tests: followup on b89cc1746f
Update network subnets in group_vars/all

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-24 16:55:15 +02:00
Guillaume Abrioux b89cc1746f tests: do not deploy all daemons for shrink osds scenarios
Let's create a dedicated environment for these scenarios, there is no
need to deploy everything.
By the way, doing so will save some times.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-23 18:30:06 +02:00
Guillaume Abrioux af82e7523d tests: test master against ansible 2.6
Ansible 2.4 is currently end-of-life.
Ansible 2.5 will go end-of-life after Ansible 2.7 is released.

Fixes: #2901

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-23 11:59:15 +00:00
Guillaume Abrioux 0c863a3783 tests: add support of 'ooo-collocation' scenario when testing against ceph dev
The group_vars/all file is not available on 'ooo-collocation' scenario,
it's making the `dev_setup.yml` failing because this path is hardcoded.

The idea here is to check if the pattern 'ooo-collocation' is present in
`change_dir` variable so we can set this path properly according to the
scenario being run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-20 07:47:33 +02:00
Guillaume Abrioux d8281e50f1 tests: support update scenarios in test_rbd_mirror_is_up()
`test_rbd_mirror_is_up()` is failing on update scenarios because it
assumes the `ceph_stable_release` is still set to the value of the
original ceph release, it means it won't enter in the right part of the
condition and fails.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-20 07:46:41 +02:00
Guillaume Abrioux 651ee469f6 tests: stop hardcoding ansible version
In addition to ceph/ceph-build#1082

Let's set the ansible version in each ceph-ansible branch's respective
requirements.txt.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-19 23:10:37 +02:00
Sébastien Han 7fc13bc9d5 validate: only run osd test on osd node
Do not run device validation on every hosts, only on OSD nodes.

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-19 12:46:18 +00:00
Guillaume Abrioux 774980227a docs: update doc
- stable-2.1 and stable-2.2 shouldn't be referenced anymore.
- add stable-3.1 branch reference
- update the differents ansible version supported

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-18 17:49:44 +02:00
Sébastien Han ce1dd8d2b3 shrink-osd: purge osd on containerized deployment
Prior to this commit we were only stopping the container, but now we
also purge the devices.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1572933
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-18 14:26:22 +00:00
Sébastien Han cf01e596b6 valide: improve device check
We know make sure that:

* devices are actually block special files
* length of dedicated_device is identical to devices

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-18 14:26:22 +00:00
Guillaume Abrioux 05852b0301 tests: add latest-bis-jewel for jewel tests
since no latest-bis-jewel exists, it's using latest-bis which points to
ceph mimic. In our testing, using it for idempotency/handlers tests
means upgrading from jewel to mimic which is not what we want do.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-17 09:10:15 +00:00
Guillaume Abrioux 1a626d3c61 nfs: change default stable branch for nfs-ganesha repo
Since `V2.6-stable` is available and has packages for `mimic`, let's
update this default value accordingly so nfs nodes can be deployed with
mimic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-13 08:20:27 +00:00
Guillaume Abrioux cc71bb96cc tests: followup on #2656
34f70428 has introduced a fix using `command` module while this could
have been achieved by using `lvol` module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-13 07:55:14 +00:00
Sébastien Han e61ca882a1 validate: force ansible version
We currently only support Ansible 2.4.X so let's fail if the version is
different.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-13 07:52:56 +00:00
Guillaume Abrioux 5ef5fcd0b6 client: do not rely on copy_admin_key to import keys
Relying on `copy_admin_key` to import created keys on client nodes makes
us obliged to copy admin key on those nodes which is not something we might
want.
We should use the fact `condition_copy_admin_key` which will be set to
`True` when the delegated node is a mon which means we can import keys
without taking care of admin keyring.

Fixes: #2867

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-13 06:52:00 +00:00
Andy McCrae a1b3d5b7c3 Sync config_template with upstream for Ansible 2.6
The original_basename option in the copy module changed to be
_original_basename in Ansible 2.6+, this PR resyncs the config_template
module to allow this to work with both Ansible 2.6+ and before.

Additionally, this PR removes the _v1_config_template.py file, since
ceph-ansible no longer supports versions of Ansible before version 2,
and so we shouldn't continue to carry that code.

Closes: #2843
Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>
2018-07-12 21:07:41 +00:00
Guillaume Abrioux ce5ac930c5 mgr: fix condition to add modules to ceph-mgr
Follow up on #2784

We must check in the generated fact `_disabled_ceph_mgr_modules` to
enable disabled mgr module.
Otherwise, this task will be skipped because it's not comparing the
right list.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600155

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-12 21:04:01 +00:00
Sébastien Han c45195a18a Update issue templates
Introduce templates for issues and feature requests.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-12 14:10:15 +02:00
Guillaume Abrioux 9f54b3b4a7 mon: ensure socker is purged when mon is stopped
On containerized deployment, if a mon is stopped, the socket is not
purged and can cause failure when a cluster is redeployed after the
purge playbook has been run.

Typical error:

```
fatal: [osd0]: FAILED! => {}

MSG:

'dict object' has no attribute 'osd_pool_default_pg_num'
```

the fact is not set because of this previous failure earlier:

```
ok: [mon0] => {
    "changed": false,
    "cmd": "docker exec ceph-mon-mon0 ceph --cluster test daemon mon.mon0 config get osd_pool_default_pg_num",
    "delta": "0:00:00.217382",
    "end": "2018-07-09 22:25:53.155969",
    "failed_when_result": false,
    "rc": 22,
    "start": "2018-07-09 22:25:52.938587"
}

STDERR:

admin_socket: exception getting command descriptions: [Errno 111] Connection refused

MSG:

non-zero return code
```

This failure happens when the ceph-mon service is stopped, indeed, since
the socket isn't purged, it's a leftover which is confusing the process.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-10 20:08:07 +00:00
Guillaume Abrioux d0746e0858 common: switch from docker module to docker_container
As of ansible 2.4, `docker` module has been removed (was deprecated
since ansible 2.1).
We must switch to `docker_container` instead.

See: https://docs.ansible.com/ansible/latest/modules/docker_module.html#docker-module

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-10 20:08:07 +00:00
Guillaume Abrioux 9a65ec231d tests: fix `_get_osd_id_from_host()` in TestOSDs()
We must initialize `children` variable in `_get_osd_id_from_host()`,
otherwise, if for any reason the deployment has failed and result with
an osd host with no OSD registered, we won't enter in the condition,
therefore, `children` is never set and the function tries to return
something undefined.

Typical error:
```
E       UnboundLocalError: local variable 'children' referenced before assignment
```

Fixes: #2860

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-10 13:06:23 +00:00
Shilpa Jagannath 07852ed039 Remove zone from zonegroup and update period before deleting the zone to avoid inconsistent period information across other zones.
When you delete a zone without removing from zonegroup, the period update would
fail since that command needs to load the zone and zonegroup to be able to
update the master. Period update would fail with an error like this:

radosgw-admin period update --commit
-1 Cannot find zone id= (name=), switching to local zonegroup configuration
-1 Cannot find zone id= (name=)

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
2018-07-09 12:27:24 +00:00
Sébastien Han b9f7df7ba2 common: remove hdparm
As of Kraken, the journal code does not use the hdparm command anymore
so we can remove it from our package dependency list.

Fixes: https://github.com/ceph/ceph-ansible/issues/1402
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f6910efa24389c264062963b2054c7cd29ffebb3)
2018-07-07 08:53:47 +00:00
Guillaume Abrioux 103529c172 tox: test mimic deployment
Let's try to deploy mimic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d2cdfd830c)
2018-07-06 16:31:49 +00:00
Guillaume Abrioux b6d09b510f tests: refact ci testing master
We should test ceph-ansible against the latest ansible stable version on
master.

This commit also remove the pinning to 1.7.1 version of testinfra
because ansible 2.5 requires a newer version.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-06 16:31:49 +00:00
Guillaume Abrioux 09d795b5b7 tests: add mimic support for test_rbd_mirror_is_up()
prior mimic, the data structure returned by `ceph -s -f json` used to
gather information about rbd-mirror daemons looked like below:

```
  "servicemap": {
    "epoch": 8,
    "modified": "2018-07-05 13:21:06.207483",
    "services": {
      "rbd-mirror": {
        "daemons": {
          "summary": "",
          "ceph-nano-luminous-faa32aebf00b": {
            "start_epoch": 8,
            "start_stamp": "2018-07-05 13:21:04.668450",
            "gid": 14107,
            "addr": "172.17.0.2:0/2229952892",
            "metadata": {
              "arch": "x86_64",
              "ceph_version": "ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)",
              "cpu": "Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz",
              "distro": "centos",
              "distro_description": "CentOS Linux 7 (Core)",
              "distro_version": "7",
              "hostname": "ceph-nano-luminous-faa32aebf00b",
              "instance_id": "14107",
              "kernel_description": "#1 SMP Wed Mar 14 15:12:16 UTC 2018",
              "kernel_version": "4.9.87-linuxkit-aufs",
              "mem_swap_kb": "1048572",
              "mem_total_kb": "2046652",
              "os": "Linux"
            }
          }
        }
      }
    }
  }
```

This part has changed from mimic and became:
```
  "servicemap": {
    "epoch": 2,
    "modified": "2018-07-04 09:54:36.164786",
    "services": {
      "rbd-mirror": {
        "daemons": {
          "summary": "",
          "14151": {
            "start_epoch": 2,
            "start_stamp": "2018-07-04 09:54:35.541272",
            "gid": 14151,
            "addr": "192.168.1.80:0/240942528",
            "metadata": {
              "arch": "x86_64",
              "ceph_release": "mimic",
              "ceph_version": "ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)",
              "ceph_version_short": "13.2.0",
              "cpu": "Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz",
              "distro": "centos",
              "distro_description": "CentOS Linux 7 (Core)",
              "distro_version": "7",
              "hostname": "ceph-rbd-mirror0",
              "id": "ceph-rbd-mirror0",
              "instance_id": "14151",
              "kernel_description": "#1 SMP Wed May 9 18:05:47 UTC 2018",
              "kernel_version": "3.10.0-862.2.3.el7.x86_64",
              "mem_swap_kb": "1572860",
              "mem_total_kb": "1015548",
              "os": "Linux"
            }
          }
        }
      }
    }
  }
```

This patch modifies the function `test_rbd_mirror_is_up()` in
`test_rbd_mirror.py` so it works with `mimic` and keeps backward compatibility
with `luminous`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-07-06 14:39:13 +02:00
Sébastien Han 713b9fcf9b ceph-config: do not log cluster log on container
The container image recently merged both cluster and mon log into a
single stream. Following this, we now see this warning coming from the
container image:

2018-06-19 13:44:01.542990 7ff75b024700  1 mon.vm02@1(peon).log
v57928205 unable to write to '/var/log/ceph/ceph.log' for channel
'cluster': (2) No such file or directory

So we now tell the mon to not log cluster log on the filesystem.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1591771
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-05 15:11:45 +00:00
Sébastien Han fcf11ecc35 ceph-common: fix rhcs condition
We forgot to add mgr_group_name when checking for the mon repo, thus the
conditional on the next task was failing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1598185
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-07-04 17:17:21 +02:00