Commit Graph

4132 Commits (44afe1656802152c56de28cc4a6acbf0419485e8)
 

Author SHA1 Message Date
Guillaume Abrioux e8dd6b8993 tests: change default pools size
default pool size in our test should be explicitly set to 1

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-29 01:49:05 +00:00
Guillaume Abrioux 292d967d2f update: fix a typo
`hostvars[groups[mon_host]]['ansible_hostname']` seems to be a typo.
That should be `hostvars[mon_host]['ansible_hostname']`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7c99b6df6d)
2018-11-29 01:49:05 +00:00
Guillaume Abrioux 30cec03ae7 tests: do not fully override previous ceph_conf_overrides
We run an initial deployment with `osd_pool_default_size: 1` in
`ceph_conf_overrides`.
When re-running the playbook to test idempotency and handlers, we reset
`ceph_conf_overrides`, we must append a new value instead of just
overwritting it, otherwise, this can lead to error in the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f290e49df8)
2018-11-29 01:49:05 +00:00
Guillaume Abrioux 1f4cf61058 rolling_update: refact set_fact `mon_host`
each monitor node should select another monitor which isn't itself.
Otherwise, one node in the monitor group won't set this fact and causes
failure.

Typical error:
```
TASK [create potentially missing keys (rbd and rbd-mirror) when mon is containerized] ***
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-update_docker_cluster/rolling_update.yml:200
Thursday 22 November 2018  14:02:30 +0000 (0:00:07.493)       0:02:50.005 *****
fatal: [mon1]: FAILED! => {}

MSG:

The task includes an option with an undefined variable. The error was: 'dict object' has no attribute u'mon2'
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit af78173584)
2018-11-29 01:49:05 +00:00
Sébastien Han d4f1f12bd0 rolling_update: create rbd and rbd-mirror keyrings
During an upgrade ceph won't create keys that were not existing on the
previous version. So after the upgrade of let's Jewel to Luminous, once
all the monitors have the new version they should get or create the
keys. It's ok to have the task fails, especially for the rbd-mirror
key, which only appears in Nautilus.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650572
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 4e267bee4f)
2018-11-29 01:49:05 +00:00
Sébastien Han ee96454980 ceph_key: add a get_key function
When checking if a key exists we also have to ensure that the key exists
on the filesystem, the key can change on Ceph but still have an outdated
version on the filesystem. This solves this issue.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 691f373543)
2018-11-29 01:49:05 +00:00
Sébastien Han 26ea96424c switch: do not look for devices anymore
It's easier lookup a directoriy instead of the block devices,
especially because of ceph-volume and ceph-disk have a different way to
handle devices.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit c14f9b78ff)
2018-11-29 00:31:47 +01:00
Sébastien Han 57ac7b94c0 switch: disable all ceph units
Prior to this commit we were only disabling ceph-osd units, but forgot
the ceph.target which is controlling everything and will restart the
ceph-osd units at each reboot.
Now that everything gets disabled there won't be any conflicts between
the old non-container and the new container units.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit cd56dad9fa)
2018-11-29 00:31:47 +01:00
Sébastien Han 8d0379b4d9 switch: do not mask systemd unit
If we mask it we won't be able to start the OSD container since now the
osd container use the osd ID as a name such as: ceph-osd@0

Fixes the error:  Failed to execute operation: Cannot send after transport endpoint shutdown

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit fe1d09925a)
2018-11-29 00:31:47 +01:00
Sébastien Han 9b5a93e3a5 osd: re-introduce disk_list check
This commit
4cc1506303 (diff-51bbe3572e46e3b219ad726da44b64ebL13)
accidentally removed this check.

This is a must have for ceph-disk based containerized OSDs.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-29 00:31:13 +01:00
Guillaume Abrioux 659f2c60b5 validate: change default value for `radosgw_address`
change default value of `radosgw_address` to keep consistency with
`monitor_address`.
Moreover, `ceph-validate` checks if the value is '0.0.0.0' to determine
if it has to run `check_eth_rgw.yml`.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600227

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e4869ac8bd)
2018-11-28 23:54:06 +01:00
Guillaume Abrioux 968e6f5854 tests: rgw_multisite allow clusters to talk to each other
Adding this rule on the hypervisor will allow cluster to talk to each
other.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 96ce8761ba)
2018-11-28 23:53:58 +01:00
Guillaume Abrioux 133615471a tests: set pool size to 1 in ceph-override.json
setting this setting to 1 makes the CI covering the related code in the
playbook without breaking the upgrade scenarios.

Those scenarios were broken because there is a check `TASK [waiting for
clean pgs...]` in rolling_update.yml, since the pool size for
`cephfs_metadata` and `cephfs_data` are updated to `2` in
`ceph-override.json` and there is not enough osd to honor this size,
some PGs are degraded and make the mentioned check failing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3ac6619fb9)
2018-11-28 23:11:46 +01:00
Guillaume Abrioux 4cc1506303 osd: commonize start_osd code
since `ceph-volume` introduction, there is no need to split those tasks.

Let's refact this part of the code so it's clearer.

By the way, this was breaking rolling_update.yml when `openstack_config:
true` playbook because nothing ensured OSDs were started in ceph-osd role (In
`openstack_config.yml` there is a check ensuring all OSD are UP which was
obviously failing) and resulted with OSDs on the last OSD node not started
anyway.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f7fcc012e9)
2018-11-28 23:11:46 +01:00
Guillaume Abrioux b72d806f4c mgr: fix mgr keyring error on rolling_update
when upgrading from RHCS 2.5 to 3.2, it fails because the task `create
ceph mgr keyring(s) when mon is containerized` has a when condition
`inventory_hostname == groups[mon_group_name]|last`.
First, this is incorrect because `inventory_hostname` is referring to a
mgr node, it means this condition would have never been satisfied.
Then, this condition + `serial: 1` makes the mgr keyring creating skipped on
the first node. Further, the `ceph-mgr` role tries to copy the mgr
keyring (it's not aware we are running `serial: 1`) this leads to a
failure like the following:

```
TASK [ceph-mgr : copy ceph keyring(s) if needed] ***************************************************************************************************************************************************************************************************************************************************************************
task path: /usr/share/ceph-ansible/roles/ceph-mgr/tasks/common.yml:10
Tuesday 27 November 2018  12:03:34 +0000 (0:00:00.296)       0:11:01.290 ******
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AnsibleFileNotFound: Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'
failed: [magna021] (item={u'dest': u'/var/lib/ceph/mgr/local-magna021/keyring', u'name': u'/etc/ceph/local.mgr.magna021.keyring', u'copy_key': True}) => {"changed": false, "item": {"copy_key": true, "dest": "/var/lib/ceph/mgr/local-magna021/keyring", "name": "/etc/ceph/local.mgr.magna021.keyring"}, "msg": "Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'"}
```

The ceph_key module is idempotent, so there is no need to have such a
condition.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1649957

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 73287f91bc)
2018-11-28 23:11:46 +01:00
Guillaume Abrioux 3ead8a2586 tests: apply dev_setup on the secondary cluster for rgw_multisite
we must apply this playbook before deploying the secondary cluster.
Otherwise, there will be a mismatch between the two deployed cluster.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3d8f4e6304)
2018-11-28 12:56:57 +00:00
Sébastien Han 2fca8555cc handler: show unit logs on error
This will tremendously help debugging daemons that fail on restart by
showing the systemd unit logs.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit a9b337ba66)
2018-11-27 12:44:15 +00:00
Andrew Schoen 59524c7246 ceph-volume: be idempotent when the batch strategy changes
If you deploy with 2 HDDs and 1 SDD then each subsequent deploy both
HDD drives will be filtered out, because they're already used by ceph.
ceph-volume will report this as a 'strategy change' because the device
list went from a mixed type of HDD and SDD to a single type of only SDD.

This situation results in a non-zero exit code from ceph-volume. We want
to handle this situation gracefully and report that nothing will be changed.
A similar json structure to what would have been given by ceph-volume is
returned in the 'stdout' key.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650306

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit e13f32c1c5)
2018-11-27 00:23:21 +00:00
Guillaume Abrioux 1a1886a442 config: convert _osd_memory_target to int
ceph.conf doesn't accept float value.

Typical error seen:
```
$ sudo ceph daemon osd.2 config get osd_memory_target
Can't get admin socket path: unable to get conf option admin_socket for osd.2:
parse error setting 'osd_memory_target' to '7823740108,8' (strict_si_cast:
unit prefix not recognized)
```

This commit ensures the value inserted in ceph.conf will be an integer.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 68dde424f6)
2018-11-21 15:35:55 +00:00
Guillaume Abrioux abdc245ceb infra: don't restart firewalld if unit is masked
if firewalld.service systemd unit is masked, the handler will fail when
trying to restart it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650281

(cherry picked from commit 63b9835cbb)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-19 17:32:44 +01:00
Neha Ojha c96af4bac9 osd_memory_target: standardize unit and fix calculation
* The default value of osd_memory_target used by ceph is 4294967296 bytes,
so use the same as ceph-ansible default.

* Convert ansible_memtotal_mb to bytes to calculate osd_memory_target

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 10538e9a23)
2018-11-19 10:51:05 +00:00
Guillaume Abrioux f5d8701ed8 client: fix a typo in create_users_keys.yml
cd1e4ee024 introduced a typo.
This commit fixes it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 393ab94728)
2018-11-17 20:59:11 +00:00
Guillaume Abrioux 62d2ddafd4 validate: allow stable-3.2 to run with ansible 2.4
Although this is not officially supported, this commit allows
`stable-3.2` to run against ansible 2.4.
This should ease the transition in RHOSP.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-16 08:57:00 +00:00
Jason Dillaman 3b40e2bc87 igw: add support for IPv6
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 0aff0e9ede)

Conflicts:
	library/igw_purge.py: trivial resolution
	roles/ceph-iscsi-gw/library/igw_purge.py: trivial resolution
2018-11-13 17:35:58 +00:00
Mike Christie 702f2baccc igw: open iscsi target port
Open the port the iscsi target uses for iscsi traffic.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit 5ba7d1671e)
2018-11-12 10:46:41 +00:00
Mike Christie 44ee5c7495 igw: use api_port variable for firewall port setting
Don't hard code api port because it might be overridden by the user.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit e2f1f81de4)
2018-11-12 10:46:41 +00:00
Mike Christie db576f6f0e igw: fix firewall iscsi_group_name check
The firewall setup for igw is not getting setup because iscsi_group_name
does not it exist. It should be iscsi_gw_group_name.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit a4ff52842c)
2018-11-12 10:46:41 +00:00
Mike Christie c843ea1d92 igw: Fix default api port
The default igw api port is 5000 in the manual setup docs and
ceph-iscsi-config package so this syncs up ansible.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit a10853c5f8)
2018-11-12 10:46:41 +00:00
VasishtaShastry f17140c03d ceph-validate : Added functions to accept true and flase
ceph-validate used to throw error for setting flags as 'true' or 'false' for True and False
Now user can set the flags 'dmcrypt' and 'osd_auto_discovery' as 'true' or 'false'

Will fix - Bug 1638325

Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
(cherry picked from commit 098f42f233)
2018-11-09 16:47:57 +00:00
Rishabh Dave a74f4204cd remove configuration files for ceph packages on ubuntu clusters
For apt-get, purge command needs to be used, instead of remove command,
to remove related configuration files. Otherwise, packages might be
shown as installed while running dpkg command even after removing them.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1640061
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 640cad3fd8)
2018-11-09 16:50:25 +01:00
Mike Christie 77de54025b igw: stop tcmu-runner on iscsi purge
When the iscsi purge playbook is run we stop the gw and api daemons but
not tcmu-runner which I forgot on the previous PR.

Fixes Red Hat BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1621255

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit b523a44a1a)
2018-11-09 16:50:04 +01:00
Guillaume Abrioux 93cdbddd78 tests: test ooo_collocation agasint v3.0.3 ceph-container image
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 811f043947)
2018-11-09 16:48:35 +01:00
Sébastien Han 12ce311da5 rbd-mirror: enable ceph-rbd-mirror.target
Without this the daemon will never start after reboot.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit b7a791e902)
2018-11-09 16:48:35 +01:00
Andrew Schoen ee883aa9f2 validate: do not validate ceph_repository if deploying containers
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1630975

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 9cd8ecf0cc)
2018-11-09 15:14:40 +00:00
Guillaume Abrioux d5409109fb rgw: move multisite default variables in ceph-defaults
Move all rgw multisite variables in ceph-defaults so ceph-validate can
go through them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 17:41:35 +01:00
Guillaume Abrioux f52344300a tests: add more memory for rgw_multsite scenarios
Adding more memory to VMs for rgw_multisite scenarios could avoid this error
I have recently hit in the CI:

(It is worth it to set 1024Mb since there is only 2 nodes in those
scenarios.)

```
fatal: [osd0]: FAILED! => {
    "changed": false,
    "cmd": [
        "docker",
        "run",
        "--rm",
        "--entrypoint",
        "/usr/bin/ceph",
        "docker.io/ceph/daemon:latest-luminous",
        "--version"
    ],
    "delta": "0:00:04.799084",
    "end": "2018-10-29 17:10:39.136602",
    "rc": 1,
    "start": "2018-10-29 17:10:34.337518"
}

STDERR:

Traceback (most recent call last):
  File "/usr/bin/ceph", line 125, in <module>
    import rados
ImportError: libceph-common.so.0: cannot map zero-fill pages: Cannot allocate memory
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00
Guillaume Abrioux 547e90f281 rgw: move multisite related tasks after docker/main.yml
We must play this task after the container has started otherwise
rgw_multisite tasks will fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00
Guillaume Abrioux 710e11668d rgw: add rgw_multisite for containerized deployments
run commands on containers when containerized deployments.
(At the moment, all commands are run on the host only)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00
Guillaume Abrioux 37970a5b3c tests: add rgw_multisite functional test
Add a playbook that will upload a file on the master then try to get
info from the secondary node, this way we can check if the replication
is ok.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00
Guillaume Abrioux 4d464c1003 rgw: add testing scenario for rgw multisite
This will setup 2 cluster with rgw multisite enabled.
First cluster will act as the 'master', the 2nd will be the secondary
one.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00
Guillaume Abrioux fe88c89c9c validate: remove check on rgw_multisite_endpoint_addr definition
since `rgw_multisite_endpoint_addr` has a default value to
`{{ ansible_fqdn }}`, it shouldn't be mandatory to set this variable.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00
Ali Maredia 59e6d04f9b rgw: add ceph-validate tasks for multisite, other fixes
- updated README-MULTISITE
- re-added destroy.yml
- added tasks in ceph-validate to make sure the
rgw multisite vars are set

Signed-off-by: Ali Maredia <amaredia@redhat.com>
2018-10-30 14:00:28 +01:00
Guillaume Abrioux 77d5d128c3 rgw: add a dedicated variable for multisite endpoint
We should give users the possibility to set the IP they want as
multisite endpoint, setting the default value to `{{ ansible_fqdn }}` to
not force them to set this variable.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00
Ali Maredia 474f151450 rgw: update rgw multisite tasks
- remove destroy tasks
- cleanup conditionals and syntax
- remove unnecessary realm pulls
- enable multisite to be tested in automated
testing infra
- add multisite related vars to main.yml and
group_vars
- update README-MULTISITE
- ensure all `radosgw-admin` commands are being run
on a mon

Signed-off-by: Ali Maredia <amaredia@redhat.com>
2018-10-30 14:00:28 +01:00
Sébastien Han 9e87a5ae5e travis: add ansible-galaxy integration
This instructs Travis to notify Galaxy when a build completes. Since 3.0
the ansible-galaxy has the ability to build and push roles from repos
with multiple roles.

Closes: https://github.com/ceph/ceph-ansible/issues/3165
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-30 13:45:30 +01:00
Sébastien Han 49d4b65751 gitignore: add mergify and travis as exceptions
Git must notice changes from .travis.yml and .mergify.yml

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-30 13:45:30 +01:00
Sébastien Han b8a203bacf contrib: rm script push-roles-to-ansible-galaxy.sh
The script is not used anymore and soon Travis CI will do this job of
pushing the role into the galaxy.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-30 13:45:30 +01:00
Sébastien Han 0e659caf77 cleanup repos's root
Remove old files and move scripts to the contrib directory.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-30 10:37:48 +00:00
Maciej Naruszewicz 252d0f9cf2 ceph-volume: fix TypeError exception when setting osds-per-device > 1
osds-per-device needs to be passed to run_command as a string.
Otherwise, expandvars method will try to iterate over an integer.

Signed-off-by: Maciej Naruszewicz <maciej.naruszewicz@intel.com>
2018-10-29 21:56:37 +01:00
Sébastien Han 22aed97266 testinfra: change test osds for containers
We do not use  @<device> anymore so we don't need to perform the
readlink check anymore.

Also we are making an exception for ooo which is still using ceph-disk.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-29 18:31:17 +01:00