Commit Graph

387 Commits (8c8ec636337e85efea5095ecd5b1fab17fdd51e6)

Author SHA1 Message Date
Guillaume Abrioux 7e0a70f7a8 switch_to_containers: do not try to redeploy monitors
`ceph-mon` tries to redeploy monitors because it assumes it was not yet
deployed since `mon_socket_stat` and `ceph_mon_container_stat` are
undefined (indeed, we stop the daemon before calling `ceph-mon` in the
switch_to_containers playbook).

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-02-13 09:42:27 +01:00
John Fulton 37b5d1084a Make python print statements python3 compatible
The restart_osd_daemon.sh generated from the j2 template
contains a python call which uses 'print x' instead of
'print(x)'. Add the missing parentheses to make this call
compatible with both 2 and 3.

Also add parentheses to other python print calls found
in roles/ceph-client/defaults/main.yml and
infrastructure-playbooks/cluster-os-migration.yml.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1671721
Signed-off-by: John Fulton <fulton@redhat.com>
2019-02-01 15:23:27 +00:00
Noah Watkins 9a43674d2e shrink_osd: use cv zap by fsid to remove parts/lvs
Fixes:
  https://bugzilla.redhat.com/show_bug.cgi?id=1569413
  https://bugzilla.redhat.com/show_bug.cgi?id=1572933

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2019-01-24 16:34:13 +01:00
Guillaume Abrioux edfdc49488 rolling_update: support multiple rgw instance
1ac94c048f introduced the support of
multiple rgw instances on a single host but somehow has missed to
implement this feature in rolling_update.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-01-22 13:45:38 +01:00
Giulio Fidente ff8dbe114c Preserve rolling_update backward compatibility with ansible < 2.5
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
2019-01-21 14:05:45 +01:00
guihecheng 1ac94c048f rgw: add support for multiple rgw instances on a single host
With this, we could have multiple rgw instances on a single host
with a single run, don't have to use rgw-standalone.yml which does not
seems able to bind ports separately.
If you want to have multiple rgw instances, just change 'radosgw_instances'
to the number you want, which defaults to 1.
Not compatible with Multi-Site yet.

Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>
2019-01-18 11:12:28 +01:00
Guillaume Abrioux 268f2cef82 update: do not enforce `serial: 1` on client nodes
There is no need to enforce `serial: 1` on client nodes.
Let's make it parameterizable by introducing a new *extra* variable
`client_update_batch`, if not filled this will default to `{{
ansible_forks }}`.

NOTE: this is only usable as an extra variable passed with
`-e client_update_batch=<num>`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650184

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-01-02 16:55:08 +00:00
Daniel-Pivonka ba149972be Example ceph_add_users_buckets playbook
This is example playbook will show how to bulk add rgw users and buckets

Signed-off-by: Daniel-Pivonka <dpivonka@redhat.com>
2018-12-20 14:23:25 +01:00
Guillaume Abrioux d7e77012ef retry on packages and repositories failures
add register/until on all packaging related tasks to avoid non valid CI
failure.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-12-19 14:48:27 +00:00
Noah Watkins 110049e825 playbook: report storage device inventory
Signed-off-by: Noah Watkins <nwatkins@redhat.com>
2018-12-18 10:51:31 +01:00
Andrew Schoen ffd56177e7 purge-cluster: skip tasks that use ceph-volume if it's not installed
This will allow the playbook to be idempotent.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1656935

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-12-13 11:27:27 +01:00
Guillaume Abrioux a12de3e048 purge-container: move facts gathering after ceph-defaults role import
This task has to be called after the role `ceph-defaults` has been
played, otherwise, `mon_group_name` will never be known.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-12-12 16:50:24 +00:00
Guillaume Abrioux d0b3cb7f85 purge-container: fix wrong syntax
we want a default value for `mon_group_name`, not for
`groups[mon_group_name]`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-12-12 11:33:57 +01:00
Guillaume Abrioux 0eb56e36f8 introduce new role ceph-facts
sometimes we play the whole role `ceph-defaults` just to access the
default value of some variables. It means we play the `facts.yml` part
in this role while it's not desired. Splitting this role will speedup
the playbook.

Closes: #3282

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-12-12 11:18:01 +01:00
Guillaume Abrioux ae7f3d66a6 purge-docker: do not call ceph-osd role
calling ceph-osd role in purge playbook is not needed.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-12-11 09:59:25 +01:00
Guillaume Abrioux 1a4a6ec855 purge: gather monitors facts in OSD purge
the OSD part of the purge delegates commands on monitor node, we need to
gather monitors facts to know the `ansible_hostname` fact that is used
in the `docker_exec_cmd` fact.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-12-11 09:59:25 +01:00
Sébastien Han 62111ff53c purge-container: gather fact before calling ceph-defaults
ceph-defaults relies on facts so we must gather facts before running it.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-12-11 09:59:25 +01:00
Sébastien Han fc6ebd8ebb purge-cluster: add support for mon/mgr collocation
Recently we introduced the default collocation of mon/mgr without the
need of a dedicated mgrs section. This means we have to stop the mgr
process on that machine too.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-12-11 09:59:25 +01:00
Sébastien Han 3a154fa0ad purge-cluster: remove support for other init system
We only support systemd and use the service module anyway.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-12-11 09:59:25 +01:00
Sébastien Han 325a159415 purge-docker-cluster: add support for mgr/mon collocation
Recently we introduced the collocation of mon and mgr by default, so we
don't need to have an explicit mgrs section for this. This means we have
to remove the mgr container on the mon machines too.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-12-11 09:59:25 +01:00
Sébastien Han 2bcc00896f purge-docker-cluste: add a task to check hosts
It's useful when running on CI to see what might remain on the machines.
So we list all the containers and images. We expect the list to be
empty.

We fail if we see containers running.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-12-11 09:59:25 +01:00
Sébastien Han 1751885bc9 purge-docker-cluster: add ceph-volume support
This commits adds the support for purging cluster that were deployed
with ceph-volume. It also separates nicely with a block intruction the
work to do when lvm is used or not.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-12-11 09:59:25 +01:00
Rishabh Dave 2fb12ae554 use pre_tasks and post_tasks when necessary
Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-12-05 08:17:10 +00:00
Rishabh Dave e4f0af2b78 don't use private option for import_role
Since sharing variables amongst roles has been made default since
Ansible 2.6, private option has been deprecated; so stop using it.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-12-04 23:45:59 +00:00
Ramana Raja cb784c601d rolling_update: fail if less than 3 MONs
... for non-containerized deployments as well.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1655470

Signed-off-by: Ramana Raja <rraja@redhat.com>
2018-12-04 14:28:49 +00:00
Sébastien Han 896676ee80 fix json data type
Json is a type structure which is always typed as a string, where before
this we were declaring a dict, which is not a json valid structure.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-12-04 12:34:54 +01:00
Guillaume Abrioux 78116fa6db purge: add iscsi support
add iscsi support for both non containerized and containerized
deployment in purge playbooks.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1651054

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-12-03 17:35:21 +01:00
Sébastien Han 1c760904b0 site: collocated mon and mgr by default
This will speed up the deployment and also deploy mon and mgr collocated
just as recommended.
This won't prevent you of adding more and dedicaded machines for mgr if
needed.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-12-03 14:39:43 +01:00
Sébastien Han bb7bfca113 rolling-update: remove old condition
This failure condition was only valid at the time where clusters didn't
have ceph-mgr activated. Now since we collocate the ceph-mgr with the
mon by default, if the daemon wasn't present it will be created during
the upgrade.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-12-03 14:39:43 +01:00
Guillaume Abrioux a952122c38 rolling_update: create missing keyring only on running mon
try to create the potentially missing keys only on monitors that are
actually running.
The current node being played is stopped before this task.
By the way, delegating the command on all nodes but the current node
being played ensures that the generated keys will be present on all
monitors.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-29 16:40:46 +00:00
Sébastien Han 61fb6972ec rolling_update: default ceph json output to empty dict
So we can avoid the following failure:

The conditional check 'hostvars[mon_host]['ansible_hostname'] in (ceph_health_raw.stdout | from_json)["quorum_names"] or hostvars[mon_host]['ansible_fqdn'] in (ceph_health_raw.stdout | from_json)["quorum_names"]
' failed. The error was: No JSON object could be decoded

We just need to set a default, the next iteration will have a more
complete json since the command won't fail.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-29 10:46:15 +00:00
Guillaume Abrioux 73287f91bc mgr: fix mgr keyring error on rolling_update
when upgrading from RHCS 2.5 to 3.2, it fails because the task `create
ceph mgr keyring(s) when mon is containerized` has a when condition
`inventory_hostname == groups[mon_group_name]|last`.
First, this is incorrect because `inventory_hostname` is referring to a
mgr node, it means this condition would have never been satisfied.
Then, this condition + `serial: 1` makes the mgr keyring creating skipped on
the first node. Further, the `ceph-mgr` role tries to copy the mgr
keyring (it's not aware we are running `serial: 1`) this leads to a
failure like the following:

```
TASK [ceph-mgr : copy ceph keyring(s) if needed] ***************************************************************************************************************************************************************************************************************************************************************************
task path: /usr/share/ceph-ansible/roles/ceph-mgr/tasks/common.yml:10
Tuesday 27 November 2018  12:03:34 +0000 (0:00:00.296)       0:11:01.290 ******
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AnsibleFileNotFound: Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'
failed: [magna021] (item={u'dest': u'/var/lib/ceph/mgr/local-magna021/keyring', u'name': u'/etc/ceph/local.mgr.magna021.keyring', u'copy_key': True}) => {"changed": false, "item": {"copy_key": true, "dest": "/var/lib/ceph/mgr/local-magna021/keyring", "name": "/etc/ceph/local.mgr.magna021.keyring"}, "msg": "Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'"}
```

The ceph_key module is idempotent, so there is no need to have such a
condition.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1649957

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-27 18:19:56 +01:00
Sébastien Han e5d5dffeb5 shrink-osd: add missing CEPH_BINARY
We need to add the right binary to do the docker exec.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-27 16:47:40 +00:00
Sébastien Han 4f57e44f9c defaults: declare container_binary
Always declare container_binary and assign it a correct value.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-27 16:47:40 +00:00
Sébastien Han 49e0e19056 rolling_update: update ceph_key task for container
Use the new way to create keys on containerized env as introduced by: 1098b71bda90db3dad19ac179f0ba900ccb0f953

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-27 16:47:40 +00:00
Sébastien Han 2814d36c93 infra playbooks: use the right container binary
Use podman or docker wether they are available or not. podman will be
prioritized over docker if present.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-27 16:47:40 +00:00
Guillaume Abrioux 7c99b6df6d update: fix a typo
`hostvars[groups[mon_host]]['ansible_hostname']` seems to be a typo.
That should be `hostvars[mon_host]['ansible_hostname']`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-26 18:22:20 +01:00
Guillaume Abrioux af78173584 rolling_update: refact set_fact `mon_host`
each monitor node should select another monitor which isn't itself.
Otherwise, one node in the monitor group won't set this fact and causes
failure.

Typical error:
```
TASK [create potentially missing keys (rbd and rbd-mirror) when mon is containerized] ***
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-update_docker_cluster/rolling_update.yml:200
Thursday 22 November 2018  14:02:30 +0000 (0:00:07.493)       0:02:50.005 *****
fatal: [mon1]: FAILED! => {}

MSG:

The task includes an option with an undefined variable. The error was: 'dict object' has no attribute u'mon2'
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-26 18:22:20 +01:00
Sébastien Han 4e267bee4f rolling_update: create rbd and rbd-mirror keyrings
During an upgrade ceph won't create keys that were not existing on the
previous version. So after the upgrade of let's Jewel to Luminous, once
all the monitors have the new version they should get or create the
keys. It's ok to have the task fails, especially for the rbd-mirror
key, which only appears in Nautilus.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650572
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-26 18:22:20 +01:00
Sébastien Han c14f9b78ff switch: do not look for devices anymore
It's easier lookup a directoriy instead of the block devices,
especially because of ceph-volume and ceph-disk have a different way to
handle devices.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-23 07:56:23 +00:00
Sébastien Han cd56dad9fa switch: disable all ceph units
Prior to this commit we were only disabling ceph-osd units, but forgot
the ceph.target which is controlling everything and will restart the
ceph-osd units at each reboot.
Now that everything gets disabled there won't be any conflicts between
the old non-container and the new container units.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-23 07:56:23 +00:00
Sébastien Han fe1d09925a switch: do not mask systemd unit
If we mask it we won't be able to start the OSD container since now the
osd container use the osd ID as a name such as: ceph-osd@0

Fixes the error:  Failed to execute operation: Cannot send after transport endpoint shutdown

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-23 07:56:23 +00:00
Guillaume Abrioux c783bc70da docker-common: rename role
rename `ceph-docker-common` role to `ceph-container-common`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-12 10:51:48 +01:00
Rishabh Dave 90f222f6a5 add quotes around package names added in da6f384
Add quotes around package names added in the commit
da6f384223 so that the difference between
the Ansible variables and package names is clear.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-11-09 12:59:08 +00:00
Rishabh Dave d72340abbe pass the list of packages to package management modules
Instead of looping over a list of packages or repeating the task
separately for different packages, pass the list of packages to the
task performing package management.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2018-11-09 12:59:08 +00:00
Sébastien Han 53910de43b ceph_key: add fetch_initial_keys capability
This is needed for Nautilus since the ceph-create-keys script goes away.
(https://github.com/ceph/ceph/pull/21305)
Now the module if called with 'state: fetch_initial_keys' will lookup
keys generated by the monitor and write them down on the filesystem to
the right location (/etc/ceph and /var/lib/ceph/boostrap*).
This is not applicable to container since keys are generated by the
container only.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-09 12:45:52 +01:00
Mike Christie b523a44a1a igw: stop tcmu-runner on iscsi purge
When the iscsi purge playbook is run we stop the gw and api daemons but
not tcmu-runner which I forgot on the previous PR.

Fixes Red Hat BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1621255

Signed-off-by: Mike Christie <mchristi@redhat.com>
2018-11-09 10:02:16 +01:00
Noah Watkins b848d2be4c don't use "role" or "roles" to include roles
see 3f62fc585f

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
2018-11-08 17:45:37 +01:00
Noah Watkins 9c47950961 Fix comments in shrink-osd-ceph-disk playbook
Signed-off-by: Noah Watkins <nwatkins@redhat.com>
2018-11-08 17:45:37 +01:00
Noah Watkins f5dacbf7de Add a ceph-volume aware shrink-osd playbook
Signed-off-by: Noah Watkins <nwatkins@redhat.com>
2018-11-08 17:45:37 +01:00