When using fqdn in inventory, that playbook fails because of some tasks
using the result of ceph osd tree (which returns shortname) to get
some datas in hostvars[].
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1779021
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6d9ca6b05b)
When using the ceph dashboard with iscsi gateways nodes we also need to
remove the nodes from the ceph dashboard list.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786686
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 931a842f21)
When an OSD is stopped, it leaves partitions mounted.
We must umount them before zapping them, otherwise error like "Device is
busy" will show up.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8056514134)
We only need to set `container_binary`.
Let's use `tasks_from` option.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0ae0a9ce28)
The command is delegated on the first monitor so we must use the fact
`container_binary` from this node.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 77b39d235b)
This commit prevent from shrinking an mds node when max_mds wouldn't be
honored after that operation.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2cfe5a04bf)
This commit adds a task to ensure device mappers are well closed when
lvm batch scenario is used.
Otherwise, OSDs can't be redeployed given that devices that are rejected
by ceph-volume because they are locked.
Adding a condition `devices | default([]) | length > 0` to remove these
dm only when using lvm batch scenario.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8e6ef818a2)
Otherwise, sometimes it can take a while for an OSD to be seen as down
and causes the `ceph osd purge` command to fail.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 51d601193e)
Do not use `--destroy` when zapping a device.
Otherwise, it destroys VGs while they are still needed to redeploy the
OSDs.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e3305e6bb6)
This commit adds the non containerized context support to the
filestore-to-bluestore.yml infrastructure playbook.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4833b85e04)
In containerized context, containers aren't stopped early in the
sequence.
It means they aren't restarted after the upgrade because the task is
just checking the daemon status is started (eg: `state: started`).
This commit also removes the task which ensure services are started
because it's already done in the role ceph-iscsigw.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c7708eb458)
when upgrading from RHCS 3, dashboard has obviously never been deployed
and it forces us to deploy it later manually.
This commit adds the dashboard deployment as part of the upgrade to
RHCS 4.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1779092
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 451c5ca934)
The podman support was added to the purge-container-cluster playbook but
containers are always used for the dashboard even on non containerized
deployment.
This commits adds the podman support on purging the dashboard resources
in the purge-cluster playbook.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 89f6cc54a2)
Since we now support podman, let's rename the playbook so it's more
generic.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7bc7e3669d)
If the new mon/osd node doesn't have python installed then we need to
execute the tasks from raw_install_python.yml.
Closes: #4368
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 34b03d1873)
When a container is already running on a non containerized node then the
umount ceph partition task is skipped.
This is due to the container ps command which always returns 0 even if
the filter matches nothing.
We should run the umount task when:
1/ the container command is failing (not installed) : rc != 0
2/ the container command reports running ceph-osd containers : rc == 0
Also we should not fail on the ceph directory listing.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 39cfe0aa65)
If the container binary is podman, we shouldn't try to stop docker here.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b18476a1a6)
in order to be able to call container_binary without having to run the
whole ceph-facts role.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fe5ffe589e)
All containers are removed when systemd stops them.
There is no need to call this module in purge container playbook.
This commit also removes all docker_image task and remove all container
images in the final cleanup play.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1776736
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d23383a820)
This is needed to avoid following error:
```
ERROR! The requested handler 'restart ceph mons' was not found in either the main handlers list nor in the listening handlers list
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1777829
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a43a872105)
let's use `client_group_name` instead of hardcoding the name.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7fe0d55eff)
We must import this role in the first play otherwise the first call to
`client_group_name`fails.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1777829
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6526a25ab5)
in containerized context, using the binary provided in atomic os won't
work because it's an old version provided by ceph-common based on
10.2.5.
Using a container could be an idea but for large cluster with hundreds
of client nodes, that would require to pull the image of each of them
just to unmap the rbd devices.
Let's use the sysfs method in order to avoid any issue related to ceph
version that is shipped on the host.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766064
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3cfcc7a105)
This commit adds a default value in the `with_dict` because when using
python 2.7, if a task using a `with_dict` has a condition, it is
evaluated anyway whereas in python 3 it isn't.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766499
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e9823f319b)
There's no need to use the default filter on active/standby groups
because if the group doesn't exist then the play is just skipped.
Currently this generates warnings like:
[WARNING]: Could not match supplied host pattern, ignoring: |
[WARNING]: Could not match supplied host pattern, ignoring: default([])
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2ca79fcc99)
The active mds host should be based on the inventory hostname and not on
the ansible hostname.
The value returns under the mdsmap structure is based on the OS hostname
so we need to find the right node in the inventory with this value when
doing operation on inventory nodes.
Othewise we could see error like:
The task includes an option with an undefined variable. The error was:
"hostvars[foobar]" is undefined
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f1f2352c79)
mon_host should use the inventory hostname and not the node hostname.
Fix creates an issue when the inventory and node hostname are different.
Closes: #4670
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 650bc0c3f0)
Without the become flag set to true, we can't executed the roles
successfully.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 77b212833e)
This must be consistent with what is used in `name` parameter.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d06057ebd2)
Let's skip this part of the code if there's no mds node in the
inventory.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5ec906c3af)
The ceph-container-engine role is missing from both playbooks so the
container engine (docker, podman) isn't install resulting in a failure
on the added nodes.
fatal: [xxxxx]: FAILED! => changed=false
cmd: docker --version
msg: '[Errno 2] No such file or directory'
rc: 2
Closes: #4634
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bfb1d6be12)
The [group|host]_vars directories are ignored for the dashboard playbook
when the inventory file directory doesn't contain those directories.
Closes: #4601
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1761612
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8426856262)
This commit makes the all_daemons scenario deploying 3 mds in order to
cover the multimds case.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 25b98b2ce3)
This commit fixes the standby_mdss group creation by using `{{ item }}`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c4fc8cc878)
The common roles don't need to be executed again on each group plays
(like mons, osds, etc..).
We only need to execute them during the first play. That wat, we will
apply the changes on all nodes in parallel instead of doing it once per
group.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 68a3dac7cd)
This is already done in the main playbooks but absent in the dashboard
playbook.
The facts are already gathered during the first play of the main
playbooks so we don't need to doing twice.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5ae7304ace)
When switching from a baremetal deployment to a containerized deployment
we only umount the OSD data partition.
If the OSD is encrypted (dmcrypt: true) then there's an additional
partition (part number 5) used for the lockbox and mount in the
/var/lib/ceph/osd-lockbox/ directory.
Because this partition isn't umount then the containerized OSD aren't
able to start. The partition is still mount by the system and can't be
remount from the container.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 19edf707a5)
This commit refacts the way we set `ceph_uid` fact in `ceph-facts` and
removes all `set_fact` tasks for `ceph_uid` in switch-to-containers playbook
to avoid duplicated code.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fa9b42e98e)
This commit excludes client nodes from facts gathering, they are not
needed and can speed up this task.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 865d2eac9b)
The block section were used with the dashboard_enabled condition when
the code was included in the main playbooks.
Because this condition isn't present in the dashboard playbook anymore
we can remove the block section.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cf47594b47)
This commit moves containerized deployment related files to `./tasks/`
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e08194dd67)
This commit moves containerized deployment related files to `./tasks/`
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c69816c6b7)
This commit moves containerized deployment related files to `./tasks/
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4636f3f7e2)
this commit adds a new playbook to force systemd units for containers to
use podman instead of docker.
This is needed in the rhel8 upgrade context so after the base OS is upgraded
containers can be started using podman.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f2017dcda2)
after all mon are upgraded, let's reset mon_host which is used in the
rest of the playbook for setting `container_exec_cmd` so we are sure to
use the right value.
Typical error:
```
failed: [mds0 -> mon0] (item={u'path': u'/var/lib/ceph/bootstrap-mds/ceph.keyring', u'name': u'client.bootstrap-mds', u'copy_key': True}) => changed=true
ansible_loop_var: item
cmd:
- docker
- exec
- ceph-mon-mon2
- ceph
- --cluster
- ceph
- auth
- get
- client.bootstrap-mds
delta: '0:00:00.016294'
end: '2019-09-27 13:54:58.828835'
item:
copy_key: true
name: client.bootstrap-mds
path: /var/lib/ceph/bootstrap-mds/ceph.keyring
msg: non-zero return code
rc: 1
start: '2019-09-27 13:54:58.812541'
stderr: 'Error response from daemon: No such container: ceph-mon-mon2'
stderr_lines: <omitted>
stdout: ''
stdout_lines: <omitted>
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d84160a170)
This change implements a filter_plugin that is used in the
ceph-facts, ceph-validate roles and infrastucture-playbooks.
The new filter plugin will return a list of all IP address
that reside in any one of the given IP ranges. The new filter
replaces the use of the ipaddr filter.
ceph.conf already support a comma separated list of CIDRs
for the public_network and cluster_network options.
Changes: [1] and [2] introduced a regression in ceph-ansible
where public_network can no longer be a comma separated list
of cidrs.
With this change a comma separated list of subnet CIDRs can
also be used for monitor_address_block and radosgw_address_block.
[1] commit: d67230b2a2
[2] commit: 20e4852888
Related-To: https://bugs.launchpad.net/tripleo/+bug/1840030
Related-To: https://bugzilla.redhat.com/show_bug.cgi?id=1740283Closes: #4333
Please backport to stable-4.0
Signed-off-by: Harald Jensås <hjensas@redhat.com>
(cherry picked from commit e695efcaf7)
The rolling_update.yml playbook fails when scanning ceph-disk osds while
deploying nautilus. The --force flag is required to scan existing osds
and rewrite their json metadata.
Signed-off-by: Sam Choraria <sam.choraria@bbc.co.uk>
(cherry picked from commit 7cc9f93680)
This playbook helps to migrate all osds on a node from filestore to
bluestore backend.
Note that *ALL* osd on the specified osd nodes will be shrinked and
redeployed.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3f9ccdaa8a)
the confirmation play's name should confirm removing rgw instead of
monitor
Signed-off-by: Mehdy Khoshnoody <mehdy.khoshnoody@gmail.com>
(cherry picked from commit 9fa98d79fd)
If we're looking at the mon hostname in the ceph status output then
there's some scenarios where this could be true.
If we collocate some services (mons, mgrs, etc..) then the hostname of
the monitor to shrink will still be present in the ceph status (like
in mgrs or other).
Instead we should check the hostame only in the mon part of the output.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 734c0dc310)
By changing the set ownership command from using the file module in combination with a with_items loop to a raw chown command, we can achieve a 98% performance increase here.
On a ceph cluster with a significant amount of directories and files in /var/lib/ceph, the file module has to run checks on ownership of all those directories and files to determine whether a change is needed.
In this case, we just want to explicitly set the ownership of all these directories and files to the ceph_uid
Added context note to all set proper ownership tasks
Signed-off-by: Kevin Jones <kevinjones@redhat.com>
(cherry picked from commit 47bf47c9d8)
use `from_json` filter instead of a `| python` so we can get rid of the
`shell` module usage here.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5573f17e76)
in order to use the right binary name when using python cli in command
or shell module.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 13815ad3ca)
We don't need to execute the ceph-dashboard role on the nodes present
in the grafana-server group. This one is dedicated to the grafana and
prometheus stack.
The ceph-dashboard needs to executed where the ceph-mgr is running. It
is either on the dedicated mgr nodes or if mgr and mon are collocated
implicitly on the mon nodes.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 16939eff9e)
Add a playbook named shrink-rgw.yml to infrastructure-playbooks/ that
can remove a RGW from a node in an already deployed Ceph cluster.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 632a44bdf2)
Use facility built-in in Ansible to check whether a command was executed
successfully rather looking at its return value.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 5aecdd3ba6)
There's no need to add complexity and trying to fallback on other group.
Let's deploy dashboard on all nodes present in grafana-server group.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d67230b2a2)
Move dashboard, grafana/prometheus and node-exporter plays into a
dedicated playbook in infrastructure-playbook directory.
To avoid using 'dashboard_enabled | bool' condition multiple time
in the main playbook we can just import the dashboard playbook or
not.
This patch also allows to use an unique dashboard playbook for
both baremetal and container playbooks.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 43135840b1)
Some NBSP are still present in the yaml files.
Adding a test in travis CI.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 07c6695d16)
This commits adds a check to ensure the daemon has been removed from the
cluster.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 916dc1f52f)
Add a playbook named "shrink-rbdmirror.yml" in infrastructure-playbooks/
that removes a RBD Mirror from a node in an already deployed Ceph
cluster.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit c4824acb19)
Add a playbook, named "shrink-mgr.yml", in infrastructure-playbooks/
that removes a MGR from a node in an already deployed Ceph cluster.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit f4ea75051b)
This commit refacts the way we check the "mds_to_kill" node is well
stopped.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 7df62fde34)
Add a playbook, named "shrink-mds.yml", in infrastructure-playbooks/
that removes a MDS from a node in an already deployed Ceph cluster.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 235b1fccc6)
The ceph-iscsi-config and ceph-iscsi-cli packages were combined into
ceph-iscsi and its APIs changed. This fixes up the iscsi purge task to
support the new API and old one.
Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit b163206db7)
This tries to first unmount any cephfs/nfs-ganesha mount point on client
nodes, then unmap any mapped rbd devices and finally it tries to remove
ceph kernel modules.
If it fails it means some resources are still busy and should be cleaned
manually before continuing to purge the cluster.
This is done early in the playbook so the cluster stays untouched until
everything is ready for that operation, otherwise if you try to redeploy
a cluster it could end up by getting confused by leftover from previous
deployment.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1337915
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 20e4852888)
3a100cfa52 introduced a check which is a
bit too restrictive, let's accept HEALTH_OK and HEALTH_WARN.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6dce51183b)
The ceph restapi configuration was only available until Luminous
release so we don't need those leftovers for nautilus+.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit da8b7ab7fb)
starting an upgrade if the cluster isn't HEALTH_OK isn't a good idea.
Let's check for the cluster status before trying to upgrade.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3a100cfa52)
Otherwise it fails like following:
```
fatal: [mon0]: FAILED! => changed=false
msg: |-
Unable to enable service ceph-mgr@mon0: Failed to execute operation: Cannot send after transport endpoint shutdown
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 51b2813e04)
The ceph-agent role was used only for RHCS 2 (jewel) so it's not
usefull anymore.
The current code will fail on CentOS distribution because the rhscon
package is only avaible on Red Hat with the RHCS 2 repository and
this ceph release is supported on stable-3.0 branch.
Resolves: #4020
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7503098ca0)
By running ceph-ansible there are a lot ``[DEPRECATION WARNING]`` like these:
```
[DEPRECATION WARNING]: evaluating containerized_deployment as a bare variable,
this behaviour will go away and you might need to add |bool to the expression
in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This
feature will be removed in version 2.12. Deprecation warnings can be disabled
by setting deprecation_warnings=False in ansible.cfg.
```
Now appended ``| bool`` on a lot of the affected variables.
Sometimes the coding style from ``variable|bool`` changed to ``variable | bool`` *(with spaces at the pipe)*.
Closes: #4022
Signed-off-by: L3D <l3d@c3woc.de>
(cherry picked from commit ab54fe20ec)
We currently only purge rh_storage yum repository file but depending
on the ceph_repository value we are using, the ceph repository file
could have a different name.
Resolves: #4056
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 44c63903ca)
This commit splits the current `ceph-container-common` role.
This introduces a new role `ceph-container-engine` which handles the
tasks specific to the installation of containers tools (docker/podman).
This is needed for the ceph-dashboard implementation for 2 main reasons:
1/ Since the ceph-dashboard stack is only containerized, we must install
everything needed to run containers even in non containerized
deployments. Splitting this role allows us to not have to call the full
`ceph-container-common` role which would run a bunch of unneeded tasks
that would have been skipped anyway.
2/ The current implementation would have required to run
`ceph-container-common` on all ceph-clients nodes which would have been
conflicting with 9d3517c670 (we don't want
to run ceph-container-common on all client nodes, see mentioned commit
for more details)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 55420d6253)
Because we don't manage the docker service on atomic (yet) via the
ceph-container-common role then we can't stop docker dans remove
the data.
For now let's do that only for non atomic hosts.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 638604929b)
This commit renames the `docker_exec_cmd` variable to
`container_exec_cmd` so it's more generic.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e74d80e72f)
We were failing when that variable is unset; purge-cluster.yml contains
this workaround.
Signed-off-by: Zack Cerza <zack@redhat.com>
(cherry picked from commit 9b4339a2ba)
This commit will merge dashboard-ansible installation scripts with
ceph-ansible. This includes several new roles to setup ceph-dashboard
and the underlying technologies like prometheus and grafana server.
Signed-off-by: Boris Ranto & Zack Cerza <team-gmeno@redhat.com>
Co-authored-by: Zack Cerza <zcerza@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2f141a6e80)
We never clean the content of /var/lib/docker so we can still have
some data present in this directory after run the purge playbook.
Pip isn't used anymore.
Also update the docker package name (especially the python binding
one).
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 168d7cd016)
The shell module doesn't have a stdout_lines attributes. Instead of
using the shell module, we can use the find modules.
Also adding `become: false` to the local tmp directory creation
otherwise we won't have enough right to fetch the files into this
directory.
Resolves: #3966
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ea1f8f551c)
We must stop tcmu-runner after the other rbd-target-* services
because they may need to interact with tcmu-runner during shutdown.
There is also a bug in some kernels where IO can get stuck in the
kernel and by stopping rbd-target-* first we can make sure all IO is
flushed.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1659611
Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit d7ef12910e)
We don't need infrastructure-playbooks/rgw-standalone.yml since
site.yml.sample and site-cotainer.yml.sample can add a new RGW node to
an already deployed Ceph cluster.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 6e8fb2b3ea)
Keywords requiring only one item shouldn't express it by creating a
list with single item.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 739a662c80)
Conflicts:
roles/ceph-mon/tasks/ceph_keys.yml
roles/ceph-validate/tasks/check_devices.yml
Currently only rbd-target-gw service is restarted during an update.
We also need to restart tcmu-runner and rbd-target-api services
during the ceph iscsi upgrade.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1659611
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f1048627ea)
We do this so that the ceph-config role can most accurately
report the number of osds for the generation of the ceph.conf
file.
We don't want to use ceph-volume to determine the number of
osds because in an upgrade to nautilus ceph-volume won't be able to
accurately count osds created by ceph-disk.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 67453853ff)
When upgrading to nautlius run ``ceph-volume simple scan`` and
``ceph-volume simple activate --all`` to migrate any running
ceph-disk osds to ceph-volume.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1656460
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 28c47e4d1b)
These tasks must be run from a monitor which is upgraded otherwise it
might fail.
See: https://tracker.ceph.com/issues/39355
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7eb42c9e8e)
these commands could return something else than 0.
Let's ensure all retries have been done before actually failing.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ed84325b1d)
Add a playbook that deploys a new monitor on a new node, adds that node
to the Ceph cluster and the monitor to the quorum and updates the ceph
configuration file on OSD nodes.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit d5967af7fb)
When using purge-cluster playbook with nautilus, there's still the
python-ceph-argparse package installed on the host preventing to
reinstall a ceph cluster with a different version (like luminous or
mimic)
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit eb658b3af6)
e6bfb84 introduced a regression in the switch from non containerized
to container deployment.
We need to stop all previous OSDs services. We just don't need the
ceph-disk pattern in the regex.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 150acba8c5)
as of stable-4.0, ceph-disk is no longer supported.
These tasks aren't needed anymore.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a1254d767c)
as of stable-4.0, ceph-disk is no longer supported.
Let's remove this legacy version of the playbook.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 73aa788459)
As of stable-4.0, the only valid scenario is `lvm`.
Thus, this makes this variable useless.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4d35e9eeed)
if mgr group isn't defined in inventory, that task will fail with
undefined error.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c1e4529b0e)
ceph aliases have been introduced in stable-3.2 during the ceph
deployment. On master this has been removed but we don't handle
this removal in the upgrade from stable-3.2 to master via the
rolling_update playbook.
Also remove the task from purge-docker-cluster missing from
d9e7835
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 57b4e76d11)
`lvm_volumes` and/or `devices` variable(s) can be undefined depending on
the scenario chosen.
These tasks should be run only if these variable are defined, otherwise
it ends up with undefined variable errors.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0180738313)
Similar to #3658
Since there's too many changes between master and stable branches let's
commit directly in each branches instead of trying to backport this
commit.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
all rgw instances should be stopped according to the multiple rgw
instances support added in rolling_update.yml
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add a couple of fixes to allow containerized deployments upgrade support
to upgrade from luminous/mimic to nautilus.
- pass CEPH_CONTAINER_IMAGE and CEPH_CONTAINER_BINARY environment
variable to the ceph_key module,
- fix the docker exec command in 'waiting for the containerized monitor
to join the quorum' task according to the `delegate_to` parameter,
- override `docker_exec_cmd` in `ceph-facts` with `mon_host` when
rolling_update is `True`,
- do not run unnecessarily `create_mds_filesystems.yml` when performing an
upgrade.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
iscsigws were missing.
The 'complete upgrade' couldn't complete because rolling_update was set
to False for iscsigw nodes.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
once the cluster is upgraded to nautilus, we can complete the process by
disallowing pre-nautilus OSDs and enabling all new nautilus-only functionality
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
As of 1c760904b0, ceph-ansible implicitly
bootstrap managers on monitors.
mgrs must be upgraded only after all monitors, therefore, this commit
refact the way mgrs are upgraded to be sure we don't upgrade a mgr
during the monitors upgrade.
This commit also ensure we handle the case were we split managers on
dedicated nodes.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This directory is created by ceph-config node by node.
In the upgrade context we need it to be created on ALL monitors as soon
as the first iteration because of the task right after which creates and sends
the keyrings on all monitors.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This prevents the packaging from restarting services before we do need
to restart them in the rolling update sequence.
We want to handle services restart at rolling_update playbook.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
There is no need to set osd flags (noout, norebalance) each time we
upgrade a mon.
This commit moves up those tasks (before stopping the mon) so we don't need
to delegate them.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We actually want to ensure the node being upgraded is joining the quorum
instead of the monitor picked up earlier.
Indeed, the `mon_host`is used only in `delegate_to:` so we can still run ceph
commands while the monitor being upgraded is stopped.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
the `containerized` parameter in ceph_key module doesn't exist anymore.
This was making the module failing but was hidden because of the
`ignore_errors: True`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
otherwise, it will end up with error like following:
```
FAILED! => {"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_hostname'"}
```
because facts won't have been gathered.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1670663
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
as of b70d54ac80 the service launched isn't
ceph-rbd-mirror@admin.service.
it's now `ceph-rbd-mirror@rbd-mirror.{{ ansible_hostname }}`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
removing the content of this directory seems a bit agressive and cause a
redeployment to fail after a purge on debian based distrubition.
Typical error:
```
fatal: [mon0]: FAILED! => changed=false
attempts: 3
msg: No package matching 'ceph' is available
```
The following task will consider the cache is still valid, so apt
doesn't refresh it:
```
- name: update apt cache if cache_valid_time has expired
apt:
update_cache: yes
cache_valid_time: 3600
register: result
until: result is succeeded
```
since the task installing ceph packages has a `update_cache: no` it
fails:
```
- name: install ceph for debian
apt:
name: "{{ debian_ceph_pkgs | unique }}"
update_cache: no
state: "{{ (upgrade_ceph_packages|bool) | ternary('latest','present') }}"
default_release: "{{ ceph_stable_release_uca | default('') }}{{ ansible_distribution_release ~ '-backports' if ceph_origin == 'distro' and ceph_use_distro_backports else '' }}"
register: result
until: result is succeeded
```
/tmp/* isn't specific to ceph as well, so we shouldn't remove everything
in this directory.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
using `shell` module seems to be the only way to make this task working
on rhel based distribution AND debian based distributions.
on ubuntu, using `command` ansible module fails like following
(not due to `sudo` usage or not):
```
ok: [osd1] => changed=false
cmd: command -v ceph-volume
failed_when_result: false
msg: '[Errno 2] No such file or directory: ''command'': ''command'''
rc: 2
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
remove old systemd unit files (non-containerized) during the
switch_to_containers transition.
We have seen sometimes the unit started is the old one instead of the
new systemd unit generated.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
use the ceph binary from the container instead of the host.
If the ceph CLI version isn't compatible between host and container
image, it can cause the CLI to hang.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
`ceph-mon` tries to redeploy monitors because it assumes it was not yet
deployed since `mon_socket_stat` and `ceph_mon_container_stat` are
undefined (indeed, we stop the daemon before calling `ceph-mon` in the
switch_to_containers playbook).
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The restart_osd_daemon.sh generated from the j2 template
contains a python call which uses 'print x' instead of
'print(x)'. Add the missing parentheses to make this call
compatible with both 2 and 3.
Also add parentheses to other python print calls found
in roles/ceph-client/defaults/main.yml and
infrastructure-playbooks/cluster-os-migration.yml.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1671721
Signed-off-by: John Fulton <fulton@redhat.com>
1ac94c048f introduced the support of
multiple rgw instances on a single host but somehow has missed to
implement this feature in rolling_update.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
With this, we could have multiple rgw instances on a single host
with a single run, don't have to use rgw-standalone.yml which does not
seems able to bind ports separately.
If you want to have multiple rgw instances, just change 'radosgw_instances'
to the number you want, which defaults to 1.
Not compatible with Multi-Site yet.
Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>
There is no need to enforce `serial: 1` on client nodes.
Let's make it parameterizable by introducing a new *extra* variable
`client_update_batch`, if not filled this will default to `{{
ansible_forks }}`.
NOTE: this is only usable as an extra variable passed with
`-e client_update_batch=<num>`
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650184
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This task has to be called after the role `ceph-defaults` has been
played, otherwise, `mon_group_name` will never be known.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>