when `dashboard_enabled` is `True`, let's append `dashboard` and
`prometheus` modules to `ceph_mgr_modules` so they are automatically
loaded.
Closes: #4026
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a2b6f44665)
As the bz1721914 describes, the grafana_server_addr
fact is not defined if ip_version used is ipv6.
This commit adds the ip_version condition to set
correctly this fact.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1721914
Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit e655038743)
If no grafana-server group is defined while an mgr group is, that task
will fail because `hostvars[groups[grafana_server_group_name][0]` can't
return anything since `groups['grafana-server']` will be a non existing
key.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 366b309c12)
To address this warning:
```
[DEPRECATION WARNING]: evaluating nfs_ganesha_dev as a bare variable, this
behaviour will go away and you might need to add |bool to the expression in the
future
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2b9fb377a8)
This task is already present in pre_requisite_non_container.yml
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit edb8d42596)
There's two big issues with the current OSD restart script.
1/ We try to test if the ceph osd daemon socket exists but we use a
wildcard for the socket name : /var/run/ceph/*.asok.
This fails because we usually have multiple ceph osd sockets (or
other ceph daemon collocated) present in /var/run/ceph directory.
Currently the test fails with:
bash: line xxx: [: too many arguments
But it doesn't stop the script execution.
Instead we can specify the full ceph osd socket name because we
already know the OSD id.
2/ The container filter pattern is wrong and could matches multiple
containers resulting the script to fail.
We use the filter with two different patterns. One is with the device
name (sda, sdb, ..) and the other one is with the OSD id (ceph-osd-0,
ceph-osd-15, ..).
In both case we could match more than needed.
$ docker container ls
CONTAINER ID IMAGE NAMES
958121a7cc7d ceph-daemon:latest ceph-osd-strg0-sda
589a982d43b5 ceph-daemon:latest ceph-osd-strg0-sdb
46c7240d71f3 ceph-daemon:latest ceph-osd-strg0-sdaa
877985ec3aca ceph-daemon:latest ceph-osd-strg0-sdab
$ docker container ls -q -f "name=sda"
958121a7cc7d
46c7240d71f3
877985ec3aca
$ docker container ls
CONTAINER ID IMAGE NAMES
2db399b3ee85 ceph-daemon:latest ceph-osd-5
099dc13f08f1 ceph-daemon:latest ceph-osd-13
5d0c2fe8f121 ceph-daemon:latest ceph-osd-17
d6c7b89db1d1 ceph-daemon:latest ceph-osd-1
$ docker container ls -q -f "name=ceph-osd-1"
099dc13f08f1
5d0c2fe8f121
d6c7b89db1d1
Adding an extra '$' character at the end of the pattern solves the
problem.
Finally removing the get_container_osd_id function because it's not
used in the script at all.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 45d46541cb)
The ansible_lsb fact is based on the lsb package (lsb-base,
lsb-release or redhat-lsb-core).
If the package isn't installed on the remote host then the fact isn't
populated.
--------
"ansible_lsb": {},
--------
Switching to the ansible_distribution_release fact instead.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit dc187ea6fa)
As per bz1718981, this commit adds higher values to check
the quorum status. This is helpful for several OSP deployments
that fail during the scale up.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1718981
Signed-off-by: fpantano <fpantano@redhat.com>
(cherry picked from commit ba73dc7b21)
The ceph-volume lvm list command takes ages to complete when having
a lot of LV devices on containerized deployment.
For instance, with 25 OSDs on a node it takes 3 mins 44s to list the
OSD.
Adding the max open files limit to the container engine cli when
executing the ceph-volume command seems to improve a lot thee
execution time ~30s.
This was impacting the OSDs creation with ceph-volume (both filestore
and bluestore) when using multiple LV devices.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b987534881)
We already set the become flag to true at a play level in the site*
playbooks so we don't need to set it at a task level.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7c3640177b)
`parted_results` isn't used anymore in the playbook.
By the way, `parted` seems to cause issue because it changes the
ownership on devices:
```
root@osd0 ~]# ls -l /dev/sdc*
brw-rw----. 1 root disk 8, 32 Jun 11 08:53 /dev/sdc
brw-rw----. 1 ceph ceph 8, 33 Jun 11 08:53 /dev/sdc1
brw-rw----. 1 ceph ceph 8, 34 Jun 11 08:53 /dev/sdc2
[root@osd0 ~]# parted -s /dev/sdc print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sdc: 53.7GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1049kB 1075MB 1074MB ceph block.db
2 1075MB 2149MB 1074MB ceph block.db
[root@osd0 ~]# #We can see ownerships have changed from ceph:ceph to root:disk:
[root@osd0 ~]# ls -l /dev/sdc*
brw-rw----. 1 root disk 8, 32 Jun 11 08:57 /dev/sdc
brw-rw----. 1 root disk 8, 33 Jun 11 08:57 /dev/sdc1
brw-rw----. 1 root disk 8, 34 Jun 11 08:57 /dev/sdc2
[root@osd0 ~]#
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit eece362b38)
The definitions of cephfs pools should match openstack pools.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Co-Authored-by: Simone Caronni <simone.caronni@teralytics.net>
(cherry picked from commit 67071c3169)
The ceph-agent role was used only for RHCS 2 (jewel) so it's not
usefull anymore.
The current code will fail on CentOS distribution because the rhscon
package is only avaible on Red Hat with the RHCS 2 repository and
this ceph release is supported on stable-3.0 branch.
Resolves: #4020
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7503098ca0)
Because we're using vagrant, a ssh config file will be created for
each nodes with options like user, host, port, identity, etc...
But via tox we're override ANSIBLE_SSH_ARGS to use this file. This
remove the default value set in ansible.cfg.
Also adding PreferredAuthentications=publickey because CentOS/RHEL
servers are configured with GSSAPIAuthenticationis enabled for ssh
server forcing the client to make a PTR DNS query.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 34f9d51178)
Since timesyncd is not available on RHEL-based OSs, change the default
to chronyd for RHEL-based OSs. Also, chronyd is chrony on Ubuntu, so
set the Ansible fact accordingly.
Fixes: https://github.com/ceph/ceph-ansible/issues/3628
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 9d88d3199f)
if we don't assign the rbd application tag on this pool,
the cluster will get `HEALTH_WARN` state like following:
```
HEALTH_WARN application not enabled on 1 pool(s)
POOL_APP_NOT_ENABLED application not enabled on 1 pool(s)
application not enabled on pool 'rbd'
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4cf17a6fdd)
Ubuntu-based CI jobs often fail with error code 404 while installing
NTP daemons. Updating cache beforehand should fix the issue.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit d1c266e6c7)
069076b introduced a bug in the systemd unit script template. This
commit fixes the options used by the node-exporter container.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d0840217f3)
Add a variable to support the allow_embedding support.
See ceph/ceph-ansible/issues/4084 for details.
Fixes: #4084
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 27856cc499)
This setting must be set to something resolvable.
See: ceph/ceph-ansible/issues/4085 for details
Fixes: #4085
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2c9cd9d9e7)
We're using fuser command to see if a process is using a ceph unix
socket file. But the fuser command runs through every PID present in
/proc/<PID> to see if one of them is using the file.
On a system running thousands processes, the fuser command can take
a long time to finish.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1717011
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit da9891da1e)
Instead of using the modprobe command from the path in the systemd
unit script, we can use the modprobe ansible module.
That way we don't have to manage the binary path based on the linux
distribution.
Resolves: #4072
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit dbf81b6b5b)
Few fixes on systemd unit templates for node_exporter and
alertmanager container parameters.
Added the ability to use a dedicated instance to deploy the
dashboard components (prometheus and grafana).
This commit also introduces the grafana_group_name variable
to refer grafana group and keep consistency with the other
groups.
During the integration with TripleO some grafana/prometheus
template variables resulted undefined. This commit adds the
ability to check if the group exist and create, accordingly,
different job groups in prometheus template.
Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit 069076bbfd)
Currently we're only able to use podman on ubuntu if podman's
installation is done manually before the ceph-ansible execution
because the deb package is present in an external repository.
We already manage the docker-ce installation via an external
repository so we should be able to allow the podman installation
with the same mechanism too.
https://github.com/containers/libpod/blob/master/install.md#ubuntuResolves: #3947
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 518ab794fb)
When using podman, the systemd unit scripts don't have a dependency
on the network. So we're not sure that the network is up and running
when the containers are starting.
With docker this behaviour is already handled because the systemd
unit scripts depend on docker service which is started after the
network.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f49090df7e)
By running ceph-ansible there are a lot ``[DEPRECATION WARNING]`` like these:
```
[DEPRECATION WARNING]: evaluating containerized_deployment as a bare variable,
this behaviour will go away and you might need to add |bool to the expression
in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This
feature will be removed in version 2.12. Deprecation warnings can be disabled
by setting deprecation_warnings=False in ansible.cfg.
```
Now appended ``| bool`` on a lot of the affected variables.
Sometimes the coding style from ``variable|bool`` changed to ``variable | bool`` *(with spaces at the pipe)*.
Closes: #4022
Signed-off-by: L3D <l3d@c3woc.de>
(cherry picked from commit ab54fe20ec)
This add support for rgw loadbalancer based on HAProxy and Keepalived.
We define a single role ceph-rgw-loadbalancer and include HAProxy and
Keepalived configurations all in this.
A single haproxy backend is used to balance all RGW instances and
a single frontend is exported via a single port, default 80.
Keepalived is used to maintain the high availability of all haproxy
instances. You are free to use any number of VIPs. A single VIP is
shared across all keepalived instances and there will be one
master for one VIP, selected sequentially, and others serve as
backups.
This assumes that each keepalived instance is on the same node as
one haproxy instance and we use a simple check script to detect
the state of each haproxy instance and trigger the VIP failover
upon its failure.
Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>
(cherry picked from commit 35d40c65f8)
if `nfs_obj_gw` is True when deploying an internal ganesha with an
external ceph cluster, `ceph_nfs_rgw_access_key` and
`ceph_nfs_rgw_secret_key` must be provided so the
ganesha configuration file can be generated.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 003aeea45a)
This commits allows to deploy an internal ganesha with an external ceph
cluster.
This requires to define `external_cluster_mon_ips` with a comma
separated list of external monitors.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1710358
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6a6785b719)
Otherwise content in /run/udev is mislabeled and prevent some services
like NetworkManager from starting.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 80875adba7)
the rhel8 image used is an outdated beta version, it is not worth it to
maintain this image upstream, since it's possible to test podman with a
newer version of centos/atomic-host image.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a78fb209b1)
789cef7 introduces a regression in the ganesha configuration file
generation. The new config_template module version broke it.
But the ganesha.conf file isn't an ini file and doesn't really
need to use the config_template module. Instead we can use the
classic template module.
Resolves: #4045
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 616c484698)
The fsid generation is done via a python command. When the ansible
controller node only have python3 available (like RHEL 8) then the
python command isn't necessarily present causing the fsid generation
to fail.
We already do some resource creation (like ceph keyring secret) with
the python command too but from the mon node so we should do the same
for fsid.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1714631
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit daf92a9e1f)
This commit splits the current `ceph-container-common` role.
This introduces a new role `ceph-container-engine` which handles the
tasks specific to the installation of containers tools (docker/podman).
This is needed for the ceph-dashboard implementation for 2 main reasons:
1/ Since the ceph-dashboard stack is only containerized, we must install
everything needed to run containers even in non containerized
deployments. Splitting this role allows us to not have to call the full
`ceph-container-common` role which would run a bunch of unneeded tasks
that would have been skipped anyway.
2/ The current implementation would have required to run
`ceph-container-common` on all ceph-clients nodes which would have been
conflicting with 9d3517c670 (we don't want
to run ceph-container-common on all client nodes, see mentioned commit
for more details)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 55420d6253)
The ceph mgr dashboard requires routes python library to be installed
on the system.
Resolves: #3995
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f37edfa113)
gpg package isn't available for all Debian/Ubuntu distribution but
gnupg is.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 622d9feae9)
As of nautilus, if you set `ms bind ipv6 = True` you must explicitly set
`ms bind ipv4 = False` too, otherwise OSDs will still try to pick up an
IPv4 address.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1710319
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6ca7372a2d)
Because ansible_distribution_version doesn't return minor version on
CentOS with ansible 2.8 we can apply the selinux anyway but only for
CentOS/RHEL 7.
Starting RHEL 8, there's a dedicated package for selinux called
nfs-ganesha-selinux [1].
Also replace the command module + semanage by the selinux_permissive
module.
[1] https://github.com/nfs-ganesha/nfs-ganesha/commit/a7911f
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0ee833432e)
Ceph iSCSI gateway requires Red Hat Enterprise Linux or CentOS 7.5
or later.
Because we can not check the ansible_distribution_version fact for
CentOS with ansible 2.8 (returns only the major version) we can
fallback by checking the kernel option.
- CONFIG_TARGET_CORE=m
- CONFIG_TCM_USER2=m
- CONFIG_ISCSI_TARGET=m
http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-install/
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0c7fd79865)
When using a minimal Debian/Ubuntu distribution there's no
ca-certificates and gpg packages installed so the apt modules will
fail:
Failed to find required executable gpg in paths:
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
apt.cache.FetchFailedException:
W:https://download.ceph.com/debian-luminous/dists/bionic/InRelease:
No system certificates available. Try installing ca-certificates.
Resolves: #3994
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 494746b7a6)
There is no need to have default values for these variables in each roles
since there is no corresponding host groups
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9f0d4d6847)
This commit renames the `docker_exec_cmd` variable to
`container_exec_cmd` so it's more generic.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e74d80e72f)
This commit aligns the way the different containers are managed with how
it's currently done with the other ceph daemon.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cc285c417a)
make `dashboard_rgw_api_no_ssl_verify` a bool variable since it seems to
be used as it.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cd5f3fca64)
since stable-4.0 isn't to deploy ceph releases prior to nautilus,
there's no need to add this complexity here.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4405f50c85)
use site.yml to deploy ceph-container-common in order to install docker
even in non-containerized deployments since there's no RPM available to
deploy the differents applications needed for ceph-dashboard.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cdff0da7d4)
there is no need to add more complexity for this, let's use
`containerized_deployment` in order to detect if we are running a
containerized deployment.
The idea is to use `container_exec_cmd` the same way we do in the rest of
the playbook to run the different ceph commands needed to deploy the
ceph-dashboard role.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 742bb6214c)
This is needed for the ceph-dashboard implementation since it requires
to run containerized application which aren't packaged as RPMs.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6d9dbb1d39)
add .j2 to all templates file related to dashboard roles.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3578d576a4)
This adds support for podman in dashboard-related roles. It also drops
the creation of custom network for the dashboard-related roles as this
functionality works in a different way with podman.
Signed-off-by: Boris Ranto <branto@redhat.com>
(cherry picked from commit b4d1c3693b)
We cannot use the old fashioned config-key way, here. It was not
supported when the option was introduced (post 14.2.0). Since the option
is not always supported we can simply ignore the potential failure on
ceph clusters that do not support it.
Signed-off-by: Boris Ranto <branto@redhat.com>
(cherry picked from commit e737a1f83e)
This commit adds a list of alerting rules for ceph-dashboard from the
old cephmetrics project. It also installs the configuration file so that
the rules get recognized by the prometheus server.
Signed-off-by: Boris Ranto <branto@redhat.com>
(cherry picked from commit 8f77caa932)
This commit will merge dashboard-ansible installation scripts with
ceph-ansible. This includes several new roles to setup ceph-dashboard
and the underlying technologies like prometheus and grafana server.
Signed-off-by: Boris Ranto & Zack Cerza <team-gmeno@redhat.com>
Co-authored-by: Zack Cerza <zcerza@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2f141a6e80)
Currently podman installation is very tied to RHEL 8 even if we're
able to install it on Debian/Ubuntu distribution.
This patch changes the way we are starting or not the (fat) container
daemon. Before the condition was based on the distribution release
and now on the container_service_name variable.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d2ad191eca)
If we do this in one line we get the error described in #3968fixes#3968
Signed-off-by: Bruceforce <markus.greis@gmx.de>
(cherry picked from commit c3b0ee30a1)
RHCS 4 will be based on Nautilus and only usable on RHEL 8.
Updated the default ceph_rhcs_version to 4 and update the rhcs
repositories to rhcs 4 with RHEL 8.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ba49225eab)
The old condition would resolve to
"when": "nfs_ganesha_stable - ceph_repository == 'community'"
now it is
"when": [
"nfs_ganesha_stable",
"ceph_repository == 'community'"
]
Please backport to stable-4.0
Signed-off-by: Bruceforce <markus.greis@gmx.de>
(cherry picked from commit 29f2c953b4)
Set the application to rgw for pools created from rgw_create_pools. On Ceph Nautilus the heath is set to HEALTH_WARN with the message "application not enabled on X pool(s)" if an application isn't specified for a pool.
Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>
(cherry picked from commit 381c58ca3e)
Use blocks for similar tasks in main.yml. And move when keywords before
block keywords.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 121b5e4184)
running an external ceph cluster deployment with (obviously) no
monitors defined in inventory breaks with an undefined error because
`_monitor_addresses` never get defined.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1707460
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 936c6fca78)
Except for some corner case, it's not correct to access some other
node's copy of variable docker_exec_cmd. Therefore replace
"hostvars[groups[mon_group_name][0]]['docker_exec_cmd']" by
"docker_exec_cmd".
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 89748d579a)
Add code in ceph-mgr for creating a keyring for manager in so that
managers can be deployed on a separate node too.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 56bfec7c58)
Adds "check_mode: no" to commands which register cluster state in a
variable and don't modify anything. These commands have to run in order
to support running the playbook in check mode.
Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>
(cherry picked from commit 3c8987c7a5)
Keywords requiring only one item shouldn't express it by creating a
list with single item.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 739a662c80)
Conflicts:
roles/ceph-mon/tasks/ceph_keys.yml
roles/ceph-validate/tasks/check_devices.yml
This will be removed in ansible 2.8 and breaks the playbook execution
with this release.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ae266c6f2b)
In containerized deployment the default mds cpu quota is too low
for production environment.
This is causing performance degradation compared to bare-metal.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695850
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1999cf3d19)
In containerized deployment the default osd cpu quota is too low
for production environment using NVMe devices.
This is causing performance degradation compared to bare-metal.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695880
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c17106874c)
Only rbd-target-api and rbd-target-gw were started/enabled for non
containerized deployment.
The issue doesn't happen with containerized setup.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4ae5ce399b)
Otherwise the reader is forced to search for "when" when blocks are too
long.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit e0beaf123a)
Conflicts:
roles/ceph-config/tasks/main.yml
roles/ceph-container-common/tasks/pre_requisites/prerequisites.yml
roles/ceph-validate/tasks/check_devices.yml
1/ The OSD already supports cpuset to be used for containerized deployments
through the use of the ceph_osd_docker_cpuset_cpus variable. This adds similar
support to the RGW service for containerized deployments by setting a new
variable named ceph_rgw_docker_cpuset_cpus. Like the OSD, there are times where
using distinct cores has advantages over using the CFS in kernel scheduler.
ceph_rgw_docker_cpuset_cpus accepts a comma delimited set of CPU ids
2/ Add support for specifying --cpuset-mem variable to restrict the cgroup's memory
allocations to a particular numa node, which should typically correspond with
the cpu ids of that numa node that were provided with --cpuset-cpus. To ensure
the correct cpu ids are used one can run `numactl --hardware` to list the nodes
and which cpu ids correspond to each.
Signed-off-by: Kyle Bader <kbader@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0bee90b201)
Until now it was not possible to install a specific container package
because it was somehow hardcoded.
This patch allows to override the container package name (docker.io
vs docker-ce) and refacts the package installation. This could be
achieve via the container_package_name variable.
Instead of using one task per distribution we can set the package and
service name in vars. This allows to have a unified package task.
Also refactorize the debian_prerequisites tasks because the content
was outdated.
https://docs.docker.com/install/linux/docker-ce/debian/https://docs.docker.com/install/linux/docker-ce/ubuntu/Resolves: #3609
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8105a1cefb)
We do this so that the ceph-config role can most accurately
report the number of osds for the generation of the ceph.conf
file.
We don't want to use ceph-volume to determine the number of
osds because in an upgrade to nautilus ceph-volume won't be able to
accurately count osds created by ceph-disk.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 67453853ff)
When performing a rolling update do not try to create
any new osds with `ceph-volume lvm batch`. This is troublesome
because when upgrading to nautilus the devices list might contain
devices that are currently being used by ceph-disk and have GPT
headers on them, which will cause ceph-volume to fail when
trying to use such a device. Any devices originally created
by ceph-disk will need to be removed from the devices list
before any new osds can be created.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 5e3dfe5021)
Since Nautilus there's mgr extra modules not present in ceph-mgr
package but in dedicated packages.
Resolves: #3860
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 86315272c7)
this commit refact the msgr2 protocol introduction.
If it's a fresh install, let's go with v2 only.
If we upgrade to nautilus, we should go with v2+v1 syntax to ensure
nothing breaks.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a4bc7bda51)
The library directory that contain the custom ceph modules in present
in the ceph-ansible root directory.
All igw_* mocules are already present there so we don't need the one
present in roles/ceph-iscsi-gw/library.
Also remove the associated spec file.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c8814d1331)
this task has nothing to do in stable-4.0 and after.
Let's remove it since stable-4.0 and after aren't intended to deploy
luminous.
Closes: #3873
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 58f3851573)
Currently we only support ansible 2.7
We plan to use 2.8 when it will be release so we have to support both
2.7 and 2.8.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1700548
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e471bce76b)
Because 5c98e361df could be seen as a non
backward compatible change this commit reverts it and bring back package
dependencies installation support.
Let's just modify the default value instead.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit edfa4310d3)
These packages aren't needed anymore.
They were needed for ceph-init-detect buti as of ceph-init-detect doesn't exist
anymore.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683885
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5c98e361df)
When adding a new monitor, we must reuse the existing initial monitor
keyring. Otherwise, the new monitor will issue its 'mkfs' with a new
monitor keyring and it will result with a mismatch between them. The
new monitor will be unable to join the quorum in the end.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit edf1ee2073)
these files aren't needed anymore since we only use lvm scenario.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4f68462009)
This variable was related to ceph-disk scenarios.
Since we are entirely dropping ceph-disk support as of stable-4.0, let's
remove this variable.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f0416c8892)
As of stable-4.0, the only valid scenario is `lvm`.
Thus, this makes this variable useless.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4d35e9eeed)
ceph_disk_cli_options_facts.yml is not used anymore, let's remove it.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4d5637fd8a)
We only validate the devices that are passed if there is a list of
devices to validate.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2888c0825f)
osd_scenario has become obsolete and defaults to lvm. With lvm there is
no such things has collocated and non-collocated.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 52df15895b)
ceph-disk is not supported anymore, so all the newly created OSDs will
be configured using ceph-volume.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 9ea1e49407)
We don't support the preparation of OSD with ceph-disk. ceph-volume is
only supported. However, the start operation of OSD is still supported.
So let's say you change a config option, the handlers will be able to
restart all the OSDs via their respective systemd unit files.
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e2a5aa062e)
We don't need to use the cephfs variable for the application pool
name because it's always cephfs.
If the cephfs variable is set to something else than the default
value it will break the appplication pool task.
Resolves: #3790
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d2efb7f02b)
ceph-volume didn't work when the devices where passed by path.
Since it now support it, let's allow this feature in ceph-ansible
Closes: #3812
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7e0adca7a4)
As discussed in ceph/ceph#26599, beast is now the default frontend
for rados gateway with nautilus release.
Add rgw_thread_pool_size variable with 512 as default value and keep
backward compatibility with num_threads option when using civetweb.
Update radosgw_civetweb_num_threads to reflect rgw_thread_pool_size
change.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d17b1b48b6)
Let's use a condition to run this task only on the first mon.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 631e5d3144)
The Ubuntu Cloud Archive-related (UCA) defaults in
roles/ceph-defaults/defaults/main.yml were commented out, which means
if you set `ceph_repository` to "uca", you get undefined variable
errors, e.g.
```
The task includes an option with an undefined variable. The error was: 'ceph_stable_repo_uca' is undefined
The error appears to have been in '/nfs/users/nfs_m/mv3/software/ceph-ansible/roles/ceph-common/tasks/installs/debian_uca_repository.yml': line 6, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: add ubuntu cloud archive repository
^ here
```
Unfortunately, uncommenting these results in some other breakage,
because further roles were written that use the fact of
`ceph_stable_release_uca` being defined as a proxy for "we're using
UCA", so try and install packages from the bionic-updates/queens
release, for example, which doesn't work. So there are a few `apt` tasks
that need modifying to not use `ceph_stable_release_uca` unless
`ceph_origin` is `repository` and `ceph_repository` is `uca`.
Closes: #3475
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit 9dd913cf8a)
docker daemon is automatically started during package installation
but the service isn't enabled on boot.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 37816570c6)
Add a tox scenario that adds an new MDS node as a part of already
deployed Ceph cluster and deploys MDS there.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit c0dfa9b61a)
When using monitor_address_block or radosgw_address_block variables
to configure the mon/rgw address we're getting the first ip address
from the ansible facts present in that cidr.
When there's VIP on that network the first filter could return the
wrong value.
This seems to affect only IPv6 setup because the VIP addresses are
added to the ansible facts at the beginning of the list. This is the
opposite (at the end) when using IPv4.
This causes the mon/rgw processes to bind on the VIP address.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680155
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fd4b0ec7eb)
The path of the RGW environment file (in the /var/lib/ceph/radosgw/
directory) depends on the Ceph clustername. It was not taken into
account in the Ansible role `ceph-rgw`.
Signed-off-by: flaf <francois.lafont.1978@gmail.com>
(cherry picked from commit 4c3e77d869)
When mgrs are implicitly collocated on monitors (no mgrs in mgrs group).
That include was skipped because of this condition :
`inventory_hostname == groups[mgr_group_name][0]`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cbfdbab177)
before managing mgr modules, we must ensure all mgr are available
otherwise we can hit failure like following:
```
stdout:Error ENOENT: all mgr daemons do not support module 'restful', pass --force to force enablement
```
It happens because all mgr are not yet available when trying to manage
with mgr modules.
Closes: #3100
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f596cc1711)
According to rdo testing https://review.rdoproject.org/r/#/c/18721
a check on the output of the ceph_health value is added to
allow the playbook to make several attempts (according to the
retry/delay variables) when waiting the cluster quorum or
when the container bootstrap is not ended.
It avoids the failure of the command execution when it doesn't
receive a valid json object to decode (because cluster is too
slow to boostrap compared to ceph-ansible task execution).
Signed-off-by: fpantano <fpantano@redhat.com>
(cherry picked from commit afbb90e4ac)
In containerized deployment the default radosgw quota is too low
for production environment.
This is causing performance degradation compared to bare-metal.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680171
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d3ae9fd05f)
Since https://github.com/ceph/ceph/commit/77912c0 ceph-volume uses
stdout encoding based on LC_CTYPE and PYTHONIOENCODING environment
variables.
Thoses variables aren't set when using ansible.
Currently this commit breaks non containerized deployment on Ubuntu.
TASK [use ceph-volume to create bluestore osds] ********************
cmd:
- ceph-volume
- --cluster
- ceph
- lvm
- create
- --bluestore
- --data
- /dev/sdb
rc: 1
stderr: |-
Traceback (most recent call last):
(...)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in
position 132: ordinal not in range(128)
Note that the task is failing on ansible side due to the stdout
decoding but the osd creation is successful.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7e5e4229b7)
Similar to #3658
Since there's too many changes between master and stable branches let's
commit directly in each branches instead of trying to backport this
commit.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When installing python-minimal on Ubuntu bionic, this will add the
/usr/bin/python symlink to the default python interpreter.
On bionic, this isn't python2 but python3.
$ /usr/bin/python --version
Python 3.6.7
The python docker library is only installed for python2 which causes
issues when running the purge-docker-cluster playbook. This playbook
uses the ansible docker modules and requires to have python bindings
installed on the remote host.
Without the bindings we can see python error reported by the docker
module.
msg: Failed to import docker or docker-py - No module named 'docker'.
Try `pip install docker` or `pip install docker-py` (Python 2.6)
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
sometimes those tasks might fail because of a timeout.
I've been facing this several times in the CI, adding this retry might
help and won't hurt in any case.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add a couple of fixes to allow containerized deployments upgrade support
to upgrade from luminous/mimic to nautilus.
- pass CEPH_CONTAINER_IMAGE and CEPH_CONTAINER_BINARY environment
variable to the ceph_key module,
- fix the docker exec command in 'waiting for the containerized monitor
to join the quorum' task according to the `delegate_to` parameter,
- override `docker_exec_cmd` in `ceph-facts` with `mon_host` when
rolling_update is `True`,
- do not run unnecessarily `create_mds_filesystems.yml` when performing an
upgrade.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This prevents the packaging from restarting services before we do need
to restart them in the rolling update sequence.
We want to handle services restart at rolling_update playbook.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
As of nautilus, the initial keyrings list has changed, it means when
upgrading from Luminous or Mimic, it is expected there's a mismatch
between what is found on the cluster and the expected initial keyring
list hardcoded in ceph_key module. We shouldn't fail when upgrading to
nautilus.
str_to_bool() took from ceph-volume.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-Authored-by: Alfredo Deza <adeza@redhat.com>
rolling_update playbook already takes care of stopping/starting services
during the sequence. There's no need to trigger potential unwanted
services restart.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When using osd_scenario lvm, we never check if the lvm2 package is
present on the host.
When using containerized deployment and docker on CentOS/RedHat this
package will be automatically installed as a dependency but not for
Ubuntu distribution.
OSD deployed via ceph-volume require the lvmetad.socket to be active
and running.
Resolves: #3728
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Since all files in container image have moved to `/opt/ceph-container`
this check must look for new AND the old path so it's backward
compatible. Otherwise it could end up by templating an inconsistent
`ceph-osd-run.sh`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When using monitor_address_block to determine the ip address of the
monitor node, we need an ip address available in that cidr to be
present in the ansible facts (ansible_all_ipv[46]_addresses).
Currently we don't check if there's an ip address available during
the ceph-validate role.
As a result, the ceph-config role fails due to an empty list during
ceph.conf template creation but the error isn't explicit.
TASK [ceph-config : generate ceph.conf configuration file] *****
fatal: [0]: FAILED! => {"msg": "No first item, sequence was empty."}
With this patch we will fail before the ceph deployment with an
explicit failure message.
Resolves: rhbz#1673687
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When using community repository we need to set the priority on the
ceph repositories because we could have some conflict with EPEL
packages.
In order to set the priority on the ceph repositories, we need to
install the yum-plugin-priorities package.
http://docs.ceph.com/docs/master/install/get-packages/#rpm-packages
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
the task will be delegated to mons[0] for all mgr hosts, so we can just run it on the first host and have the same effect.
Signed-off-by: wumingqiao <wumingqiao@beyondcent.com>
Currently the default crush rule value is added to the ceph config
on the mon nodes as an extra configuration applied after the template
generation via the ansible ini module.
This implies two behaviors:
1/ On each ceph-ansible run, the ceph.conf will be regenerated via
ceph-config+template and then ceph-mon+ini_file. This leads to a
non necessary daemons restart.
2/ When other ceph daemons are collocated on the monitor nodes
(like mgr or rgw), the default crush rule value will be erased by
the ceph.conf template (mon -> mgr -> rgw).
This patch adds the osd_pool_default_crush_rule config to the ceph
template and only for the monitor nodes (like crush_rules.yml).
The default crush rule id is read (if exist) from the current ceph
configuration.
The default configuration is -1 (ceph default).
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1638092
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
With 3e32dce we can run OSD containers with numactl support.
When using numactl command in a containerized deployment we need to
be sure that the corresponding package is installed on the host.
The package installation is only executed when the
ceph_osd_numactl_opts variable isn't empty.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We don't need to set After=docker.service when the container_binary
variable isn't set to docker.
It doesn't break anything currently but it could be confusing when
using podman.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Instead of using subscription-manager with command module we can use
the rhsm_repository ansible module.
This module already uses repos list feature to determine if a
repository is enabled or not. That way this module is idempotent so
we don't need changed_when: false anymore.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The ceph stable community repository only enables the basearch
packages url.
Adding the noarch url because starting with nautilus release, some
packages are added there and useful for mgr or grafana.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
After b8d580b and e9e5d5a we could have either item.min_size or
osd_pool_default_min_size using string instead of int causing the
condition to be true when it's false.
As a result, the task could try to set the pool min_size value to
0 which leads to:
Error EINVAL: pool min_size must be between 1 and 1
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
b8d580b3f4 introduced a bug when
`min_size` isn't set (default to 0).
Typical error:
```
Error EINVAL: pool min_size must be between 1 and 1
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The following lint issues have been resolved:
[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-mon/tasks/ceph_keys.yml:2
[305] Use shell only when shell functionality is required
/home/travis/build/ceph/ceph-ansible/roles/ceph-osd/tasks/start_osds.yml:47
[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:2
[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:7
[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:14
[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:19
[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:24
Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>
Tuned name of a task and error message to make it more user understandable
Fixes BZ 1648168 - ceph-validate : devices are not validated in non-collocated and lvm_batch scenario
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1648168
Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
The "get osd ids" statement only registers the osd_ids_non_container variable. Running "ls /var/lib/ceph/osd/ | sed 's/.*-//'" should never produce a change on the system. Adding changed_when: false prevents irrelevant change messages from Ansible.
Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>