Before this patch, the lvm2 package installation was done during the
ceph-osd role.
However we were running ceph-volume command in the ceph-config role
before ceph-osd. If lvm2 wasn't installed then the ceph-volume command
fails:
error checking path "/run/lock/lvm": stat /run/lock/lvm: no such file or
directory
This wasn't visible before because lvm2 was automatically installed as
docker dependency but it's not the same for podman on CentOS 8.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Since we don't have nfs-ganesha builds available on CentOS 8 at the
moment on shaman then we can use the alternative repository at [1]
[1] https://download.nfs-ganesha.org/3/LATEST/CentOS
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Currently the keepalived template only works when system hostnames exactly match the Ansible inventory name. If these are different, all generated templates become BACKUP without a MASTER assigned. Using the inventory_hostname in the template file resolves this issue.
Signed-off-by: Stanley Lam stanleylam_604@hotmail.com
The grafana-server group name was hardcoded for the grafana/prometheus
firewalld tasks condition.
We should we the associated variable : grafana_server_group_name
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Instead of using multiple dashboard_enabled condition in the
configure_firewall file we could just have the condition once
and include the dedicated tasks list.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When there's no mgr group defined in the ansible inventory then the
mgrs are deployed implicitly on the mons nodes.
If the dashboard is enabled then we need to open the dashboard port on
the node that is running the ceph mgr process (mgr or mon).
The current code only allow to open that port on the mgr nodes when they
are present explicitly in the inventory but not implicitly.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783520
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
To avoid confusion, let's change the default value from `0.0.0.0` to
`x.x.x.x`.
Users might think setting `0.0.0.0` will make the daemon binding on all
interfaces.
Fixes: #4827
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We don't need to set the use_new_ceph_iscsi fact on other nodes than
those present in the iscsigws group.
Also remove the duplicate iscsi_gw_group_name condition already present
on the include_task.
Finally validate the ansible distribution as the first task.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This commit isolates and adds an explicit comment about variables not
intended to be modified by the user.
Fixes: #4828
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Typical error:
```
type=AVC msg=audit(1575367499.582:3210): avc: denied { search } for pid=26680 comm="node_exporter" name="1" dev="proc" ino=11528 scontext=system_u:system_r:container_t:s0:c100,c1014 tcontext=system_u:system_r:init_t:s0 tclass=dir permissive=0
```
node_exporter needs to be run as privileged to avoid avc denied error
since it gathers lot of information on the host.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1762168
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
It doesn't make sense to start validating configuration if the ansible
version isn't the good one.
This commit moves the check_system task as the first task in the
ceph-validate role.
The ansible version test tasks are moved at the top of this file.
Also moving the iscsi kernel tests from check_system to check_iscsi
file.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The ntp/chrony facts are only used in the ceph-infra role so we don't
really need to set them in the ceph-facts roles.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
A recent change in ceph/ceph prevent from having username in the
password:
`Error EINVAL: Password cannot contain username.`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The md devices (RAID software) aren't excluded from the devices list in
the auto discovery scenario.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1764601
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When using the shortname, the URL for active alert launches with short
hostname and fails to connect to the server.
This commit changes the template in order to use the fqdn.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1765485
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
All containers are removed when systemd stops them.
There is no need to call this module in purge container playbook.
This commit also removes all docker_image task and remove all container
images in the final cleanup play.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1776736
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This is needed to avoid following error:
```
ERROR! The requested handler 'restart ceph mons' was not found in either the main handlers list nor in the listening handlers list
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1777829
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
cf8c6a3 moves the 'wait for all osds' task from openstack_config to the
main tasks list.
But the openstack_config code was executed only on the last OSD node.
We don't need to do this check on all OSD node so we need to add set
run_once to true on that task.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When using `osd_auto_discovery`, `devices` is built multiple times due
to multiple runs of `ceph-facts` role. It end up with duplicate
instances of a same device in the list.
Using `unique` filter when building the list fixes this issue.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit makes the ceph-dashboard role only printing ceph-dashboard
URL of the nodes present in grafana-server group
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1762163
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When creating crush rules with device class parameter we need to be sure
that all OSDs are up and running because the device class list is
is populated with this information.
This is now enable for all scenario not openstack_config only.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The wait_for ansible module doesn't support the backets on IPv6 address
so need to remove them.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1769710
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
this file is provided by the packaging (nfs-ganesha) so there's no need
to maintain it in ceph-ansible
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
At the moment, we bindmount the dbus socket from the host, this requires
to run the container with --privileged.
Since we now run a dedicated dbus daemon inside the same container, we
can stop running privileged nfs-ganesha containers
Related ceph-container PR : ceph/ceph-container#1517
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1725254
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Configuration of cephfs with an existing cluster using --limit used to fail
at different tasks while running with site-docker.yml
This commit addresses both of those tasks
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1773489
Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
This adds device class support to crush rules when using the class key
in the rule dict via the create-replicated sub command.
If the class key isn't specified then we use the create-simple sub
command for backward compatibility.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
If we want to create crush rules with the create-replicated sub command
and device class then we need to have the OSD created before the crush
rules otherwise the device classes won't exist.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
In addition to the grafana container tag change, we need to do the same
for the prometheus container stack based on the release present in the
OSE 4.1 container image.
$ docker run --rm openshift4/ose-prometheus-node-exporter:v4.1 --version
node_exporter, version 0.17.0
build user: root@67fee13ed48f
build date: 20191023-14:38:12
go version: go1.11.13
$ docker run --rm openshift4/ose-prometheus-alertmanager:4.1 --version
alertmanager, version 0.16.2
build user: root@70b79a3f29b6
build date: 20191023-14:57:30
go version: go1.11.13
$ docker run --rm openshift4/ose-prometheus:4.1 --version
prometheus, version 2.7.2
build user: root@12da054778a3
build date: 20191023-14:39:36
go version: go1.11.13
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This will prevent failure of site-docker.yml with configs in doc.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1769760
Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
Co-Authored-By: Guillaume Abrioux <gabrioux@redhat.com>
When ceph_rbd_mirror_configure is set to true we need to ensure that
the required variables aren't empty.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1760553
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
If for some reason, there's an old rgw socket file present in the
/var/run/ceph/ directory then the test command could fail with
test: xxxxxxxxx.asok: binary operator expected
$ ls -hl /var/run/ceph/
total 0
srwxr-xr-x. ceph-client.rgw.rgw0.rgw0.68.94153614631472.asok
srwxr-xr-x. ceph-client.rgw.rgw0.rgw0.68.94240997655088.asok
We can check the radosgw socket in /proc/net/unix to avoid using wildcard
in the socket name.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
[1] introduced a regression on the fs.aio-max-nr sysctl value condition.
The enable key isn't a boolean but a string because the expression isn't
evaluated.
This string output "(osd_objectstore == 'bluestore')" is always true
because item.enable condition only matches non empty string. So the
sysctl value was applyied for both filestore and bluestore backend.
[2] added the bool filter to the condition but the filter always returns
false on string and the sysctl wasn't applyed at all.
This commit fixes the enable key value by evaluating the value instead
of using the string.
[1] https://github.com/ceph/ceph-ansible/commit/08a2b58
[2] https://github.com/ceph/ceph-ansible/commit/ab54fe2
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The latest grafana container tag is using grafana 6.x release which could
cause issue with the ceph dashboard integration.
Considering that the grafana container in RHCS 3 is based on 5.x then we
should use the same version.
$ docker run --rm rhceph/rhceph-3-dashboard-rhel7:3 -v
Version 5.2.4 (commit: unknown-dev)
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Even if this improves ceph-disk/ceph-volume performances then it also
impact the ceph-osd process.
The ceph-osd process shouldn't use 1024:4096 value for the max open
files.
Removing the ulimit option from the container engine and doing this kind
of change on the container side [1].
[1] https://github.com/ceph/ceph-container/pull/1497
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This change adds two tasks to set grafana-api user and password
that are required to inject dashboard layouts to the external
grafana instance.
Without these two parameters the ceph-ansible playbook fails
showing an authorization error (HTTPError: 401 Client Error:
Unauthorized").
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1767365
Signed-off-by: fmount <fpantano@redhat.com>
This brings the possibility to modify the rgw nfs user to use specific
keys when those are defined.
Signed-off-by: Radu Toader <radu.m.toader@gmail.com>
Add ceph_docker_registry_username and ceph_docker_registry_password
variables in ceph-defaults role so they will be present in the group_vars
samples but commented.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1763139
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
since c09b82a80a392ccd0da7677c7b424ce5cd3fa5d6 in ceph/ceph we must call
mon_status from asok instead.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit adds the support of the ceph-iscsi stable repository when
use ceph_repository community instead of always using the devel
repositories.
We're still using the devel repositories for rtslib and tcmu-runner in
both cases (dev and community).
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>