This commit adds a new job in order to test the
filestore-to-bluestore.yml infrastructure playbook.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We don't need to set the use_new_ceph_iscsi fact on other nodes than
those present in the iscsigws group.
Also remove the duplicate iscsi_gw_group_name condition already present
on the include_task.
Finally validate the ansible distribution as the first task.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
There's no need to enforce PreferredAuthentications by default.
Users can still choose to override the ansible.cfg with any additional
parameter like this one to fit their infrastructure.
Fixes: #4826
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit isolates and adds an explicit comment about variables not
intended to be modified by the user.
Fixes: #4828
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When using fqdn in inventory, that playbook fails because of some tasks
using the result of ceph osd tree (which returns shortname) to get
some datas in hostvars[].
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1779021
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Typical error:
```
type=AVC msg=audit(1575367499.582:3210): avc: denied { search } for pid=26680 comm="node_exporter" name="1" dev="proc" ino=11528 scontext=system_u:system_r:container_t:s0:c100,c1014 tcontext=system_u:system_r:init_t:s0 tclass=dir permissive=0
```
node_exporter needs to be run as privileged to avoid avc denied error
since it gathers lot of information on the host.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1762168
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
It doesn't make sense to start validating configuration if the ansible
version isn't the good one.
This commit moves the check_system task as the first task in the
ceph-validate role.
The ansible version test tasks are moved at the top of this file.
Also moving the iscsi kernel tests from check_system to check_iscsi
file.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The ntp/chrony facts are only used in the ceph-infra role so we don't
really need to set them in the ceph-facts roles.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
A recent change in ceph/ceph prevent from having username in the
password:
`Error EINVAL: Password cannot contain username.`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
In containerized context, containers aren't stopped early in the
sequence.
It means they aren't restarted after the upgrade because the task is
just checking the daemon status is started (eg: `state: started`).
This commit also removes the task which ensure services are started
because it's already done in the role ceph-iscsigw.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
when upgrading from RHCS 3, dashboard has obviously never been deployed
and it forces us to deploy it later manually.
This commit adds the dashboard deployment as part of the upgrade to
RHCS 4.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1779092
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The md devices (RAID software) aren't excluded from the devices list in
the auto discovery scenario.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1764601
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The podman support was added to the purge-container-cluster playbook but
containers are always used for the dashboard even on non containerized
deployment.
This commits adds the podman support on purging the dashboard resources
in the purge-cluster playbook.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Having max_mds value equals to the number of mds nodes generates a
warning in the ceph cluster status:
cluster:
id: 6d3e49a4-ab4d-4e03-a7d6-58913b8ec00a'
health: HEALTH_WARN'
insufficient standby MDS daemons available'
(...)
services:
mds: cephfs:3 {0=mds1=up:active,1=mds0=up:active,2=mds2=up:active}'
Let's use 2 active and 1 standby mds.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
ceph/ceph-ansible#4805 introduced a symlink to
purge-container-cluster.yml playbook which is broken.
This commit fixes it.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When using the shortname, the URL for active alert launches with short
hostname and fails to connect to the server.
This commit changes the template in order to use the fqdn.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1765485
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
All containers are removed when systemd stops them.
There is no need to call this module in purge container playbook.
This commit also removes all docker_image task and remove all container
images in the final cleanup play.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1776736
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This is needed to avoid following error:
```
ERROR! The requested handler 'restart ceph mons' was not found in either the main handlers list nor in the listening handlers list
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1777829
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We must import this role in the first play otherwise the first call to
`client_group_name`fails.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1777829
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
When a container is already running on a non containerized node then the
umount ceph partition task is skipped.
This is due to the container ps command which always returns 0 even if
the filter matches nothing.
We should run the umount task when:
1/ the container command is failing (not installed) : rc != 0
2/ the container command reports running ceph-osd containers : rc == 0
Also we should not fail on the ceph directory listing.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
cf8c6a3 moves the 'wait for all osds' task from openstack_config to the
main tasks list.
But the openstack_config code was executed only on the last OSD node.
We don't need to do this check on all OSD node so we need to add set
run_once to true on that task.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When using `osd_auto_discovery`, `devices` is built multiple times due
to multiple runs of `ceph-facts` role. It end up with duplicate
instances of a same device in the list.
Using `unique` filter when building the list fixes this issue.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit makes the ceph-dashboard role only printing ceph-dashboard
URL of the nodes present in grafana-server group
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1762163
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The current ceph cluster health is in warning state:
health: HEALTH_WARN
13 pool(s) have no replicas configured
2 pool(s) have non-power-of-two pg_num
Because we're using only 1 replica then we need to disable the redundancy
check.
The pool pg num should be a power of two number (like 16).
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When creating crush rules with device class parameter we need to be sure
that all OSDs are up and running because the device class list is
is populated with this information.
This is now enable for all scenario not openstack_config only.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The wait_for ansible module doesn't support the backets on IPv6 address
so need to remove them.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1769710
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This commit reverts the following change:
fcf181342a (diff-23b6f443c01ea2efcb4f36eedfea9089R7-R14)
this is causing CI failures so this commit is intended to unlock the CI.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
this file is provided by the packaging (nfs-ganesha) so there's no need
to maintain it in ceph-ansible
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
At the moment, we bindmount the dbus socket from the host, this requires
to run the container with --privileged.
Since we now run a dedicated dbus daemon inside the same container, we
can stop running privileged nfs-ganesha containers
Related ceph-container PR : ceph/ceph-container#1517
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1725254
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
1. set noout and nodeep-scrub flags,
2. upgrade each OSD node, one by one, wait for active+clean pgs
3. after all osd nodes are upgraded, unset flags
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Rachana Patel <racpatel@redhat.com>
Configuration of cephfs with an existing cluster using --limit used to fail
at different tasks while running with site-docker.yml
This commit addresses both of those tasks
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1773489
Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
If we execute the site-container.yml playbook with specific tags (like
ceph_update_config) then we need to be sure to gather the facts otherwise
we will see error like:
The task includes an option with an undefined variable. The error was:
'ansible_hostname' is undefined
This commit also adds missing 'gather_facts: false' to mons plays.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1754432
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This adds device class support to crush rules when using the class key
in the rule dict via the create-replicated sub command.
If the class key isn't specified then we use the create-simple sub
command for backward compatibility.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
If we want to create crush rules with the create-replicated sub command
and device class then we need to have the OSD created before the crush
rules otherwise the device classes won't exist.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
In addition to the grafana container tag change, we need to do the same
for the prometheus container stack based on the release present in the
OSE 4.1 container image.
$ docker run --rm openshift4/ose-prometheus-node-exporter:v4.1 --version
node_exporter, version 0.17.0
build user: root@67fee13ed48f
build date: 20191023-14:38:12
go version: go1.11.13
$ docker run --rm openshift4/ose-prometheus-alertmanager:4.1 --version
alertmanager, version 0.16.2
build user: root@70b79a3f29b6
build date: 20191023-14:57:30
go version: go1.11.13
$ docker run --rm openshift4/ose-prometheus:4.1 --version
prometheus, version 2.7.2
build user: root@12da054778a3
build date: 20191023-14:39:36
go version: go1.11.13
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This will prevent failure of site-docker.yml with configs in doc.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1769760
Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
Co-Authored-By: Guillaume Abrioux <gabrioux@redhat.com>
when `import_key` is enabled, if the key already exists, it will only be
fetched using ceph cli, if the mode specified in the `ceph_key` task is
different from what is applied by the ceph cli, the mode isn't restored because
we don't call `module.set_fs_attributes_if_different()` before
`module.exit_json(**result)`
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1734513
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>