This commit refacts the way we check the "mds_to_kill" node is well
stopped.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 7df62fde34)
Add a new functional test that deploys a Ceph cluster with three nodes
for MON, OSD and MDS and then runs shrink-mds.yml to test it.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 324b3b4a6c)
# Conflicts:
# tox.ini
Add a playbook, named "shrink-mds.yml", in infrastructure-playbooks/
that removes a MDS from a node in an already deployed Ceph cluster.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 235b1fccc6)
c90f605b5 introduces the default ceph cluster name value in the rgw
socket path for the rgw restart script. But this should use the
`cluster` variable instead.
This commit also fixes this in the osd restart script.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit de7f948b75)
The ability to add nodes with the monitor role to an existing cluster
whose name differs from the default name is fixed.
Signed-off-by: ilyashestopalov <usr.tester@yandex.ru>
(cherry picked from commit 904532c5e2)
According to the OSP pattern, we need the package-install tag
to control what is installed on the host. This commit just add
the missing tag to meet the TripleO requirements.
See: /issues/4197 for details
Fixes: #4197
Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit 95bd002b35)
On containerized deployment we need to bind mount the ceph-iscsi
directory to avoid writing the logs in the container.
The /var/log/ceph directory isn't use by rbd-targe-api/gw services
because they have their own log directories.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 91bef94b6c)
This commit moves some old variables into ceph-defaults so we can move
the `use_new_ceph_iscsi` fact in ceph-facts role in order.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a781ce881c)
If the user is still using the older packages and does not setup
the target iqn you will just get a vague error message later on.
This adds a check during the validate task, so it is clear to the
user.
Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit 08a6d10c32)
If the user has manually installed ceph-iscsi but is trying to setup a
iscsi object in iscsigws.yml you will just a python crash. This patch
adds a check and more user friendly error message for the case.
Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit 2e1a4328a8)
gateway_ip_list is depreciated and is only used when using the old
ceph-iscsi-config/cli packages that are no longer being developed
(GH repos are archived). Because ceph-iscsi-config/cli is no longer
being worked on, this modifies the tests to stress the ceph-iscsi
based installs.
Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit 1e64efc2f0)
Update iscsigws.yml.sample to document that we cannot use ansible to
setup iSCSI objects and use the new ceph-iscsi package.
Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit 75fee55d19)
This adds support for the ceph-iscsi package during install. ceph-iscsi
does not support setting up targets/gws, luns and clients with the
current library/igw_* code. Going forward those tasks should be done with
gwcli or dashboard. ceph-iscsi will only be used if the user has no iscsi
objects setup so we do not break existing setups.
The next patch will update the iscsigws.yml.sample to document that
users must not setup any iscsi object if they want to use the new
package and tools.
Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit cbe66cec52)
The gateway_ip_list is not used in container setups, so drop it
for that case.
Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit b7b2213be1)
The ceph-iscsi-config and ceph-iscsi-cli packages were combined into
ceph-iscsi and its APIs changed. This fixes up the iscsi purge task to
support the new API and old one.
Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit b163206db7)
adding back a sleep 30s after nodes have rebooted before running
testinfra.
This was removed accidentally by d5be83e
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ca84a5359f)
Since Mimic the radosgw socket has two extra fields in the socket
name (before the .asok suffix): <pid>.<ctid>
Before:
/var/run/ceph/ceph-client.rgw.cephaio-1.asok
After:
/var/run/ceph/ceph-client.rgw.cephaio-1.16913.23928832.asok
The radosgw restart script doesn't handle this and could fail during
an upgrade.
If the SOCKETS variable isn't defined in the script then the test
command won't fail because the return code is 0
$ test -S
$ echo $?
0
There multiple issues in that script:
- The default SOCKETS value isn't defined due to a typo
SOCKET vs SOCKETS.
- Because the socket name uses the pid then we need to check the
socket name after the service restart.
- After restarting the radosgw service we need to wait few seconds
otherwise the socket won't be created.
- Update the wget parameters because the command is doing a loop.
We now use the same option than curl.
- The check_rest function doesn't test the radosgw at all due to
a wrong test command (test against a string) and always returns 0.
This needs to use the DOCKER_EXECS variable in order to execute the
command.
$ test 'wget http://192.168.100.11:8080'
$ echo $?
0
Also remove the test based on the ansible_fqdn because we only use
the ansible_hostname + rgw instance name.
Finally group all for loop into a single one.
Resolves: #3926
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c90f605b51)
This is necessary when configuring RGW with SSL because
in addition to passing specific frontend options, civetweb
appends the 's' character to the binding port and beast uses
ssl_endpoint instead of endpoint.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1722071
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
(cherry picked from commit d526803c6c)
The first py.test call didn't have the reruns option. The commit
fixes it.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3f92323f28)
Even if dashboard feature is disabled then the installer status will
still report dashboard, grafana and node-exporter roles timing.
INSTALLER STATUS **********************************
Install Ceph Monitor : Complete (0:01:21)
Install Ceph Manager : Complete (0:00:49)
Install Ceph Dashboard : Complete (0:00:00)
Install Ceph Grafana : Complete (0:00:02)
When need to set the dashboard_enabled condition on those installer
phase pre/post tasks.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0604275c11)
This environment variable was added in cb381b4 but was removed in
4d35e9e.
This commit reintroduces the change.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 02fbe76e62)
This commit moves the package installation into ceph-dashboard role.
This is needed to install ceph dasboard json file in
`/etc/grafana/dashboards/ceph-dashboard/`.
Closes: #4026
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6e2e30db54)
- There is no need to open ports 3000, 8234, 9283 on all nodes.
- Add missing rule for alertmanager (port 9093)
Closes: #4023
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 14f5fc3c86)
when `dashboard_enabled` is `True`, let's append `dashboard` and
`prometheus` modules to `ceph_mgr_modules` so they are automatically
loaded.
Closes: #4026
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a2b6f44665)
As the bz1721914 describes, the grafana_server_addr
fact is not defined if ip_version used is ipv6.
This commit adds the ip_version condition to set
correctly this fact.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1721914
Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit e655038743)
If no grafana-server group is defined while an mgr group is, that task
will fail because `hostvars[groups[grafana_server_group_name][0]` can't
return anything since `groups['grafana-server']` will be a non existing
key.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 366b309c12)
Alphabetized ceph_repository_uca keys due to errors validating when
using UCA/queens repository on Ubuntu 16.04
An exception occurred during task execution. To see the full
traceback, use -vvv. The error was:
SchemaError: -> ceph_stable_repo_uca schema item is not
alphabetically ordered
Closes: #4154
Signed-off-by: Gabriel Ramirez <gabrielramirez1109@gmail.com>
(cherry picked from commit 82262c6e8c)
- clean some leftover.
- move nfs_ganesha_[stable|dev] in group_vars so dev_setup.yml can modify them.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 45041f52fd)
To address this warning:
```
[DEPRECATION WARNING]: evaluating nfs_ganesha_dev as a bare variable, this
behaviour will go away and you might need to add |bool to the expression in the
future
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2b9fb377a8)
This task is already present in pre_requisite_non_container.yml
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit edb8d42596)
Add back the nfs-ganesha deployment testing which was removed because of
broken dependencies.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 013ae62177)
this commit bring back the nfs-ganesha testing in containerized
deployment.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9201674b5b)
This tries to first unmount any cephfs/nfs-ganesha mount point on client
nodes, then unmap any mapped rbd devices and finally it tries to remove
ceph kernel modules.
If it fails it means some resources are still busy and should be cleaned
manually before continuing to purge the cluster.
This is done early in the playbook so the cluster stays untouched until
everything is ready for that operation, otherwise if you try to redeploy
a cluster it could end up by getting confused by leftover from previous
deployment.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1337915
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 20e4852888)
There's two big issues with the current OSD restart script.
1/ We try to test if the ceph osd daemon socket exists but we use a
wildcard for the socket name : /var/run/ceph/*.asok.
This fails because we usually have multiple ceph osd sockets (or
other ceph daemon collocated) present in /var/run/ceph directory.
Currently the test fails with:
bash: line xxx: [: too many arguments
But it doesn't stop the script execution.
Instead we can specify the full ceph osd socket name because we
already know the OSD id.
2/ The container filter pattern is wrong and could matches multiple
containers resulting the script to fail.
We use the filter with two different patterns. One is with the device
name (sda, sdb, ..) and the other one is with the OSD id (ceph-osd-0,
ceph-osd-15, ..).
In both case we could match more than needed.
$ docker container ls
CONTAINER ID IMAGE NAMES
958121a7cc7d ceph-daemon:latest ceph-osd-strg0-sda
589a982d43b5 ceph-daemon:latest ceph-osd-strg0-sdb
46c7240d71f3 ceph-daemon:latest ceph-osd-strg0-sdaa
877985ec3aca ceph-daemon:latest ceph-osd-strg0-sdab
$ docker container ls -q -f "name=sda"
958121a7cc7d
46c7240d71f3
877985ec3aca
$ docker container ls
CONTAINER ID IMAGE NAMES
2db399b3ee85 ceph-daemon:latest ceph-osd-5
099dc13f08f1 ceph-daemon:latest ceph-osd-13
5d0c2fe8f121 ceph-daemon:latest ceph-osd-17
d6c7b89db1d1 ceph-daemon:latest ceph-osd-1
$ docker container ls -q -f "name=ceph-osd-1"
099dc13f08f1
5d0c2fe8f121
d6c7b89db1d1
Adding an extra '$' character at the end of the pattern solves the
problem.
Finally removing the get_container_osd_id function because it's not
used in the script at all.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 45d46541cb)
The ansible_lsb fact is based on the lsb package (lsb-base,
lsb-release or redhat-lsb-core).
If the package isn't installed on the remote host then the fact isn't
populated.
--------
"ansible_lsb": {},
--------
Switching to the ansible_distribution_release fact instead.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit dc187ea6fa)
3a100cfa52 introduced a check which is a
bit too restrictive, let's accept HEALTH_OK and HEALTH_WARN.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6dce51183b)
As per bz1718981, this commit adds higher values to check
the quorum status. This is helpful for several OSP deployments
that fail during the scale up.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1718981
Signed-off-by: fpantano <fpantano@redhat.com>
(cherry picked from commit ba73dc7b21)
The ceph-volume lvm list command takes ages to complete when having
a lot of LV devices on containerized deployment.
For instance, with 25 OSDs on a node it takes 3 mins 44s to list the
OSD.
Adding the max open files limit to the container engine cli when
executing the ceph-volume command seems to improve a lot thee
execution time ~30s.
This was impacting the OSDs creation with ceph-volume (both filestore
and bluestore) when using multiple LV devices.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b987534881)
We already set the become flag to true at a play level in the site*
playbooks so we don't need to set it at a task level.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7c3640177b)
The ceph restapi configuration was only available until Luminous
release so we don't need those leftovers for nautilus+.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit da8b7ab7fb)
`parted_results` isn't used anymore in the playbook.
By the way, `parted` seems to cause issue because it changes the
ownership on devices:
```
root@osd0 ~]# ls -l /dev/sdc*
brw-rw----. 1 root disk 8, 32 Jun 11 08:53 /dev/sdc
brw-rw----. 1 ceph ceph 8, 33 Jun 11 08:53 /dev/sdc1
brw-rw----. 1 ceph ceph 8, 34 Jun 11 08:53 /dev/sdc2
[root@osd0 ~]# parted -s /dev/sdc print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sdc: 53.7GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1049kB 1075MB 1074MB ceph block.db
2 1075MB 2149MB 1074MB ceph block.db
[root@osd0 ~]# #We can see ownerships have changed from ceph:ceph to root:disk:
[root@osd0 ~]# ls -l /dev/sdc*
brw-rw----. 1 root disk 8, 32 Jun 11 08:57 /dev/sdc
brw-rw----. 1 root disk 8, 33 Jun 11 08:57 /dev/sdc1
brw-rw----. 1 root disk 8, 34 Jun 11 08:57 /dev/sdc2
[root@osd0 ~]#
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit eece362b38)
starting an upgrade if the cluster isn't HEALTH_OK isn't a good idea.
Let's check for the cluster status before trying to upgrade.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3a100cfa52)
Otherwise it fails like following:
```
fatal: [mon0]: FAILED! => changed=false
msg: |-
Unable to enable service ceph-mgr@mon0: Failed to execute operation: Cannot send after transport endpoint shutdown
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 51b2813e04)
This commits adds the support of the installer phase for dashboard,
grafana and node-exporter roles.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c7a5967a6f)
The definitions of cephfs pools should match openstack pools.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Co-Authored-by: Simone Caronni <simone.caronni@teralytics.net>
(cherry picked from commit 67071c3169)
The ceph-agent role was used only for RHCS 2 (jewel) so it's not
usefull anymore.
The current code will fail on CentOS distribution because the rhscon
package is only avaible on Red Hat with the RHCS 2 repository and
this ceph release is supported on stable-3.0 branch.
Resolves: #4020
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7503098ca0)