We were not testing the right ansible_distribution fact value for RHEL
distribution.
This commit also updates the minial RHEL version supported by RHCS.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5de74fe512)
This commit fixes a bug when trying to scale out osd nodes with
`crush_rule_config` is enabled.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1822599
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4bcc52cb2a)
The dashboard SSO mgr module requires the saml python library to be
installed. This is only a valid scenario for RHCS deployment because
the saml python library isn't available in other classic repositories.
This package is present in RHCS Tools repository so we also need to
enable it on the mgr nodes.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1820233
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6617d90733)
When `rule_name` isn't set in `crush_rules` the osd pool creation will
fail.
This commit adds a new fact `ceph_osd_pool_default_crush_rule_name` with
the default crush rule name.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1817586
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1bb9860dfd)
Creating crush rules even with no crush hierarchy configuration is a
valid scenario so we shouldn't be bound to the first task result (which
configure crush hierarchy) to be able to add new crush rules.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816989
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5b0476385c)
The latest Ceph stable release is now Octopus so the "latest" container
image tag is pointing to Octopus and not Nautilus anymore.
This commit updates the ceph_docker_image_tag with "latest-nautilus".
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Support for debian with RHCS has been dropped starting RHCS 4
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4ac99223b2)
This is no longer true, let's remove this comment given that this option
is not ignored in containerized deployments.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e551b5ba1a)
The rgw_instances_all fact is supposed to be the list of all radosgw
instances from all rgw nodes.
But the fact is always using the local rgw_instances variable so this
won't work on multiple nodes.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0487d21938)
This commit allows one to set the role for the admin user as read-only.
This can be controlled via the dashboard_admin_user_ro variable but the
default value is false for backward compatibility.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1810176
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fb69f6990c)
Since 8e8aa73 we're using grafana 5.4.3 in RHCS 4.1 via [1].
We should also update the grafana container tag from docker.io when
using the community release.
[1] registry.redhat.io/rhceph/rhceph-4-dashboard-rhel8:4
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b97a4d5201)
This option has been deprecated (As of 0.51).
By the way, ceph-ansible already sets the
auth_{service,client,cluster}_required variables.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623586
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 152c2caa9f)
This commit adds the rgw multi-instances support in ceph-handler
(restart_rgw_daemons.sh.j2)
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3626c688cf)
When using the radosgw multi instances configuration then the firewall
rules aren't adapted to that setup.
We only open the port according to the radosgw_frontend_port variable
so only the first radosgw instance port will be opened in the firewall
configuration.
We should instead iterate over the rgw_instances list.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e8bf0a0cf2)
This commit make that task retrying 5 times to start the service
firewalld to avoid failure like following:
```
TASK [ceph-infra : start firewalld] ********************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-centos-container-purge/roles/ceph-infra/tasks/configure_firewall.yml:22
Monday 09 March 2020 08:58:48 +0000 (0:00:00.963) 0:02:16.457 **********
fatal: [osd4]: FAILED! => changed=false
msg: |-
Unable to enable service firewalld: Created symlink from /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service to /usr/lib/systemd/system/firewalld.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/firewalld.service to /usr/lib/systemd/system/firewalld.service.
Failed to execute operation: Connection reset by peer
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b3d943fe9f)
Sometimes, these task can timeout for some reason.
Adding these retries can help to avoid unexcepted failures.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7a8a719e75)
There's no need to run this part of the role when upgrading clients
node. Let's skip it when rolling_update.yml is being run.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit eac207091b)
This commit adds condition in order to not try to customize pools size
when its type is erasure.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e17c79b871)
This commit adds the pg autoscaler support.
The structure for pool definition has now two additional attributes
`pg_autoscale_mode` and `target_size_ratio`, eg:
```
test:
name: "test"
pg_num: "{{ osd_pool_default_pg_num }}"
pgp_num: "{{ osd_pool_default_pg_num }}"
rule_name: "replicated_rule"
application: "rbd"
type: 1
erasure_profile: ""
expected_num_objects: ""
size: "{{ osd_pool_default_size }}"
min_size: "{{ osd_pool_default_min_size }}"
pg_autoscale_mode: False
target_size_ratio": 0.1
```
when `pg_autoscale_mode` is `True` user has to set a decent value in
`target_size_ratio`.
Given that it's a new feature, it's still disabled by default.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1782253
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 47adc2bb08)
Currently, the command executed is wrong, eg:
```
cmd:
- podman
- exec
- ceph-mon-controller-0
- ceph
- --cluster
- ceph
- osd
- pool
- create
- volumes
- '32'
- '32'
- replicated_rule
- '1'
delta: '0:00:01.625525'
end: '2020-02-27 16:41:05.232705'
item:
```
From documentation, the osd pool creation command is :
```
ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \
[crush-rule-name] [expected-num-objects]
ceph osd pool create {pool-name} {pg-num} {pgp-num} erasure \
[erasure-code-profile] [crush-rule-name] [expected_num_objects]
```
it means we pass '1' (from item.type) as value for
`expected_num_objects` by default which is very likely not what we want.
Also, this commit modifies the default value when no `rule_name` is set
to use the existing variable `osd_pool_default_crush_rule`
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808495
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bf1f125d71)
Make it so that more than one realm, zonegroup,
or zone can be created during a run of the rgw
multisite ansible playbooks.
The rgw hosts now need to be grouped into zones
and realms in the inventory.
.yml files need to be created in group_vars
for the realms and zones. Sample yaml files
are available.
Also remove multsite destroy playbook
and add --cluster before radosgw-admin commands
remove manually added rgw_zone_endpoints var
and have ceph-ansible automatically add the
correct endpoints of all the rgws in a rgw_zone
from the information provided in that rgws hostvars.
Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 71f55bd54d)
It looks like that the service module doesn't support wildcard anymore
for stopping/disabling multiple services.
fatal: [rgw0]: FAILED! => changed=false
msg: 'This module does not currently support using glob patterns,
found ''*'' in service name: ceph-radosgw@*'
...ignoring
Instead we should iterate over the rgw_instances list.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9d3b49293d)
When using the firewalld ansible module we need to be sure that the
python bindings are installed.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 90b1fc8fe9)
Since ansible 2.9 the firewalld task could not be used with service and
source in the same time anymore.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 45fb9241c0)
It doesn't make sense to start validating configuration if the ansible
version isn't the good one.
This commit moves the check_system task as the first task in the
ceph-validate role.
The ansible version test tasks are moved at the top of this file.
Also moving the iscsi kernel tests from check_system to check_iscsi
file.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1a77dd7e91)
When running environment with OSDs having ID with more than 2 digits,
some tasks don't match the system units and therefore, playbook can fail.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1805643
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a084a2a347)
This change introduces a new set of tasks to configure the
ceph dashboard backend and listen just on the mgr related
subnet (and not on '*'). For the same reason the proper
server address is added in both prometheus and alertmanger
systemd units.
This patch also adds the "dashboard_frontend_vip" parameter
to make sure we're able to support the HA model when multiple
grafana instances are deployed.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792230
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
(cherry picked from commit 15ed9eebf1)
5s as a connection timeout could be low in some setup. Let's increase
it to 10s.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 44e750ee5d)
Because we are relying on docker|podman for managing containers then we
don't need systemd to manage the process (like kill).
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5a03e0ee1c)
The ad7a5da commit introduced a regression when using TLS on haproxy
via the haproxy_frontend_ssl_certificate variable.
This cause the "stats socket" and the "tune.ssl.default-dh-param"
parameters to be on the same line resulting haproxy failing to start.
[ALERT] 351/140240 (21388) : parsing [xxxxx] : 'stats socket' : unknown
keyword 'tune.ssl.default-dh-param'. Registered
[ALERT] 351/140240 (21388) : Fatal errors found in configuration.
Fixes: #4869
Signed-off-by: Florian Faltermeier <florian.faltermeier@uibk.ac.at>
(cherry picked from commit 9d081e2453)
Both bootstrap_dirs_owner and bootstrap_dirs_group variables aren't
used anymore in the code.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c644ea9041)
The rgw user creation for the Ceph dashboard integration shouldn't be
created on secondary rgw zones.
Closes: #4707
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794351
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 16e12bf2bb)
Since ed36a11 we move the crush rules creation code from the ceph-mon to
the ceph-osd role.
To keep the backward compatibility we kept the possibility to set the
crush variables on the mons side but we didn't move the default values.
As a result, when using crush_rule_config set to true and wanted to use
the default values for crush_rules then the crush rule ansible task
creation will fail.
"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute
'crush_rules'"
This patch move the default crush variables from ceph-mon to ceph-osd
role but also use those default values when nothing is defined on the
mons side.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798864
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1fc6b33714)
The grafana_{crt,key} aren't boolean variables but strings. The default
value is an empty string so we should do the conditional on the string
length instead of the bool filter
Closes: #5053
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 15bd4cd189)
Client configuration with --limit fails without this patch
because certain tasks are only done to the first host in the
_filtered_clients list and it's likely that first host will
not be included in what's sepcified with --limit. To fix this
the _filtered_clients list should be built from all clients
in the inventory that are also in the running play.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798781
Signed-off-by: John Fulton <fulton@redhat.com>
(cherry picked from commit e4bf4857f5)
Since nfs-ganesha 2.8.3 the rados-urls library has been move to a
dedicated package.
We don't have the same nfs-ganesha 2.8.x between the community and rhcs
repositories.
community: 2.8.1
rhcs: 2.8.3
As a workaround we will install that package only for rhcs setup.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0a3e85e8ca)
The ceph_nfs_ceph_user variable is a string for the ceph-nfs role but a
list in ceph-client role.
6a6785b introduced a confusion between both variable type in the ceph-nfs
role for external ceph with ganesha.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1801319
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 10951eeea8)
When using multiple grafana hosts then we push set the grafana and
prometheus URL and push the dashboard layout to a single node.
grafana_server_addrs is the list of all grafana nodes and used during
the ceph-dashboard role (on mgr/mon nodes).
grafana_server_addr is the current grafana node used during the
ceph-grafana and ceph-prometheus role (on grafana-server nodes).
We don't have the grafana_server_addr fact duplication code between
external vs collocated nodes.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1784011
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c6e96699f7)
This commit increases the default values for the following variable
consumed in switch-from-non-containerized-to-containerized-ceph-daemons.yml
playbook.
This also moves these variables in `ceph-defaults` role so the user can
set different values if needed.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783223
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3700aa5385)
If for some reason, there's an old rgw socket file present in the
/var/run/ceph/ directory then the test command could fail with
test: xxxxxxxxx.asok: binary operator expected
$ ls -hl /var/run/ceph/
total 0
srwxr-xr-x. ceph-client.rgw.rgw0.rgw0.68.94153614631472.asok
srwxr-xr-x. ceph-client.rgw.rgw0.rgw0.68.94240997655088.asok
We can check the radosgw socket in /proc/net/unix to avoid using wildcard
in the socket name.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 60cbfdc2a6)
Since de8f2a9 the lvm2 package installation has been moved from ceph-osd
role to ceph-container-engine role.
But the scope wasn't limited to the OSD nodes only.
This commit fixes this behaviour.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fa8aa8c864)
RHCS 4 is available for both RHEL 7 and 8 so we should also enable the
cdn repositories for that distribution.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1796853
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9b40a959b9)
During a rolling update we will run the ceph iscsigw tasks that start
the daemons then run the configure_iscsi.yml tasks which can create
iscsi objects like targets, disks, clients, etc. The problem is that
once the daemons are started they will accept confifguration requests,
or may want to update the system themself. Those operations can then
conflict with the configure_iscsi.yml tasks that setup objects and we
can end up in crashes due to the kernel being in a unsupported state.
This could also happen during creation, but is less likely due to no
objects being setup yet, so there are no watchers or users accessing the
gws yet. The fix in this patch works for both update and initial setup.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1795806
Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit 77f3b5d51b)
When no monitor group is present in the inventory, this task fails.
This affects only non-containerized deployments.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e7bc079405)
The [rgw] section in the ceph.conf file or via the ceph_conf_overrides
variable doesn't exist and has no effect.
To apply overrides to all radosgw instances we should use either the
[global] or [client] sections.
Overrides per radosgw instance should still use the
[client.rgw.{instance-name}] section.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794552
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2f07b85131)
Otherwise, if the variables contains a '$' it will be interpreted as a BASH
variable.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8c3759f8ce)
This commit adds a task to make sure user set a custom password for
`grafana_admin_password` and `dashboard_admin_password` variables.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1795509
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 99328545de)
When using different name between the inventory_hostname and the
ansible_hostname then the _container_exec_cmd fact will get a wrong
value based on the inventory_hostname instead of the ansible_hostname.
This happens when the ceph cluster is already running (update/upgrade).
Later the container exec commands will fail because the container name
is wrong.
We should always set the _container_exec_cmd based on the
ansible_hostname fact.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1795792
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1fcafffdad)
We must call `container_exec_cmd` from the right monitor node otherwise
the value of the fact might mistmatch between the delegated node and the
node being played.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794900
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2f919f8971)
Given that we delegate to the first monitor, we must read the value of
`container_exec_cmd` from this node.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792320
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit eb9112d8fb)
fact 'running_mon' is set once 'grep' successfully exits with 'rc == 0'
Signed-off-by: Vytenis Sabaliauskas <vytenis.sabaliauskas@protonmail.com>
(cherry picked from commit ed1eaa1f38)
Because we need to manage legacy ceph-disk based OSD with ceph-volume
then we need a way to know the osd_objectstore in the container.
This was done like this previously with ceph-disk so we should also
do it with ceph-volume.
Note that this won't have any impact for ceph-volume lvm based OSD.
Rename docker_env_args fact to container_env_args and move the container
condition on the include_tasks call.
Remove OSD_DMCRYPT env variable from the ceph-osd template because it's
now included in the container_env_args variable.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792122
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c9e1fe3d92)
In 3c31b19ab3, I fixed the `customize pool
size` task by replacing `item.size` with `item.value.size`. However, I
missed the same issue in the `when` condition.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 3842aa1a30)
When unsetting the noup flag, we must call container_exec_cmd from the
delegated node (first mon member)
Also, adding a `run_once: true` because this task needs to be run only 1
time.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792320
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 22865cde9c)
Iterating over all monitors in order to delegate a `
{{ container_binary }}` fails when collocating mgrs with mons, because
ceph-facts reset `container_exec_cmd` to point to the first member of
the monitor group.
The idea is to force `container_exec_cmd` to be reset in ceph-mgr.
This commit also removes the `container_exec_cmd_mgr` fact.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1791282
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8dcbcecd71)
The trusted_ip_list parameter for the rbd-target-api service doesn't
support ipv6 address with bracket.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787531
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bd87d69183)
Before this patch, the lvm2 package installation was done during the
ceph-osd role.
However we were running ceph-volume command in the ceph-config role
before ceph-osd. If lvm2 wasn't installed then the ceph-volume command
fails:
error checking path "/run/lock/lvm": stat /run/lock/lvm: no such file or
directory
This wasn't visible before because lvm2 was automatically installed as
docker dependency but it's not the same for podman on CentOS 8.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit de8f2a9f83)
since fd1718f379, we must use `_devices`
when deploying with lvm batch scenario.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5558664f37)
There is no need to run this part of the playbook when upgrading the
cluter.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit af6875706a)
This commit lets add-osd.yml in place but mark the deprecation of the
playbook.
Scaling up OSDs is now possible using --limit
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3496a0efa2)
We don't need to executed the grafana fact everytime but only during
the dashboard deployment.
Especially for ceph-grafana, ceph-prometheus and ceph-dashboard roles.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790303
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f940e695ab)
To avoid confusion, let's change the default value from `0.0.0.0` to
`x.x.x.x`.
Users might think setting `0.0.0.0` will make the daemon binding on all
interfaces.
Fixes: #4827
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fc02fc98eb)
This commit refact the condition in the loop of that task so all
potential osd ids found are well started.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790212
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 58e6bfed2d)
When ceph_rbd_mirror_configure is set to true we need to ensure that
the required variables aren't empty.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1760553
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4a065cebd7)
cf8c6a3 moves the 'wait for all osds' task from openstack_config to the
main tasks list.
But the openstack_config code was executed only on the last OSD node.
We don't need to do this check on all OSD node so we need to add set
run_once to true on that task.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5bd1cf40eb)
When creating crush rules with device class parameter we need to be sure
that all OSDs are up and running because the device class list is
is populated with this information.
This is now enable for all scenario not openstack_config only.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cf8c6a3849)
This adds device class support to crush rules when using the class key
in the rule dict via the create-replicated sub command.
If the class key isn't specified then we use the create-simple sub
command for backward compatibility.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ef2cb99f73)
If we want to create crush rules with the create-replicated sub command
and device class then we need to have the OSD created before the crush
rules otherwise the device classes won't exist.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ed36a11eab)
We must exclude the devices already used and prepared by ceph-disk when
doing the lvm batch report. Otherwise it fails because ceph-volume
complains about GPT header.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786682
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fd1718f379)
We must pick up a mon which actually exists in ceph-facts in order to
detect if a cluster is running. Otherwise, it will state no cluster is
already running which will end up deploying a new monitor isolated in a
new quorum.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 86f3eeb717)
Only the ipv4 addresses from the nodes running the dashboard mgr module
were added to the trusted_ip_list configuration file on the iscsigws
nodes.
This also add the iscsi gateways with ipv6 configuration to the ceph
dashboard.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787531
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 70eba66182)
RadosGW pools can be created by setting
```yaml
rgw_create_pools:
.rgw.root:
pg_num: 512
size: 2
```
for instance. However, doing so would create pools of size
`osd_pool_default_size` regardless of the `size` value. This was due to
the fact that the Ansible task used
```
{{ item.size | default(osd_pool_default_size) }}
```
as the pool size value, but `item.size` is always undefined; the
correct variable is `item.value.size`.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 3c31b19ab3)
411bd07d54 introduced a bug in handlers
using `handler_*_status` instead of `hostvars[item]['handler_*_status']`
causes handlers to be triggered in anycase even though
`handler_*_status` was set to `False` on a specific node.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 30200802d9)
Since RHEL 8.1 we need to add the ganesha_t type to the permissive
SELinux list.
Otherwise the nfs-ganesha service won't start.
This was done on RHEL 7 previously and part of the nfs-ganesha-selinux
package on RHEL 8.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786110
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d758125290)
The grafana-server group name was hardcoded for the grafana/prometheus
firewalld tasks condition.
We should we the associated variable : grafana_server_group_name
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2c06678cde)
Instead of using multiple dashboard_enabled condition in the
configure_firewall file we could just have the condition once
and include the dedicated tasks list.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f4c261ef90)
When there's no mgr group defined in the ansible inventory then the
mgrs are deployed implicitly on the mons nodes.
If the dashboard is enabled then we need to open the dashboard port on
the node that is running the ceph mgr process (mgr or mon).
The current code only allow to open that port on the mgr nodes when they
are present explicitly in the inventory but not implicitly.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783520
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4535985188)
that task is delegated on the first mon so we should always use the
`discovered_interpreter_python` from that node.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5adb735c78)
A recent change in ceph/ceph prevent from having username in the
password:
`Error EINVAL: Password cannot contain username.`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0756fa467d)
This commit isolates and adds an explicit comment about variables not
intended to be modified by the user.
Fixes: #4828
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a234338eff)
Typical error:
```
type=AVC msg=audit(1575367499.582:3210): avc: denied { search } for pid=26680 comm="node_exporter" name="1" dev="proc" ino=11528 scontext=system_u:system_r:container_t:s0:c100,c1014 tcontext=system_u:system_r:init_t:s0 tclass=dir permissive=0
```
node_exporter needs to be run as privileged to avoid avc denied error
since it gathers lot of information on the host.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1762168
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d245eb7e7d)
The md devices (RAID software) aren't excluded from the devices list in
the auto discovery scenario.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1764601
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 014f51c2a4)
When using `osd_auto_discovery`, `devices` is built multiple times due
to multiple runs of `ceph-facts` role. It end up with duplicate
instances of a same device in the list.
Using `unique` filter when building the list fixes this issue.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 23b1f43897)
The wait_for ansible module doesn't support the backets on IPv6 address
so need to remove them.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1769710
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 55adc10be3)
In addition to the grafana container tag change, we need to do the same
for the prometheus container stack based on the release present in the
OSE 4.1 container image.
$ docker run --rm openshift4/ose-prometheus-node-exporter:v4.1 --version
node_exporter, version 0.17.0
build user: root@67fee13ed48f
build date: 20191023-14:38:12
go version: go1.11.13
$ docker run --rm openshift4/ose-prometheus-alertmanager:4.1 --version
alertmanager, version 0.16.2
build user: root@70b79a3f29b6
build date: 20191023-14:57:30
go version: go1.11.13
$ docker run --rm openshift4/ose-prometheus:4.1 --version
prometheus, version 2.7.2
build user: root@12da054778a3
build date: 20191023-14:39:36
go version: go1.11.13
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3e29b8d5ff)
in order to be able to call container_binary without having to run the
whole ceph-facts role.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fe5ffe589e)
All containers are removed when systemd stops them.
There is no need to call this module in purge container playbook.
This commit also removes all docker_image task and remove all container
images in the final cleanup play.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1776736
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d23383a820)
When using the shortname, the URL for active alert launches with short
hostname and fails to connect to the server.
This commit changes the template in order to use the fqdn.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1765485
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a8d76d72d7)
This commit makes the ceph-dashboard role only printing ceph-dashboard
URL of the nodes present in grafana-server group
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1762163
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cc0c1ce301)
This is needed to avoid following error:
```
ERROR! The requested handler 'restart ceph mons' was not found in either the main handlers list nor in the listening handlers list
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1777829
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a43a872105)
Configuration of cephfs with an existing cluster using --limit used to fail
at different tasks while running with site-docker.yml
This commit addresses both of those tasks
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1773489
Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
(cherry picked from commit 72c43cc5d9)
This will prevent failure of site-docker.yml with configs in doc.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1769760
Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
Co-Authored-By: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9a1f1626c3)
[1] introduced a regression on the fs.aio-max-nr sysctl value condition.
The enable key isn't a boolean but a string because the expression isn't
evaluated.
This string output "(osd_objectstore == 'bluestore')" is always true
because item.enable condition only matches non empty string. So the
sysctl value was applyied for both filestore and bluestore backend.
[2] added the bool filter to the condition but the filter always returns
false on string and the sysctl wasn't applyed at all.
This commit fixes the enable key value by evaluating the value instead
of using the string.
[1] https://github.com/ceph/ceph-ansible/commit/08a2b58
[2] https://github.com/ceph/ceph-ansible/commit/ab54fe2
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ece46d33be)
The latest grafana container tag is using grafana 6.x release which could
cause issue with the ceph dashboard integration.
Considering that the grafana container in RHCS 3 is based on 5.x then we
should use the same version.
$ docker run --rm rhceph/rhceph-3-dashboard-rhel7:3 -v
Version 5.2.4 (commit: unknown-dev)
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2037fb87b6)
Even if this improves ceph-disk/ceph-volume performances then it also
impact the ceph-osd process.
The ceph-osd process shouldn't use 1024:4096 value for the max open
files.
Removing the ulimit option from the container engine and doing this kind
of change on the container side [1].
[1] https://github.com/ceph/ceph-container/pull/1497
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9a996aef7f)
This change adds two tasks to set grafana-api user and password
that are required to inject dashboard layouts to the external
grafana instance.
Without these two parameters the ceph-ansible playbook fails
showing an authorization error (HTTPError: 401 Client Error:
Unauthorized").
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1767365
Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit 41b8c17356)
Add ceph_docker_registry_username and ceph_docker_registry_password
variables in ceph-defaults role so they will be present in the group_vars
samples but commented.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1763139
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b33c476f16)
This commit adds the support of the ceph-iscsi stable repository when
use ceph_repository community instead of always using the devel
repositories.
We're still using the devel repositories for rtslib and tcmu-runner in
both cases (dev and community).
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f2cb937193)
When deploying with packages then the ceph-container-common role isn't
executed so the registry authentication task is ignored.
Closes: #4636
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9ad000618f)
If ansible-lint reports an error then it's skipped. We should fail in
this case.
This patch also fixes the pipefail lint in the rbd mirror role
[306] Shells that use pipes should set the pipefail option
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3969470fca)
[303] mktemp used in place of tempfile module
[602] Don't compare to empty string
[701] No 'galaxy_info' found
[702] Use 'galaxy_tags' rather than 'categories'
This patch also changes the ansible log_path value via the
ANSIBLE_LOG_PATH environment variable in the travis configuration to
avoid warnings.
[WARNING]: log file at /home/travis/ansible/ansible.log is not writeable
and we cannot create it, aborting
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f7fd0b6d4f)
This task is failing when `ceph_docker_registry_auth` is enabled and
`ceph_docker_registry_username` is undefined with an ansible error
instead of the expected message.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1763139
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit da4215e9c0)
Otherwise it fails like following:
```
TASK [ceph-mds : allow multimds] **************************************************************************************************************************************************
Monday 22 July 2019 16:37:38 +0800 (0:00:03.269) 0:13:25.651 ***********
fatal: [rhel7u6clone1]: FAILED! => {"msg": "The conditional check 'ceph_release_num[ceph_release] == ceph_release_num.luminous' failed. The error was: error while evaluating conditional (ceph_release_num[ceph_release] == ceph_release_num.luminous): 'dict object' has no attribute u'dummy'\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-mds/tasks/create_mds_filesystems.yml': line 43, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: allow multimds\n ^ here\n"}
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1645379
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4e9504c939)
this task is a leftover and no longer needed.
It even causes bug when collocating nfs with mon.
Closes: #4609
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b63bd13073)
When using python3 the name of the rtslib rpm is python3-rtslib. The
packages that use rtslib already have code that detects the python
version and distro deps, so drop it from the ceph iscsi gw task list and
let the ceph-iscsi rpm dependency handle it.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1760930
Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit ba141298d7)
Due the 'failed_when: false' statement present in the peer task then
the playbook continues to ran even if the peer task was failing (like
incorrect remote peer format.
"stderr": "rbd: invalid spec 'admin@cluster1'"
This patch adds a task to list the peer present and add the peer only if
it's not already added. With this we don't need the failed_when statement
anymore.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1665877
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0b1e9c0737)
When the iscsi gateway or the ceph configuration file change then we
need to notify the rbd target api/gw services to be restarted.
This patch also merges the rbd-target-api and rbd-target-gw handler
into a single file and listen.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bc701860d5)
There is no need to loop over all mgr nodes to set this fact, it's even
breaking deployments because it tries to copy all mgr keyring on all
mgr.
Closes: #4602
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cb80231725)
We are using multiple listen topics with the handlers. That means that
we are notifying 4 tasks for each handler.
Instead we can group the listen on an include_tasks and based on the
group condition.
Before:
NOTIFIED HANDLER ceph-handler : set _mon_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy mon restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph mon daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _mon_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _osd_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy osd restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph osds daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _osd_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _mds_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy mds restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph mds daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _mds_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _rgw_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy rgw restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph rgw daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _rgw_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _mgr_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy mgr restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph mgr daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _mgr_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _rbdmirror_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy rbd mirror restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph rbd mirror daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _rbdmirror_handler_called after restart for mon0
After:
NOTIFIED HANDLER ceph-handler : mons handler for mon0
NOTIFIED HANDLER ceph-handler : osds handler for mon0
NOTIFIED HANDLER ceph-handler : mdss handler for mon0
NOTIFIED HANDLER ceph-handler : rgws handler for mon0
NOTIFIED HANDLER ceph-handler : mgrs handler for mon0
NOTIFIED HANDLER ceph-handler : rbdmirrors handler for mon0
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fe9c5b8c68)
This commit merges the two restart tasks into a single one, this way
it's one task less to notify.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 411bd07d54)
The current ceph-validate role is using both validate action and fail
module tasks to validate the ceph configuration.
The validate action is based on the notario python library. When one of
the notario validation fails then a python stack trace is reported to the
ansible task. This output isn't understandable by users.
This patch removes the validate action and the notario depencendy. The
validation is now done with only fail ansible module.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1654790
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0f978d969b)
Delegating on remote node isn't necessary here since we are already
iterating over the right nodes.
Closes: #4518
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 161170524d)
This commit adds a validation task to prevent from installing an OSD on
the same disk as the OS.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623580
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 80e2d00b16)
This commit removes some legacy tasks.
These tasks aren't needed, they cause the playbook to fail when
collocating daemons.
Closes: #4553
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 273413186a)
If the mgr dashboard doesn't restart fast enough then the inject
dashboard task will fail with a HTTP error 400.
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 914, in _handle_command
return self.handle_command(inbuf, cmd)
File "/usr/share/ceph/mgr/dashboard/module.py", line 450, in handle_command
push_local_dashboards()
File "/usr/share/ceph/mgr/dashboard/grafana.py", line 132, in push_local_dashboards
retry()
File "/usr/share/ceph/mgr/dashboard/grafana.py", line 89, in call
result = self.func(*self.args, **self.kwargs)
File "/usr/share/ceph/mgr/dashboard/grafana.py", line 127, in push
grafana.push_dashboard(body)
File "/usr/share/ceph/mgr/dashboard/grafana.py", line 54, in push_dashboard
response.raise_for_status()
File "/usr/lib/python2.7/site-packages/requests/models.py", line 834, in raise_for_status
raise HTTPError(http_error_msg, response=self)
HTTPError: 400 Client Error: Bad Request
Instead we can trigger this task before the module restart.
Closes: #4565
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3f6ff240b7)
This commit moves this task in order to stop the nfs server service
regardless the deployment type desired (containerized or non
containerized).
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1508506
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6c6a512a72)
The syntax here wasn't working, this refact fixes this task.
Also, removing the `ignore_errors: true` which was hidding the failure.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1508506
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 47034effe0)
We don't need to have dedicated variables for the RGW integration into
the Ceph Dashboard and need to be manually filled.
Instead we can use the current values from the RGW nodes by using the
IP and port from the first RGW instance of the first RGW node via the
radosgw_address and radosgw_frontend_port variables.
We don't need to specify all RGW nodes, this will be done automatically
with one node.
The RGW api scheme is using the radosgw_frontend_ssl_certificate variable
to determine if the value is http or https. This variable is also reuse
as a condition for the ssl verify task.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b9e93ad7a6)
This commit refacts the way we set `ceph_uid` fact in `ceph-facts` and
removes all `set_fact` tasks for `ceph_uid` in switch-to-containers playbook
to avoid duplicated code.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fa9b42e98e)
This patch moves the https dashboard configuration into a dedicated
block to avoid the multiple occurence of the dashboard_protocol
condition.
It also fixes the dashboard certificate and key variables handling in
the condition introduced by ab54fe2. Those variables aren't boolean but
strings so we can test them via the length filter.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 249764047b)
The ceph dashboard tasks didn't use the cluster option if the cluster
name isn't the default value.
Closes: #4529
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit dd526cfe4e)
When using the ansible --limit option on one or few OSD nodes and if the
handler is triggered then we will restart the OSD service on all OSDs
nodes instead of the hosts limited by the limit value.
Even if the play is limited by the --limit value we are using all OSD
nodes from the OSD group.
with_items: '{{ groups[osd_group_name] }}'
Instead we should iterate only on the nodes present in both OSD group and
limit list.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0346871fb5)
e695efc introduced a regression in the _radosgw_address fact when using
the radosgw_address_block variable.
There's no item there because we don't use the items lookup. This is
only used for _monitor_address with monitor_address_block.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1758099
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 780cf36a59)
There is no need to get n * number of nodes the different keyrings.
Adding a `run_once: true` here avoid running a ceph command too many
times which could be impacting large cluster deployment.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9bad239d77)
During the rolling_update scenario, the fsid value is retrieve from the
current ceph cluster configuration via the ceph daemon config command.
This command tries first to resolve the admin socket path via the
ceph-conf command.
Unfortunately this command won't work if you have a duplicate key in the
ceph configuration even if it only produces a warning. As a result the
task will fail.
Can't get admin socket path: unable to get conf option admin_socket for
mon.xxx: warning: line 13: 'osd_memory_target' in section 'osd' redefined
Instead of using ceph daemon we can use the --admin-daemon option
because we already know what the socket admin path value based on the
ceph cluster and mon hostname values.
Closes: #4492
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ec3b687dc4)
Check for gpt header when osd scenario is lvm or lvm batch.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 272d16e101)
This commit moves containerized deployment related files to `./tasks/`
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e08194dd67)
This commit moves containerized deployment related files to `./tasks/`
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c69816c6b7)
This commit moves containerized deployment related files to `./tasks/
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4636f3f7e2)
This commit isolates the systemd unit files generation for containers into
separate yml files in order to be able importing each corresponding roles
without playing all tasks.
This is needed so we can run ceph-ansible to render systemd unit files
so they call podman instead of docker.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bd64167469)
e695efc hasn't been updated with the changes introduced in 9bb11c7 so
the ips_in_ranges filter isn't used for an external grafana instance.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 20b1a464ec)
The old default prometheus port 9090 clashes with cockpit in rhel 8. The
9090 port is reserved for web service administration of machines. We
should change the default to something that does not clash with other
ports used in rhel 8, at least by default. The port 9092 seems like a
good choice in my testing.
Signed-off-by: Boris Ranto <branto@redhat.com>
(cherry picked from commit b96c6da832)
This reverts commit 58b27ef0b3.
This is breaking debian based OS deployments.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e4444d29e0)
The package python-xml is needed for ansible's zypper module to interact with
the zypper package management tool.
roles/ceph-defaults/defaults/main.yml:
Remove python-xml from variable suse_package_dependencies to only
install python-xml on SUSE/openSUSE if python is not found.
raw_install_python.yml already contains all the logic needed to check
if there is a valid python installation, so this is better suited there.
openSUSE Leap 15.x / SLES 15.x do no longer have /usr/bin/python,
only /usr/bin/python3, which already contains the xml module, so
nothing needs to be installed in that case.
Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
(cherry picked from commit 5cf22e9b31)
This change implements a filter_plugin that is used in the
ceph-facts, ceph-validate roles and infrastucture-playbooks.
The new filter plugin will return a list of all IP address
that reside in any one of the given IP ranges. The new filter
replaces the use of the ipaddr filter.
ceph.conf already support a comma separated list of CIDRs
for the public_network and cluster_network options.
Changes: [1] and [2] introduced a regression in ceph-ansible
where public_network can no longer be a comma separated list
of cidrs.
With this change a comma separated list of subnet CIDRs can
also be used for monitor_address_block and radosgw_address_block.
[1] commit: d67230b2a2
[2] commit: 20e4852888
Related-To: https://bugs.launchpad.net/tripleo/+bug/1840030
Related-To: https://bugzilla.redhat.com/show_bug.cgi?id=1740283Closes: #4333
Please backport to stable-4.0
Signed-off-by: Harald Jensås <hjensas@redhat.com>
(cherry picked from commit e695efcaf7)
Depending on the infrastruture (w/o kerberos auth) then the SecType
value could be different.
Currently this value is hardcoded in the NFS Ganesha template. Instead
we can use a variable.
The default value is still the same to avoid breaking the backward
compatibility.
Closes: #4459
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ca77d7bd31)
The set-prometheus-api-host ceph dashboard subcommand was missing in
ceph-dashboard role. Only grafana and alermanager were present.
This commit also remove the trailing slash at the end of the host/url
values.
Closes: #4453
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 74ab59c4f3)
Currently, ceph package only an meta-package that do not contain
actual software, but simply depend on other packages. It's been few
release since debian stretch (official), ubuntu bionic (official),
ubuntu uca repository and upstream debian-jewel.
As we only support nautilus and higher release for master branch,
I propose to drop ceph package and use ceph-base instead for repository
model other than rhcs so debian ceph install will be more minimalis.
Signed-off-by: Anthony Rusdi <33247310+antrusd@users.noreply.github.com>
(cherry picked from commit 58b27ef0b3)
download grafana dashboard files from github when running on Debian based OS
Signed-off-by: liuxu <liuxu623@gmail.com>
(cherry picked from commit 195f70897c)
This change just adds the task to inject from the
ceph dashboard mgr module the required layouts
to show all the cluster metrics on the grafana
instance.
Since we're now able to push grafana layouts through
the ceph mgr module command, the dashboards configuration
template is no longer needed on containerized environments.
This commit also fixes the Vagrantfile IP static assigment
in the grafana section because it generates an issue (it's
the same of the mgr instance).
Finally, considering some deployments that use an external
grafana server instance, we reworked the 'grafana_server_addr'
assignment to address these requirements.
Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit 9bb11c7b2a)
setting it at extra vars level prevent from setting it per node.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5bb6a4da42)
This commit adds the `wal_devices` option support to the
ceph_volume module.
passing a devices list in `bluestore_wal_devices` will make ceph-volume
creating 1 vg using these devices to create block.wal partitions.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 09e04a9197)
This commit adds the `block_db_devices` option support to the
ceph_volume module.
passing a devices list in `dedicated_devices` will make ceph-volume
creating 1 vg using these devices to create block.db partitions for data
devices.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7b836eaa47)
This commit adds a condition to check whether these variables are empty.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2b97ac921b)
The registry.redhat.io regsitry requires authentication so before pulling
the RHCS 4 container images from the registry we need to do the login
step.
This is done via the new ceph_docker_registry_auth variable. The
default value is false but true for RHCS setup.
When set to true, you need to provide the username and password
for the registry via the associated variables.
This patch also updates the ceph_docker_registry value for RHCS setup.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1748911
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9f4a99fb24)
In containerized deployment, the restart OSD handler couldn't be
triggered in most ansible execution.
This is due to the usage of run_once + a condition on the inventory
hostname and the last filter.
The run_once is triggered first so ansible will pick a node in the
osd group to execute the restart task. But if this node isn't the
last one in the osd group then the task is ignored. There's more
probability that the task will be ignored than executed.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5b1c15653f)
The ceph-rbd-mirror role allows to copy the admin keyring via the
copy_admin_key variable but there's actually no task in that role
doing the job.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1f505628dd)
The admin keyring isn't present by default on the rbd mirror nodes so
the rbd commands related to the mirroring confguration will fail.
Instead we can use the rbd mirror client keyring.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a3d36df025)
Ganesha cannot be operated active/active, in those deployments
where it is managed by pacemaker the container name can be
different than the default.
This change uses "ceph_nfs_service_suffix" where previously
missing to ensure tasks will work with customized names.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1750005
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
(cherry picked from commit d2a2bd7c42)
The rbd mirror configuration was only available for non containerized
deployment and was also imcomplete.
We now enable the mirroring on the pool and add the remote peer in both
scenarios.
The default mirroring mode is set to 'pool' but can be configured via
the ceph_rbd_mirror_mode variable.
This commit also fixes an issue on the rbd mirror command if the ceph
cluster name isn't using the default value (ceph) due to a missing
--cluster parameter to the command.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1665877
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7e5e21741e)
This change fixes the discovered_interpreter_python variable
name that was "discovered_python_interpreter" and caused a
failure in OSP deployments.
Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit 81eb091533)
Instead of hardcoding `luminous`, use the `ceph_stable_release` variable
to point to the correct repository.
This is now uncommented in roles/ceph-defaults/defaults/main.yml to be
available, as it is only used if ceph_repository is set to 'obs'.
group_vars/*.sample files have been regenerated using the
./generate_group_vars_sample.sh script.
Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
(cherry picked from commit 0cedc4d303)
We have no reason to make grafana container
listen on *:<port>, so this change adds the
http_addr option to the grafana config file
and adds the related option on the wait_for
tasks.
Since grafana_server_addr should exists, we
shouldn't rely on the _current_monitor_addr
default on prometheus/grafana templates.
This change also remove this default value
that is not necessary anymore.
Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit 8a666bfd15)
[201] Trailing whitespace
[206] Variables should have spaces before and after: {{ var_name }}
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 42082c0a27)
roles/ceph-common/tasks/installs/suse_obs_repository.yml:
ansible's zypper_repository module does not know a parameter 'uri', this is
called 'repo' instead
Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
(cherry picked from commit 4711a7d626)
This commit fixes the error [301]:
`[301] Commands should not change things if nothing needs doing`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 327d564106)
This commit fixes the error [306]:
`[306] Shells that use pipes should set the pipefail option`
using `/bin/bash` as executable because Debian/Ubuntu systems use `dash`
by default which doesn't have the `-o pipefail`. (See:
https://github.com/ansible/ansible-lint/issues/497#issue-424623501)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 102edaeb61)
On containerized deployment, the mon container sometimes needs to
access to the radosgw endpoint (via the radosgw-admin command). When
using TLS on the radosgw with self-signed certificates then we need to
access to the CA certification from the mon container.
The CA certificate needs to be added on the host and then the directory
will be bind mount on the container.
Resolves: #4358
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2b0616ecca)
Like the OpenStack keyrings, we can use the profile rbd for the clients
keyring (both mon and osd).
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 49aa05b96c)
This reverts commit 2d955757ee.
The "osd blacklist" isn't an osd caps but should be used with mon caps.
Also the correct caps for this is: 'allow command "osd blacklist"'.
The current change is breaking the openstack and clients keyrings.
By using the profile rbd (which is already used) we already rely on the
ability to blacklist dead client.
Resolves: #4385
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 717af83475)
If the user has set the `ansible_python_interpreter`, ansible will not try to
discover python, so `discovered_python_interpreter` will not be set.
Solution: Set `discovered_python_interpreter` to `ansible_python_interpreter`
if `ansible_python_interpreter` is defined
Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
(cherry picked from commit bd507fa147)
since the following commit:
commit 1ac94c048f
rgw: add support for multiple rgw instances on a single host
we have multi-instance rgw support on a single host and
the config section name of the rgw changed from
[client.rgw.$(hostname)] -> [client.rgw.$(hostname).rgwX]
when X is the sequence number: 0,1,2,...
So we should assign 'rgw_zone' item to the exact rgw instance
config section in ceph.conf
Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>
(cherry picked from commit a0590cae9d)
This commit makes it possible to parametrize the ceph directories modes.
So it changes hardocded mode for ceph related directories from 0755 to
customizable with `ceph_directories_mode` variable.
Closes: #2920
Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 011270ca69)
On containerized deployment, the OSD entrypoint runs some ceph-volume
commands (lvm/simple scan and/or activate) which perform badly without
the ulimit option.
This option was added for all previous ceph-volume commands but not on
the ceph-osd container startup.
Also updating hard limit value to 4096 to reflect default baremetal
value.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9a4ac46d19)
The "run 'ceph-volume lvm batch --report' to see how many osds are to be
created" and "run 'ceph-volume lvm list' to see how many osds have already been
created" statements only register the lvm_batch_report and lvm_list variables.
Running those ceph-volume commands should never produce a change on the system.
Adding changed_when: false prevents irrelevant change messages from Ansible.
Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>
(cherry picked from commit e11cbbbcb1)
This commit fixes a typo in roles/ceph-facts/tasks/facts.yml
Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
(cherry picked from commit e1b9312084)
roles/ceph-validate/tasks/check_nfs.yml: fail on openSUSE Leap
using `ceph_origin = distro`, as the ganesha packages are not available from
the distribution repositories
Fixes: #4342
Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
(cherry picked from commit 11aa5dbb58)
Otherwise rgw handler ends up with an error when using https.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9329bbb3af)
install packages on SUSE/openSUSE distributions, using the
same logic as on RedHat-based distributions
Fixes#4340
Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
(cherry picked from commit c721cb99cb)
roles/ceph-common/tasks/installs/install_on_suse.yml: remove the task that
installs the dependencies, as this is done later in install_suse_packages.yml
Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
(cherry picked from commit 504017d562)
This commits adds the `osd blacklist` cap on all OSP clients keyrings.
Fixes: #2296
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2d955757ee)
openSUSE switched from 'openSUSE 13.x' to 'openSUSE Leap 42.x' and then to
'openSUSE Leap 15.x' to align with SLES15 development.
The previous logic did not correctly allow the current release, as 15.x matched
the 'less than 42.3' condition.
For now only support openSUSE Leap 15.x, and extend support once 16.x is
released (or whatever the exact version will be)
Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
(cherry picked from commit 5ee3d96fb4)
just like `ceph_osd_pool_default_size`, a pool size might change after an
initial deployment. Having this condition prevents from customizing the
pool in that case.
This is not needed so let's remove it.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 70cf2a5846)
there is no need to use `shell` in these tasks. Let's use `command`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4df92152c0)