always set these facts on monitor nodes whatever we run with `--limit`.
Otherwise, playbook will fail when using `--limit` on nodes where these
facts are used on a delegated task to monitor.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e5e81843e9)
The dashboard nodes (alertmanager, grafana, node-exporter, and prometheus)
were not manage during the docker to podman migration.
This adds the systemd container template of those services to a dedicated
file (systemd.yml) in order to include it in the docker2podman playbook.
This also adds the dashboard container images pull from docker to podman.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1829389
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 252e78b4e4)
The current ganesha log directory is only present in the container
and not bind mount on the host.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 222fe4abd8)
Same fix as `ceph-rgw` for `rgw_create_pools` pool names that contain Jinja
templates.
See #5348 for details.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 444b46ea24)
It is common to set templated pool names in `rgw_create_pools`, e.g.
```yaml
rgw_create_pools:
"{{ rgw_zone }}.rgw.buckets.index":
pg_num: 16
size: 3
type: replicated
```
This worked fine with Ansible 2.8, but broke in Ansible 2.9 due to a change in
the way `with_dict` works [1].
This commit replaces the use of `with_dict` with
```yaml
loop: "{{ rgw_create_pools | dict2items }}"
```
which works as intended and expands the template in the pool name.
[1]: https://docs.ansible.com/ansible/latest/porting_guides/porting_guide_2.9.html#loopsCloses#5348
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit d2b7670c7d)
When using radosgw_interface and IPv6 setup then the _radosgw_address
fact doesn't use square brackets compared to the radosgw_address and
radosgw_address_block configuration.
Closes: #5325
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ed4f23d530)
This change allows the operator to refresh the
ceph dashboard admin role on multiple ceph-ansible
executions.
In the current state the role is set only when the
user is created, and there's no way to change it if
the user exists.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1826002
Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit 5eb363e033)
this commit removes the task which enable application on cephfs pools.
See: https://tracker.ceph.com/issues/43761Fixes: #5278
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 86dc6f8206)
The '==' jinja2 operator (or 'equalto') has been introduced in jinja2
2.8.
On EL7, jinja2 version is 2.7 so the operator isn't present creating
templating error like:
The error was: TemplateRuntimeError: no test named '=='
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1747206
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 34e6e8e06c)
Since ea2b654d9 we're not running the rados command from the monitor
nodes but from the ganesha node. Unfortunately we don't have the
required keyring on that node to run the rados command as we don't
import the right keyring.
This commit restores the workflow for internal ganesha deployment like
before ea2b654d9 but keeps the rados commands from the ganesha node for
external deployment until we have a better design.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8a890306ad)
Fix the condition on the keyring copy task that prevent the ganesha
keyring to be created in the /var/lib/ceph directory.
Also ensure that the directory exists first.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1831285
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 748ac4b928)
The condition is missing an index here which makes the playbook failing.
Typical error:
```
The conditional check 'not item.get('skipped', False)' failed. The error was: error while evaluating conditional (not item.get('skipped', False)): 'list object' has no attribute 'get'",
```
Also, adds the missing '/keyring' on the `exec_cmd_nfs` fact.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1831342
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cf460274c7)
15ed9ee introduced a regression for the mgr dashboard daemon using
IPv6 since the mgr dashboard configuration doesn't support brackets.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1827299
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f1728929cd)
This adds CentOS 8 support for containerized deployment allowing podman
installation as the default container engine for this distribution.
Closes: #5130
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This was removed in Ansible 2.9.
[DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of
using `result|version_compare` use `result is version_compare`. This
feature will be removed in version 2.9. Deprecation warnings can be
disabled by setting deprecation_warnings=False in ansible.cfg.
Rename 'version_compare' to the function 'version'.
version_compose was renamed to version since ansible 2.5
Signed-off-by: abaird-rh <abaird@redhat.com>
(cherry picked from commit eb71244bfd)
This commit creates an empty rados index object even when deploying
standalone nfs-ganesha.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1822328
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ea2b654d95)
We were not testing the right ansible_distribution fact value for RHEL
distribution.
This commit also updates the minial RHEL version supported by RHCS.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5de74fe512)
This commit fixes a bug when trying to scale out osd nodes with
`crush_rule_config` is enabled.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1822599
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4bcc52cb2a)
The dashboard SSO mgr module requires the saml python library to be
installed. This is only a valid scenario for RHCS deployment because
the saml python library isn't available in other classic repositories.
This package is present in RHCS Tools repository so we also need to
enable it on the mgr nodes.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1820233
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6617d90733)
When `rule_name` isn't set in `crush_rules` the osd pool creation will
fail.
This commit adds a new fact `ceph_osd_pool_default_crush_rule_name` with
the default crush rule name.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1817586
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1bb9860dfd)
Creating crush rules even with no crush hierarchy configuration is a
valid scenario so we shouldn't be bound to the first task result (which
configure crush hierarchy) to be able to add new crush rules.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816989
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5b0476385c)
The latest Ceph stable release is now Octopus so the "latest" container
image tag is pointing to Octopus and not Nautilus anymore.
This commit updates the ceph_docker_image_tag with "latest-nautilus".
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Support for debian with RHCS has been dropped starting RHCS 4
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4ac99223b2)
This is no longer true, let's remove this comment given that this option
is not ignored in containerized deployments.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e551b5ba1a)
The rgw_instances_all fact is supposed to be the list of all radosgw
instances from all rgw nodes.
But the fact is always using the local rgw_instances variable so this
won't work on multiple nodes.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0487d21938)
This commit allows one to set the role for the admin user as read-only.
This can be controlled via the dashboard_admin_user_ro variable but the
default value is false for backward compatibility.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1810176
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fb69f6990c)
Since 8e8aa73 we're using grafana 5.4.3 in RHCS 4.1 via [1].
We should also update the grafana container tag from docker.io when
using the community release.
[1] registry.redhat.io/rhceph/rhceph-4-dashboard-rhel8:4
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b97a4d5201)
This option has been deprecated (As of 0.51).
By the way, ceph-ansible already sets the
auth_{service,client,cluster}_required variables.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623586
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 152c2caa9f)
This commit adds the rgw multi-instances support in ceph-handler
(restart_rgw_daemons.sh.j2)
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3626c688cf)
When using the radosgw multi instances configuration then the firewall
rules aren't adapted to that setup.
We only open the port according to the radosgw_frontend_port variable
so only the first radosgw instance port will be opened in the firewall
configuration.
We should instead iterate over the rgw_instances list.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e8bf0a0cf2)
This commit make that task retrying 5 times to start the service
firewalld to avoid failure like following:
```
TASK [ceph-infra : start firewalld] ********************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-centos-container-purge/roles/ceph-infra/tasks/configure_firewall.yml:22
Monday 09 March 2020 08:58:48 +0000 (0:00:00.963) 0:02:16.457 **********
fatal: [osd4]: FAILED! => changed=false
msg: |-
Unable to enable service firewalld: Created symlink from /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service to /usr/lib/systemd/system/firewalld.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/firewalld.service to /usr/lib/systemd/system/firewalld.service.
Failed to execute operation: Connection reset by peer
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b3d943fe9f)
Sometimes, these task can timeout for some reason.
Adding these retries can help to avoid unexcepted failures.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7a8a719e75)
There's no need to run this part of the role when upgrading clients
node. Let's skip it when rolling_update.yml is being run.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit eac207091b)
This commit adds condition in order to not try to customize pools size
when its type is erasure.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e17c79b871)
This commit adds the pg autoscaler support.
The structure for pool definition has now two additional attributes
`pg_autoscale_mode` and `target_size_ratio`, eg:
```
test:
name: "test"
pg_num: "{{ osd_pool_default_pg_num }}"
pgp_num: "{{ osd_pool_default_pg_num }}"
rule_name: "replicated_rule"
application: "rbd"
type: 1
erasure_profile: ""
expected_num_objects: ""
size: "{{ osd_pool_default_size }}"
min_size: "{{ osd_pool_default_min_size }}"
pg_autoscale_mode: False
target_size_ratio": 0.1
```
when `pg_autoscale_mode` is `True` user has to set a decent value in
`target_size_ratio`.
Given that it's a new feature, it's still disabled by default.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1782253
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 47adc2bb08)
Currently, the command executed is wrong, eg:
```
cmd:
- podman
- exec
- ceph-mon-controller-0
- ceph
- --cluster
- ceph
- osd
- pool
- create
- volumes
- '32'
- '32'
- replicated_rule
- '1'
delta: '0:00:01.625525'
end: '2020-02-27 16:41:05.232705'
item:
```
From documentation, the osd pool creation command is :
```
ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \
[crush-rule-name] [expected-num-objects]
ceph osd pool create {pool-name} {pg-num} {pgp-num} erasure \
[erasure-code-profile] [crush-rule-name] [expected_num_objects]
```
it means we pass '1' (from item.type) as value for
`expected_num_objects` by default which is very likely not what we want.
Also, this commit modifies the default value when no `rule_name` is set
to use the existing variable `osd_pool_default_crush_rule`
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808495
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bf1f125d71)
Make it so that more than one realm, zonegroup,
or zone can be created during a run of the rgw
multisite ansible playbooks.
The rgw hosts now need to be grouped into zones
and realms in the inventory.
.yml files need to be created in group_vars
for the realms and zones. Sample yaml files
are available.
Also remove multsite destroy playbook
and add --cluster before radosgw-admin commands
remove manually added rgw_zone_endpoints var
and have ceph-ansible automatically add the
correct endpoints of all the rgws in a rgw_zone
from the information provided in that rgws hostvars.
Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 71f55bd54d)
It looks like that the service module doesn't support wildcard anymore
for stopping/disabling multiple services.
fatal: [rgw0]: FAILED! => changed=false
msg: 'This module does not currently support using glob patterns,
found ''*'' in service name: ceph-radosgw@*'
...ignoring
Instead we should iterate over the rgw_instances list.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9d3b49293d)
When using the firewalld ansible module we need to be sure that the
python bindings are installed.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 90b1fc8fe9)
Since ansible 2.9 the firewalld task could not be used with service and
source in the same time anymore.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 45fb9241c0)
It doesn't make sense to start validating configuration if the ansible
version isn't the good one.
This commit moves the check_system task as the first task in the
ceph-validate role.
The ansible version test tasks are moved at the top of this file.
Also moving the iscsi kernel tests from check_system to check_iscsi
file.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1a77dd7e91)
When running environment with OSDs having ID with more than 2 digits,
some tasks don't match the system units and therefore, playbook can fail.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1805643
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a084a2a347)
This change introduces a new set of tasks to configure the
ceph dashboard backend and listen just on the mgr related
subnet (and not on '*'). For the same reason the proper
server address is added in both prometheus and alertmanger
systemd units.
This patch also adds the "dashboard_frontend_vip" parameter
to make sure we're able to support the HA model when multiple
grafana instances are deployed.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792230
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
(cherry picked from commit 15ed9eebf1)
5s as a connection timeout could be low in some setup. Let's increase
it to 10s.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 44e750ee5d)
Because we are relying on docker|podman for managing containers then we
don't need systemd to manage the process (like kill).
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5a03e0ee1c)
The ad7a5da commit introduced a regression when using TLS on haproxy
via the haproxy_frontend_ssl_certificate variable.
This cause the "stats socket" and the "tune.ssl.default-dh-param"
parameters to be on the same line resulting haproxy failing to start.
[ALERT] 351/140240 (21388) : parsing [xxxxx] : 'stats socket' : unknown
keyword 'tune.ssl.default-dh-param'. Registered
[ALERT] 351/140240 (21388) : Fatal errors found in configuration.
Fixes: #4869
Signed-off-by: Florian Faltermeier <florian.faltermeier@uibk.ac.at>
(cherry picked from commit 9d081e2453)
Both bootstrap_dirs_owner and bootstrap_dirs_group variables aren't
used anymore in the code.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c644ea9041)
The rgw user creation for the Ceph dashboard integration shouldn't be
created on secondary rgw zones.
Closes: #4707
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794351
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 16e12bf2bb)
Since ed36a11 we move the crush rules creation code from the ceph-mon to
the ceph-osd role.
To keep the backward compatibility we kept the possibility to set the
crush variables on the mons side but we didn't move the default values.
As a result, when using crush_rule_config set to true and wanted to use
the default values for crush_rules then the crush rule ansible task
creation will fail.
"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute
'crush_rules'"
This patch move the default crush variables from ceph-mon to ceph-osd
role but also use those default values when nothing is defined on the
mons side.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798864
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1fc6b33714)
The grafana_{crt,key} aren't boolean variables but strings. The default
value is an empty string so we should do the conditional on the string
length instead of the bool filter
Closes: #5053
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 15bd4cd189)
Client configuration with --limit fails without this patch
because certain tasks are only done to the first host in the
_filtered_clients list and it's likely that first host will
not be included in what's sepcified with --limit. To fix this
the _filtered_clients list should be built from all clients
in the inventory that are also in the running play.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798781
Signed-off-by: John Fulton <fulton@redhat.com>
(cherry picked from commit e4bf4857f5)
Since nfs-ganesha 2.8.3 the rados-urls library has been move to a
dedicated package.
We don't have the same nfs-ganesha 2.8.x between the community and rhcs
repositories.
community: 2.8.1
rhcs: 2.8.3
As a workaround we will install that package only for rhcs setup.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0a3e85e8ca)
The ceph_nfs_ceph_user variable is a string for the ceph-nfs role but a
list in ceph-client role.
6a6785b introduced a confusion between both variable type in the ceph-nfs
role for external ceph with ganesha.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1801319
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 10951eeea8)
When using multiple grafana hosts then we push set the grafana and
prometheus URL and push the dashboard layout to a single node.
grafana_server_addrs is the list of all grafana nodes and used during
the ceph-dashboard role (on mgr/mon nodes).
grafana_server_addr is the current grafana node used during the
ceph-grafana and ceph-prometheus role (on grafana-server nodes).
We don't have the grafana_server_addr fact duplication code between
external vs collocated nodes.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1784011
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c6e96699f7)
This commit increases the default values for the following variable
consumed in switch-from-non-containerized-to-containerized-ceph-daemons.yml
playbook.
This also moves these variables in `ceph-defaults` role so the user can
set different values if needed.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783223
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3700aa5385)
If for some reason, there's an old rgw socket file present in the
/var/run/ceph/ directory then the test command could fail with
test: xxxxxxxxx.asok: binary operator expected
$ ls -hl /var/run/ceph/
total 0
srwxr-xr-x. ceph-client.rgw.rgw0.rgw0.68.94153614631472.asok
srwxr-xr-x. ceph-client.rgw.rgw0.rgw0.68.94240997655088.asok
We can check the radosgw socket in /proc/net/unix to avoid using wildcard
in the socket name.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 60cbfdc2a6)
Since de8f2a9 the lvm2 package installation has been moved from ceph-osd
role to ceph-container-engine role.
But the scope wasn't limited to the OSD nodes only.
This commit fixes this behaviour.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fa8aa8c864)
RHCS 4 is available for both RHEL 7 and 8 so we should also enable the
cdn repositories for that distribution.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1796853
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9b40a959b9)
During a rolling update we will run the ceph iscsigw tasks that start
the daemons then run the configure_iscsi.yml tasks which can create
iscsi objects like targets, disks, clients, etc. The problem is that
once the daemons are started they will accept confifguration requests,
or may want to update the system themself. Those operations can then
conflict with the configure_iscsi.yml tasks that setup objects and we
can end up in crashes due to the kernel being in a unsupported state.
This could also happen during creation, but is less likely due to no
objects being setup yet, so there are no watchers or users accessing the
gws yet. The fix in this patch works for both update and initial setup.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1795806
Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit 77f3b5d51b)
When no monitor group is present in the inventory, this task fails.
This affects only non-containerized deployments.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e7bc079405)
The [rgw] section in the ceph.conf file or via the ceph_conf_overrides
variable doesn't exist and has no effect.
To apply overrides to all radosgw instances we should use either the
[global] or [client] sections.
Overrides per radosgw instance should still use the
[client.rgw.{instance-name}] section.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794552
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2f07b85131)
Otherwise, if the variables contains a '$' it will be interpreted as a BASH
variable.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8c3759f8ce)
This commit adds a task to make sure user set a custom password for
`grafana_admin_password` and `dashboard_admin_password` variables.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1795509
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 99328545de)
When using different name between the inventory_hostname and the
ansible_hostname then the _container_exec_cmd fact will get a wrong
value based on the inventory_hostname instead of the ansible_hostname.
This happens when the ceph cluster is already running (update/upgrade).
Later the container exec commands will fail because the container name
is wrong.
We should always set the _container_exec_cmd based on the
ansible_hostname fact.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1795792
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1fcafffdad)
We must call `container_exec_cmd` from the right monitor node otherwise
the value of the fact might mistmatch between the delegated node and the
node being played.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794900
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2f919f8971)
Given that we delegate to the first monitor, we must read the value of
`container_exec_cmd` from this node.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792320
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit eb9112d8fb)
fact 'running_mon' is set once 'grep' successfully exits with 'rc == 0'
Signed-off-by: Vytenis Sabaliauskas <vytenis.sabaliauskas@protonmail.com>
(cherry picked from commit ed1eaa1f38)
Because we need to manage legacy ceph-disk based OSD with ceph-volume
then we need a way to know the osd_objectstore in the container.
This was done like this previously with ceph-disk so we should also
do it with ceph-volume.
Note that this won't have any impact for ceph-volume lvm based OSD.
Rename docker_env_args fact to container_env_args and move the container
condition on the include_tasks call.
Remove OSD_DMCRYPT env variable from the ceph-osd template because it's
now included in the container_env_args variable.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792122
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c9e1fe3d92)
In 3c31b19ab3, I fixed the `customize pool
size` task by replacing `item.size` with `item.value.size`. However, I
missed the same issue in the `when` condition.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 3842aa1a30)
When unsetting the noup flag, we must call container_exec_cmd from the
delegated node (first mon member)
Also, adding a `run_once: true` because this task needs to be run only 1
time.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792320
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 22865cde9c)
Iterating over all monitors in order to delegate a `
{{ container_binary }}` fails when collocating mgrs with mons, because
ceph-facts reset `container_exec_cmd` to point to the first member of
the monitor group.
The idea is to force `container_exec_cmd` to be reset in ceph-mgr.
This commit also removes the `container_exec_cmd_mgr` fact.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1791282
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8dcbcecd71)
The trusted_ip_list parameter for the rbd-target-api service doesn't
support ipv6 address with bracket.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787531
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bd87d69183)
Before this patch, the lvm2 package installation was done during the
ceph-osd role.
However we were running ceph-volume command in the ceph-config role
before ceph-osd. If lvm2 wasn't installed then the ceph-volume command
fails:
error checking path "/run/lock/lvm": stat /run/lock/lvm: no such file or
directory
This wasn't visible before because lvm2 was automatically installed as
docker dependency but it's not the same for podman on CentOS 8.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit de8f2a9f83)
since fd1718f379, we must use `_devices`
when deploying with lvm batch scenario.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5558664f37)
There is no need to run this part of the playbook when upgrading the
cluter.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit af6875706a)
This commit lets add-osd.yml in place but mark the deprecation of the
playbook.
Scaling up OSDs is now possible using --limit
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3496a0efa2)
We don't need to executed the grafana fact everytime but only during
the dashboard deployment.
Especially for ceph-grafana, ceph-prometheus and ceph-dashboard roles.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790303
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f940e695ab)
To avoid confusion, let's change the default value from `0.0.0.0` to
`x.x.x.x`.
Users might think setting `0.0.0.0` will make the daemon binding on all
interfaces.
Fixes: #4827
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fc02fc98eb)
This commit refact the condition in the loop of that task so all
potential osd ids found are well started.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790212
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 58e6bfed2d)
When ceph_rbd_mirror_configure is set to true we need to ensure that
the required variables aren't empty.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1760553
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4a065cebd7)
cf8c6a3 moves the 'wait for all osds' task from openstack_config to the
main tasks list.
But the openstack_config code was executed only on the last OSD node.
We don't need to do this check on all OSD node so we need to add set
run_once to true on that task.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5bd1cf40eb)
When creating crush rules with device class parameter we need to be sure
that all OSDs are up and running because the device class list is
is populated with this information.
This is now enable for all scenario not openstack_config only.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cf8c6a3849)
This adds device class support to crush rules when using the class key
in the rule dict via the create-replicated sub command.
If the class key isn't specified then we use the create-simple sub
command for backward compatibility.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ef2cb99f73)
If we want to create crush rules with the create-replicated sub command
and device class then we need to have the OSD created before the crush
rules otherwise the device classes won't exist.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ed36a11eab)
We must exclude the devices already used and prepared by ceph-disk when
doing the lvm batch report. Otherwise it fails because ceph-volume
complains about GPT header.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786682
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fd1718f379)