Commit Graph

2338 Commits (6a5308fa7f267de2b93efd1106ef0c965f163a2c)

Author SHA1 Message Date
Dimitri Savineau 36e18e20d1 ceph-osd: check container engine rc for pools
When creating OpenStack pools, we only check if the return code from
the pool list command isn't 0 (ie: if it doesn't exist). In that case,
the return code will be 2. That's why the next condition is rc != 0 for
the pool creation.
But in containerized deployment, the return code could be different if
there's a failure on the container engine command (like container not
running). In that case, the return code could but either 1 (docker) or
125 (podman) so we should fail at this point and not in the next tasks.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1732157

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d549fffdd2)
2019-07-31 14:07:41 -04:00
Guillaume Abrioux 51af74face dashboard: fix timeout usage on rgw user creation command
For some reason, this is making the playbook failing like following:

```
TASK [ceph-dashboard : create radosgw system user] ************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
task path: /home/guits/ceph-ansible/roles/ceph-dashboard/tasks/configure_dashboard.yml:106
Tuesday 30 July 2019  10:04:54 +0200 (0:00:01.910)       0:11:22.319 **********
FAILED - RETRYING: create radosgw system user (3 retries left).
FAILED - RETRYING: create radosgw system user (2 retries left).
FAILED - RETRYING: create radosgw system user (1 retries left).
fatal: [mgr0 -> mon0]: FAILED! => changed=true
  attempts: 3
  cmd: timeout 20 podman exec ceph-mon-mon0 radosgw-admin user create --uid=ceph-dashboard --display-name='Ceph dashboard' --system
  delta: '0:00:20.021973'
  end: '2019-07-30 08:06:32.656066'
  msg: non-zero return code
  rc: 124
  start: '2019-07-30 08:06:12.634093'
  stderr: 'exec failed: container_linux.go:336: starting container process caused "process_linux.go:82: copying bootstrap data to pipe caused \"write init-p: broken pipe\""'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
```

using `timeout -f -s KILL` fixes this issue.

Also, there is no need to use `shell` module here, let's switch to
`command`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c9d80af4e0)
2019-07-30 15:08:46 +02:00
Guillaume Abrioux ea44783f3d validate: add checks for grafana-server group definition
this commit adds two checks:
- check that the `[grafana-server]` group is defined
- check that the `[grafana-server]` contains at least one node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 02beb00916)
2019-07-29 15:46:58 +02:00
Guillaume Abrioux e2b41a17c0 mgr: fix a typo
this tasks isn't using the right container_exec_cmd, that's delegating
to the wrong node.
Let's use the right fact to fix this command.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ec33ee7574)
2019-07-29 15:46:58 +02:00
Guillaume Abrioux 1a9043128c dashboard: remove cfg80211 module installation
According to this comment [1], this seems to be needed to detect wifi
devices.

In node exporter we can see this:

```
--collector.wifi          Enable the wifi collector (default: disabled).
```

since it's enabled by default and we don't even change this in our
systemd templates for node-exporter, we can easily assume in the end
it's not needed. Therefore, let's remove this.

[1] dbf81b6b5b (diff-961545214e21efed3b84a9e178927a08L21-L23)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b9cdf341be)
2019-07-29 15:46:58 +02:00
Guillaume Abrioux d0ad1cf0f1 dashboard: use dedicated group only
There's no need to add complexity and trying to fallback on other group.
Let's deploy dashboard on all nodes present in grafana-server group.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d67230b2a2)
2019-07-29 15:46:58 +02:00
Guillaume Abrioux 93826e061d dashboard: enable dashboard by default
This commit enables dashboard deployment by default.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1726739

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fb1b5b3251)

# Conflicts:
#	tox-dashboard.ini
2019-07-29 15:46:58 +02:00
Dimitri Savineau 43d625b59a Remove NBSP characters
Some NBSP are still present in the yaml files.
Adding a test in travis CI.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 07c6695d16)
2019-07-26 16:23:41 -04:00
Guillaume Abrioux 6ef73b59d2 container: rename docker directories
Those 2 directories should be renamed to be more generic (docker vs.
podman).

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 19950b5170)
2019-07-25 13:40:40 +02:00
fmount 15c745d998 Avoid to setup provisioners in a fully containerized environment
This commit adds a when clause to avoid the setup of grafana
provisioners in a fully containerized scenario.
This is needed when the ceph-grafana-dashboards package is not
installed and this task could result in a wrong grafana
configuration that let the container crash.

Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit fac1b030cb)
2019-07-24 14:16:55 +02:00
Dimitri Savineau 367dce2894 ceph-dashboard: enable rgw options conditionally
The dashboard rgw frontend options only need to be applied when there's
some nodes present in the rgw ansible group.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5383c2f7f3)
2019-07-19 20:33:42 +00:00
Dimitri Savineau 87db5aa55c dashboard: use variables for port value
The current port value for alertmanager, grafana, node-exporter and
prometheus is hardcoded in the roles so it's not possible to change the
port binding of those services.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8ab9b719fa)
2019-07-19 20:33:42 +00:00
Giulio Fidente 985165dbf7 Fix backward compat with old cephfs_pools format
Previously cephfs_pools items used to have a pgs: key but not
pgp_num: nor pg_num:

Signed-off-by: Giulio Fidente <gfidente@redhat.com>
(cherry picked from commit edd1420217)
2019-07-19 17:50:57 +00:00
Guillaume Abrioux bbfd6965e0 handler: fix bug in osd handlers
fbf4ed42ae introduced a bug when
container binary is podman.
podman doesn't support ps -f using regular expression, the container id
is never set in the restart script causing the handler to fail.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1721536

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 618dbf271d)
2019-07-18 16:49:14 +00:00
Guillaume Abrioux 4aa4496fc1 validate: fail if gpt header found on unprepared devices
ceph-volume will complain if gpt headers are found on devices.
This commit checks whether a gpt header is present on devices passed in
`devices` variable and fail early.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1730541

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 487d701685)
2019-07-18 10:32:53 +02:00
Dimitri Savineau 2d8ed4cc52 ceph-infra: update handler with daemon variable
Both ntp and chrony daemon use variable for the service name because it
could be different depending on the GNU/Linux distribution.
This has been update in 9d88d3199 for chrony but only for the start part
not for the handler.
The commit fixes this for both ntp and chrony.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0ae0193144)
2019-07-15 16:36:49 +00:00
Dimitri Savineau b87c189299 ceph-infra: Open prometheus port
The Prometheus porrt 9090 isn't open in the firewall configuration.
Also the dashboard task on the grafana node was not required because
it's already present on the mgr node.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 41b44dde85)
2019-07-11 13:41:58 +00:00
Guillaume Abrioux 2742063aee handler: remove legacy condition
since everything is already in a block with the same condition, it's not
needed to leave all of them on these tasks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ee29f7370a)
2019-07-10 15:53:26 +00:00
Guillaume Abrioux bca8ac39c2 validate: improve message printed in check_devices.yml
The message prints the whole content of the registered variable in the
playbook, this is not needed and makes the message pretty unclear and
unreadable.

```
"msg": "{'_ansible_parsed': True, 'changed': False, '_ansible_no_log': False, u'err': u'Error: Could not stat device /dev/sdf - No such file or directory.\\n', 'item': u'/dev/sdf', '_ansible_item_result': True, u'failed': False, '_ansible_item_label': u'/dev/sdf', u'msg': u\"Error while getting device information with parted script: '/sbin/parted -s -m /dev/sdf -- unit 'MiB' print'\", u'rc': 1, u'invocation': {u'module_args': {u'part_start': u'0%', u'part_end': u'100%', u'name': None, u'align': u'optimal', u'number': None, u'label': u'msdos', u'state': u'info', u'part_type': u'primary', u'flags': None, u'device': u'/dev/sdf', u'unit': u'MiB'}}, 'failed_when_result': False, '_ansible_ignore_errors': None, u'out': u''} is not a block special file!"
```

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1719023

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e6dc3ebd8c)
2019-07-10 09:37:01 -04:00
Boris Ranto 5d5e7d59fd dashboard: Use upstream default port
We are currently using incorrect dashboard default port. The upstream
uses 8443 instead of 8234 by default. This should get us closer to the
upstream project.

Signed-off-by: Boris Ranto <branto@redhat.com>
(cherry picked from commit 21758fcee8)
2019-07-10 11:49:35 +02:00
Dimitri Savineau 3bdcbb005f ceph-dashboard: remove bool filter for rgw vars
Some dashboard_rgw_api_* variables are using the bool filter but those
variables are strings with an empty string as default value.
So we should test the variable against an empty string instead of a
bool.

dashboard_rgw_api_host: ''
dashboard_rgw_api_port: ''
dashboard_rgw_api_scheme: ''
dashboard_rgw_api_admin_resource: ''

Resolves: #4179

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5413274412)
2019-07-10 11:48:58 +02:00
Dimitri Savineau c040c34d97 ceph-iscsi: Update gateway config/template
- Remove gateway_keyring from the configuration file because it's
not used in ceph-iscsi 3.x release.
- Use config_template instead of template module for iscsi-gateway
configuration file. Because the file is an ini file and we might want
to override more parameters than those present in ceph-ansible.
- Because we can now set the pool name in the configuration, we should
use a variable for that. This is refact with the iscsi_pool_* variables
also used to configure the pool size.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1f2a4f1910)
2019-07-10 09:35:21 +00:00
Dimitri Savineau f13e6642a4 ceph-handler: fix cluster name in socket path
c90f605b5 introduces the default ceph cluster name value in the rgw
socket path for the rgw restart script. But this should use the
`cluster` variable instead.
This commit also fixes this in the osd restart script.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit de7f948b75)
2019-07-08 19:57:08 +00:00
ilyashestopalov 5c6a9e1a96 ceph-mon: Fix cluster name parameter
The ability to add nodes with the monitor role to an existing cluster
whose name differs from the default name is fixed.

Signed-off-by: ilyashestopalov <usr.tester@yandex.ru>
(cherry picked from commit 904532c5e2)
2019-07-08 09:12:37 -04:00
fmount ca378f1da0 Add package-install tag on ceph-grafana-dashboard pkg install.
According to the OSP pattern, we need the package-install tag
to control what is installed on the host. This commit just add
the missing tag to meet the TripleO requirements.

See: /issues/4197 for details

Fixes: #4197

Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit 95bd002b35)
2019-07-08 10:42:41 +00:00
Dimitri Savineau cd7156efee ceph-iscsi-gw: Update log directories bind mount
On containerized deployment we need to bind mount the ceph-iscsi
directory to avoid writing the logs in the container.
The /var/log/ceph directory isn't use by rbd-targe-api/gw services
because they have their own log directories.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 91bef94b6c)
2019-07-07 07:09:42 +00:00
Guillaume Abrioux 689605b084 iscsi: refact deprecated variables
This commit moves some old variables into ceph-defaults so we can move
the `use_new_ceph_iscsi` fact in ceph-facts role in order.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a781ce881c)
2019-07-04 00:04:04 +00:00
Mike Christie ce62ac7beb igw: Add check for missing iqn
If the user is still using the older packages and does not setup
the target iqn you will just get a vague error message later on.
This adds a check during the validate task, so it is clear to the
user.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit 08a6d10c32)
2019-07-04 00:04:04 +00:00
Mike Christie cb8bab06d8 igw: Update iscsigws.yml.sample for ceph-iscsi support
Update iscsigws.yml.sample to document that we cannot use ansible to
setup iSCSI objects and use the new ceph-iscsi package.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit 75fee55d19)
2019-07-04 00:04:04 +00:00
Mike Christie 6872f7ee95 igw: Support ceph-iscsi package for install
This adds support for the ceph-iscsi package during install. ceph-iscsi
does not support setting up targets/gws, luns and clients with the
current library/igw_* code. Going forward those tasks should be done with
gwcli or dashboard. ceph-iscsi will only be used if the user has no iscsi
objects setup so we do not break existing setups.

The next patch will update the iscsigws.yml.sample to document that
users must not setup any iscsi object if they want to use the new
package and tools.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit cbe66cec52)
2019-07-04 00:04:04 +00:00
Mike Christie f180eccb84 igw: drop gateway_ip_list for container setups
The gateway_ip_list is not used in container setups, so drop it
for that case.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit b7b2213be1)
2019-07-04 00:04:04 +00:00
Mike Christie f984db5544 igw: move gateway_ip_list check to validate role
Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit d89d3e7cd6)
2019-07-04 00:04:04 +00:00
Dimitri Savineau d4a3e26534 ceph-handler: Fix rgw socket in restart script
Since Mimic the radosgw socket has two extra fields in the socket
name (before the .asok suffix): <pid>.<ctid>

Before:
  /var/run/ceph/ceph-client.rgw.cephaio-1.asok
After:
  /var/run/ceph/ceph-client.rgw.cephaio-1.16913.23928832.asok

The radosgw restart script doesn't handle this and could fail during
an upgrade.
If the SOCKETS variable isn't defined in the script then the test
command won't fail because the return code is 0

$ test -S
$ echo $?
0

There multiple issues in that script:
  - The default SOCKETS value isn't defined due to a typo
SOCKET vs SOCKETS.
  - Because the socket name uses the pid then we need to check the
socket name after the service restart.
  - After restarting the radosgw service we need to wait few seconds
otherwise the socket won't be created.
  - Update the wget parameters because the command is doing a loop.
We now use the same option than curl.
  - The check_rest function doesn't test the radosgw at all due to
a wrong test command (test against a string) and always returns 0.
This needs to use the DOCKER_EXECS variable in order to execute the
command.

$ test 'wget http://192.168.100.11:8080'
$ echo $?
0

Also remove the test based on the ansible_fqdn because we only use
the ansible_hostname + rgw instance name.

Finally group all for loop into a single one.

Resolves: #3926

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c90f605b51)
2019-07-03 15:08:35 +00:00
Giulio Fidente 72e0ac1f44 Add radosgw_frontend_ssl_certificate parameter
This is necessary when configuring RGW with SSL because
in addition to passing specific frontend options, civetweb
appends the 's' character to the binding port and beast uses
ssl_endpoint instead of endpoint.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1722071
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
(cherry picked from commit d526803c6c)
2019-07-02 20:13:09 +00:00
Guillaume Abrioux 2295a4cf0a containers: improve logging
bindmount /var/log/ceph on all containers so it's possible to retrieve
logs from the host.

related ceph-container PR: ceph/ceph-container#1408

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1710548

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 33eed78d17)
2019-07-02 11:27:34 -04:00
Guillaume Abrioux 381358f439 nfs: clean template
remove legacy options

```
ganesha.nfsd-115[main] config_errs_to_log :CONFIG :WARN :Config File (/etc/ganesha/ganesha.conf:13): Unknown parameter (Dir_Max)
ganesha.nfsd-115[main] config_errs_to_log :CONFIG :WARN :Config File (/etc/ganesha/ganesha.conf:14): Unknown parameter (Cache_FDs)

```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b725b3077e)
2019-07-02 11:01:07 +02:00
Dimitri Savineau 109883e7a5 ceph-osd: Add CONTAINER_IMAGE env variable
This environment variable was added in cb381b4 but was removed in
4d35e9e.
This commit reintroduces the change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 02fbe76e62)
2019-06-27 17:34:24 -04:00
Guillaume Abrioux bcfed47009 dashboard: move ceph-grafana-dashboards package installation
This commit moves the package installation into ceph-dashboard role.
This is needed to install ceph dasboard json file in
`/etc/grafana/dashboards/ceph-dashboard/`.

Closes: #4026

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6e2e30db54)
2019-06-26 12:03:21 -04:00
Guillaume Abrioux df0d146166 infra: refact dashboard firewall rules
- There is no need to open ports 3000, 8234, 9283 on all nodes.
- Add missing rule for alertmanager (port 9093)

Closes: #4023

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 14f5fc3c86)
2019-06-26 12:03:21 -04:00
Guillaume Abrioux 28e1ce0d8c dashboard: append mgr modules to ceph_mgr_modules
when `dashboard_enabled` is `True`, let's append `dashboard` and
`prometheus` modules to `ceph_mgr_modules` so they are automatically
loaded.

Closes: #4026

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a2b6f44665)
2019-06-26 12:03:21 -04:00
fmount 5c009d01b6 Set grafana_server_addr fact for ipv6 scenarios.
As the bz1721914 describes, the grafana_server_addr
fact is not defined if ip_version used is ipv6.
This commit adds the ip_version condition to set
correctly this fact.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1721914

Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit e655038743)
2019-06-26 12:02:29 -04:00
Guillaume Abrioux b9c49227bb facts: fix bug in grafana_server_addr fact setting
If no grafana-server group is defined while an mgr group is, that task
will fail because `hostvars[groups[grafana_server_group_name][0]` can't
return anything since `groups['grafana-server']` will be a non existing
key.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 366b309c12)
2019-06-26 15:08:44 +02:00
Guillaume Abrioux 115b457731 nfs: add missing | bool filters
To address this warning:
```
[DEPRECATION WARNING]: evaluating nfs_ganesha_dev as a bare variable, this
behaviour will go away and you might need to add |bool to the expression in the
 future
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2b9fb377a8)
2019-06-26 13:13:11 +02:00
Guillaume Abrioux bf61b5e823 nfs: remove duplicate task
This task is already present in pre_requisite_non_container.yml

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit edb8d42596)
2019-06-26 13:13:11 +02:00
Dimitri Savineau fbf4ed42ae ceph-handler: Fix OSD restart script
There's two big issues with the current OSD restart script.

1/ We try to test if the ceph osd daemon socket exists but we use a
wildcard for the socket name : /var/run/ceph/*.asok.
This fails because we usually have multiple ceph osd sockets (or
other ceph daemon collocated) present in /var/run/ceph directory.
Currently the test fails with:

bash: line xxx: [: too many arguments

But it doesn't stop the script execution.
Instead we can specify the full ceph osd socket name because we
already know the OSD id.

2/ The container filter pattern is wrong and could matches multiple
containers resulting the script to fail.
We use the filter with two different patterns. One is with the device
name (sda, sdb, ..) and the other one is with the OSD id (ceph-osd-0,
ceph-osd-15, ..).
In both case we could match more than needed.

$ docker container ls
CONTAINER ID IMAGE              NAMES
958121a7cc7d ceph-daemon:latest ceph-osd-strg0-sda
589a982d43b5 ceph-daemon:latest ceph-osd-strg0-sdb
46c7240d71f3 ceph-daemon:latest ceph-osd-strg0-sdaa
877985ec3aca ceph-daemon:latest ceph-osd-strg0-sdab
$ docker container ls -q -f "name=sda"
958121a7cc7d
46c7240d71f3
877985ec3aca

$ docker container ls
CONTAINER ID IMAGE              NAMES
2db399b3ee85 ceph-daemon:latest ceph-osd-5
099dc13f08f1 ceph-daemon:latest ceph-osd-13
5d0c2fe8f121 ceph-daemon:latest ceph-osd-17
d6c7b89db1d1 ceph-daemon:latest ceph-osd-1
$ docker container ls -q -f "name=ceph-osd-1"
099dc13f08f1
5d0c2fe8f121
d6c7b89db1d1

Adding an extra '$' character at the end of the pattern solves the
problem.

Finally removing the get_container_osd_id function because it's not
used in the script at all.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 45d46541cb)
2019-06-21 14:51:29 -04:00
Dimitri Savineau 6fd4902b55 Change ansible_lsb by ansible_distribution_release
The ansible_lsb fact is based on the lsb package (lsb-base,
lsb-release or redhat-lsb-core).
If the package isn't installed on the remote host then the fact isn't
populated.

--------
"ansible_lsb": {},
--------

Switching to the ansible_distribution_release fact instead.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit dc187ea6fa)
2019-06-21 13:36:15 -04:00
fpantano c03a1e49dd Add higher retry/delay defaults to check the quorum status.
As per bz1718981, this commit adds higher values to check
the quorum status. This is helpful for several OSP deployments
that fail during the scale up.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1718981

Signed-off-by: fpantano <fpantano@redhat.com>
(cherry picked from commit ba73dc7b21)
2019-06-20 20:03:19 -04:00
Dimitri Savineau 62d98971f2 ceph-volume: Set max open files limit on container
The ceph-volume lvm list command takes ages to complete when having
a lot of LV devices on containerized deployment.
For instance, with 25 OSDs on a node it takes 3 mins 44s to list the
OSD.
Adding the max open files limit to the container engine cli when
executing the ceph-volume command seems to improve a lot thee
execution time ~30s.

This was impacting the OSDs creation with ceph-volume (both filestore
and bluestore) when using multiple LV devices.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b987534881)
2019-06-20 20:00:53 -04:00
Dimitri Savineau 590f6026bb roles: Remove useless become (true) flag
We already set the become flag to true at a play level in the site*
playbooks so we don't need to set it at a task level.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7c3640177b)
2019-06-20 22:00:27 +00:00
Guillaume Abrioux 52ff9ce5d1 facts: add a retry on get current fsid task
sometimes it can happen the following task fails:

```
TASK [ceph-facts : get current fsid] *******************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-centos-container-update/roles/ceph-facts/tasks/facts.yml:78
Wednesday 19 June 2019  18:12:49 +0000 (0:00:00.203)       0:02:39.995 ********
fatal: [mon2 -> mon1]: FAILED! => changed=true
  cmd:
  - timeout
  - --foreground
  - -s
  - KILL
  - 600s
  - docker
  - exec
  - ceph-mon-mon1
  - ceph
  - --cluster
  - ceph
  - daemon
  - mon.mon1
  - config
  - get
  - fsid
  delta: '0:00:00.239339'
  end: '2019-06-19 18:12:49.812099'
  msg: non-zero return code
  rc: 22
  start: '2019-06-19 18:12:49.572760'
  stderr: 'admin_socket: exception getting command descriptions: [Errno 2] No such file or directory'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
```

not sure exactly why since just before this task, mon1 seems to be well
UP otherwise it wouldn't have passed the task `waiting for the
containerized monitor to join the quorum`.

As a quick fix/workaround, let's add a retry which allows us to get
around this situation:

```
TASK [ceph-facts : get current fsid] *******************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-scenario/roles/ceph-facts/tasks/facts.yml:78
Thursday 20 June 2019  15:35:07 +0000 (0:00:00.201)       0:03:47.288 *********
FAILED - RETRYING: get current fsid (3 retries left).
changed: [mon2 -> mon1] => changed=true
  attempts: 2
  cmd:
  - timeout
  - --foreground
  - -s
  - KILL
  - 600s
  - docker
  - exec
  - ceph-mon-mon1
  - ceph
  - --cluster
  - ceph
  - daemon
  - mon.mon1
  - config
  - get
  - fsid
  delta: '0:00:00.290252'
  end: '2019-06-20 15:35:13.960188'
  rc: 0
  start: '2019-06-20 15:35:13.669936'
  stderr: ''
  stderr_lines: <omitted>
  stdout: |-
    {
        "fsid": "153e159d-7ade-42a7-842c-4d04348b901e"
    }
  stdout_lines: <omitted>
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 46a2683944)
2019-06-20 14:01:33 -04:00