Commit Graph

2559 Commits (e17c79b871600b5488148a32c994e888fff0919f)

Author SHA1 Message Date
Guillaume Abrioux 33bfb10af9 nfs: remove legacy file
this file is provided by the packaging (nfs-ganesha) so there's no need
to maintain it in ceph-ansible

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-11-22 05:11:41 +01:00
Guillaume Abrioux d06158e9d9 nfs: do not run privileged nfs container
At the moment, we bindmount the dbus socket from the host, this requires
to run the container with --privileged.
Since we now run a dedicated dbus daemon inside the same container, we
can stop running privileged nfs-ganesha containers

Related ceph-container PR : ceph/ceph-container#1517

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1725254

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-11-22 05:11:41 +01:00
Dimitri Savineau 19e9a06ab1 tox-podman: use centos 8 vagrant image
Switch the podman scenario from atomic centos 7 to centos 8 (not atomic)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-11-20 10:34:34 +01:00
VasishtaShastry 72c43cc5d9 Fixes failure of cephfs configuration using --limit
Configuration of cephfs with an existing cluster using --limit used to fail
at different tasks while running with site-docker.yml
This commit addresses both of those tasks

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1773489
Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
2019-11-18 16:44:47 +01:00
Dimitri Savineau ef2cb99f73 ceph-osd: add device class to crush rules
This adds device class support to crush rules when using the class key
in the rule dict via the create-replicated sub command.
If the class key isn't specified then we use the create-simple sub
command for backward compatibility.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-11-14 16:25:46 +01:00
Dimitri Savineau ed36a11eab move crush rule creation from mon to osd role
If we want to create crush rules with the create-replicated sub command
and device class then we need to have the OSD created before the crush
rules otherwise the device classes won't exist.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-11-14 16:25:46 +01:00
Dimitri Savineau 3e29b8d5ff ceph-defaults: pin prometheus container tags
In addition to the grafana container tag change, we need to do the same
for the prometheus container stack based on the release present in the
OSE 4.1 container image.

$ docker run --rm openshift4/ose-prometheus-node-exporter:v4.1 --version
node_exporter, version 0.17.0
  build user:       root@67fee13ed48f
  build date:       20191023-14:38:12
  go version:       go1.11.13
$ docker run --rm openshift4/ose-prometheus-alertmanager:4.1 --version
alertmanager, version 0.16.2
  build user:       root@70b79a3f29b6
  build date:       20191023-14:57:30
  go version:       go1.11.13
$ docker run --rm openshift4/ose-prometheus:4.1 --version
prometheus, version 2.7.2
  build user:       root@12da054778a3
  build date:       20191023-14:39:36
  go version:       go1.11.13

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-11-14 16:11:14 +01:00
VasishtaShastry 9a1f1626c3 Evades validation of ceph_repository_type in containerized scenario
This will prevent failure of site-docker.yml with configs in doc.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1769760

Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
Co-Authored-By: Guillaume Abrioux <gabrioux@redhat.com>
2019-11-14 15:53:22 +01:00
Dimitri Savineau 4a065cebd7 ceph-validate: add rbdmirror validation
When ceph_rbd_mirror_configure is set to true we need to ensure that
the required variables aren't empty.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1760553

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-11-07 08:57:43 -05:00
Dimitri Savineau 60cbfdc2a6 ceph-handler: Use /proc/net/unix for rgw socket
If for some reason, there's an old rgw socket file present in the
/var/run/ceph/ directory then the test command could fail with

test: xxxxxxxxx.asok: binary operator expected

$ ls -hl /var/run/ceph/
total 0
srwxr-xr-x. ceph-client.rgw.rgw0.rgw0.68.94153614631472.asok
srwxr-xr-x. ceph-client.rgw.rgw0.rgw0.68.94240997655088.asok

We can check the radosgw socket in /proc/net/unix to avoid using wildcard
in the socket name.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-11-07 14:41:11 +01:00
Dimitri Savineau ece46d33be ceph-osd: fix fs.aio-max-nr sysctl condition
[1] introduced a regression on the fs.aio-max-nr sysctl value condition.
The enable key isn't a boolean but a string because the expression isn't
evaluated.
This string output "(osd_objectstore == 'bluestore')" is always true
because item.enable condition only matches non empty string. So the
sysctl value was applyied for both filestore and bluestore backend.

[2] added the bool filter to the condition but the filter always returns
false on string and the sysctl wasn't applyed at all.

This commit fixes the enable key value by evaluating the value instead
of using the string.

[1] https://github.com/ceph/ceph-ansible/commit/08a2b58
[2] https://github.com/ceph/ceph-ansible/commit/ab54fe2

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-11-07 13:51:48 +01:00
Dimitri Savineau 2037fb87b6 ceph-defaults: pin grafana container tag to 5.2.4
The latest grafana container tag is using grafana 6.x release which could
cause issue with the ceph dashboard integration.
Considering that the grafana container in RHCS 3 is based on 5.x then we
should use the same version.

$ docker run --rm rhceph/rhceph-3-dashboard-rhel7:3 -v
Version 5.2.4 (commit: unknown-dev)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-31 18:44:51 -04:00
Dimitri Savineau 9a996aef7f ceph-osd: Remove ulimit nofile on container start
Even if this improves ceph-disk/ceph-volume performances then it also
impact the ceph-osd process.
The ceph-osd process shouldn't use 1024:4096 value for the max open
files.
Removing the ulimit option from the container engine and doing this kind
of change on the container side [1].

[1] https://github.com/ceph/ceph-container/pull/1497

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-31 10:42:09 -04:00
fmount 41b8c17356 Set grafana-server user and password in ceph-dashboard role
This change adds two tasks to set grafana-api user and password
that are required to inject dashboard layouts to the external
grafana instance.
Without these two parameters the ceph-ansible playbook fails
showing an authorization error (HTTPError: 401 Client Error:
Unauthorized").

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1767365
Signed-off-by: fmount <fpantano@redhat.com>
2019-10-31 10:29:57 -04:00
Mihai Plasoianu d3f67d63ae ceph-mon: use --admin-daemon to set default crush rule
Signed-off-by: Mihai Plasoianu <m.plasoianu@vertical.de>
2019-10-29 20:59:32 -04:00
Radu Toader f2573c9e6b nfs: support specific keys for rgw nfs user
This brings the possibility to modify the rgw nfs user to use specific
keys when those are defined.

Signed-off-by: Radu Toader <radu.m.toader@gmail.com>
2019-10-29 14:59:26 -04:00
Dimitri Savineau 15f7c7195a ceph-nfs: add nfs-ganesha-rados-grace explicitly
Since nfs-ganesha V3.0-rc4 and [1] we need to explicitly install the
nfs-ganesha-rados-grace package.

[1] https://github.com/nfs-ganesha/nfs-ganesha/commit/0fea990

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-28 16:27:36 -04:00
Dimitri Savineau b33c476f16 defaults: add user/pass auth registry variables
Add ceph_docker_registry_username and ceph_docker_registry_password
variables in ceph-defaults role so they will be present in the group_vars
samples but commented.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1763139

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-24 15:11:45 -04:00
Guillaume Abrioux 3d28773da5 mon: call mon_status from asok
since c09b82a80a392ccd0da7677c7b424ce5cd3fa5d6 in ceph/ceph we must call
mon_status from asok instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-24 10:19:16 -04:00
Dimitri Savineau d050391cbb dashboard: add ceph iscsi management
When deploying with ceph-iscsi nodes and dashboard enabled, we need to
add the ceph iscsi gateway endpoints to the dashboard configuration and
add the mgr ip address in the trusted list in the iscsi gateway
configuration file.

Closes: #4638
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1764173

https://docs.ceph.com/docs/master/mgr/dashboard/#enabling-iscsi-management

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-22 23:24:17 +02:00
Dimitri Savineau f2cb937193 ceph-iscsi: add ceph-iscsi stable repositories
This commit adds the support of the ceph-iscsi stable repository when
use ceph_repository community instead of always using the devel
repositories.
We're still using the devel repositories for rtslib and tcmu-runner in
both cases (dev and community).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-22 23:24:17 +02:00
Dimitri Savineau fd8d47da98 Revert "iscsigw: install python-requests"
We don't need this since [1]. Also this was only working for python2 and
not supporting python3.

[1] https://github.com/ceph/ceph-iscsi/commit/00f198a

This reverts commit 167737dd3d.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-22 23:24:17 +02:00
Dimitri Savineau 9ad000618f container/dashboard: run the registry auth task
When deploying with packages then the ceph-container-common role isn't
executed so the registry authentication task is ignored.

Closes: #4636

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-22 23:23:32 +02:00
Guillaume Abrioux da4215e9c0 validate: fix credentials validation
This task is failing when `ceph_docker_registry_auth` is enabled and
`ceph_docker_registry_username` is undefined with an ansible error
instead of the expected message.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1763139

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-21 13:26:55 -04:00
Dimitri Savineau 3969470fca travis: fail on ansible-lint errors
If ansible-lint reports an error then it's skipped. We should fail in
this case.

This patch also fixes the pipefail lint in the rbd mirror role

[306] Shells that use pipes should set the pipefail option

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-21 13:26:02 -04:00
Guillaume Abrioux 4e9504c939 common: do not override ceph_release when using custom repo
Otherwise it fails like following:

```
TASK [ceph-mds : allow multimds] **************************************************************************************************************************************************
Monday 22 July 2019  16:37:38 +0800 (0:00:03.269)       0:13:25.651 ***********
fatal: [rhel7u6clone1]: FAILED! => {"msg": "The conditional check 'ceph_release_num[ceph_release] == ceph_release_num.luminous' failed. The error was: error while evaluating conditional (ceph_release_num[ceph_release] == ceph_release_num.luminous): 'dict object' has no attribute u'dummy'\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-mds/tasks/create_mds_filesystems.yml': line 43, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: allow multimds\n  ^ here\n"}
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1645379

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-17 22:58:16 +02:00
Mike Christie ba141298d7 iscsi-gw: Fix rtslib installation
When using python3 the name of the rtslib rpm is python3-rtslib. The
packages that use rtslib already have code that detects the python
version and distro deps, so drop it from the ceph iscsi gw task list and
let the ceph-iscsi rpm dependency handle it.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1760930

Signed-off-by: Mike Christie <mchristi@redhat.com>
2019-10-16 12:59:31 -04:00
Guillaume Abrioux 71cebf80a6 update: follow new recommandation to upgrade mds cluster
Refact the mds cluster upgrade code in order to follow the documented
recommandation.
See: https://github.com/ceph/ceph/blob/master/doc/cephfs/upgrading.rst

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1569689

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-16 11:23:12 -04:00
Guillaume Abrioux b63bd13073 nfs: remove unnecessary set_fact in main.yml
this task is a leftover and no longer needed.
It even causes bug when collocating nfs with mon.

Closes: #4609

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-16 11:23:02 -04:00
Dimitri Savineau 0b1e9c0737 rbd-mirror: fail if the peer is not added
Due the 'failed_when: false' statement present in the peer task then
the playbook continues to ran even if the peer task was failing (like
incorrect remote peer format.

"stderr": "rbd: invalid spec 'admin@cluster1'"

This patch adds a task to list the peer present and add the peer only if
it's not already added. With this we don't need the failed_when statement
anymore.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1665877

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-16 16:27:46 +02:00
Dimitri Savineau bc701860d5 ceph-iscsi: notify rbd target services
When the iscsi gateway or the ceph configuration file change then we
need to notify the rbd target api/gw services to be restarted.
This patch also merges the rbd-target-api and rbd-target-gw handler
into a single file and listen.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-16 16:25:40 +02:00
Guillaume Abrioux cb80231725 mgr: do not copy all keyrings on all mgr
There is no need to loop over all mgr nodes to set this fact, it's even
breaking deployments because it tries to copy all mgr keyring on all
mgr.

Closes: #4602

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-15 15:06:46 -04:00
Dimitri Savineau 0f978d969b Remove validate action and notario dependency
The current ceph-validate role is using both validate action and fail
module tasks to validate the ceph configuration.
The validate action is based on the notario python library. When one of
the notario validation fails then a python stack trace is reported to the
ansible task. This output isn't understandable by users.

This patch removes the validate action and the notario depencendy. The
validation is now done with only fail ansible module.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1654790

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-15 11:34:49 +02:00
Dimitri Savineau f7fd0b6d4f lint: fix error [303,602,701,702]
[303] mktemp used in place of tempfile module
 [602] Don't compare to empty string
 [701] No 'galaxy_info' found
 [702] Use 'galaxy_tags' rather than 'categories'

This patch also changes the ansible log_path value via the
ANSIBLE_LOG_PATH environment variable in the travis configuration to
avoid warnings.

[WARNING]: log file at /home/travis/ansible/ansible.log is not writeable
and we cannot create it, aborting

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-15 10:07:52 +02:00
Dimitri Savineau fe9c5b8c68 ceph-handler: group listen topics and condition
We are using multiple listen topics with the handlers. That means that
we are notifying 4 tasks for each handler.
Instead we can group the listen on an include_tasks and based on the
group condition.

Before:

NOTIFIED HANDLER ceph-handler : set _mon_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy mon restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph mon daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _mon_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _osd_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy osd restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph osds daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _osd_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _mds_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy mds restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph mds daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _mds_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _rgw_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy rgw restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph rgw daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _rgw_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _mgr_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy mgr restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph mgr daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _mgr_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _rbdmirror_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy rbd mirror restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph rbd mirror daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _rbdmirror_handler_called after restart for mon0

After:

NOTIFIED HANDLER ceph-handler : mons handler for mon0
NOTIFIED HANDLER ceph-handler : osds handler for mon0
NOTIFIED HANDLER ceph-handler : mdss handler for mon0
NOTIFIED HANDLER ceph-handler : rgws handler for mon0
NOTIFIED HANDLER ceph-handler : mgrs handler for mon0
NOTIFIED HANDLER ceph-handler : rbdmirrors handler for mon0

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-11 15:43:58 -04:00
Guillaume Abrioux 161170524d mgr: improve mgr keyring creation
Delegating on remote node isn't necessary here since we are already
iterating over the right nodes.

Closes: #4518

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-11 09:40:07 -04:00
Guillaume Abrioux 273413186a common: do not reset `container_exec_cmd`
This commit removes some legacy tasks.

These tasks aren't needed, they cause the playbook to fail when
collocating daemons.

Closes: #4553

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-10 14:38:30 -04:00
Guillaume Abrioux 80e2d00b16 validate: prevent from installing OSD on same disk as the OS
This commit adds a validation task to prevent from installing an OSD on
the same disk as the OS.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623580

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-10 08:33:20 -04:00
Dimitri Savineau 3f6ff240b7 dashboard: update layouts before the restart
If the mgr dashboard doesn't restart fast enough then the inject
dashboard task will fail with a HTTP error 400.

Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 914, in _handle_command
    return self.handle_command(inbuf, cmd)
  File "/usr/share/ceph/mgr/dashboard/module.py", line 450, in handle_command
    push_local_dashboards()
  File "/usr/share/ceph/mgr/dashboard/grafana.py", line 132, in push_local_dashboards
    retry()
  File "/usr/share/ceph/mgr/dashboard/grafana.py", line 89, in call
    result = self.func(*self.args, **self.kwargs)
  File "/usr/share/ceph/mgr/dashboard/grafana.py", line 127, in push
    grafana.push_dashboard(body)
  File "/usr/share/ceph/mgr/dashboard/grafana.py", line 54, in push_dashboard
    response.raise_for_status()
  File "/usr/lib/python2.7/site-packages/requests/models.py", line 834, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
HTTPError: 400 Client Error: Bad Request

Instead we can trigger this task before the module restart.

Closes: #4565

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-09 09:10:27 +02:00
Guillaume Abrioux 6c6a512a72 nfs: stop nfs server service in all context
This commit moves this task in order to stop the nfs server service
regardless the deployment type desired (containerized or non
containerized).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1508506

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-07 10:24:33 -04:00
Guillaume Abrioux 47034effe0 nfs: stop nfs server service
The syntax here wasn't working, this refact fixes this task.
Also, removing the `ignore_errors: true` which was hidding the failure.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1508506

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-07 10:24:33 -04:00
Guillaume Abrioux fa9b42e98e switch_to_containers: do not re-set `ceph_uid`
This commit refacts the way we set `ceph_uid` fact in `ceph-facts` and
removes all `set_fact` tasks for `ceph_uid` in switch-to-containers playbook
to avoid duplicated code.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-07 14:15:56 +02:00
Dimitri Savineau b9e93ad7a6 ceph-dashboard: remove rgw api host,port,scheme
We don't need to have dedicated variables for the RGW integration into
the Ceph Dashboard and need to be manually filled.
Instead we can use the current values from the RGW nodes by using the
IP and port from the first RGW instance of the first RGW node via the
radosgw_address and radosgw_frontend_port variables.
We don't need to specify all RGW nodes, this will be done automatically
with one node.
The RGW api scheme is using the radosgw_frontend_ssl_certificate variable
to determine if the value is http or https. This variable is also reuse
as a condition for the ssl verify task.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-07 11:22:44 +02:00
Dimitri Savineau 249764047b ceph-dashboard: Improve https configuration
This patch moves the https dashboard configuration into a dedicated
block to avoid the multiple occurence of the dashboard_protocol
condition.
It also fixes the dashboard certificate and key variables handling in
the condition introduced by ab54fe2. Those variables aren't boolean but
strings so we can test them via the length filter.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-07 09:08:16 +02:00
Guillaume Abrioux ccc11cfc93 handler: followup on #4519
This commit adds some missing `| bool` filters.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-04 10:49:15 -04:00
Dimitri Savineau dd526cfe4e ceph-dashboard: add cluster parameter to ceph cmd
The ceph dashboard tasks didn't use the cluster option if the cluster
name isn't the default value.

Closes: #4529

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-04 16:10:22 +02:00
Guillaume Abrioux 411bd07d54 handlers: refact osd handler
This commit merges the two restart tasks into a single one, this way
it's one task less to notify.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-04 09:42:20 -04:00
Dimitri Savineau 0346871fb5 ceph-handler: don't restart all OSDs with limit
When using the ansible --limit option on one or few OSD nodes and if the
handler is triggered then we will restart the OSD service on all OSDs
nodes instead of the hosts limited by the limit value.
Even if the play is limited by the --limit value we are using all OSD
nodes from the OSD group.

  with_items: '{{ groups[osd_group_name] }}'

Instead we should iterate only on the nodes present in both OSD group and
limit list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-03 14:52:27 -04:00
Dimitri Savineau 780cf36a59 ceph-facts: fix _radosgw_address with block
e695efc introduced a regression in the _radosgw_address fact when using
the radosgw_address_block variable.
There's no item there because we don't use the items lookup. This is
only used for _monitor_address with monitor_address_block.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1758099

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-03 16:15:45 +02:00
Guillaume Abrioux 9bad239d77 common: improve keyrings generation
There is no need to get n * number of nodes the different keyrings.
Adding a `run_once: true` here avoid running a ceph command too many
times which could be impacting large cluster deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-02 13:09:50 +02:00
Dimitri Savineau ec3b687dc4 ceph-facts: use --admin-daemon to get fsid
During the rolling_update scenario, the fsid value is retrieve from the
current ceph cluster configuration via the ceph daemon config command.
This command tries first to resolve the admin socket path via the
ceph-conf command.
Unfortunately this command won't work if you have a duplicate key in the
ceph configuration even if it only produces a warning. As a result the
task will fail.

Can't get admin socket path: unable to get conf option admin_socket for
mon.xxx: warning: line 13: 'osd_memory_target' in section 'osd' redefined

Instead of using ceph daemon we can use the --admin-daemon option
because we already know what the socket admin path value based on the
ceph cluster and mon hostname values.

Closes: #4492

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-02 10:07:13 +02:00
Guillaume Abrioux 272d16e101 validate: fix gpt header check
Check for gpt header when osd scenario is lvm or lvm batch.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-01 11:46:17 -04:00
Guillaume Abrioux ed8616aa66 rbdmirror: rename a file
rename this file to be more generic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-01 10:27:51 -04:00
Guillaume Abrioux e08194dd67 rgw: refact tasks directory layout
This commit moves containerized deployment related files to `./tasks/`
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-01 10:27:51 -04:00
Guillaume Abrioux c69816c6b7 rbdmirror: refact tasks directory layout
This commit moves containerized deployment related files to `./tasks/`
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-01 10:27:51 -04:00
Guillaume Abrioux 4636f3f7e2 iscsigw: refact tasks directory layout
This commit moves containerized deployment related files to `./tasks/
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-01 10:27:51 -04:00
Guillaume Abrioux bd64167469 container: isolate systemd tasks
This commit isolates the systemd unit files generation for containers into
separate yml files in order to be able importing each corresponding roles
without playing all tasks.
This is needed so we can run ceph-ansible to render systemd unit files
so they call podman instead of docker.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-10-01 10:27:51 -04:00
Dimitri Savineau 20b1a464ec ceph-facts: update external grafana fact filter
e695efc hasn't been updated with the changes introduced in 9bb11c7 so
the ips_in_ranges filter isn't used for an external grafana instance.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-10-01 10:47:14 +02:00
Guillaume Abrioux e4444d29e0 Revert "ceph-common: install only necesarry ceph-* packages on debian"
This reverts commit 58b27ef0b3.
This is breaking debian based OS deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-09-28 08:04:12 +02:00
Boris Ranto b96c6da832 ceph-defaults: Change the default prometheus port
The old default prometheus port 9090 clashes with cockpit in rhel 8. The
9090 port is reserved for web service administration of machines. We
should change the default to something that does not clash with other
ports used in rhel 8, at least by default. The port 9092 seems like a
good choice in my testing.

Signed-off-by: Boris Ranto <branto@redhat.com>
2019-09-28 04:40:42 +02:00
Johannes Kastl 5cf22e9b31 move python-xml to raw_install_python.yml
The package python-xml is needed for ansible's zypper module to interact with
the zypper package management tool.

roles/ceph-defaults/defaults/main.yml:
Remove python-xml from variable suse_package_dependencies to only
install python-xml on SUSE/openSUSE if python is not found.
raw_install_python.yml already contains all the logic needed to check
if there is a valid python installation, so this is better suited there.

openSUSE Leap 15.x / SLES 15.x do no longer have /usr/bin/python,
only /usr/bin/python3, which already contains the xml module, so
nothing needs to be installed in that case.

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
2019-09-27 14:19:32 +02:00
Harald Jensås e695efcaf7 Replace ipaddr() with ips_in_ranges()
This change implements a filter_plugin that is used in the
ceph-facts, ceph-validate roles and infrastucture-playbooks.
The new filter plugin will return a list of all IP address
that reside in any one of the given IP ranges. The new filter
replaces the use of the ipaddr filter.

ceph.conf already support a comma separated list of CIDRs
for the public_network and cluster_network options.

Changes: [1] and [2] introduced a regression in ceph-ansible
where public_network can no longer be a comma separated list
of cidrs.

With this change a comma separated list of subnet CIDRs can
also be used for monitor_address_block and radosgw_address_block.

[1] commit: d67230b2a2
[2] commit: 20e4852888

Related-To: https://bugs.launchpad.net/tripleo/+bug/1840030
Related-To: https://bugzilla.redhat.com/show_bug.cgi?id=1740283

Closes: #4333
Please backport to stable-4.0

Signed-off-by: Harald Jensås <hjensas@redhat.com>
2019-09-27 10:11:53 +02:00
Dimitri Savineau 74ab59c4f3 ceph-dashboard: Add prometheus api host
The set-prometheus-api-host ceph dashboard subcommand was missing in
ceph-dashboard role. Only grafana and alermanager were present.
This commit also remove the trailing slash at the end of the host/url
values.

Closes: #4453

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-09-27 09:16:12 +02:00
Anthony Rusdi 58b27ef0b3 ceph-common: install only necesarry ceph-* packages on debian
Currently, ceph package only an meta-package that do not contain
actual software, but simply depend on other packages. It's been few
release since debian stretch (official), ubuntu bionic (official),
ubuntu uca repository and upstream debian-jewel.
As we only support nautilus and higher release for master branch,
I propose to drop ceph package and use ceph-base instead for repository
model other than rhcs so debian ceph install will be more minimalis.

Signed-off-by: Anthony Rusdi <33247310+antrusd@users.noreply.github.com>
2019-09-27 01:11:22 +02:00
Dimitri Savineau ca77d7bd31 ceph-nfs: Allow to configure SecType value
Depending on the infrastruture (w/o kerberos auth) then the SecType
value could be different.
Currently this value is hardcoded in the NFS Ganesha template. Instead
we can use a variable.
The default value is still the same to avoid breaking the backward
compatibility.

Closes: #4459

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-09-27 00:33:18 +02:00
liuxu 195f70897c dashboard: add grafana dashboard support on Debian based OS
download grafana dashboard files from github when running on Debian based OS

Signed-off-by: liuxu <liuxu623@gmail.com>
2019-09-26 18:49:56 +02:00
fmount 9bb11c7b2a Inject ceph grafana dashboard layouts
This change just adds the task to inject from the
ceph dashboard mgr module the required layouts
to show all the cluster metrics on the grafana
instance.
Since we're now able to push grafana layouts through
the ceph mgr module command, the dashboards configuration
template is no longer needed on containerized environments.
This commit also fixes the Vagrantfile IP static assigment
in the grafana section because it generates an issue (it's
the same of the mgr instance).
Finally, considering some deployments that use an external
grafana server instance, we reworked the 'grafana_server_addr'
assignment to address these requirements.

Signed-off-by: fmount <fpantano@redhat.com>
2019-09-26 11:12:20 -04:00
Guillaume Abrioux 167737dd3d iscsigw: install python-requests
Typical error at rbd-target-api startup:

```
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: Traceback (most recent call last):
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: File "/usr/bin/rbd-target-api", line 39, in <module>
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: from gwcli.utils import (APIRequest, valid_gateway, valid_client,
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: File "/usr/lib/python2.7/site-packages/gwcli/utils.py", line 1, in <module>
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: import requests
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: ImportError: No module named requests
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-09-26 11:35:24 +02:00
Guillaume Abrioux 5bb6a4da42 tests: set copy_admin_key at group_vars level
setting it at extra vars level prevent from setting it per node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-09-26 11:35:24 +02:00
Guillaume Abrioux ab370b6ad8 global: remove fetch_directory dependency
This commit drops the fetch_directory dependency.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-09-26 11:35:24 +02:00
Guillaume Abrioux 09e04a9197 osd: add wal_devices option support to ceph_volume module
This commit adds the `wal_devices` option support to the
ceph_volume module.
passing a devices list in `bluestore_wal_devices` will make ceph-volume
creating 1 vg using these devices to create block.wal partitions.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-09-26 11:35:24 +02:00
Guillaume Abrioux 70f1b37097 osd: update doc text in defaults/main.yml
This commit removes ceph-disk references.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-09-26 11:35:24 +02:00
Guillaume Abrioux 7b836eaa47 osd: add block_db_devices option support to ceph_volume module
This commit adds the `block_db_devices` option support to the
ceph_volume module.
passing a devices list in `dedicated_devices` will make ceph-volume
creating 1 vg using these devices to create block.db partitions for data
devices.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-09-26 11:35:24 +02:00
Guillaume Abrioux 2b97ac921b validate: check ceph_docker_registry_* length
This commit adds a condition to check whether these variables are empty.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-09-18 16:03:18 +02:00
Dimitri Savineau 9f4a99fb24 container: Allow to use registry authentication
The registry.redhat.io regsitry requires authentication so before pulling
the RHCS 4 container images from the registry we need to do the login
step.
This is done via the new ceph_docker_registry_auth variable. The
default value is false but true for RHCS setup.
When set to true, you need to provide the username and password
for the registry via the associated variables.
This patch also updates the ceph_docker_registry value for RHCS setup.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1748911

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-09-18 16:03:18 +02:00
Dimitri Savineau 5b1c15653f ceph-handler: Fix osd restart condition
In containerized deployment, the restart OSD handler couldn't be
triggered in most ansible execution.
This is due to the usage of run_once + a condition on the inventory
hostname and the last filter.
The run_once is triggered first so ansible will pick a node in the
osd group to execute the restart task. But if this node isn't the
last one in the osd group then the task is ignored. There's more
probability that the task will be ignored than executed.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-09-10 15:56:53 -04:00
Dimitri Savineau 1f505628dd rbd-mirror: Allow to copy the admin keyring
The ceph-rbd-mirror role allows to copy the admin keyring via the
copy_admin_key variable but there's actually no task in that role
doing the job.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-09-10 15:44:04 -04:00
Dimitri Savineau a3d36df025 rbd-mirror: Use the rbd mirror client keyring
The admin keyring isn't present by default on the rbd mirror nodes so
the rbd commands related to the mirroring confguration will fail.
Instead we can use the rbd mirror client keyring.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-09-10 15:44:04 -04:00
Giulio Fidente d2a2bd7c42 Look for additional names when checking ceph-nfs container status
Ganesha cannot be operated active/active, in those deployments
where it is managed by pacemaker the container name can be
different than the default.

This change uses "ceph_nfs_service_suffix" where previously
missing to ensure tasks will work with customized names.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1750005
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
2019-09-09 15:27:37 -04:00
Harald Jensås d94229204d Support comma-delimited subnets in firewall
ceph.conf supports a comma separated list of
subnet CIDR's for the public_network and the
cluster network. ceph-ansible should support
setting up the firewall for this configuration.

Closes: #4425
Related: #4333
https://docs.ceph.com/docs/nautilus/rados/configuration/network-config-ref/#network-config-settings

Signed-off-by: Harald Jensås <hjensas@redhat.com>
2019-09-09 15:20:58 -04:00
Dimitri Savineau 7e5e21741e rbd-mirror: configure pool and peer
The rbd mirror configuration was only available for non containerized
deployment and was also imcomplete.
We now enable the mirroring on the pool and add the remote peer in both
scenarios.

The default mirroring mode is set to 'pool' but can be configured via
the ceph_rbd_mirror_mode variable.

This commit also fixes an issue on the rbd mirror command if the ceph
cluster name isn't using the default value (ceph) due to a missing
--cluster parameter to the command.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1665877

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-09-06 11:00:55 -04:00
fmount 81eb091533 Fix discovered_interpreter_python variable
This change fixes the discovered_interpreter_python variable
name that was "discovered_python_interpreter" and caused a
failure in OSP deployments.

Signed-off-by: fmount <fpantano@redhat.com>
2019-09-04 09:55:30 -04:00
Dimitri Savineau 42082c0a27 lint: fix error [201,206]
[201] Trailing whitespace
 [206] Variables should have spaces before and after: {{ var_name }}

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-08-29 14:28:35 -04:00
Dimitri Savineau 65089a7fc3 ceph-common: remove ceph_stable repo on dev
When upgrading from stable to devel release with redhat community
packages, the rpm packages are not updated due to priority introduced
via a7b1e35 (starting nautilus).
We need to remove the ceph stable repositories when configuring the
dev repositories.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-08-29 14:05:13 -04:00
Dimitri Savineau 5e5d5c2d87 Add octopus release
Add the 15th ceph release: octopus.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-08-29 14:05:13 -04:00
fmount 8a666bfd15 Add http_addr option to grafana config
We have no reason to make grafana container
listen on *:<port>, so this change adds the
http_addr option to the grafana config file
and adds the related option on the wait_for
tasks.
Since grafana_server_addr should exists, we
shouldn't rely on the _current_monitor_addr
default on prometheus/grafana templates.
This change also remove this default value
that is not necessary anymore.

Signed-off-by: fmount <fpantano@redhat.com>
2019-08-29 13:00:22 -04:00
Anthony Rusdi 4c592066b7 ceph_custom_repo: define apt and rpm key for custom repo
This commit also remove the notify on new added debian repo,
force update_cache to yes and define sample ceph_custom_key vars.

Signed-off-by: Anthony Rusdi <33247310+antrusd@users.noreply.github.com>
2019-08-29 10:25:10 -04:00
Johannes Kastl 0cedc4d303 openSUSE OBS repo using ceph_stable_release
Instead of hardcoding `luminous`, use the `ceph_stable_release` variable
to point to the correct repository.

This is now uncommented in roles/ceph-defaults/defaults/main.yml to be
available, as it is only used if ceph_repository is set to 'obs'.

group_vars/*.sample files have been regenerated using the
./generate_group_vars_sample.sh script.

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
2019-08-29 10:23:56 -04:00
Johannes Kastl 4711a7d626 fix openSUSE OBS repo creation
roles/ceph-common/tasks/installs/suse_obs_repository.yml:
ansible's zypper_repository module does not know a parameter 'uri', this is
called 'repo' instead

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
2019-08-29 10:23:07 -04:00
Nick Erdmann 7953ee1b81 ceph-infra: open ceph iscsi/prometheus port
Signed-off-by: Nick Erdmann <n@nirf.de>
2019-08-28 16:09:55 -04:00
Johannes Kastl bd507fa147 set discovered_python_interpreter if ansible_python_interpreter is defined
If the user has set the `ansible_python_interpreter`, ansible will not try to
discover python, so `discovered_python_interpreter` will not be set.

Solution: Set `discovered_python_interpreter` to `ansible_python_interpreter`
if `ansible_python_interpreter` is defined

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
2019-08-27 20:54:59 +02:00
Dimitri Savineau 2b0616ecca ceph-mon: Bind mount the ca-trust directory
On containerized deployment, the mon container sometimes needs to
access to the radosgw endpoint (via the radosgw-admin command). When
using TLS on the radosgw with self-signed certificates then we need to
access to the CA certification from the mon container.
The CA certificate needs to be added on the host and then the directory
will be bind mount on the container.

Resolves: #4358

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-08-27 20:53:45 +02:00
Dimitri Savineau 49aa05b96c ceph-client: Use profile rbd in keyring caps
Like the OpenStack keyrings, we can use the profile rbd for the clients
keyring (both mon and osd).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-08-27 20:52:23 +02:00
Dimitri Savineau 717af83475 Revert "osd: add 'osd blacklist' cap for osp keyrings"
This reverts commit 2d955757ee.

The "osd blacklist" isn't an osd caps but should be used with mon caps.
Also the correct caps for this is: 'allow command "osd blacklist"'.
The current change is breaking the openstack and clients keyrings.
By using the profile rbd (which is already used) we already rely on the
ability to blacklist dead client.

Resolves: #4385

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-08-27 20:52:23 +02:00
Guillaume Abrioux 5986b26a01 global: add newline at end of file
This commit re-add a newline at end of files when it's missing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-08-23 15:56:47 +02:00
Artur Fijalkowski 011270ca69 global: make directories mode parameterizable
This commit makes it possible to parametrize the ceph directories modes.
So it changes hardocded mode for ceph related directories from 0755 to
customizable with `ceph_directories_mode` variable.

Closes: #2920

Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-08-23 09:38:17 +02:00
guihecheng a0590cae9d rgw/multisite: assign 'rgw_zone' to the exact section in ceph.conf
since the following commit:
  commit 1ac94c048f
  rgw: add support for multiple rgw instances on a single host

we have multi-instance rgw support on a single host and
the config section name of the rgw changed from
[client.rgw.$(hostname)] -> [client.rgw.$(hostname).rgwX]
when X is the sequence number: 0,1,2,...
So we should assign 'rgw_zone' item to the exact rgw instance
config section in ceph.conf

Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>
2019-08-23 08:14:10 +02:00
Guillaume Abrioux 327d564106 lint: fix error [301], add `changed_when: false` when needed
This commit fixes the error [301]:

`[301] Commands should not change things if nothing needs doing`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-08-23 00:23:47 +02:00
Guillaume Abrioux 102edaeb61 lint: fix error [306], add pipefail on shell command using pipe
This commit fixes the error [306]:

`[306] Shells that use pipes should set the pipefail option`

using `/bin/bash` as executable because Debian/Ubuntu systems use `dash`
by default which doesn't have the `-o pipefail`. (See:
https://github.com/ansible/ansible-lint/issues/497#issue-424623501)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-08-23 00:23:47 +02:00
Johannes Kastl efd38ecc88 ceph-validate: Refactor check for installation check on SUSE/openSUSE
Move the validation from roles/ceph-common/tasks/installs/install_on_suse.yml
to roles/ceph-validate/ and fix the syntax.

There are two valid combinations of `ceph_origin` and `ceph_repository` on
SUSE/openSUSE:
- ceph_origin == 'distro'
- ceph_origin == 'repository' and ceph_repository == 'obs'

The current when condition would fail even in the valid second combination,
as ceph_origin != distro would be true then

Fixes: #4362

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
2019-08-22 20:22:13 +02:00
Johannes Kastl e1b9312084 facts: fix a typo
This commit fixes a typo in roles/ceph-facts/tasks/facts.yml

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
2019-08-22 18:08:28 +02:00
Kevin Coakley e11cbbbcb1 ceph-config: Set changed_when to false on fact gathering statements
The "run 'ceph-volume lvm batch --report' to see how many osds are to be
created" and "run 'ceph-volume lvm list' to see how many osds have already been
created" statements only register the lvm_batch_report and lvm_list variables.
Running those ceph-volume commands should never produce a change on the system.
Adding changed_when: false prevents irrelevant change messages from Ansible.

Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>
2019-08-22 17:27:58 +02:00
Johannes Kastl 8e3511ddc7 fix SUSE/openSUSE naming
As SUSE 15.x and openSUSE Leap 15.x share the same base, make clear
that both are targeted by the respective tasks

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
2019-08-22 17:20:21 +02:00
Johannes Kastl cdbe958e55 roles/ceph-validate/tasks/check_system.yml: fail on unsupported SUSE versions
Fail if SUSE distributions other than 15.x are found, similar to what we have
for openSUSE

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
2019-08-22 17:17:21 +02:00
Dimitri Savineau 9a4ac46d19 ceph-osd: Add ulimit nofile on container start
On containerized deployment, the OSD entrypoint runs some ceph-volume
commands (lvm/simple scan and/or activate) which perform badly without
the ulimit option.
This option was added for all previous ceph-volume commands but not on
the ceph-osd container startup.
Also updating hard limit value to 4096 to reflect default baremetal
value.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-08-22 16:59:08 +02:00
Johannes Kastl 11aa5dbb58 ceph-nfs: fail on openSUSE Leap using distro packages
roles/ceph-validate/tasks/check_nfs.yml: fail on openSUSE Leap
using `ceph_origin = distro`, as the ganesha packages are not available from
the distribution repositories

Fixes: #4342

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
2019-08-21 09:58:54 +02:00
Johannes Kastl c721cb99cb install ceph-mds packages on SUSE/openSUSE
install packages on SUSE/openSUSE distributions, using the
same logic as on RedHat-based distributions

Fixes #4340

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
2019-08-21 09:57:56 +02:00
Guillaume Abrioux 9329bbb3af handler: do not validate the server certificate against the CA
Otherwise rgw handler ends up with an error when using https.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-08-20 13:52:15 +02:00
Johannes Kastl 504017d562 remove duplicate task installing suse dependencies
roles/ceph-common/tasks/installs/install_on_suse.yml: remove the task that
installs the dependencies, as this is done later in install_suse_packages.yml

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
2019-08-20 12:59:25 +02:00
Guillaume Abrioux 70cf2a5846 osd: remove useless condition
just like `ceph_osd_pool_default_size`, a pool size might change after an
initial deployment. Having this condition prevents from customizing the
pool in that case.
This is not needed so let's remove it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-08-19 16:17:22 +02:00
Guillaume Abrioux 4df92152c0 common: replace shell module
there is no need to use `shell` in these tasks. Let's use `command`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-08-14 16:42:02 +02:00
Guillaume Abrioux 687087fd43 osd: refact 'wait for all osd to be up' task
let's use `until` instead of doing test in bash using python oneliner
also, use `command` instead of `shell`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-08-14 16:42:02 +02:00
Guillaume Abrioux 13815ad3ca common: use discovered_interpreter_python fact
in order to use the right binary name when using python cli in command
or shell module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-08-14 16:42:02 +02:00
Guillaume Abrioux a5e359ee80 osd: update the check for 'all osd to be up'
the data structure has changed in octopus.
eg: the path to `num_osds` is now `["osdmap"]["num_osds"]`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-08-14 16:42:02 +02:00
Guillaume Abrioux 5b9b841108 mgr: refact 'wait for all mgr to be up' task
There's no need to use `shell` module here.
Instead of using `| python -c`, let's use `from_json` filter.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-08-07 10:33:54 +02:00
Dimitri Savineau 4c6ec1dccb mgr/dashboard: Fix grafana/prometheus url config
When configuring grafana/prometheus embed in the mgr/dashboard, we need
to use the address of the grafana-server node and not the current
hostname because mgr/dashboard and grafana/prometheus could be present
on different hosts.
We should instead rely on the grafana_server_addr variable and remove
the dashboard_url.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-08-06 09:34:20 +02:00
Dimitri Savineau f545b5be0d ceph-dashboard: Add run_once on delegate tasks
Because we need to execute commands from a monitor node (the first one
in the mons list) we are using delegate_to option.
If there's multiple nodes running the ceph-dashboard role then the
delegated task will be executed multiple times.
Also remove a mgr config-key option not present for nautilus+ releases.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-08-06 09:34:20 +02:00
Johannes Kastl 5ee3d96fb4 only support openSUSE Leap 15.x, fail on 42.x
openSUSE switched from 'openSUSE 13.x' to 'openSUSE Leap 42.x' and then to
'openSUSE Leap 15.x' to align with SLES15 development.
The previous logic did not correctly allow the current release, as 15.x matched
the 'less than 42.3' condition.

For now only support openSUSE Leap 15.x, and extend support once 16.x is
released (or whatever the exact version will be)

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
2019-08-05 09:46:31 -04:00
Dimitri Savineau 771f25b1f8 ceph-infra: Apply firewall rules with container
We don't have a reason to not apply firewall rules on the host when
using a containerized deployment.
The TripleO environments already manage the ceph firewall rules outside
ceph-ansible and set the configure_firewall variable to false.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1733251

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-08-01 15:16:49 +02:00
Dimitri Savineau 34036c667c ceph-grafana: Set grafana uid/gid on files
We don't need to create a grafana system user (in fact we even don't
set the righ uid to this user) because we're using a container setup.
Instead we just need to be sure to set the owner/group to 472 (grafana
user/group from the container) like we do for ceph/167.
We don't need to set the user/group recursively on /etc/grafana
directory in a dedicated task.
Also on Ubuntu system, the ceph-grafana-dashboards isn't present so on
non containerized deployment we won't have the
/etc/grafana/dashboards/ceph-dashboard directory present (coming with
the package) so we need to be sure it exists.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-08-01 10:10:56 +02:00
Guillaume Abrioux c9d80af4e0 dashboard: fix timeout usage on rgw user creation command
For some reason, this is making the playbook failing like following:

```
TASK [ceph-dashboard : create radosgw system user] ************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
task path: /home/guits/ceph-ansible/roles/ceph-dashboard/tasks/configure_dashboard.yml:106
Tuesday 30 July 2019  10:04:54 +0200 (0:00:01.910)       0:11:22.319 **********
FAILED - RETRYING: create radosgw system user (3 retries left).
FAILED - RETRYING: create radosgw system user (2 retries left).
FAILED - RETRYING: create radosgw system user (1 retries left).
fatal: [mgr0 -> mon0]: FAILED! => changed=true
  attempts: 3
  cmd: timeout 20 podman exec ceph-mon-mon0 radosgw-admin user create --uid=ceph-dashboard --display-name='Ceph dashboard' --system
  delta: '0:00:20.021973'
  end: '2019-07-30 08:06:32.656066'
  msg: non-zero return code
  rc: 124
  start: '2019-07-30 08:06:12.634093'
  stderr: 'exec failed: container_linux.go:336: starting container process caused "process_linux.go:82: copying bootstrap data to pipe caused \"write init-p: broken pipe\""'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
```

using `timeout -f -s KILL` fixes this issue.

Also, there is no need to use `shell` module here, let's switch to
`command`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-07-30 13:52:44 +02:00
Guillaume Abrioux 2d955757ee osd: add 'osd blacklist' cap for osp keyrings
This commits adds the `osd blacklist` cap on all OSP clients keyrings.

Fixes: #2296

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-07-29 09:57:25 -04:00
Dimitri Savineau d549fffdd2 ceph-osd: check container engine rc for pools
When creating OpenStack pools, we only check if the return code from
the pool list command isn't 0 (ie: if it doesn't exist). In that case,
the return code will be 2. That's why the next condition is rc != 0 for
the pool creation.
But in containerized deployment, the return code could be different if
there's a failure on the container engine command (like container not
running). In that case, the return code could but either 1 (docker) or
125 (podman) so we should fail at this point and not in the next tasks.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1732157

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-07-29 15:55:04 +02:00
Guillaume Abrioux 02beb00916 validate: add checks for grafana-server group definition
this commit adds two checks:
- check that the `[grafana-server]` group is defined
- check that the `[grafana-server]` contains at least one node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-07-29 14:42:45 +02:00
Guillaume Abrioux ec33ee7574 mgr: fix a typo
this tasks isn't using the right container_exec_cmd, that's delegating
to the wrong node.
Let's use the right fact to fix this command.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-07-29 14:42:45 +02:00
Guillaume Abrioux b9cdf341be dashboard: remove cfg80211 module installation
According to this comment [1], this seems to be needed to detect wifi
devices.

In node exporter we can see this:

```
--collector.wifi          Enable the wifi collector (default: disabled).
```

since it's enabled by default and we don't even change this in our
systemd templates for node-exporter, we can easily assume in the end
it's not needed. Therefore, let's remove this.

[1] dbf81b6b5b (diff-961545214e21efed3b84a9e178927a08L21-L23)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-07-29 14:42:45 +02:00
Guillaume Abrioux d67230b2a2 dashboard: use dedicated group only
There's no need to add complexity and trying to fallback on other group.
Let's deploy dashboard on all nodes present in grafana-server group.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-07-29 14:42:45 +02:00
Guillaume Abrioux fb1b5b3251 dashboard: enable dashboard by default
This commit enables dashboard deployment by default.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1726739

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-07-29 14:42:45 +02:00
Dimitri Savineau 07c6695d16 Remove NBSP characters
Some NBSP are still present in the yaml files.
Adding a test in travis CI.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-07-26 16:09:23 -04:00
Guillaume Abrioux 19950b5170 container: rename docker directories
Those 2 directories should be renamed to be more generic (docker vs.
podman).

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-07-24 16:31:46 +02:00
fmount fac1b030cb Avoid to setup provisioners in a fully containerized environment
This commit adds a when clause to avoid the setup of grafana
provisioners in a fully containerized scenario.
This is needed when the ceph-grafana-dashboards package is not
installed and this task could result in a wrong grafana
configuration that let the container crash.

Signed-off-by: fmount <fpantano@redhat.com>
2019-07-23 09:06:50 +02:00
Giulio Fidente edd1420217 Fix backward compat with old cephfs_pools format
Previously cephfs_pools items used to have a pgs: key but not
pgp_num: nor pg_num:

Signed-off-by: Giulio Fidente <gfidente@redhat.com>
2019-07-19 11:56:58 -04:00
Guillaume Abrioux 618dbf271d handler: fix bug in osd handlers
fbf4ed42ae introduced a bug when
container binary is podman.
podman doesn't support ps -f using regular expression, the container id
is never set in the restart script causing the handler to fail.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1721536

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-07-18 16:22:51 +02:00
Guillaume Abrioux 487d701685 validate: fail if gpt header found on unprepared devices
ceph-volume will complain if gpt headers are found on devices.
This commit checks whether a gpt header is present on devices passed in
`devices` variable and fail early.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1730541

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-07-18 07:43:55 +02:00
Dimitri Savineau 5383c2f7f3 ceph-dashboard: enable rgw options conditionally
The dashboard rgw frontend options only need to be applied when there's
some nodes present in the rgw ansible group.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-07-18 07:22:13 +02:00
Dimitri Savineau 8ab9b719fa dashboard: use variables for port value
The current port value for alertmanager, grafana, node-exporter and
prometheus is hardcoded in the roles so it's not possible to change the
port binding of those services.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-07-18 07:22:13 +02:00
Dimitri Savineau 0ae0193144 ceph-infra: update handler with daemon variable
Both ntp and chrony daemon use variable for the service name because it
could be different depending on the GNU/Linux distribution.
This has been update in 9d88d3199 for chrony but only for the start part
not for the handler.
The commit fixes this for both ntp and chrony.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-07-12 09:14:33 -04:00
Dimitri Savineau 41b44dde85 ceph-infra: Open prometheus port
The Prometheus porrt 9090 isn't open in the firewall configuration.
Also the dashboard task on the grafana node was not required because
it's already present on the mgr node.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-07-11 13:40:22 +02:00
Guillaume Abrioux ee29f7370a handler: remove legacy condition
since everything is already in a block with the same condition, it's not
needed to leave all of them on these tasks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-07-10 09:42:00 -04:00
Guillaume Abrioux e6dc3ebd8c validate: improve message printed in check_devices.yml
The message prints the whole content of the registered variable in the
playbook, this is not needed and makes the message pretty unclear and
unreadable.

```
"msg": "{'_ansible_parsed': True, 'changed': False, '_ansible_no_log': False, u'err': u'Error: Could not stat device /dev/sdf - No such file or directory.\\n', 'item': u'/dev/sdf', '_ansible_item_result': True, u'failed': False, '_ansible_item_label': u'/dev/sdf', u'msg': u\"Error while getting device information with parted script: '/sbin/parted -s -m /dev/sdf -- unit 'MiB' print'\", u'rc': 1, u'invocation': {u'module_args': {u'part_start': u'0%', u'part_end': u'100%', u'name': None, u'align': u'optimal', u'number': None, u'label': u'msdos', u'state': u'info', u'part_type': u'primary', u'flags': None, u'device': u'/dev/sdf', u'unit': u'MiB'}}, 'failed_when_result': False, '_ansible_ignore_errors': None, u'out': u''} is not a block special file!"
```

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1719023

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-07-10 09:32:11 -04:00
Dimitri Savineau 1f2a4f1910 ceph-iscsi: Update gateway config/template
- Remove gateway_keyring from the configuration file because it's
not used in ceph-iscsi 3.x release.
- Use config_template instead of template module for iscsi-gateway
configuration file. Because the file is an ini file and we might want
to override more parameters than those present in ceph-ansible.
- Because we can now set the pool name in the configuration, we should
use a variable for that. This is refact with the iscsi_pool_* variables
also used to configure the pool size.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-07-10 09:44:40 +02:00
Dimitri Savineau 5413274412 ceph-dashboard: remove bool filter for rgw vars
Some dashboard_rgw_api_* variables are using the bool filter but those
variables are strings with an empty string as default value.
So we should test the variable against an empty string instead of a
bool.

dashboard_rgw_api_host: ''
dashboard_rgw_api_port: ''
dashboard_rgw_api_scheme: ''
dashboard_rgw_api_admin_resource: ''

Resolves: #4179

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-07-10 09:42:37 +02:00
Boris Ranto 21758fcee8 dashboard: Use upstream default port
We are currently using incorrect dashboard default port. The upstream
uses 8443 instead of 8234 by default. This should get us closer to the
upstream project.

Signed-off-by: Boris Ranto <branto@redhat.com>
2019-07-10 09:17:36 +02:00
Dimitri Savineau de7f948b75 ceph-handler: fix cluster name in socket path
c90f605b5 introduces the default ceph cluster name value in the rgw
socket path for the rgw restart script. But this should use the
`cluster` variable instead.
This commit also fixes this in the osd restart script.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-07-08 13:55:35 -04:00
fmount 95bd002b35 Add package-install tag on ceph-grafana-dashboard pkg install.
According to the OSP pattern, we need the package-install tag
to control what is installed on the host. This commit just add
the missing tag to meet the TripleO requirements.

See: /issues/4197 for details

Fixes: #4197

Signed-off-by: fmount <fpantano@redhat.com>
2019-07-08 10:54:30 +02:00
Dimitri Savineau 91bef94b6c ceph-iscsi-gw: Update log directories bind mount
On containerized deployment we need to bind mount the ceph-iscsi
directory to avoid writing the logs in the container.
The /var/log/ceph directory isn't use by rbd-targe-api/gw services
because they have their own log directories.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-07-07 07:25:33 +02:00
ilyashestopalov 904532c5e2 ceph-mon: Fix cluster name parameter
The ability to add nodes with the monitor role to an existing cluster
whose name differs from the default name is fixed.

Signed-off-by: ilyashestopalov <usr.tester@yandex.ru>
2019-07-07 07:21:29 +02:00
Guillaume Abrioux a781ce881c iscsi: refact deprecated variables
This commit moves some old variables into ceph-defaults so we can move
the `use_new_ceph_iscsi` fact in ceph-facts role in order.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-07-03 22:13:19 +02:00
Mike Christie 08a6d10c32 igw: Add check for missing iqn
If the user is still using the older packages and does not setup
the target iqn you will just get a vague error message later on.
This adds a check during the validate task, so it is clear to the
user.

Signed-off-by: Mike Christie <mchristi@redhat.com>
2019-07-03 22:13:19 +02:00
Mike Christie 75fee55d19 igw: Update iscsigws.yml.sample for ceph-iscsi support
Update iscsigws.yml.sample to document that we cannot use ansible to
setup iSCSI objects and use the new ceph-iscsi package.

Signed-off-by: Mike Christie <mchristi@redhat.com>
2019-07-03 22:13:19 +02:00