Commit Graph

4651 Commits (da9891da1e8b9a8c91077c74e54a9df8ebb7070d)
 

Author SHA1 Message Date
Dimitri Savineau da9891da1e ceph-handler: replace fuser by /proc/net/unix
We're using fuser command to see if a process is using a ceph unix
socket file. But the fuser command runs through every PID present in
/proc/<PID> to see if one of them is using the file.
On a system running thousands processes, the fuser command can take
a long time to finish.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1717011

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-06-12 19:31:21 +02:00
Guillaume Abrioux 905c2256bd mon: enforce mon0 delegation for initial_mon_key register
since this task is designed to be always run on the first monitor, let's
enforce the container name accordingly otherwise it could fail like
following:

```
fatal: [mon1 -> mon0]: FAILED! => changed=true
  cmd:
  - docker
  - exec
  - ceph-mon-mon1
  - ceph
  - --cluster
  - ceph
  - --name
  - mon.
  - -k
  - /var/lib/ceph/mon/ceph-mon0/keyring
  - auth
  - get-key
  - mon.
  delta: '0:00:00.085025'
  end: '2019-06-12 06:12:27.677936'
  msg: non-zero return code
  rc: 1
  start: '2019-06-12 06:12:27.592911'
  stderr: 'Error response from daemon: No such container: ceph-mon-mon1'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-06-12 11:21:19 -04:00
Guillaume Abrioux 9e4e692c61 tests: remove unused variable
`e MGR_DASHBOARD=0` isn't needed anymore here, let's remove this legacy.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-06-12 10:41:01 -04:00
Guillaume Abrioux 8dd774a99b tests: update docker image tag used in ooo job
ceph-ansible@master isn't intended to deploy luminous.
Let's use latest-master on ceph-ansible@master branch

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-06-12 10:41:01 -04:00
Guillaume Abrioux 27856cc499 dashboard: add allow_embedding support
Add a variable to support the allow_embedding support.

See ceph/ceph-ansible/issues/4084 for details.

Fixes: #4084

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-06-12 16:00:32 +02:00
Guillaume Abrioux 2c9cd9d9e7 dashboard: fix dashboard_url setting
This setting must be set to something resolvable.

See: ceph/ceph-ansible/issues/4085 for details

Fixes: #4085

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-06-12 15:59:58 +02:00
Dimitri Savineau d0840217f3 ceph-node-exporter: Fix systemd template
069076b introduced a bug in the systemd unit script template. This
commit fixes the options used by the node-exporter container.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-06-11 21:48:40 +02:00
Dimitri Savineau dbf81b6b5b ceph-node-exporter: use modprobe ansible module
Instead of using the modprobe command from the path in the systemd
unit script, we can use the modprobe ansible module.
That way we don't have to manage the binary path based on the linux
distribution.

Resolves: #4072

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-06-11 21:40:50 +02:00
fmount 069076bbfd Fix units and add ability to have a dedicated instance
Few fixes on systemd unit templates for node_exporter and
alertmanager container parameters.
Added the ability to use a dedicated instance to deploy the
dashboard components (prometheus and grafana).
This commit also introduces the grafana_group_name variable
to refer grafana group and keep consistency with the other
groups.
During the integration with TripleO some grafana/prometheus
template variables resulted undefined. This commit adds the
ability to check if the group exist and create, accordingly,
different job groups in prometheus template.

Signed-off-by: fmount <fpantano@redhat.com>
2019-06-10 18:18:46 +02:00
Guillaume Abrioux 771648304d validate: fail in check_devices at the right task
see https://bugzilla.redhat.com/show_bug.cgi?id=1648168#c17 for details.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1648168#c17

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-06-07 16:14:18 +02:00
Guillaume Abrioux c933645bf7 spec: bring back possibility to install ceph with custom repo
This can be seen as a regression for customers who were used to deploy
in offline environment with custom repositories.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1673254

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-06-07 15:21:42 +02:00
Dimitri Savineau f49090df7e podman: Add systemd dependency on network.target
When using podman, the systemd unit scripts don't have a dependency
on the network. So we're not sure that the network is up and running
when the containers are starting.
With docker this behaviour is already handled because the systemd
unit scripts depend on docker service which is started after the
network.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-06-07 09:28:58 +02:00
Dimitri Savineau 44c63903ca purge-cluster: clean all ceph repo files
We currently only purge rh_storage yum repository file but depending
on the ceph_repository value we are using, the ceph repository file
could have a different name.

Resolves: #4056

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-06-07 09:28:14 +02:00
guihecheng 59e702ec39 Add section for purging rgw loadbalancer in purge-cluster.yml
Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>
2019-06-06 17:12:04 +02:00
guihecheng 96c346743b Add section for rgw loadbalancer in site.yml
This drives ceph rgw loadbalancer stuff to run.

Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>
2019-06-06 17:12:04 +02:00
guihecheng 35d40c65f8 Add role definitions of ceph-rgw-loadbalancer
This add support for rgw loadbalancer based on HAProxy and Keepalived.
We define a single role ceph-rgw-loadbalancer and include HAProxy and
Keepalived configurations all in this.

A single haproxy backend is used to balance all RGW instances and
a single frontend is exported via a single port, default 80.

Keepalived is used to maintain the high availability of all haproxy
instances. You are free to use any number of VIPs. A single VIP is
shared across all keepalived instances and there will be one
master for one VIP, selected sequentially, and others serve as
backups.
This assumes that each keepalived instance is on the same node as
one haproxy instance and we use a simple check script to detect
the state of each haproxy instance and trigger the VIP failover
upon its failure.

Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>
2019-06-06 17:12:04 +02:00
L3D ab54fe20ec ansible: use 'bool' filter on boolean conditionals
By running ceph-ansible there are a lot ``[DEPRECATION WARNING]`` like these:
```
[DEPRECATION WARNING]: evaluating containerized_deployment as a bare variable,
this behaviour will go away and you might need to add |bool to the expression
in the future. Also see CONDITIONAL_BARE_VARS configuration toggle.. This
feature will be removed in version 2.12. Deprecation warnings can be disabled
by setting deprecation_warnings=False in ansible.cfg.
```

Now appended ``| bool`` on a lot of the affected variables.

Sometimes the coding style from ``variable|bool`` changed to ``variable | bool`` *(with spaces at the pipe)*.

Closes: #4022

Signed-off-by: L3D <l3d@c3woc.de>
2019-06-06 10:21:17 +02:00
Dimitri Savineau 518ab794fb container-common: support podman on Ubuntu
Currently we're only able to use podman on ubuntu if podman's
installation is done manually before the ceph-ansible execution
because the deb package is present in an external repository.
We already manage the docker-ce installation via an external
repository so we should be able to allow the podman installation
with the same mechanism too.

https://github.com/containers/libpod/blob/master/install.md#ubuntu

Resolves: #3947

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-06-05 14:07:34 +02:00
Guillaume Abrioux 80875adba7 ceph-osd: do not relabel /run/udev in containerized context
Otherwise content in /run/udev is mislabeled and prevent some services
like NetworkManager from starting.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-06-04 11:32:41 -04:00
Guillaume Abrioux a78fb209b1 tests: test podman against atomic os instead rhel8
the rhel8 image used is an outdated beta version, it is not worth it to
maintain this image upstream, since it's possible to test podman with a
newer version of centos/atomic-host image.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-06-04 11:32:41 -04:00
Dimitri Savineau 2d375e1aa7 site-container: update container-engine role
Since the split between container-engine and container-common roles,
the tags and condition were not updated to reflect the change.

- ceph-container-engine needs with_pkg tag
- ceph-container-common needs fetch_container_images
- we don't need to pull the container image in a dedicated task for
atomic host. We can now use the ceph-container-common role.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-06-04 14:19:32 +02:00
Dimitri Savineau 616c484698 ceph-nfs: use template module for configuration
789cef7 introduces a regression in the ganesha configuration file
generation. The new config_template module version broke it.
But the ganesha.conf file isn't an ini file and doesn't really
need to use the config_template module. Instead we can use the
classic template module.

Resolves: #4045

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-06-04 09:11:52 +02:00
Guillaume Abrioux 5457312e62 dashboard: add a last default value in grafana-server host section
If there is no mgrs and mons in the inventory, it will fail with the following error:

```
ERROR! The field 'hosts' has an invalid value, which includes an undefined variable. The error was: 'dict object' has no attribute 'mons'

The error appears to be in '/home/guits/ceph-ansible/site-docker.yml.sample': line 539, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- hosts: '{{ (groups["mgrs"] | default(groups["mons"]))[0] }}'
  ^ here
We could be wrong, but this one looks like it might be an issue with
missing quotes. Always quote template expression brackets when they
start a value. For instance:

    with_items:
      - {{ foo }}

Should be written as:

    with_items:
      - "{{ foo }}"
```

let's add an  `omit` so it just display this message instead:

```
PLAY [[]] *******************
skipping: no hosts matched
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-06-03 13:36:38 +02:00
Guillaume Abrioux 6e2e30db54 dashboard: move ceph-grafana-dashboards package installation
This commit moves the package installation into ceph-dashboard role.
This is needed to install ceph dasboard json file in
`/etc/grafana/dashboards/ceph-dashboard/`.

Closes: #4026

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-06-03 13:36:38 +02:00
Guillaume Abrioux 14f5fc3c86 infra: refact dashboard firewall rules
- There is no need to open ports 3000, 8234, 9283 on all nodes.
- Add missing rule for alertmanager (port 9093)

Closes: #4023

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-06-03 13:36:38 +02:00
Guillaume Abrioux a2b6f44665 dashboard: append mgr modules to ceph_mgr_modules
when `dashboard_enabled` is `True`, let's append `dashboard` and
`prometheus` modules to `ceph_mgr_modules` so they are automatically
loaded.

Closes: #4026

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-06-03 13:36:38 +02:00
Dimitri Savineau 7503098ca0 remove ceph-agent role and references
The ceph-agent role was used only for RHCS 2 (jewel) so it's not
usefull anymore.
The current code will fail on CentOS distribution because the rhscon
package is only avaible on Red Hat with the RHCS 2 repository and
this ceph release is supported on stable-3.0 branch.

Resolves: #4020

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-06-03 13:35:50 +02:00
Guillaume Abrioux 003aeea45a validate: add a check for nfs standalone
if `nfs_obj_gw` is True when deploying an internal ganesha with an
external ceph cluster, `ceph_nfs_rgw_access_key` and
`ceph_nfs_rgw_secret_key` must be provided so the
ganesha configuration file can be generated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-06-03 13:34:38 +02:00
Guillaume Abrioux 6a6785b719 nfs: support internal Ganesha with external ceph cluster
This commits allows to deploy an internal ganesha with an external ceph
cluster.

This requires to define `external_cluster_mon_ips` with a comma
separated list of external monitors.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1710358

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-06-03 13:34:38 +02:00
Dimitri Savineau daf92a9e1f ceph-facts: generate fsid on mon node
The fsid generation is done via a python command. When the ansible
controller node only have python3 available (like RHEL 8) then the
python command isn't necessarily present causing the fsid generation
to fail.
We already do some resource creation (like ceph keyring secret) with
the python command too but from the mon node so we should do the same
for fsid.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1714631

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-06-03 10:11:32 +02:00
Dimitri Savineau 24d0fd7003 vagrant: Default box to centos/7
We don't use ceph/ubuntu-xenial anymore but only centos/7 and
centos/atomic-host.
Changing the default to centos/7.

Resolves: #4036

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-05-31 10:32:04 -04:00
Kevin Carter 789cef7621 Sync config_template from upstream
This change pulls in the most recent release of the config_template module
into the ceph_ansible action plugins.

Signed-off-by: Kevin Carter <kecarter@redhat.com>
2019-05-28 14:27:02 -04:00
Guillaume Abrioux 4708b7615f tests: add retries on failing tests in testinfra
This commit adds `pytest-rerunfailures` in requirements.txt so we can
retry failing test in testinfra to avoid false positive. (eg: sometimes it
can happen for some reason a service takes too much time to start)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-05-22 09:47:36 -04:00
Guillaume Abrioux 55420d6253 roles: introduce `ceph-container-engine` role
This commit splits the current `ceph-container-common` role.

This introduces a new role `ceph-container-engine` which handles the
tasks specific to the installation of containers tools (docker/podman).

This is needed for the ceph-dashboard implementation for 2 main reasons:

1/ Since the ceph-dashboard stack is only containerized, we must install
everything needed to run containers even in non containerized
deployments. Splitting this role allows us to not have to call the full
`ceph-container-common` role which would run a bunch of unneeded tasks
that would have been skipped anyway.

2/ The current implementation would have required to run
`ceph-container-common` on all ceph-clients nodes which would have been
conflicting with 9d3517c670 (we don't want
to run ceph-container-common on all client nodes, see mentioned commit
for more details)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-05-22 13:02:10 +02:00
Dimitri Savineau f37edfa113 ceph-mgr: install python-routes for dashboard
The ceph mgr dashboard requires routes python library to be installed
on the system.

Resolves: #3995

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-05-22 08:46:16 +02:00
Dimitri Savineau 622d9feae9 common: use gnupg instead of gpg
gpg package isn't available for all Debian/Ubuntu distribution but
gnupg is.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-05-21 17:53:58 +02:00
Dimitri Savineau 29b0d47c8c ceph-prometheus: fix error in templates
- remove trailing double quotes in jinja templates
- add jinja filename without .j2 suffix

Resolves: #4011

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-05-21 17:53:39 +02:00
Guillaume Abrioux 6ca7372a2d config: fix ipv6
As of nautilus, if you set `ms bind ipv6 = True` you must explicitly set
`ms bind ipv4 = False` too, otherwise OSDs will still try to pick up an
IPv4 address.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1710319

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-05-21 10:38:00 -04:00
Dimitri Savineau de147469d7 tests: update testinfra release
In order to support ansible 2.8 with testinfra we need to use the
latest release (3.0.x).
Adding ssh-config option to py.test.
Also bumping the pytest and xdist version.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-05-20 13:04:58 +02:00
Dimitri Savineau 0ee833432e ceph-nfs: apply selinux fix anyway
Because ansible_distribution_version doesn't return minor version on
CentOS with ansible 2.8 we can apply the selinux anyway but only for
CentOS/RHEL 7.
Starting RHEL 8, there's a dedicated package for selinux called
nfs-ganesha-selinux [1].

Also replace the command module + semanage by the selinux_permissive
module.

[1] https://github.com/nfs-ganesha/nfs-ganesha/commit/a7911f

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-05-20 13:04:58 +02:00
Dimitri Savineau 0c7fd79865 ceph-validate: use kernel validation for iscsi
Ceph iSCSI gateway requires Red Hat Enterprise Linux or CentOS 7.5
or later.
Because we can not check the ansible_distribution_version fact for
CentOS with ansible 2.8 (returns only the major version) we can
fallback by checking the kernel option.

  - CONFIG_TARGET_CORE=m
  - CONFIG_TCM_USER2=m
  - CONFIG_ISCSI_TARGET=m

http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-install/

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-05-20 13:04:58 +02:00
Guillaume Abrioux 72d8315299 switch to ansible 2.8
- remove private attribute with import_role.
- update documentation.
- update rpm spec requirement.
- fix MagicMock python import in unit tests.

Closes: #3765

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-05-20 13:04:58 +02:00
Dimitri Savineau 494746b7a6 common: install dependencies for apt modules
When using a minimal Debian/Ubuntu distribution there's no
ca-certificates and gpg packages installed so the apt modules will
fail:

Failed to find required executable gpg in paths:
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

apt.cache.FetchFailedException:
W:https://download.ceph.com/debian-luminous/dists/bionic/InRelease:
No system certificates available. Try installing ca-certificates.

Resolves: #3994

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-05-20 10:02:43 +02:00
Guillaume Abrioux 7c6a3bf825 dashboard: move the call to ceph-node-exporter
This moves the call to ceph-node-exporter role after
ceph-container-common, otherwise it will try to run container before
docker or podman are installed.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-05-20 09:37:52 +02:00
Dimitri Savineau 0f89a3f7a5 tox: Don't copy infrastructure playbook
Since a1a871c we don't need to copy the infrastructure playbooks
under the ceph-ansible root directory.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-05-17 18:04:18 +02:00
Dimitri Savineau 638604929b purge-docker-cluster: don't remove data on atomic
Because we don't manage the docker service on atomic (yet) via the
ceph-container-common role then we can't stop docker dans remove
the data.
For now let's do that only for non atomic hosts.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-05-17 09:33:50 +02:00
Guillaume Abrioux 9f0d4d6847 dashboard: move defaults variables to ceph-defaults
There is no need to have default values for these variables in each roles
since there is no corresponding host groups

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-05-16 16:39:13 +02:00
Guillaume Abrioux e74d80e72f rename docker_exec_cmd variable
This commit renames the `docker_exec_cmd` variable to
`container_exec_cmd` so it's more generic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-05-16 16:39:13 +02:00
Guillaume Abrioux acac24d984 dashboard: fix a typo
6f0643c8e introduced a typo, the role that should be run is
ceph-container-common, not ceph-common

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-05-16 16:39:13 +02:00
Guillaume Abrioux 17634fc3df tests: add dashboard scenario testing
This commit add a new scenario to test the dashboard deployment via
ceph-ansible.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-05-16 16:39:13 +02:00