Commit Graph

5292 Commits (d3b3c8948ef47d79cb32c328eb8df275fe589ac1)
 

Author SHA1 Message Date
Paulo Matias 38ce02c2ea Allow user to specify grafana_server_fqdn
This is needed to get a TLS certificate to validate correctly.

If unspecified, auto-detected grafana_server_addr is used.

Signed-off-by: Paulo Matias <matias@ufscar.br>
2020-04-07 20:51:23 +02:00
Paulo Matias dac8e1d0a9 Prometheus APIs are only available through plain http
Trying to access these APIs through TLS produces "Could not reach
external API" errors in Ceph dashboard.

Signed-off-by: Paulo Matias <matias@ufscar.br>
2020-04-07 20:51:23 +02:00
Matthew Vernon 7963a76c7a Use a tempfile directory to store restart scripts
Make a tempfile directory and copy the restart scripts there (and then
execute them from there), rather than using insecure known filenames
in /tmp/

This is a partial fix for ceph/ceph-ansible#2937

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
2020-04-06 22:55:51 +02:00
Guillaume Abrioux 2cfaa056e0 switch-to-containers: set and unset osd flags
The workflow in this playbook should be the same than in rolling_update,
we should first set noout and nodeep-scrub flags before migrating the
first osd and unset osd flags after the last osd is migrated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-04-06 17:00:00 +02:00
Guillaume Abrioux 6df7887f87 update: use tasks_from when including ceph-facts
When setting/unsetting osd flags, we can use `tasks_from` when importing
`ceph-facts` role to save some times given that we only need this role
for setting `container_binary`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-04-06 17:00:00 +02:00
Dimitri Savineau 6617d90733 ceph-mgr: add saml python lib for dashboard SSO
The dashboard SSO mgr module requires the saml python library to be
installed. This is only a valid scenario for RHCS deployment because
the saml python library isn't available in other classic repositories.
This package is present in RHCS Tools repository so we also need to
enable it on the mgr nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1820233

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-04-06 10:11:00 -04:00
Guillaume Abrioux ccfa249919 ceph_key: fetch key when needed
Fetch the key when it is present in the cluster but not on the node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-04-03 19:41:53 +02:00
Guillaume Abrioux 003defec03 ceph_key: fix idempotency when no secret is passed
553584cbd0 introduced a regression when no
secret is passed, it overwrites the secret each time the task is run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-04-03 09:51:16 -04:00
Dimitri Savineau 92f538f1af tox: replace testinfra by pytest for add-mgrs
The add-mgrs scenario is still using the testinfra command instead of
pytest so the tests exectution are failling.

ERROR: InvocationError for command could not find executable testinfra

This also adds the missing --ssh-config option to testinfra.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-04-03 09:59:06 +02:00
Dimitri Savineau d10f13fd61 tox-docker2podman: update container image tag
The current docker to podman scenario is using the nautilus container
image tag instead of master.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-04-02 17:19:39 -04:00
Dimitri Savineau 6264f6979e vagrant: force centos 8.1 libvirt image
The current centos/8 vagrant image (libvirt) is still using the
CentOS 8.0 release (1905) while the 8.1 release (1911) is already
available since few months.
Using an update CentOS 8 release fixes slow ceph-volume/lvm commands.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-04-02 17:55:21 +02:00
Guillaume Abrioux 4a4f54f6ee docker2podman: call `container_options_facts.yml` on osd nodes
We must call `ceph-osd` role from `container_options_facts.yml` because
ceph-osd-run.sh.j2 needs variables set in this file.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1819681

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-04-02 07:56:15 +02:00
Guillaume Abrioux 553584cbd0 ceph_key: remove 'update' state
With this change, the state `present` is enough to update a keyring.
If the keyring already exist, it will be updated if caps or secret
passed to the module are different.
If the keyring doen't exist, it will be created.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808367

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-04-01 16:21:11 -04:00
Guillaume Abrioux 1bb9860dfd osd: use default crush rule name when needed
When `rule_name` isn't set in `crush_rules` the osd pool creation will
fail.
This commit adds a new fact `ceph_osd_pool_default_crush_rule_name` with
the default crush rule name.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1817586

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-31 14:49:38 -04:00
Guillaume Abrioux 8c1c34b201 tests: add more coverage in external_clients scenario
Run create_users_keys.yml in external_clients scenario

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-31 14:49:38 -04:00
Guillaume Abrioux 5b0476385c osd: support changing default rule even when osd_crush_location isn't defined
Creating crush rules even with no crush hierarchy configuration is a
valid scenario so we shouldn't be bound to the first task result (which
configure crush hierarchy) to be able to add new crush rules.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816989

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-31 14:26:48 -04:00
Guillaume Abrioux 9219991441 remove *docker*.yml symlinks
This commits removes these two symlinks.
They were there for backward compatibility and were marked deprecated as
of stable-4.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-31 10:03:22 -04:00
Guillaume Abrioux 5e7962ccf6 purge-container: get *all* osds id
Adding `--all` to the `systemctl list-units` command in order to get
*all* osds id on the node (including stoppped osds). Otherwise, it will
purge the cluster but there will be leftover after that.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1814542

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-31 09:37:30 -04:00
Dimitri Savineau 64701437de container: remove ulimit nofile parameter
Since Ceph Octopus is python3 only we don't need to specify the max open
files anymore with the container engine.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-30 09:54:23 +02:00
Dimitri Savineau 760b6cd7b0 ceph_volume: fix multiple db/wal/journal devices
When using the lvm batch ceph-volume subcommand with dedicated devices
for filestore (journal) or bluestore (db/wal) then the list of devices
is convert to a string instead of being extended via an iterable.
This was working with only one dedicated device but starting with more
then the ceph_volume module fails.

TASK [ceph-osd : use ceph-volume lvm batch to create bluestore osds] **
fatal: [xxxxxx]: FAILED! => changed=true
  cmd:
  - ceph-volume
  - --cluster
  - ceph
  - lvm
  - batch
  - --bluestore
  - --yes
  - --prepare
  - --osds-per-device
  - '4'
  - /dev/nvme2n1
  - /dev/nvme3n1
  - /dev/nvme4n1
  - /dev/nvme5n1
  - /dev/nvme6n1
  - --db-devices
  - /dev/nvme0n1 /dev/nvme1n1
  - --report
  - --format=json
  msg: non-zero return code
  rc: 2
  stderr: |2-
     stderr: lsblk: /dev/nvme0n1 /dev/nvme1n1: not a block device
     stderr: error: /dev/nvme0n1 /dev/nvme1n1: No such file or directory
     stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
    usage: ceph-volume lvm batch [-h] [--db-devices [DB_DEVICES [DB_DEVICES ...]]]
                                 [--wal-devices [WAL_DEVICES [WAL_DEVICES ...]]]
                                 [--journal-devices [JOURNAL_DEVICES [JOURNAL_DEVICES ...]]]
                                 [--no-auto] [--bluestore] [--filestore]
                                 [--report] [--yes] [--format {json,pretty}]
                                 [--dmcrypt]
                                 [--crush-device-class CRUSH_DEVICE_CLASS]
                                 [--no-systemd]
                                 [--osds-per-device OSDS_PER_DEVICE]
                                 [--block-db-size BLOCK_DB_SIZE]
                                 [--block-wal-size BLOCK_WAL_SIZE]
                                 [--journal-size JOURNAL_SIZE] [--prepare]
                                 [--osd-ids [OSD_IDS [OSD_IDS ...]]]
                                 [DEVICES [DEVICES ...]]
    ceph-volume lvm batch: error: Unable to proceed with non-existing device: /dev/nvme0n1 /dev/nvme1n1

So the dedicated device list is considered as a single string.

This commit also adds the journal_devices, block_db_devices and
wal_devices documentation to the ceph_volume module.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816713

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-30 09:49:54 +02:00
Dimitri Savineau 4ac99223b2 rhcs: drop debian support
Support for debian with RHCS has been dropped starting RHCS 4

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-27 04:36:36 +01:00
Dimitri Savineau 90ad110861 rhcs: update release to 5 for octopus
RHCS 5 will be based on Ceph Octopus release and only supported on
RHEL 8.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-26 22:00:08 +01:00
Guillaume Abrioux e551b5ba1a defaults: remove legacy comment
This is no longer true, let's remove this comment given that this option
is not ignored in containerized deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-26 09:19:14 -04:00
Guillaume Abrioux b7ada14cf5 defaults: change nfs_ganesha_stable_branch
In master, even though we are using dev repo, the value here should be closer
from the last stable released.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-25 22:30:15 +01:00
Dimitri Savineau 784f16c061 mergify: Update with stable-5.0 branch
Add action to backport commits to stable-5.0.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-25 21:01:12 +01:00
Dimitri Savineau 706de944cf ceph-defaults: update ceph_stable_redhat_distro
Since octopus the ceph_stable_redhat_distro variable should be set to
el8 instead of el7.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-25 21:00:24 +01:00
Guillaume Abrioux d8fe33294e github: update issue report template
This commit updates the 'issue report template' to ask for full
ceph-ansible log.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-25 14:52:39 +01:00
Guillaume Abrioux 3788826371 tests: remove some legacy in tox.ini
This commit removes some leftover in tox.ini

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-25 12:44:31 +01:00
Dimitri Savineau 0487d21938 ceph-facts: fix rgw_instances_all fact
The rgw_instances_all fact is supposed to be the list of all radosgw
instances from all rgw nodes.
But the fact is always using the local rgw_instances variable so this
won't work on multiple nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-25 08:02:13 +01:00
Guillaume Abrioux 83fdf24caf doc/tests: bump to ansible 2.9 on master
Add testing against ansible 2.9 on master branch.
This commit also updates the documentation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-25 08:01:27 +01:00
Dimitri Savineau 0f0a14772c tests: update mgr dashboard socket listening test
Since 15ed9ee the ceph-mgr daemon binds on the IP address on the public
network instead of binding on all addresses.
This commit updates the testinfra code to reflect that change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-24 21:11:02 +01:00
Dimitri Savineau ac4f8763aa tests: register mark in pytest configuration
Unregister marks generates warnings like:

PytestUnknownMarkWarning: Unknown pytest.mark.docker - is this a typo?
You can register custom marks to avoid this warning

https://docs.pytest.org/en/latest/mark.html

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-24 15:19:18 +01:00
Dimitri Savineau f2c6281207 tests: add dashboard testinfra configuration
This commit adds basic tests for grafana, prometheus, node-exporter and
ceph mgr dashboard services.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-24 15:19:18 +01:00
Guillaume Abrioux 1b0b7af119 osd: add a default value for 'default' in crush_rules
Let's default to `False` for the `default` attribute in `crush_rules`
variable.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1797774

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-24 08:41:43 -04:00
Dimitri Savineau df8f853c85 Add pacific release
Add the 16th ceph release: pacific.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-24 09:47:12 +01:00
Guillaume Abrioux 1a7f3caecb facts: fix typo
This commit fixes a typo in some task titles

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-23 14:03:52 -04:00
Guillaume Abrioux cc28d9ec26 nfs: fix nfs with external ceph cluster support
This commit refact and fix the nfs deployment with external ceph cluster
support.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1814942

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-19 18:21:16 -04:00
Dimitri Savineau fb69f6990c dashboard: allow to set read-only admin user
This commit allows one to set the role for the admin user as read-only.
This can be controlled via the dashboard_admin_user_ro variable but the
default value is false for backward compatibility.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1810176

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-19 15:34:41 +01:00
Dimitri Savineau 5051e67f8f ceph-defaults: add registry name on dashboard vars
We don't use the registry name when using the community dashboard
container images (grafana, prometheus, alertmanager & node exporter).
This commit adds the docker.io registry explicitly in the default
dashboard container image name values.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-19 14:27:50 +01:00
Dimitri Savineau b97a4d5201 ceph-defaults: update grafana container tag
Since 8e8aa73 we're using grafana 5.4.3 in RHCS 4.1 via [1].
We should also update the grafana container tag from docker.io when
using the community release.

[1] registry.redhat.io/rhceph/rhceph-4-dashboard-rhel8:4

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-17 14:35:06 +01:00
petruha 73b3fadb0e ceph-facts: Fix system_secret_key variable handling
This commit fixes the system_secret_key variable not substitued by the
right value and always using the 'system_secret_key' string instead.

$ egrep 'system_(access|secret)_key' group_vars/all.yml
system_access_key: foofoofoofoofoofoofo
system_secret_key: barbarbarbarbarbarbarbarbarbarbarbarbarb

$ ansible-playbook -vv -i hosts site.yml.sample -e rgw_multisite=true
(...)
  - hostname: storage0
    endpoint: http://192.168.100.42:8080
    instance_name: rgw0
    radosgw_address: 192.168.50.3
    radosgw_frontend_port: 8085
    rgw_realm: canada
    rgw_zone: montreal
    rgw_zone_user: justin.trudeau
    rgw_zone_user_display_name: Justin Trudeau
    rgw_zonegroup: quebec
    system_access_key: foofoofoofoofoofoofo
    system_secret_key: system_secret_key

Fixes https://github.com/ceph/ceph-ansible/issues/5150

Signed-off-by: petruha <5363545+p37ruh4@users.noreply.github.com>
2020-03-16 17:38:52 -04:00
Boris Ranto 8e8aa735e0 rhcs_edits: Update grafana version
We are planning to release updated grafana image for ceph dashboard in
RHCS 4.1. We need to update the rhcs edut to point to the new image
then.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786107

Signed-off-by: Boris Ranto <branto@redhat.com>
2020-03-16 21:37:44 +01:00
Guillaume Abrioux 152c2caa9f config: remove legacy option in ceph.conf.j2
This option has been deprecated (As of 0.51).
By the way, ceph-ansible already sets the
auth_{service,client,cluster}_required variables.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623586

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-16 09:49:36 -04:00
Dimitri Savineau 3626c688cf handler: add rgw multi-instances support
This commit adds the rgw multi-instances support in ceph-handler
(restart_rgw_daemons.sh.j2)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-12 16:44:48 -04:00
Guillaume Abrioux 60a2e28189 rgw: add multi-instances support when deploying multisite
This commit adds the multi-instances when deploying rgw multisite

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-12 16:44:48 -04:00
Dimitri Savineau e8bf0a0cf2 ceph-infra: open radosgw ports for multi instances
When using the radosgw multi instances configuration then the firewall
rules aren't adapted to that setup.
We only open the port according to the radosgw_frontend_port variable
so only the first radosgw instance port will be opened in the firewall
configuration.
We should instead iterate over the rgw_instances list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-12 16:44:48 -04:00
Guillaume Abrioux a94035e957 purge-container: clean legacy code
This commit removes a register which isn't used in this playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-12 09:45:12 -04:00
Dimitri Savineau e62532de46 update osd pool set size command
Since [1] we can't use osd pool without replicas (size: 1) by default.
We now need to set the mon_allow_pool_size_one flag to true in the ceph
configuration and add the --yes-i-really-mean-it flag to the osd pool
set size cli.

[1] https://github.com/ceph/ceph/commit/21508bd

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-11 11:25:42 +01:00
Guillaume Abrioux b3bbd6bb77 rgw: fix a typo in create_realm_zonegroup_zone_lists
This commit fixes a typo.

`s/realms/secondary_realms`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-10 14:13:30 +01:00
Guillaume Abrioux b3d943fe9f infra: add retries/until on firewalld start task
This commit make that task retrying 5 times to start the service
firewalld to avoid failure like following:

```
TASK [ceph-infra : start firewalld] ********************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-centos-container-purge/roles/ceph-infra/tasks/configure_firewall.yml:22
Monday 09 March 2020  08:58:48 +0000 (0:00:00.963)       0:02:16.457 **********
fatal: [osd4]: FAILED! => changed=false
  msg: |-
    Unable to enable service firewalld: Created symlink from /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service to /usr/lib/systemd/system/firewalld.service.
    Created symlink from /etc/systemd/system/multi-user.target.wants/firewalld.service to /usr/lib/systemd/system/firewalld.service.
    Failed to execute operation: Connection reset by peer
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-09 15:01:34 -04:00