Commit Graph

5100 Commits (7e68a2a5d8ea1496cb50b6cb0be686e03713ffaa)
 

Author SHA1 Message Date
Guillaume Abrioux 7e68a2a5d8 doc: add day-2 operations documentation
This commit is the first of a serie in order to describe all day-2 operations
that are possible via ceph-ansible using a set of playbook provided in
`infrastructure-playbooks` directory.

Fixes: #5061

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7e800303e9)
2020-04-21 10:19:24 -04:00
Dimitri Savineau 46c640c169 filestore-to-bluestore: fix py2 on skipped tasks
When using skipped variables with from_json filter and python2 then we
need to have a default value otherwise the skipped task will fail.

Unexpected templating type error occurred on
({{ (ceph_volume_lvm_list.stdout | from_json) }}): expected string or
buffer

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790472

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2b9edba131)
2020-04-20 13:38:19 -04:00
abaird-rh 6878aab0f9 Updated use of deprecated filter
This was removed in Ansible 2.9.

[DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of
using `result|version_compare` use `result is version_compare`. This
feature will be removed in version 2.9. Deprecation warnings can be
disabled by setting deprecation_warnings=False in ansible.cfg.

Rename 'version_compare' to the function 'version'.

version_compose was renamed to version since ansible 2.5

Signed-off-by: abaird-rh <abaird@redhat.com>
(cherry picked from commit eb71244bfd)
2020-04-20 13:37:42 -04:00
Rishabh Dave e4c24f3407 library/ceph_volume: look for error messages in stderr
Error message were moved to from stdout in stderr here -
b8d6dcbe9f (diff-20f7c578a4e69ec61a5869d706567a24R137).

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1793542
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 4249d1e02d)
2020-04-20 13:36:15 -04:00
Guillaume Abrioux 0ace5f5f2c mds: fix --limit run against mds nodes
This commit fixes --limit runs against mds nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 378405e328)
2020-04-14 13:42:45 -04:00
Guillaume Abrioux 945266776b nfs: create empty rados index object for nfs standalone
This commit creates an empty rados index object even when deploying
standalone nfs-ganesha.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1822328

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ea2b654d95)
2020-04-14 12:03:38 -04:00
Dimitri Savineau 1033ad191b ceph-validate: update RHEL requirement for RHCS
We were not testing the right ansible_distribution fact value for RHEL
distribution.
This commit also updates the minial RHEL version supported by RHCS.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5de74fe512)
2020-04-14 11:27:01 -04:00
Guillaume Abrioux 1b79d73729 osd: fix monitor_name error when scaling out OSDs
This commit fixes a bug when trying to scale out osd nodes with
`crush_rule_config` is enabled.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1822599

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4bcc52cb2a)
2020-04-10 13:44:15 +02:00
Dimitri Savineau e34c95d28f tests: update mgr dashboard socket listening test
Since 15ed9ee the ceph-mgr daemon binds on the IP address on the public
network instead of binding on all addresses.
This commit updates the testinfra code to reflect that change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0f0a14772c)
2020-04-07 15:25:45 +02:00
Dimitri Savineau 47be2a2719 tests: register mark in pytest configuration
Unregister marks generates warnings like:

PytestUnknownMarkWarning: Unknown pytest.mark.docker - is this a typo?
You can register custom marks to avoid this warning

https://docs.pytest.org/en/latest/mark.html

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ac4f8763aa)
2020-04-07 15:25:45 +02:00
Dimitri Savineau 1cfb84ae94 tests: add dashboard testinfra configuration
This commit adds basic tests for grafana, prometheus, node-exporter and
ceph mgr dashboard services.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f2c6281207)
2020-04-07 15:25:45 +02:00
Dimitri Savineau 6a2272b9c0 ceph-mgr: add saml python lib for dashboard SSO
The dashboard SSO mgr module requires the saml python library to be
installed. This is only a valid scenario for RHCS deployment because
the saml python library isn't available in other classic repositories.
This package is present in RHCS Tools repository so we also need to
enable it on the mgr nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1820233

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6617d90733)
2020-04-06 11:00:01 -04:00
Guillaume Abrioux 4bc7bb2766 ceph_key: fetch key when needed
Fetch the key when it is present in the cluster but not on the node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ccfa249919)
2020-04-03 15:14:54 -04:00
Guillaume Abrioux cbed1eb17a ceph_key: fix idempotency when no secret is passed
553584cbd0 introduced a regression when no
secret is passed, it overwrites the secret each time the task is run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 003defec03)
2020-04-03 11:04:30 -04:00
Dimitri Savineau a472064cb8 tox: replace testinfra by pytest for add-mgrs
The add-mgrs scenario is still using the testinfra command instead of
pytest so the tests exectution are failling.

ERROR: InvocationError for command could not find executable testinfra

This also adds the missing --ssh-config option to testinfra.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 92f538f1af)
2020-04-03 10:43:21 -04:00
Guillaume Abrioux ba6bd3ca3d docker2podman: call `container_options_facts.yml` on osd nodes
We must call `ceph-osd` role from `container_options_facts.yml` because
ceph-osd-run.sh.j2 needs variables set in this file.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1819681

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4a4f54f6ee)
2020-04-02 11:01:14 -04:00
Guillaume Abrioux 825aed5ec1 ceph_key: remove 'update' state
With this change, the state `present` is enough to update a keyring.
If the keyring already exist, it will be updated if caps or secret
passed to the module are different.
If the keyring doen't exist, it will be created.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808367

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 553584cbd0)
2020-04-01 18:08:51 -04:00
Guillaume Abrioux 7acd9686ab osd: use default crush rule name when needed
When `rule_name` isn't set in `crush_rules` the osd pool creation will
fail.
This commit adds a new fact `ceph_osd_pool_default_crush_rule_name` with
the default crush rule name.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1817586

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1bb9860dfd)
2020-03-31 19:42:40 -04:00
Guillaume Abrioux 03355aec8c tests: add more coverage in external_clients scenario
Run create_users_keys.yml in external_clients scenario

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8c1c34b201)
2020-03-31 19:42:40 -04:00
Guillaume Abrioux 87e1f0cc6c osd: support changing default rule even when osd_crush_location isn't defined
Creating crush rules even with no crush hierarchy configuration is a
valid scenario so we shouldn't be bound to the first task result (which
configure crush hierarchy) to be able to add new crush rules.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816989

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5b0476385c)
2020-03-31 23:14:55 +02:00
Guillaume Abrioux 32f879de32 purge-container: get *all* osds id
Adding `--all` to the `systemctl list-units` command in order to get
*all* osds id on the node (including stoppped osds). Otherwise, it will
purge the cluster but there will be leftover after that.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1814542

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5e7962ccf6)
2020-03-31 11:00:41 -04:00
Javier Pena 5860335ebe Fixes for Makefile
- Set default mock configuration to epel-8-x86_64, to match the
  default dist value.
- Add support for alpha tags, like the recently added v5.0.0alpha1

Signed-off-by: Javier Pena <jpena@redhat.com>
(cherry picked from commit 19a43ff261)
2020-03-30 22:31:04 +02:00
Javier Pena eb19fe3ece Allow setting dist and mock configuration in Makefile
Curently, the dist and mock configurations are hardcoded in the
Makefile to be el8 and epel-7-x86_64, respectively. This commit
allows the user to override those settings using the DIST and
MOCK_CONFIG environment variables, falling back to the current
defaults if not set.

This provides additional flexibility when building the RPM directly
from the repository.

Signed-off-by: Javier Peña <jpena@redhat.com>
(cherry picked from commit ed8341568b)
2020-03-30 22:31:04 +02:00
Dimitri Savineau dcd02e6494 ceph_volume: fix multiple db/wal devices
When using the lvm batch ceph-volume subcommand with dedicated devices
for bluestore (db/wal) then the list of devices is convert to a string
instead of being extended via an iterable.
This was working with only one dedicated device but starting with more
then the ceph_volume module fails.

TASK [ceph-osd : use ceph-volume lvm batch to create bluestore osds] **
fatal: [xxxxxx]: FAILED! => changed=true
  cmd:
  - ceph-volume
  - --cluster
  - ceph
  - lvm
  - batch
  - --bluestore
  - --yes
  - --prepare
  - --osds-per-device
  - '4'
  - /dev/nvme2n1
  - /dev/nvme3n1
  - /dev/nvme4n1
  - /dev/nvme5n1
  - /dev/nvme6n1
  - --db-devices
  - /dev/nvme0n1 /dev/nvme1n1
  - --report
  - --format=json
  msg: non-zero return code
  rc: 2
  stderr: |2-
     stderr: lsblk: /dev/nvme0n1 /dev/nvme1n1: not a block device
     stderr: error: /dev/nvme0n1 /dev/nvme1n1: No such file or directory
     stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
    usage: ceph-volume lvm batch [-h] [--db-devices [DB_DEVICES [DB_DEVICES ...]]]
                                 [--wal-devices [WAL_DEVICES [WAL_DEVICES ...]]]
                                 [--journal-devices [JOURNAL_DEVICES [JOURNAL_DEVICES ...]]]
                                 [--no-auto] [--bluestore] [--filestore]
                                 [--report] [--yes] [--format {json,pretty}]
                                 [--dmcrypt]
                                 [--crush-device-class CRUSH_DEVICE_CLASS]
                                 [--no-systemd]
                                 [--osds-per-device OSDS_PER_DEVICE]
                                 [--block-db-size BLOCK_DB_SIZE]
                                 [--block-wal-size BLOCK_WAL_SIZE]
                                 [--journal-size JOURNAL_SIZE] [--prepare]
                                 [--osd-ids [OSD_IDS [OSD_IDS ...]]]
                                 [DEVICES [DEVICES ...]]
    ceph-volume lvm batch: error: Unable to proceed with non-existing device: /dev/nvme0n1 /dev/nvme1n1

So the dedicated device list is considered as a single string.

This commit also adds the block_db_devices and wal_devices documentation
to the ceph_volume module.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816713

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 760b6cd7b0)
2020-03-30 10:04:26 -04:00
Dimitri Savineau 1318575626 ceph-defaults: update container tag for nautilus
The latest Ceph stable release is now Octopus so the "latest" container
image tag is pointing to Octopus and not Nautilus anymore.
This commit updates the ceph_docker_image_tag with "latest-nautilus".

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-30 09:55:05 +02:00
Dimitri Savineau 62042f370a rhcs: drop debian support
Support for debian with RHCS has been dropped starting RHCS 4

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4ac99223b2)
2020-03-27 10:22:43 -04:00
Dimitri Savineau fcbb03b0b5 Add a release note
This adds a release not for the Ceph Nautilus release used in the
stable-4.0 branch.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-26 18:12:16 -04:00
Dimitri Savineau 94490b6e16 ceph-defaults: update ganesha to 2.8
With Ceph Nautilus release we should use nfs-ganesha 2.8

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2020-03-26 13:32:29 -04:00
Guillaume Abrioux 2970e28d66 defaults: remove legacy comment
This is no longer true, let's remove this comment given that this option
is not ignored in containerized deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e551b5ba1a)
2020-03-26 12:08:23 -04:00
Guillaume Abrioux d682cf6de5 tests: add inventory host for 5.0 upgrade job
This inventory is intended to be used in the upgrade scenario in
stable-5.0 branch.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-26 11:23:23 +01:00
Dimitri Savineau 98f223c4d0 ceph-facts: fix rgw_instances_all fact
The rgw_instances_all fact is supposed to be the list of all radosgw
instances from all rgw nodes.
But the fact is always using the local rgw_instances variable so this
won't work on multiple nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0487d21938)
2020-03-25 08:41:28 +01:00
Dimitri Savineau ac2bd94cba ceph-defaults: regenerate group_vars samples
In fc02fc9 the group_vars samples have been generated but only for
monitor_address variable not radosgw_address.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b46839bad0)
2020-03-19 21:55:08 -04:00
Guillaume Abrioux d22641d37b nfs: fix nfs with external ceph cluster support
This commit refact and fix the nfs deployment with external ceph cluster
support.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1814942

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cc28d9ec26)
2020-03-19 21:39:56 -04:00
Dimitri Savineau 55c222d088 dashboard: allow to set read-only admin user
This commit allows one to set the role for the admin user as read-only.
This can be controlled via the dashboard_admin_user_ro variable but the
default value is false for backward compatibility.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1810176

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fb69f6990c)
2020-03-19 13:24:05 -04:00
Dimitri Savineau bf9d628b65 ceph-defaults: update grafana container tag
Since 8e8aa73 we're using grafana 5.4.3 in RHCS 4.1 via [1].
We should also update the grafana container tag from docker.io when
using the community release.

[1] registry.redhat.io/rhceph/rhceph-4-dashboard-rhel8:4

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b97a4d5201)
2020-03-17 15:28:00 -04:00
Boris Ranto 5203a7b7e7 rhcs_edits: Update grafana version
We are planning to release updated grafana image for ceph dashboard in
RHCS 4.1. We need to update the rhcs edut to point to the new image
then.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786107

Signed-off-by: Boris Ranto <branto@redhat.com>
(cherry picked from commit 8e8aa735e0)
2020-03-16 18:10:20 -04:00
petruha f2a50c19dc ceph-facts: Fix system_secret_key variable handling
This commit fixes the system_secret_key variable not substitued by the
right value and always using the 'system_secret_key' string instead.

$ egrep 'system_(access|secret)_key' group_vars/all.yml
system_access_key: foofoofoofoofoofoofo
system_secret_key: barbarbarbarbarbarbarbarbarbarbarbarbarb

$ ansible-playbook -vv -i hosts site.yml.sample -e rgw_multisite=true
(...)
  - hostname: storage0
    endpoint: http://192.168.100.42:8080
    instance_name: rgw0
    radosgw_address: 192.168.50.3
    radosgw_frontend_port: 8085
    rgw_realm: canada
    rgw_zone: montreal
    rgw_zone_user: justin.trudeau
    rgw_zone_user_display_name: Justin Trudeau
    rgw_zonegroup: quebec
    system_access_key: foofoofoofoofoofoofo
    system_secret_key: system_secret_key

Fixes https://github.com/ceph/ceph-ansible/issues/5150

Signed-off-by: petruha <5363545+p37ruh4@users.noreply.github.com>
(cherry picked from commit 73b3fadb0e)
2020-03-16 17:59:57 -04:00
Guillaume Abrioux d4c0d3ee66 config: remove legacy option in ceph.conf.j2
This option has been deprecated (As of 0.51).
By the way, ceph-ansible already sets the
auth_{service,client,cluster}_required variables.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623586

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 152c2caa9f)
2020-03-16 12:04:48 -04:00
Dimitri Savineau e2f1a0ade8 doc: update infra playbooks statements
We don't need to copy the infrastructure playbooks in the root
ceph-ansible directory.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 195944b123)
2020-03-16 14:43:35 +01:00
Dimitri Savineau dc4993cc92 handler: add rgw multi-instances support
This commit adds the rgw multi-instances support in ceph-handler
(restart_rgw_daemons.sh.j2)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3626c688cf)
2020-03-12 19:04:26 -04:00
Guillaume Abrioux c26e80fdbf rgw: add multi-instances support when deploying multisite
This commit adds the multi-instances when deploying rgw multisite

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 60a2e28189)
2020-03-12 19:04:26 -04:00
Dimitri Savineau d248f6bf8d ceph-infra: open radosgw ports for multi instances
When using the radosgw multi instances configuration then the firewall
rules aren't adapted to that setup.
We only open the port according to the radosgw_frontend_port variable
so only the first radosgw instance port will be opened in the firewall
configuration.
We should instead iterate over the rgw_instances list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e8bf0a0cf2)
2020-03-12 19:04:26 -04:00
Guillaume Abrioux bf0a6835a2 rgw: fix a typo in create_realm_zonegroup_zone_lists
This commit fixes a typo.

`s/realms/secondary_realms`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b3bbd6bb77)
2020-03-12 16:58:09 -04:00
Guillaume Abrioux 03416c0d4e infra: add retries/until on firewalld start task
This commit make that task retrying 5 times to start the service
firewalld to avoid failure like following:

```
TASK [ceph-infra : start firewalld] ********************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-centos-container-purge/roles/ceph-infra/tasks/configure_firewall.yml:22
Monday 09 March 2020  08:58:48 +0000 (0:00:00.963)       0:02:16.457 **********
fatal: [osd4]: FAILED! => changed=false
  msg: |-
    Unable to enable service firewalld: Created symlink from /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service to /usr/lib/systemd/system/firewalld.service.
    Created symlink from /etc/systemd/system/multi-user.target.wants/firewalld.service to /usr/lib/systemd/system/firewalld.service.
    Failed to execute operation: Connection reset by peer
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b3d943fe9f)
2020-03-12 21:48:20 +01:00
Dimitri Savineau 957156c0fe filestore-to-bluestore: stop ceph-volume services
We only disable the ceph-osd services but not the ceph-volume lvm
services during the filestore to bluestore migration.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 38a683e5bf)
2020-03-12 21:10:33 +01:00
Dimitri Savineau 928c792f8d filestore-to-bluestore: reuse dedicated journal
If the filestore configuration was using a dedicated journal with either
a partition or a LV/VG then we need to reuse this for bluestore DB.

When filestore is using a raw devices then we shouldn't destroy
everything (data + journal) but only data otherwise the journal
partition won't exist anymore.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790479

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 535da53d69)
2020-03-12 21:10:33 +01:00
Dimitri Savineau 3fc4cc9f62 tests/requirements: bump testinfra
3.4 is the latest testinfra release available but python2 is dropped
starting 4.0.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ccec67aa6a)
2020-03-09 13:33:44 +01:00
Guillaume Abrioux 46a13664b2 rgw: add retry/until on pools tasks
Sometimes, these task can timeout for some reason.
Adding these retries can help to avoid unexcepted failures.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7a8a719e75)
2020-03-06 16:10:03 +01:00
Guillaume Abrioux 4547cc00d5 tests: fix update scenario
Update this scenario due to recent changes.
e361c0ff94 added more nodes in order to
cover code erasure pool creation in the CI. Therefore, we must add more
nodes in the 3.2 deployment too.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2020-03-06 16:10:03 +01:00
Guillaume Abrioux 9a574562e2 client: skip create_users_keys.yml when rolling_update
There's no need to run this part of the role when upgrading clients
node. Let's skip it when rolling_update.yml is being run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit eac207091b)
2020-03-06 16:10:03 +01:00