Commit Graph

2106 Commits (c409d6e96008cd431f1679d2582325f174c47879)

Author SHA1 Message Date
Dimitri Savineau 14f2d616ee ceph-nfs: use template module for configuration
789cef7 introduces a regression in the ganesha configuration file
generation. The new config_template module version broke it.
But the ganesha.conf file isn't an ini file and doesn't really
need to use the config_template module. Instead we can use the
classic template module.

Resolves: #4045

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 616c484698)
2019-06-24 20:47:25 +02:00
Dimitri Savineau d08af0a654 ceph-disk: Set max open files limit on container
Same behaviour than ceph-volume (b987534). The ceph-disk command runs
faster when using ulimit nofile with container cli.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-06-24 10:06:11 +02:00
Dimitri Savineau 2b492e3de1 ceph-handler: Fix OSD restart script
There's two big issues with the current OSD restart script.

1/ We try to test if the ceph osd daemon socket exists but we use a
wildcard for the socket name : /var/run/ceph/*.asok.
This fails because we usually have multiple ceph osd sockets (or
other ceph daemon collocated) present in /var/run/ceph directory.
Currently the test fails with:

bash: line xxx: [: too many arguments

But it doesn't stop the script execution.
Instead we can specify the full ceph osd socket name because we
already know the OSD id.

2/ The container filter pattern is wrong and could matches multiple
containers resulting the script to fail.
We use the filter with two different patterns. One is with the device
name (sda, sdb, ..) and the other one is with the OSD id (ceph-osd-0,
ceph-osd-15, ..).
In both case we could match more than needed.

$ docker container ls
CONTAINER ID IMAGE              NAMES
958121a7cc7d ceph-daemon:latest ceph-osd-strg0-sda
589a982d43b5 ceph-daemon:latest ceph-osd-strg0-sdb
46c7240d71f3 ceph-daemon:latest ceph-osd-strg0-sdaa
877985ec3aca ceph-daemon:latest ceph-osd-strg0-sdab
$ docker container ls -q -f "name=sda"
958121a7cc7d
46c7240d71f3
877985ec3aca

$ docker container ls
CONTAINER ID IMAGE              NAMES
2db399b3ee85 ceph-daemon:latest ceph-osd-5
099dc13f08f1 ceph-daemon:latest ceph-osd-13
5d0c2fe8f121 ceph-daemon:latest ceph-osd-17
d6c7b89db1d1 ceph-daemon:latest ceph-osd-1
$ docker container ls -q -f "name=ceph-osd-1"
099dc13f08f1
5d0c2fe8f121
d6c7b89db1d1

Adding an extra '$' character at the end of the pattern solves the
problem.

Finally removing the get_container_osd_id function because it's not
used in the script at all.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 45d46541cb)
2019-06-21 14:49:55 -04:00
Dimitri Savineau f4212b20e5 ceph-volume: Set max open files limit on container
The ceph-volume lvm list command takes ages to complete when having
a lot of LV devices on containerized deployment.
For instance, with 25 OSDs on a node it takes 3 mins 44s to list the
OSD.
Adding the max open files limit to the container engine cli when
executing the ceph-volume command seems to improve a lot thee
execution time ~30s.

This was impacting the OSDs creation with ceph-volume (both filestore
and bluestore) when using multiple LV devices.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b987534881)
2019-06-20 20:01:13 -04:00
Guillaume Abrioux f29366b848 ceph-osd: do not relabel /run/udev in containerized context
Otherwise content in /run/udev is mislabeled and prevent some services
like NetworkManager from starting.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 80875adba7)
2019-06-19 23:46:46 +02:00
Rishabh Dave 114078bfa1 ceph-infra: make chronyd default NTP daemon
Since timesyncd is not available on RHEL-based OSs, change the default
to chronyd for RHEL-based OSs. Also, chronyd is chrony on Ubuntu, so
set the Ansible fact accordingly.

Fixes: https://github.com/ceph/ceph-ansible/issues/3628
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 9d88d3199f)
2019-06-18 10:46:34 +02:00
Rishabh Dave 93c7d8d79d don't install NTPd on Atomic
Since Atomic doesn't allow any installations and NTPd is not present
on Atomic image we are using, abort when ntp_daemon_type is set to ntpd.

https://github.com/ceph/ceph-ansible/issues/3572
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit bdff3e48fd)
2019-06-18 10:46:34 +02:00
Dimitri Savineau 81de8a8106 remove ceph-agent role and references
The ceph-agent role was used only for RHCS 2 (jewel) so it's not
usefull anymore.
The current code will fail on CentOS distribution because the rhscon
package is only avaible on Red Hat with the RHCS 2 repository and
this ceph release is supported on stable-3.0 branch.

Resolves: #4020

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7503098ca0)
2019-06-17 14:42:08 -04:00
Dimitri Savineau ed9b594b80 tests: Update ansible ssh_args variable
Because we're using vagrant, a ssh config file will be created for
each nodes with options like user, host, port, identity, etc...
But via tox we're override ANSIBLE_SSH_ARGS to use this file. This
remove the default value set in ansible.cfg.

Also adding PreferredAuthentications=publickey because CentOS/RHEL
servers are configured with GSSAPIAuthenticationis enabled for ssh
server forcing the client to make a PTR DNS query.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 34f9d51178)
2019-06-17 12:02:36 -04:00
Guillaume Abrioux 64659d2c82 iscsi: assign application (rbd) to pool 'rbd'
if we don't assign the rbd application tag on this pool,
the cluster will get `HEALTH_WARN` state like following:

```
HEALTH_WARN application not enabled on 1 pool(s)
POOL_APP_NOT_ENABLED application not enabled on 1 pool(s)
    application not enabled on pool 'rbd'
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4cf17a6fdd)
2019-06-13 14:43:25 +02:00
Dimitri Savineau 95f3908e44 ceph-handler: replace fuser by /proc/net/unix
We're using fuser command to see if a process is using a ceph unix
socket file. But the fuser command runs through every PID present in
/proc/<PID> to see if one of them is using the file.
On a system running thousands processes, the fuser command can take
a long time to finish.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1717011

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit da9891da1e)
2019-06-12 23:00:21 +02:00
Guillaume Abrioux db90debcc7 validate: fail in check_devices at the right task
see https://bugzilla.redhat.com/show_bug.cgi?id=1648168#c17 for details.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1648168#c17

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 771648304d)
2019-06-10 08:09:58 +02:00
Dimitri Savineau 0b653ee5b4 update default rhcs values and docs
The RHCS documentation mentionned in the default values and
group_vars directory are referring to RHCS 2.x while it should be
3.x.

Revolves: https://bugzilla.redhat.com/show_bug.cgi?id=1702732

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-06-04 14:18:23 +02:00
Guillaume Abrioux 5053f32c15 osds: allow passing devices by path
ceph-volume didn't work when the devices where passed by path.
Since it now support it, let's allow this feature in ceph-ansible

Closes: #3812

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8f2c45dfd3)
2019-05-09 14:21:43 +02:00
Dimitri Savineau 2fa8099fa7 osd: set default bluestore_wal_devices empty
We only need to set the wal dedicated device when there's three tiers
of storage used.
Currently the block.wal partition will also be created on the same
device than block.db.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1685253

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-25 07:13:38 +00:00
Dimitri Savineau 7418999638 ceph-mds: Increase cpu limit to 4
In containerized deployment the default mds cpu quota is too low
for production environment.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695850

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1999cf3d19)
2019-04-24 21:44:23 +00:00
Dimitri Savineau 54128db5cd ceph-osd: Fix merge conflict from mergify
The PR #3916 was merged automatically by mergify even if there was a
confict in the ceph-osd-run.sh.j2 template.
This commit resolves the conflict.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-24 12:41:23 -04:00
Dimitri Savineau 3ae2a687ed ceph-osd: Increase cpu limit to 4
In containerized deployment the default osd cpu quota is too low
for production environment using NVMe devices.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695880

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c17106874c)

# Conflicts:
#	roles/ceph-osd/templates/ceph-osd-run.sh.j2
2019-04-24 16:02:28 +00:00
Matthew Vernon 1556d802ff ceph-mon: increase timeout waiting for admin and bootstrap keys
With a large and/or busy cluster, it can take significantly more than
30s for a restarted monitor to get to the point where
`ceph-create-keys` returns successfully. A recent upgrade of our
production cluster failed here because it took a couple of minutes for
the newly-upgraded `mon` to be ready. So increase the timeout
significantly.

This patch is applied to stable-3.2, because the affected code is
refactored in stable-4.0 and ceph-create-keys is no longer called.

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
2019-04-12 17:03:39 +00:00
Dimitri Savineau 56215d7688 ceph-mds: Set application pool to cephfs
We don't need to use the cephfs variable for the application pool
name because it's always cephfs.
If the cephfs variable is set to something else than the default
value it will break the appplication pool task.

Resolves: #3790

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d2efb7f02b)
2019-04-11 15:38:14 +00:00
Guillaume Abrioux c5c354a61a remove all NBSPs char in stable-3.2 branch
this can cause issues, let's replace all of these chars with real
spaces.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-04-10 13:27:48 +02:00
Matthew Vernon a8c9b65d13 UCA: Uncomment UCA variables in defaults, fix consequent breakage
The Ubuntu Cloud Archive-related (UCA) defaults in
roles/ceph-defaults/defaults/main.yml were commented out, which means
if you set `ceph_repository` to "uca", you get undefined variable
errors, e.g.

```
The task includes an option with an undefined variable. The error was: 'ceph_stable_repo_uca' is undefined

The error appears to have been in '/nfs/users/nfs_m/mv3/software/ceph-ansible/roles/ceph-common/tasks/installs/debian_uca_repository.yml': line 6, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: add ubuntu cloud archive repository
  ^ here

```

Unfortunately, uncommenting these results in some other breakage,
because further roles were written that use the fact of
`ceph_stable_release_uca` being defined as a proxy for "we're using
UCA", so try and install packages from the bionic-updates/queens
release, for example, which doesn't work. So there are a few `apt` tasks
that need modifying to not use `ceph_stable_release_uca` unless
`ceph_origin` is `repository` and `ceph_repository` is `uca`.

Closes: #3475
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit 9dd913cf8a)
2019-04-09 16:54:37 +00:00
Dimitri Savineau efa0083f3c ceph-osd: Drop memory flag with bluestore
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit dc1c0dcee2)
2019-04-09 13:26:20 +00:00
Dimitri Savineau bbb8ca6643 mon/rgw: use last ipv6 address
When using monitor_address_block or radosgw_address_block variables
to configure the mon/rgw address we're getting the first ip address
from the ansible facts present in that cidr.
When there's VIP on that network the first filter could return the
wrong value.
This seems to affect only IPv6 setup because the VIP addresses are
added to the ansible facts at the beginning of the list. This is the
opposite (at the end) when using IPv4.
This causes the mon/rgw processes to bind on the VIP address.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680155

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
2019-04-09 06:17:27 +02:00
Ali Maredia e943288cae rgw multisite: add more than 1 rgw to the master or secondary zone
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1664869

Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 37f46a8c5d)
2019-04-06 08:50:30 +00:00
Dimitri Savineau d1b3d18af1 radosgw: Raise cpu limit to 8
In containerized deployment the default radosgw quota is too low
for production environment.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680171

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d3ae9fd05f)
2019-04-04 19:14:28 +02:00
Guillaume Abrioux b92c826661 defaults: change default value for ceph_docker_image_tag
Since nautilus has been released, it's now the latest stable release, it
means the tag `latest` now refers to nautilus.

`stable-3.2` isn't intended to deploy nautilus, therefore, we should
change the default value for this variable to the latest release
stable-3.2 is able to deploy (mimic).

Closes: #3734

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-03-21 18:37:21 +00:00
Dimitri Savineau e4a71eabd9 ceph-osd: Ensure lvm2 is installed
When using osd_scenario lvm, we never check if the lvm2 package is
present on the host.
When using containerized deployment and docker on CentOS/RedHat this
package will be automatically installed as a dependency but not for
Ubuntu distribution.
OSD deployed via ceph-volume require the lvmetad.socket to be active
and running.

Resolves: #3728

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 179fdfbc19)
2019-03-20 22:59:28 +00:00
Guillaume Abrioux d3f6556041 osd: backward compatibility with old disk_list.sh location
Since all files in container image have moved to `/opt/ceph-container`
this check must look for new AND the old path so it's backward
compatible. Otherwise it could end up by templating an inconsistent
`ceph-osd-run.sh`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 987bdac963)
2019-03-18 21:56:53 +00:00
Dimitri Savineau 46e8898093 ceph-validate: fail if there's no ipaddr available in monitor_address_block subnet
When using monitor_address_block to determine the ip address of the
monitor node, we need an ip address available in that cidr to be
present in the ansible facts (ansible_all_ipv[46]_addresses).
Currently we don't check if there's an ip address available during
the ceph-validate role.
As a result, the ceph-config role fails due to an empty list during
ceph.conf template creation but the error isn't explicit.

TASK [ceph-config : generate ceph.conf configuration file] *****
fatal: [0]: FAILED! => {"msg": "No first item, sequence was empty."}

With this patch we will fail before the ceph deployment with an
explicit failure message.

Resolves: rhbz#1673687

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5c39735be5)
2019-03-18 18:31:18 +00:00
Gregory Orange 86e39a29c8 Change docker_container parameter network to network_mode
Addressing "populate kv_store with custom ceph.conf":
Unsupported parameters for (docker_container) module. Looking at
https://docs.ansible.com/ansible/latest/modules/docker_container_module.html
shows that the correct parameter is network_mode, not network.

Signed-off-by: Gregory Orange <gregoryo2014@users.noreply.github.com>
2019-03-18 13:23:10 +00:00
Dimitri Savineau bfa99cdd53 Set the default crush rule in ceph.conf
Currently the default crush rule value is added to the ceph config
on the mon nodes as an extra configuration applied after the template
generation via the ansible ini module.

This implies two behaviors:

1/ On each ceph-ansible run, the ceph.conf will be regenerated via
ceph-config+template and then ceph-mon+ini_file. This leads to a
non necessary daemons restart.

2/ When other ceph daemons are collocated on the monitor nodes
(like mgr or rgw), the default crush rule value will be erased by
the ceph.conf template (mon -> mgr -> rgw).

This patch adds the osd_pool_default_crush_rule config to the ceph
template and only for the monitor nodes (like crush_rules.yml).
The default crush rule id is read (if exist) from the current ceph
configuration.
The default configuration is -1 (ceph default).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1638092

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d8538ad4e1)
2019-03-14 14:48:03 +00:00
Dimitri Savineau 2f3206abeb ceph-osd: Install numactl package when needed
With 3e32dce we can run OSD containers with numactl support.
When using numactl command in a containerized deployment we need to
be sure that the corresponding package is installed on the host.
The package installation is only executed when the
ceph_osd_numactl_opts variable isn't empty.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b7f4e3e7c7)
2019-03-12 08:14:47 +00:00
Guillaume Abrioux 34086ec233 osd: support numactl options on OSD activate
This commit adds OSD containers activate with numactl support.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1684146

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b3eb9206fa)
2019-03-11 09:50:29 +00:00
VasishtaShastry 2393d82306 Extends check_devices tasks to non-collocated an lvm-batch scenarios
Tuned name of a task and error message to make it more user understandable

Fixes BZ 1648168 - ceph-validate : devices are not validated in non-collocated and lvm_batch scenario

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1648168

Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
(cherry picked from commit 34c25ef49b)
2019-03-01 04:06:57 +00:00
ToprHarley d1051c8e55 Convert interface names to underscores
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540881

Signed-off-by: Tomas Petr <tpetr@redhat.com>
(cherry picked from commit 573adce7dd)
2019-02-28 19:02:32 +00:00
Guillaume Abrioux de3465b6a3 osd: add ipc=host in systemd template for containers
in addition to 15812970f0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d5be83e504)
2019-02-28 13:48:39 +00:00
fpantano 1033411512 Removed not needed mountpoint and removed ubuntu section
Referring to BZ#1683290, as dsavineau suggests, being this
bug tripleO specific, removed the ubuntu section and removed
useless mountpoints.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683290

Signed-off-by: fpantano <fpantano@redhat.com>
(cherry picked from commit 21fad7ced3)
2019-02-28 12:31:23 +00:00
fpantano 9b843c24f9 Added to the ceph-radosgw service template the ca-trust
volume avoiding to expose useless information.
This bug is referred to the following bugzilla:

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683290

Signed-off-by: fpantano <fpantano@redhat.com>
(cherry picked from commit 0c1944236b)
2019-02-28 12:31:23 +00:00
Kevin Coakley 2005d857df Set permissions on monitor directory to u=rwX,g=rX,o=rX recursive
Set directories to 755 and files to 644 to
/var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }} recursively instead of
setting files and directories to 755 recursively. The ceph mon
process writes files to this path with permissions 644. This update stops
ansible from updating the permissions in
/var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }} every time ceph mon writes
a file and increases idempotency.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683997

Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>
(cherry picked from commit d327681b99)
2019-02-28 10:52:04 +00:00
Dimitri Savineau 77596c791d mon: Move client admin variable to defaults
There's no need to set the client_admin_ceph_authtool_cap variable
via a set_fact task.
Instead we can set this in the role defaults.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 58a9d310d5)
2019-02-27 20:03:13 +00:00
Dimitri Savineau 05c6ac4d78 mon: Add mds permissions to client.admin
The administrator keyring needs full capabilities on mds like mon,
osd and mgr.
Whithout this, the client.admin key won't be able to run commands
against mds (like ceph tell mds.0 session ls)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1672878

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit dd7b7604de)
2019-02-27 20:03:13 +00:00
Guillaume Abrioux 8cc75e516c common: do not override ceph_release when ceph_repository is 'rhcs'
We shouldn't reset `ceph_release` with `ceph_stable_release` when
`ceph_repository` is `rhcs`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1645379

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2b60a35634)
2019-02-21 13:03:16 +00:00
Guillaume Abrioux d15b055854 osd: make the 'wait for all osd to be up' task configurable
introduce two new variables to make the check that 'wait for all osd to
be up' configurable.
It's possible that for some deployments, OSDs can take longer to be seen
as UP and IN.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1676763

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 21e5db8982)
2019-02-20 16:53:06 +00:00
David Waiting eba80adb1a ensure at least one osd is up
The existing task checks that the number of OSDs is equal to the number of up OSDs before continuing.

The problem is that if none of the OSDs have been discovered yet, the task will exit immediately and subsequent pool creation will fail (num_osds = 0, num_up_osds = 0).

This is related to Bugzilla 1578086.

In this change, we also check that at least one OSD is present. In our testing, this results in the task correctly waiting for all OSDs to come up before continuing.

Signed-off-by: David Waiting <david_waiting@comcast.com>
(cherry picked from commit 3930791cb7)
2019-02-19 19:02:16 +00:00
Patrick C. F. Ernzer a43c68df7d setup_ntp: call handler to disable ntpd if chronyd used
The task setup chronyd called the handler disable chronyd, which of
course defeats the purpose.

Changing the task to disable ntpd instead fixes the issue of chronyd
being disabled after it got enabled.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1673664
Fixes: #3582

Signed-off-by: Patrick C. F. Ernzer pcfe@redhat.com
(cherry picked from commit c605ff6a68)
2019-02-15 09:09:36 +00:00
Guillaume Abrioux 6200f90ab2 iscsi: fix permission denied error
Typical error:
```
fatal: [iscsi-gw0]: FAILED! =>
  msg: 'an error occurred while trying to read the file ''/home/guits/ceph-ansible/tests/functional/all_daemons/fetch/e5f4ab94-c099-4781-b592-dbd440a9d6f3/iscsi-gateway.key'': [Errno 13] Permission denied: b''/home/guits/ceph-ansible/tests/functional/all_daemons/fetch/e5f4ab94-c099-4781-b592-dbd440a9d6f3/iscsi-gateway.key'''
```

`become: True` is not needed on the following task:

`copy crt file(s) to gateway nodes`.

Since it's already set in the main playbook (site.yml/site-container.yml)

The thing is that the files get generated in the 'fetch_directory' with
root user because there is a 'delegate_to' + we run the playbook with
`become: True` (from main playbook).

The idea here is to create files under ansible user so we can open them
later to copy them on the remote machine.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9d590f4339)
2019-02-11 16:17:44 +00:00
Leah Neukirchen d855cb2595 Fix uses of default(omit) with string concatenation
When {{omit}} is concatenated with another string, it expands to something
like __omit_place_holder__63eea0d96dd6ed867b95405e11d87dddf61f448d.
However, in these use-cases we need an empty string.

Regression introduced in d53f55e807.

Signed-off-by: Leah Neukirchen <leah.neukirchen@mayflower.de>
2019-02-08 11:01:11 +00:00
Sébastien Han 7db797d8df osd: expose udev into the container
In order to be able to retrieve udev information, we must expose its
socket. As per, https://github.com/ceph/ceph/pull/25201 ceph-volume will
start consuming udev output.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 997667a873)
2019-02-06 00:37:11 +00:00
Guillaume Abrioux 303cc85754 osd: bind mount /var/run/udev/
without this, the command `ceph-volume lvm list --format json` hangs and
takes a very long time to complete.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7ade032807)
2019-02-06 00:37:11 +00:00
Guillaume Abrioux af17e0dfbb override ceph_release with ceph_stable_release
when `ceph_origin` is set to `'repository'` and `ceph_repository` to
`'community'` we need to ensure `ceph_release` reflect
`ceph_stable_release`.

4a3f180f9d simply removed the override
while it should just have to be run only when the condition mentioned
above is satisfied.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0bfefdd5bc)
2019-01-24 14:18:34 +00:00
Guillaume Abrioux e29cdd0a61 config: remove code related to ceph release prior to luminous
This part of the code is not needed since ceph-ansible@master is
intended to deploy ceph@master only.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1bbdde272f)
2019-01-24 14:18:34 +00:00
Guillaume Abrioux eaa92f7e55 ceph-default: rm useless condition
This condition is useless and it's also creating issues we don't see in
our CI. ceph_release is set by either ceph-common or ceph-docker-common
so let's keep it this way.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1645379

(cherry picked from commit e9188cd202)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2019-01-24 14:18:34 +00:00
Noah Watkins e57e2d98a1 start_osds: use list instead of keys (re-introduce)
the python3 fix merged by:

  https://github.com/ceph/ceph-ansible/pull/3346

was reintroduced a few days later by:

  82a6b5adec

and this patch fixes it again :)

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
(cherry picked from commit 3cf5fd2c3e)
2019-01-16 15:48:35 +00:00
Sébastien Han 04d8002614 switch: do not fail on missing key
Some people use the switch playbook to perform upgrade so they end up in
the same situation than https://bugzilla.redhat.com/show_bug.cgi?id=1650572
This is applying the same fix as
729744c6a8.

We don't want to fail on key that are not present since they will get
created after the mons are updated. They will be created by the task
"create potentially missing keys (rbd and rbd-mirror)".

Signed-off-by: Sébastien Han <seb@redhat.com>
2019-01-14 18:54:46 +00:00
Rishabh Dave 4e94d11aa7 ceph-infra: remove ntp_rmp.yml and ntp_debian.yml
This commit fixes the merge conflict that occurred during the
auto-backport and auto-merge of the commit
488281187e.

Also please note that the commit
488281187e was merged (on PR 3477)
"as it is" (despite of merge conflicts) which was not supposed to be
the case ideally. This had a side-effect that the feature of supporting
multiple NTP daemons (new ones are namely chronyd and timesyncd) was
also backported which is itself against the convention. For
consistency's sake the feature was backported to stable-3.1 as well.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
2019-01-09 22:15:18 +01:00
Guillaume Abrioux 416b503476 introduce new role ceph-facts
sometimes we play the whole role `ceph-defaults` just to access the
default value of some variables. It means we play the `facts.yml` part
in this role while it's not desired. Splitting this role will speedup
the playbook.

Closes: #3282

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0eb56e36f8)
2019-01-07 09:14:10 +01:00
Bruceforce 5c618d7084 The nfs_ganesha_dev_apt_repo variable was set incorrect in task
"fetch nfs-ganesha development repository"
This has to be pushed directly to stable-3.2 since master has diverged

Signed-off-by: Bruceforce <Bruceforce@users.noreply.github.com>
2019-01-07 08:03:19 +00:00
Rishabh Dave b2024899b9 ceph-infra: disable unrequired NTP services
When one of the currently supported NTP services has been set up,
disable rest of the NTP services on Ceph nodes.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 6fa757d343)
2019-01-04 13:52:19 +00:00
Rishabh Dave 488281187e ceph-infra: merge ntp_debian.yml and ntp_rpm.yml
Merge ntp_debian.yml and ntp_rpm.yml into one (the new file is called
setup_ntp.yml) since they are almost identical. Also avoid repetition
of the common setup step for ntpd and chronyd services.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit b03ab60742)

# Conflicts:
#	roles/ceph-infra/tasks/ntp_debian.yml
#	roles/ceph-infra/tasks/ntp_rpm.yml
2019-01-04 13:52:19 +00:00
Kai Wembacher e2852eb40e add support for rocksdb and wal on the same partition in non-collocated
Signed-off-by: Kai Wembacher <kai@ktwe.de>
(cherry picked from commit a273ed7f60)
2018-12-20 14:21:14 +01:00
Guillaume Abrioux c3a2320e01 revert infra: don't restart firewalld if unit is masked
If firewalld unit is masked, setting `configure_firewall: false` is
enough

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1655059

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1cff1f9806)
2018-12-04 17:31:31 +01:00
Sébastien Han 8d1c67beb2 osd: discover osd_objectstore on the fly
Applying and passing the OSD_BLUESTORE/FILESTORE on the fly is wrong for
existing clusters as their config will be changed.

Typically, if an OSD was prepared with ceph-disk on filestore and we
change the default objectstore to bluestore, the activation will fail.
The flag osd_objectstore should only be used for the preparation, not
activation. The activate in this case detects the osd objecstore which
prevents failures like the one described above.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 4c51130198)
2018-12-04 09:01:50 +00:00
Sébastien Han 1151521784 ceph-osd: change jinja condition
If an existing cluster runs this config, and has ceph-disk OSD, the
`expose_partitions` won't be expected by jinja since it's inside the
'old' if. We need it as part of the osd_scenario != 'lvm' condition.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1640273
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit bef522627e)
2018-12-04 09:01:50 +00:00
Sébastien Han 729744c6a8 rolling_update: do not fail on missing keys
We don't want to fail on key that are not present since they will get
created after the mons are updated. They will be created by the task
"create potentially missing keys (rbd and rbd-mirror)".

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650572
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ebc901c6af)
2018-12-03 13:03:33 +01:00
Noah Watkins e8b10f47dc rgw: use correct default rgw frontend address
since 0.0.0.0 is the default radosgw address (not 'address'), not
configuring an address explicitly, and instead configuring the radosgw
interface, would result in 0.0.0.0 being used, instead of falling
through to section that inspects the interface config option.

backport note: this cannot be cherry-picked from master since this code
doesn't exist in master.

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1655131

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
2018-12-01 20:09:46 +00:00
Sébastien Han 452069cb3a osd: manage legacy ceph-disk non-container startup
The code is now able (again) to start osds that where configured with
ceph-disk on a non-container scenario.

Closes: https://github.com/ceph/ceph-ansible/issues/3388
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-29 23:30:21 +01:00
Guillaume Abrioux 8d93007e56 config: write jinja comment with appropriate syntax
jinja comment should be written using the jinja syntax `{# ... #}`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1654441

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a86c2b8526)
2018-11-29 21:19:41 +01:00
Guillaume Abrioux 316e49c6d7 client: change default pool size
default pool size should match the real default that is defined in ceph
itself.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ed42262b37)
2018-11-29 01:49:05 +00:00
Guillaume Abrioux 1077ae0060 defaults: change default size for openstack pools
default pool size should match the real default that is defined in ceph
itself.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6d1fe32998)
2018-11-29 01:49:05 +00:00
Guillaume Abrioux a4db9bd6e8 defaults: change for default pool size for cephfs_pools
default pool size should match the real default that is defined in ceph
itself.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fdc438dd0d)
2018-11-29 01:49:05 +00:00
Guillaume Abrioux 65699e4558 defaults: add ceph related vars file
This is to add a granularity level.
We can have ceph specific variables that user shouldn't have to change
here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f1735e9bb0)
2018-11-29 01:49:05 +00:00
Guillaume Abrioux f0195e97ed refact osd pool size customization
Add real default value for osd pool size customization.
Ceph itself has an `osd_pool_default_size` default value to `3`.

If users don't specify a pool size in various pools definition within
ceph-ansible, we should default to `3`.

By the way, this kind of condition isn't really clear:
```
when:
  - rbd_pool_size | default ("")
```

we should try to get the customized value then default to what is in
`osd_pool_default_size` (which has its default value pointing to
`ceph_osd_pool_default_size` (`3`) as well) and compare it to
`ceph_osd_pool_default_size`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7774069d45)
2018-11-29 01:49:05 +00:00
Guillaume Abrioux 68b2ad11ee mon: move `osd_pool_default_pg_num` in `ceph-defaults`
`osd_pool_default_pg_num` parameter is set in `ceph-mon`.
When using ceph-ansible with `--limit` on a specifc group of nodes, it
will fail when trying to access this variables since it wouldn't be
defined.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1518696

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d4c0960f04)
2018-11-29 01:49:05 +00:00
Sébastien Han 9b5a93e3a5 osd: re-introduce disk_list check
This commit
4cc1506303 (diff-51bbe3572e46e3b219ad726da44b64ebL13)
accidentally removed this check.

This is a must have for ceph-disk based containerized OSDs.

Signed-off-by: Sébastien Han <seb@redhat.com>
2018-11-29 00:31:13 +01:00
Guillaume Abrioux 659f2c60b5 validate: change default value for `radosgw_address`
change default value of `radosgw_address` to keep consistency with
`monitor_address`.
Moreover, `ceph-validate` checks if the value is '0.0.0.0' to determine
if it has to run `check_eth_rgw.yml`.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600227

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e4869ac8bd)
2018-11-28 23:54:06 +01:00
Guillaume Abrioux 4cc1506303 osd: commonize start_osd code
since `ceph-volume` introduction, there is no need to split those tasks.

Let's refact this part of the code so it's clearer.

By the way, this was breaking rolling_update.yml when `openstack_config:
true` playbook because nothing ensured OSDs were started in ceph-osd role (In
`openstack_config.yml` there is a check ensuring all OSD are UP which was
obviously failing) and resulted with OSDs on the last OSD node not started
anyway.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f7fcc012e9)
2018-11-28 23:11:46 +01:00
Sébastien Han 2fca8555cc handler: show unit logs on error
This will tremendously help debugging daemons that fail on restart by
showing the systemd unit logs.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit a9b337ba66)
2018-11-27 12:44:15 +00:00
Guillaume Abrioux 1a1886a442 config: convert _osd_memory_target to int
ceph.conf doesn't accept float value.

Typical error seen:
```
$ sudo ceph daemon osd.2 config get osd_memory_target
Can't get admin socket path: unable to get conf option admin_socket for osd.2:
parse error setting 'osd_memory_target' to '7823740108,8' (strict_si_cast:
unit prefix not recognized)
```

This commit ensures the value inserted in ceph.conf will be an integer.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 68dde424f6)
2018-11-21 15:35:55 +00:00
Guillaume Abrioux abdc245ceb infra: don't restart firewalld if unit is masked
if firewalld.service systemd unit is masked, the handler will fail when
trying to restart it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650281

(cherry picked from commit 63b9835cbb)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-19 17:32:44 +01:00
Neha Ojha c96af4bac9 osd_memory_target: standardize unit and fix calculation
* The default value of osd_memory_target used by ceph is 4294967296 bytes,
so use the same as ceph-ansible default.

* Convert ansible_memtotal_mb to bytes to calculate osd_memory_target

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 10538e9a23)
2018-11-19 10:51:05 +00:00
Guillaume Abrioux f5d8701ed8 client: fix a typo in create_users_keys.yml
cd1e4ee024 introduced a typo.
This commit fixes it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 393ab94728)
2018-11-17 20:59:11 +00:00
Guillaume Abrioux 62d2ddafd4 validate: allow stable-3.2 to run with ansible 2.4
Although this is not officially supported, this commit allows
`stable-3.2` to run against ansible 2.4.
This should ease the transition in RHOSP.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-11-16 08:57:00 +00:00
Jason Dillaman 3b40e2bc87 igw: add support for IPv6
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 0aff0e9ede)

Conflicts:
	library/igw_purge.py: trivial resolution
	roles/ceph-iscsi-gw/library/igw_purge.py: trivial resolution
2018-11-13 17:35:58 +00:00
Mike Christie 702f2baccc igw: open iscsi target port
Open the port the iscsi target uses for iscsi traffic.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit 5ba7d1671e)
2018-11-12 10:46:41 +00:00
Mike Christie 44ee5c7495 igw: use api_port variable for firewall port setting
Don't hard code api port because it might be overridden by the user.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit e2f1f81de4)
2018-11-12 10:46:41 +00:00
Mike Christie db576f6f0e igw: fix firewall iscsi_group_name check
The firewall setup for igw is not getting setup because iscsi_group_name
does not it exist. It should be iscsi_gw_group_name.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit a4ff52842c)
2018-11-12 10:46:41 +00:00
Mike Christie c843ea1d92 igw: Fix default api port
The default igw api port is 5000 in the manual setup docs and
ceph-iscsi-config package so this syncs up ansible.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit a10853c5f8)
2018-11-12 10:46:41 +00:00
Sébastien Han 12ce311da5 rbd-mirror: enable ceph-rbd-mirror.target
Without this the daemon will never start after reboot.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit b7a791e902)
2018-11-09 16:48:35 +01:00
Guillaume Abrioux d5409109fb rgw: move multisite default variables in ceph-defaults
Move all rgw multisite variables in ceph-defaults so ceph-validate can
go through them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 17:41:35 +01:00
Guillaume Abrioux 547e90f281 rgw: move multisite related tasks after docker/main.yml
We must play this task after the container has started otherwise
rgw_multisite tasks will fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00
Guillaume Abrioux 710e11668d rgw: add rgw_multisite for containerized deployments
run commands on containers when containerized deployments.
(At the moment, all commands are run on the host only)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00
Guillaume Abrioux fe88c89c9c validate: remove check on rgw_multisite_endpoint_addr definition
since `rgw_multisite_endpoint_addr` has a default value to
`{{ ansible_fqdn }}`, it shouldn't be mandatory to set this variable.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00
Ali Maredia 59e6d04f9b rgw: add ceph-validate tasks for multisite, other fixes
- updated README-MULTISITE
- re-added destroy.yml
- added tasks in ceph-validate to make sure the
rgw multisite vars are set

Signed-off-by: Ali Maredia <amaredia@redhat.com>
2018-10-30 14:00:28 +01:00
Guillaume Abrioux 77d5d128c3 rgw: add a dedicated variable for multisite endpoint
We should give users the possibility to set the IP they want as
multisite endpoint, setting the default value to `{{ ansible_fqdn }}` to
not force them to set this variable.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-30 14:00:28 +01:00
Ali Maredia 474f151450 rgw: update rgw multisite tasks
- remove destroy tasks
- cleanup conditionals and syntax
- remove unnecessary realm pulls
- enable multisite to be tested in automated
testing infra
- add multisite related vars to main.yml and
group_vars
- update README-MULTISITE
- ensure all `radosgw-admin` commands are being run
on a mon

Signed-off-by: Ali Maredia <amaredia@redhat.com>
2018-10-30 14:00:28 +01:00
Guillaume Abrioux 748342f5b6 roles: fix *_docker_memory_limit default value
append 'm' suffix to specify the unit size used in all
`*_docker_memory_limit`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2018-10-29 14:59:09 +01:00
Neha Ojha b7e4d4eb84 roles: do not limit docker_memory_limit for various daemons
Since we do not have enough data to put valid upper bounds for the memory
usage of these daemons, do not put artificial limits by default. This will
help us avoid failures like OOM kills due to low default values.

Whenever required, these limits can be manually enforced by the user.

More details in
https://bugzilla.redhat.com/show_bug.cgi?id=1638148

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1638148
Signed-off-by: Neha Ojha <nojha@redhat.com>
2018-10-29 14:59:09 +01:00
Sébastien Han 0e63f0f3c9
Merge branch 'master' into wip-rm-calamari 2018-10-29 14:50:37 +01:00
Sébastien Han 5ab90b358c nfs: do not create the nfs user if already present
Check if the user exists and skip its creation if true.

Closes: https://github.com/ceph/ceph-ansible/issues/3254
Signed-off-by: Sébastien Han <seb@redhat.com>
2018-10-26 16:24:38 +00:00