making `osd_pool_default_pg_num` mandatory is a bit agressive and is
unrelated when you just want to create users keyrings.
Closes: #2241
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
the entrypoint to generate users keyring is `ceph-authtool`, therefore,
it can expand the `$(ceph-authtool --gen-print-key)` inside the
container. Users must generate a keyring themselves.
This commit also adds a check to ensure keyring are properly filled when
`user_config: true`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add the variables ceph_osd_docker_cpuset_cpus and
ceph_osd_docker_cpuset_mems, so that a user may specify
the CPUs and memory nodes of NUMA systems on which OSD
containers are run.
Provides a example in osds.yaml.sample to guide user
based on sample `lscpu` output since cpuset-mems refers
to the memory by NUMA node only while cpuset-cpus can
refer to individual vCPUs within a NUMA node.
If a deployer uses an interface name with a dash/hyphen in it, such
as 'br-storage' for the monitor_interface group_var, the ceph.conf.j2
template fails to find the right facts. It looks for
'ansible_br-storage' but only 'ansible_br_storage' exists.
This patch converts the interface name to underscores when the
template does the fact lookup.
The CI complains because of `ceph_uid` fact which doesn't exist since
the docker image tag used in the CI doesn't match with this condition.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This is particularly useful in CI environments where you dont have
the option of adding extra devices or volumes to the host. It is also
a simple change to support loopback devices
In case where docker CLI is available but docker is not running, we
don't want to trigger the restart of the daemons.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since some daemons now install their own packages the task checking the
ceph version fails on Debian systems. So the 'ceph-common' package must
be installed on all the machines.
Signed-off-by: Sébastien Han <seb@redhat.com>
This task was originally needed to fix a docker installation issue
(see: #1030). This has been fixed, therefore it can be removed.
Fixes: #2199
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
A recent change [1] required that the openstack_keys
param always containe an acls list. However, it's
possible it might not contain that list. Thus, this
param sets a default for that list to be empty if it
is not in the structure as defined by the user.
[1] d65cbaa539
We were activating dmcrypt devices with the wrong command. Basically the
first task execute the wrong activate command. The task fails but
continues because of the 'failed_when: false'. Then the right activation
sequence is being done by the next task.
Signed-off-by: Sébastien Han <seb@redhat.com>
when `ceph-rbd-mirror.target` is not enabled, the service won't start
after a reboot because there is a dependency between these two units.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
If ceph-ansible deploys a Ceph cluster with "openstack_config: true"
and sets the openstack_keys map to have certain ACLs or permissions,
the requested ACLs or permissions are only set on one of the monitor
nodes [2] when they should be set on all of them.
This patch solves [3] the above issue by having the chmod and setfacl
tasks iterate the list of mon nodes (including the mon node that the
task was delegated to) to apply the chmod of setfacl to the keys in
openstack_keys.
[1]
```
openstack_keys:
- { name: client.openstack, key: "$(ceph-authtool --gen-print-key)", mon_cap: "allow r", osd_cap: "allow class-read object_prefix rbd_children, allow rwx pool=images, allow rwx pool=vms, allow rwx pool=volumes, allow rwx pool=backups", mode: "0600", acls: ["u:nova:r--", "u:cinder:r--", "u:glance:r--", "u:gnocchi:r--"] }
```
[2]
```
$ ansible mons -m shell -b -a "ls -l /etc/ceph/ceph.client.openstack.keyring ; getfacl /etc/ceph/ceph.client.openstack.keyring"
192.168.1.26 | SUCCESS | rc=0 >>
-rw-r-----+ 1 root root 253 Nov 3 20:30 /etc/ceph/ceph.client.openstack.keyring
user::rw-
user:glance:r--
user:nova:r--
user:cinder:r--
user:gnocchi:r--
group::---
mask::r--
other::---getfacl: Removing leading '/' from absolute path names
192.168.1.29 | SUCCESS | rc=0 >>
-rw-r--r--. 1 root root 253 Nov 3 20:30 /etc/ceph/ceph.client.openstack.keyring
user::rw-
group::r--
other::r--getfacl: Removing leading '/' from absolute path names
192.168.1.23 | SUCCESS | rc=0 >>
-rw-r--r--. 1 root root 253 Nov 3 20:30 /etc/ceph/ceph.client.openstack.keyring
user::rw-
group::r--
other::r--getfacl: Removing leading '/' from absolute path names
$
```
[3]
```
(undercloud) [stack@hci-director ceph-ansible]$ ansible mons -m shell -b -a "ls -l /etc/ceph/ceph.client.openstack.keyring ; getfacl /etc/ceph/ceph.client.openstack.keyring"
192.168.1.25 | SUCCESS | rc=0 >>
-rw-r-----+ 1 root root 253 Nov 14 01:12 /etc/ceph/ceph.client.openstack.keyring
user::rw-
user:glance:r--
user:nova:r--
user:cinder:r--
user:gnocchi:r--
group::---
mask::r--
other::---getfacl: Removing leading '/' from absolute path names
192.168.1.29 | SUCCESS | rc=0 >>
-rw-r-----+ 1 root root 253 Nov 14 01:12 /etc/ceph/ceph.client.openstack.keyring
user::rw-
user:glance:r--
user:nova:r--
user:cinder:r--
user:gnocchi:r--
group::---
mask::r--
other::---getfacl: Removing leading '/' from absolute path names
192.168.1.27 | SUCCESS | rc=0 >>
-rw-r-----+ 1 root root 253 Nov 14 01:12 /etc/ceph/ceph.client.openstack.keyring
user::rw-
user:glance:r--
user:nova:r--
user:cinder:r--
user:gnocchi:r--
group::---
mask::r--
other::---getfacl: Removing leading '/' from absolute path names
(undercloud) [stack@hci-director ceph-ansible]$
```
Add support for the openSUSE distributions. The required packages
are available either in the distribution repositories or in the
OBS one.
Signed-off-by: Markos Chandras <mchandras@suse.de>
When we consume the distribution packages, we don't have the choise on
which version to install, so we shouldn't require that variable to be
set. Distributions normally provide only one version of Ceph in the
official repositories so we get whatever they provide.
Signed-off-by: Markos Chandras <mchandras@suse.de>
openSUSE Leap 42.3 provides support for Ceph Luminous in both the
distribution package and the latest available version in the OBS
repository so add these as the only available installation methods for
openSUSE.
Signed-off-by: Markos Chandras <mchandras@suse.de>
Like 80d32dec, the path to the fact is not correct.
In any case, we will retrieve the IP address in hostvars, the variable
is the way we get the interface name according where it has been set
(eg.: inventory host file vs. group_vars/)
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1510906
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
there is no need to have a condition on this task, this test should be
always run since the result will be interpreted later.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This will prevent ceph-ansible from using a loop device while it
shouldn't in auto_discovery mode.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The path to the fact is not correct.
In any case, we will retrieve the IP address in hostvars, the variable
is the way we get the interface name according where it has been set
(eg.: inventory host file vs. group_vars/)
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1510906
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Use `devices` variable instead of `ansible_devices`, otherwise it means
we are not using the devices which have been 'auto discovered'
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The current code will also return lvm devices such as /dev/dm-2, this
kind of device type is not supported by ceph-disk at the moment. Now we
just ignore them.
Signed-off-by: Sébastien Han <seb@redhat.com>
- One can not run scripts directly in place, that mounted with `noexec`
option. But one can run scripts as arguments for `bash/sh`.
Signed-off-by: Arano-kai <captcha.is.evil@gmail.com>
During the initial implementation of this 'old' thing we were falling
into this issue without noticing
https://github.com/moby/moby/issues/30341 and where blindly using --rm,
now this is fixed the prepare container disappears and thus activation
fail.
I'm fixing this for old jewel images.
Also this fixes the machine reboot case where the docker logs are
purgend. In the old scenario, we now store the log locally in the same
directory as the ceph-osd-run.sh script.
Signed-off-by: Sébastien Han <seb@redhat.com>
Setting monitor_interface in group_vars/all.yml makes the
hostvars[host]['monitor_interface'] non-existing.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1507922
Signed-off-by: Sébastien Han <seb@redhat.com>
Only chmod or setfacl the requested keyring(s) in the
opentack_keys data structure when the mode or acls keys
of that data structure exist.
User may specify four permission combinations for the
keyring file(s): 1. only set ACL, 2. only set mode,
3. set neither mode nor ACL, 4. set mode and then ACL.
Fixes: #2092
Use "ceph_tcmalloc_max_total_thread_cache" to set the
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES value inside /etc/default/ceph for
Debian installs, or /etc/sysconfig/ceph for Red Hat/CentOS installs.
By default this is set to 0, so the default package value will be used,
if specified this value will be changed to match the variable, and ceph
osd services will be restarted.
There was a huge resync from luminous to jewel in ceph-docker:
https://github.com/ceph/ceph-docker/pull/797
This change brought a new handy function to discover partitions tight to
an OSD. This function doesn't exist in the old image so the
ceph-osd-run.sh script breaks when trying to deploy Jewel OSD with that
old Jewel image version.
Signed-off-by: Sébastien Han <seb@redhat.com>
stable-3.0 brought numerous changes in ceph-ansible variables, this PR
aims to maintain backward compatibility for someone running stable-2.2
upgrading to stable-3.0 but keeps its groups_vars untouched.
We will then determine the right options to make sure the upgrade works
but we are expecting that new variables should be used.
We will drop this in a near future, maybe 3.1 or 3.2.
Signed-off-by: Sébastien Han <seb@redhat.com>
This feature isn't available before luminous, therefore, we need to play
them only on luminous and after otherwise the playbook will fail.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3f3d4b9c727d06154c422d445fc2a245aceaed89)
3a58757 introduced an issue for Jewel deployments, since this role is
skipped, `enabled_ceph_mgr_modules.stdout` doesn't exist, therefore, it
ends up with an attribute error.
Uses `.get()` to retrieve `stdout` with a default value so it won't fail
if this attribute doesn't exist (jewel).
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
In Jewel, we don't use bootstrap-rbd keyring for rbd-mirror nodes, it
results with a socket path/name different according to which ceph
release you are deploying.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit add new osd scenarios, it aims to simplify the CI setup and
brings a better coverage on the OSD scenarios.
We decided to differentiate between filestore and bluestore, thinking
ahead when filestore won't be supported anymore.
So we now have two classes of tests:
* Filestore
* Bluestore
In each of those classes we have container and non-container.
Then for each we test the following:
* collocated
* collocated dmcrypt
* non-collocated
* non-collocated dmcrypt
* auto discovery collocated
* auto discovery collocated dmcrypt
This gives us a nice coverage and also reduces the footprint on the CI.
We are now up to 4 scenarios, each containing 6 OSD VMs.
Signed-off-by: Sébastien Han <seb@redhat.com>
When doing collocation the condition "inventory_hostname in play_hosts"
is breaking the restart workflow.
Signed-off-by: Sébastien Han <seb@redhat.com>
- Add upgrade support for rbd mirror and nfs daemons.
- Only works with systemd (remove sysvinit and upstart occurence)
- A bit of cleanup
Signed-off-by: Sébastien Han <seb@redhat.com>
This will solve the following issue when starting docker containers on ubuntu:
invalid argument "1\u00a0" for --cpus=1 : failed to parse 1 as a rational number
Closes-bug: #2056
1. add the variables to docker_collocation
2. trigger the check when a MDS is part of the inventory file, not when
we run on an MDS...
Signed-off-by: Sébastien Han <seb@redhat.com>
The container deployment is serialized, adding this task as a best
effort. If docker is already present we pull the image otherwise we wait
for the role to play.
Signed-off-by: Sébastien Han <seb@redhat.com>
on jewel `ceph-rbd-mirror.target` isn't enabled, therefore, if the node
is rebooted, the service doesn't get started.
from ceph-rbd-mirror unit file:
```
[Install]
WantedBy=ceph-rbd-mirror.target
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This patch simplifies the checks and installation tasks for NTP.
Debian and Red Hat had a check for NTP's presence but would then
install NTP right afterwards anyways. In addition, there were
tasks for atomic that weren't used anywhere else in the role.
This patch also uses a dynamic include to reduce delays from
skipped tasks.
This patch changes the `when:` keys so that they have no jinja2
delimiters. This avoids Ansible warnings which could turn into
errors in a future Ansible release.
We now have a variable called ceph_pools that is mandatory when
deploying a MDS.
It's a dictionnary that contains a pool name and a PG count. PG count is
mandatory and must be set, the playbook will fail otherwise.
Closes: https://github.com/ceph/ceph-ansible/issues/2017
Signed-off-by: Sébastien Han <seb@redhat.com>
Ansible throws warnings when using yum/dnf/rpm with the command
module:
[WARNING]: Consider using yum module rather than running yum
This patch adds the `warn: no` argument to suppress the warnings
in the Ansible output.
The `always_run` key is deprecated and being removed in Ansible 2.4.
Using it causes a warning to be displayed:
[DEPRECATION WARNING]: always_run is deprecated.
This patch changes all instances of `always_run` to use the `always`
tag, which causes the task to run each time the playbook runs.
Modern versions of Ansible can handle a list of packages passed
directly to the package modules. This patch optimizes the package
install process by passing the list of packages directly to the
module.
This is causing unknown issues when trying to start a dmcrypt container.
Basically the container is stuck at mount opening the LUKS device. This
is still unknown why this is causing trouble but we need to move
forward. Also, this doesn't seem to help in any ways to fix the race
condition we've seen.
Here is the log for dmcrypt:
cryptsetup 1.7.4 processing "cryptsetup --debug --verbose --key-file
key luksClose fbf8887d-8694-46ca-b9ff-be79a668e2a9"
Running command close.
Locking memory.
Installing SIGINT/SIGTERM handler.
Unblocking interruption on signal.
Allocating crypt device context by device
fbf8887d-8694-46ca-b9ff-be79a668e2a9.
Initialising device-mapper backend library.
dm version [ opencount flush ] [16384] (*1)
dm versions [ opencount flush ] [16384] (*1)
Detected dm-crypt version 1.14.1, dm-ioctl version 4.35.0.
Device-mapper backend running with UDEV support enabled.
dm status fbf8887d-8694-46ca-b9ff-be79a668e2a9 [ opencount flush ]
[16384] (*1)
Releasing device-mapper backend.
Trying to open and read device /dev/sdc1 with direct-io.
Allocating crypt device /dev/sdc1 context.
Trying to open and read device /dev/sdc1 with direct-io.
Initialising device-mapper backend library.
dm table fbf8887d-8694-46ca-b9ff-be79a668e2a9 [ opencount flush
securedata ] [16384] (*1)
Trying to open and read device /dev/sdc1 with direct-io.
Crypto backend (gcrypt 1.5.3) initialized in cryptsetup library
version 1.7.4.
Detected kernel Linux 3.10.0-693.el7.x86_64 x86_64.
Reading LUKS header of size 1024 from device /dev/sdc1
Key length 32, device size 1943016847 sectors, header size 2050
sectors.
Deactivating volume fbf8887d-8694-46ca-b9ff-be79a668e2a9.
dm status fbf8887d-8694-46ca-b9ff-be79a668e2a9 [ opencount flush ]
[16384] (*1)
Udev cookie 0xd4d14e4 (semid 32769) created
Udev cookie 0xd4d14e4 (semid 32769) incremented to 1
Udev cookie 0xd4d14e4 (semid 32769) incremented to 2
Udev cookie 0xd4d14e4 (semid 32769) assigned to REMOVE task(2) with
flags (0x0)
dm remove fbf8887d-8694-46ca-b9ff-be79a668e2a9 [ opencount flush
retryremove ] [16384] (*1)
fbf8887d-8694-46ca-b9ff-be79a668e2a9: Stacking NODE_DEL [verify_udev]
Udev cookie 0xd4d14e4 (semid 32769) decremented to 1
Udev cookie 0xd4d14e4 (semid 32769) waiting for zero
Signed-off-by: Sébastien Han <seb@redhat.com>
in addition to c4dcdaa20 this commit adds the missing condition on
install tasks for debian_rhcs deployment. Without them, these tasks are
played on any kind of deployment.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
* DBus on host should include ganesha service file
* to allow ganesha container to respond on DBus it needs to run
in --privileged mode (ganesha folks contacted to look at this)
* ceph_nfs_include_exports_dir variable replaced with more general
ceph_nfs_dynamic_exports
This is something that has nothing to do in `ceph-common`, this
is too specific to `ceph-iscsi-gw` role.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This is something that has nothing to do in `ceph-common`, this
is too specific to `ceph-nfs` role.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Make role `ceph-mgr` handling itself the installation of `ceph-mgr`
package because it's complicated to manage it regarding we are going to
install `jewel vs. luminous`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
During the initial play, the docker command doesn't not exist and then
there is no stdout_lines to the command. So get allows us to fix this by
declaring an array if the command fails.
Signed-off-by: Sébastien Han <seb@redhat.com>
Use an intermediate variable to build the final `dedicated_devices` list
to avoid duplicate entry in that array. (We need a 1:1 relation between
`dedicated_devices` and `devices` since we are using a `with_together`
later.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
With jewel, `bootstrap_rbd_keyring` is not set because of this condition:
```
when:
- ceph_release_num.{{ ceph_release }} >= ceph_release_num.luminous
```
Therefore, the task `try to fetch ceph config and keys` will fail.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
* Change version from 2 to 3.
* use ceph_rhcs_cdn_debian_repo_version to use other repositories along
* with ceph_rhcs_cdn_debian_repo
Signed-off-by: Sébastien Han <seb@redhat.com>
`ceph_release` isn't available at this step of the playbook because it
is set later based on the installed binaries.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1486062
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The value of doing this is fairly low compare to the added value.
So we remove these tasks, if rbd pool on Jewel doesn't have the right PG
value you can always increase it.
Signed-off-by: Sébastien Han <seb@redhat.com>
All keyring are getting copied to all nodes.
This commit fixes a leftover from a previous code refactor.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1498583
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
iscsi-gw needs a 'rbd' pool to configure iscsi target.
Note: I could have used the facts already set in `ceph-mon` but I voluntarily
didn't do it to not create a dependancy between these two roles.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit refacts the code regarding all `set_osd_pool_default_*`
related tasks by avoiding usage of useless `set_fact` to determine
whether a key is present in `ceph_conf_overrides`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The rbd pool is the default pool that gets created during ceph cluster
initializaiton. If we act on the rbd related operations too early, the
rbd pool does not exist yet. Move the call to perform rbd operations
to a later stage after other pools have been created.
The rbd_pool.yml playbook has all the operations related to the rbd pool.
Replace the always_run (deprecated) directive with check_mode.
Most of the ceph related tasks only need to run once. The run_once directive
executes the task on the first host.
The ceph sub-command to delete a pool is delete (not rm).
The changes submitted here were tested with this ceph version.
ceph version 0.94.9-9.el7cp (b83334e01379f267fb2f9ce729d74a0a8fa1e92c)
This upload includes these changes:
- Use the fail module (instead of assert).
- From luminous release, the rbd pool is no longer created by default.
Delete the code to create the rbd pool for luminous release
- Conform the .yml files to use the suggested syntax.
The commands are executed on the mcp nodes and I think shell ansible module
is the right one to use. The command module is used to execute commands on
remote nodes. I can make the change to use command module if that is
prefrerred.
This is to ensure `docker_exec_cmd` fact is set with the correct value
in case of daemons collocation
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Also we now play ceph-config to have everything being generated for new
daemons bootstrap during upgrade.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1497959
Signed-off-by: Sébastien Han <seb@redhat.com>
generate_crt|bool|default(false) won't apply the default value, this
generate_crt|default(false)|bool will
Signed-off-by: Sébastien Han <seb@redhat.com>
The task which sets `ceph_current_fsid` in `facts.yml` in case of containerized
deployment, will definitely fail because `docker_exec_cmd` is not set
yet.
This commits simply makes `facts.yml` played after `check_socket.yml` so
`docker_exec_cmd` is set properly.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commits refacts the role ceph-mds
The goal here is to create cephfs in `ceph-mon` for both containerized
and non-containerized cases so we don't need the admin keyring on mds
nodes anymore.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1488999
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Using systemd module allows us to do in one task what we did in three
tasks:
- enable unit file,
- issue a `daemon-reload`,
- start the service
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Running the socket check on all the hosts will override the default
value of docker_exec_cmd, leaving it with the last value (currently
rbd-mirror), as a result the subsequent docker_exec_cmd usage for the
:x
Signed-off-by: Sébastien Han <seb@redhat.com>
There is a bug in the rbd mirror unit file, the upstream fix is here:
https://github.com/ceph/ceph/pull/17969.
This should be reverted once the patch is merged and backport is done.
Signed-off-by: Sébastien Han <seb@redhat.com>
This fixes the error :
```
The conditional check 'sestatus.stdout != 'Disabled'' failed.
```
that occurs when running on non rhel based system since the
`sestatus` fact is registered only on rhel based distribution.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Specify the timeout flag to ceph-create-keys, which causes it to time out
if a monitor quorum isn't achieved. This overrides the default timeout
of 10 minutes, causing ceph-ansible to fail faster in the event of cluster
network issues.
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
We have seen issues with leftover socker. So now, if a socket is found
we also check if it's accessed by a process. If so, we can run the
handler, if not we remove it and continue the playbook.
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
It's sad but we can not rely on the prepare container anymore since the
log are flushed after reboot. So inpecting the container does not return
anything.
Now, instead we use a ephemeral container to look up for the
journal/block.db/block.wal (depending if filestore or bluestore) and
build the activate command accordingly.
Signed-off-by: Sébastien Han <seb@redhat.com>
need to use `hostvars[host]['XXX']` to retrieve the monitor
interface and/or radosgw interface.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1493920
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
- move the file fetch/push to the existing task
- rename the include
- generate the ganesha template from ansible
- re-arrange role structure
- re-use tasks for non-container and container
- configure keys for non-container and container
- fix rgw container key collection;
Signed-off-by: Sébastien Han <seb@redhat.com>
the rbd key was not pushed on rbd nodes because its keyring path was not
added in `ceph_config_keys`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We generate the ceph.conf on all the nodes through the
ceph-docker-common so there is no need to push it to the Ansible file.
Also this is breaking the ceph.conf template generation since we only
generate sections based on the host the ansible task is running on.
For example, what's typically happening, we bootstrap the monitor, we
get a ceph.conf generated for a mon only, we go on an osd, we generate
the ceph.conf with osd section (done by ceph-docker-common) but this
gets overwritten by the copy_config task of the ceph-osd role.
Signed-off-by: Sébastien Han <seb@redhat.com>
- Change capitalization of config options to be
in line with what config.txt in the nfs-ganesha
tree says
Signed-off-by: Ali Maredia <amaredia@redhat.com>
RHCS install wasn't working at all prior to this commit as the name of
the include was pointing to a non-existing file.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492056
Signed-off-by: Sébastien Han <seb@redhat.com>
When ceph-nfs service is managed by pacemaker, it's useful to
not enable and start ceph-nfs service through systemd but let
pacemaker to start the service in a next step.
In analogy to ceph_nfs_rgw_user, we should be able to define a user
with which the nfs-ganesha Ceph FSAL connects to the cluster.
Introduce a ceph_nfs_ceph_user variable, setting its default to
"admin" (which preserves the prior behavior of always connecting as
client.admin).
Fixes#1910.
When Ansible is not run with verbose options it's difficult to see which
include and/or set_fact does what. So adding a name for each clarifies.
Signed-off-by: Sébastien Han <seb@redhat.com>
The variable "statleftover" was removed by commit
a60c74f61e
and never added back to the new playbook,
yet it is still being referenced.
Adding it back
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492224
Signed-off-by: Sébastien Han <seb@redhat.com>
On a container env, machines don't have any ceph binaries so we need to
use a container to run the commands.
Signed-off-by: Sébastien Han <seb@redhat.com>
Use default delay since the mon (in particular) can take more time to
restart.
Solves error with:
STDERR:
Error response from daemon: No such container: ceph-mon-mon0
Signed-off-by: Sébastien Han <seb@redhat.com>
All keys are copied to all nodes.
This commit split that task in each roles so keys are copied to their
respective nodes.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1488999
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Less configuration for the user, the container inherit from the global
variables. No more container specific variables.
Signed-off-by: Sébastien Han <seb@redhat.com>
In a collocated scenario, where you might put a rgw, a mds and a mon on
the same node you don't want the handler blindly restart all the daemons
on the node. Indeed some of them might not be configured yet.
Implementing a more precise socket detection, for each daemon type.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1488813
Signed-off-by: Sébastien Han <seb@redhat.com>
Prior to this patch this activation sequence for autodetection was
always skipped because we were asking to activate on device without
partitions, which doesn't make sense.
We also fix the way we lookup for a device, since the data partition is
always numbered 1, we take the min element of the dict.
Closes: https://github.com/ceph/ceph-ansible/issues/1782
Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add new boolean parameter for client config create_key_file_only
with a default of false. When create_key_file_only is true, the
client tasks to connect to the external ceph cluster to verify
the key `ceph auth import` the key are skipped.
Fixes: #1848
We must mask the image so we are sure that even if the system reboots
then the OSDs won't start.
Also remove Ceph udev rules if found on the system prior to deploy
containers. If we don't do this we are exposed to conflicts between udev
rules and sytemd unit files.
Also add the CI will now test the migration from a non-containerized cluster to a
containerized cluster.
Signed-off-by: Sébastien Han <seb@redhat.com>
The way we handle the restart for both mds and rgw is not ideal, it will
try to restart the daemon on the host that don't run the daemon,
resulting in a service file being created (see bug description).
Now we restart each daemon precisely and in a serialized fashion.
Note: the current implementation does NOT support multiple mds or rgw on
the same node.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1469781
Signed-off-by: Sébastien Han <seb@redhat.com>
This patch adds passing the RGW_CIVETWEB_IP to the docker
container. This IP defaults to the value of radosgw_civetweb_bind_ip.
radosgw_civetweb_bind_ip default to ipv4.default
Without this value, the RGW containter will bind to 0.0.0.0
Remove the old check prior systemd.
We only support systemd so there is no need to run a condition on
systemd. The playbook will fail if systemd is not present.
Signed-off-by: Sébastien Han <seb@redhat.com>
The installation process is now described as follow:
* you still have to choose a 'ceph_origin' installation method. The
origin can be a 'repository' (add a new repository), distro (it will use
the packages provided by the native repo source of your distribution),
local (only available on redhat system, it installs locally built
packages). This option is not well tested, so use it carefully
* if ceph_origin == 'repository' you will have to decide what kind of
repository you want to enable:
- community: corresponds to the stable upstream/community version
- enterprise: corresponds to the stable enterprise/downstream version
(basically you are a red hat customer)
- dev: it will install ceph from packages built out of the github
development branches
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The list can not be evaluated properly if it containers '[]', which is
the case when using the filter "default([])". To fix this, we have to
properly merge the lists.
This is fixing the issue: "list object has no element 1"
Signed-off-by: Sébastien Han <seb@redhat.com>
By detecting the ceph version running in the container we can easily
apply conditions like:
ceph_release_num.{{ ceph_release }} >= ceph_release_num.luminous
We do that already, in ceph-docker-common/tasks/fetch_configs.yml.
This fixes the error:
TASK [ceph-docker-common : register rbd bootstrap key]
******************************************************
fatal: [magna005]: FAILED! => {"failed": true, "msg": "The conditional
check 'ceph_release_num.{{ ceph_release }} >= ceph_release_num.luminous'
failed. The error was: error while evaluating conditional
(ceph_release_num.{{ ceph_release }} >= ceph_release_num.luminous):
'dict object' has no attribute 'dummy'\n\nThe error appears to have been
in
'/home/ubuntu/ceph-ansible/roles/ceph-docker-common/tasks/fetch_configs.yml':
line 2, column 3, but may\nbe elsewhere in the file depending on the
exact syntax problem.\n\nThe offending line appears to be:\n\n---\n-
name: register rbd bootstrap key\n ^ here\n"}
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1486062
Signed-off-by: Sébastien Han <seb@redhat.com>
Logging inside the container is not useful since it writes to the
overlayfs partition, resulting in potential performance degradation on
the container.
If you need to check the logs, just look at journald.
Signed-off-by: Sébastien Han <seb@redhat.com>
with_items is evaluated before the when condition so if the task that
registers the 'results' is skipped the task will fail with:
{"failed": true, "msg": "'dict object' has no attribute 'results'"}
Defaulting to an empty array fixes the issue.
Reverts: abdd66619e
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1482061
Signed-off-by: Sébastien Han <seb@redhat.com>
The script can fail to get the osd id because the osds are activated by
udev and it can take a while for them to activate. This commit fixes
that by trying to get all the osds per node in a loop.
This commit also makes the osd services enabled so that they are
available after reboot.
Signed-off-by: Boris Ranto <branto@redhat.com>
There should be no need to use sudo when writing or using these files.
It creates an issue when the user running ansible-playbook does not
have sudo privs.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Some deployments can't copy infrastructure playbooks outside of the
infrastructure-playbooks directory. Thus they use ANSIBLE_ROLES_PATH to
overcome this. However some roles have 'playbook_dir' hardcoded, which
results in wrong path since the execution comes from
infrastructure-playbooks. Basically the role triggered by a playbook
from infrastructure-playbooks believes that the roles are in
infrastructure-playbooks/roles. This commit fixes that.
Signed-off-by: Sébastien Han <seb@redhat.com>
This will give us more flexibility and the possibility to deploy a client node
for an external ceph-cluster.
related BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1469426Fixes: #1670
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Before this commit we were forcing ipv4 which might not be available.
Now setting ip_version to ipv4 or ipv6 will give you the right support.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1484189
Signed-off-by: Sébastien Han <seb@redhat.com>
This commit eases the use of the
infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml
playbook. We basically run it with a couple of pre-tasks and then we let
the playbook run the docker roles.
It obviously expect to have proper variables configured in order to
work.
Signed-off-by: Sébastien Han <seb@redhat.com>
The lvm_volumes variable is now a list of dictionaries that represent
each OSD you'd like to deploy using ceph-volume. Each dictionary must
have the following keys: data, journal and data_vg. Each dictionary also
can optionaly provide a journal_vg key.
The 'data' key represents the lv name used for the OSD and the 'data_vg'
key is the vg name that the given lv resides on. The 'journal' key is
either an lv, device or partition. The 'journal_vg' key is optional and
must be the vg name for the journal lv if given. This key is mainly used
for purging of the journal lv if purge-cluster.yml is run.
For example:
lvm_volumes:
- data: data_lv1
journal: journal_lv1
data_vg: vg1
journal_vg: vg2
- data: data_lv2
journal: /dev/sdc
data_vg: vg1
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Resolves issue: Multiple RGW Ceph.conf Issue #1258
In multi-RGW setup, in ceph.conf the RGW sections
contain identical bind IP in civetweb line. So this
modification fixes that issue and puts the right IP
for each RGW.
Signed-off-by: SirishaGuduru SGuduru@walmartlabs.com
Modified ceph-defaults and ran generate_group_vars_sample.sh
group_vars/osds.yml.sample and group_vars/rhcs.yml.sample are
not part of the changes. But they got modified when
generate_group_vars_sample.sh is ran to generate group_vars/
all.yml.sample.
Uncommented added variables in ceph-defaults
Updated tests by adding value for radosgw_interface
Added radosgw_interface to centos cluster tests
Modified ceph-rgw role,rebased and ran generate_group_vars_sample.sh
In ceph-rgw role removed check_mandatory_vars.yml.
Rebased on master.
Ran generate_group_vars_sample.sh and then the below files got
modified.
The rbd-mirror daemon will be HA under luminous and new daemon health
features require a way to uniquely identify rbd-mirror instances.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
To be properly evaluated the "skipped" conditions must always have the
first place on the list of condition, otherwise the other conditions are
evaluated before and make the task fail.
Closes: https://github.com/ceph/ceph-ansible/issues/1733
Signed-off-by: Sébastien Han <seb@redhat.com>
Problem: task "check for a ceph socket in containerized deployment" will
be skipped if we are not an OSD.
with_items are still evaluated before when conditions so if the task was
skipped the dict will be empty and then fail.
Adding a "not skipped" condition skips the execution of the task.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1482061
Signed-off-by: Sébastien Han <seb@redhat.com>
This commits force ceph-common to be installed early in deployment on
nodes.
For instance, ceph-rbdmirror doesn't have the CLI installed while it is
needed for some tasks which uses it to set some facts.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
ceph services can fail to start under certain circumstances (for
example, when running in a container) because the default systemd
service configuration causes namespace issues.
To work around this we can override the system service settings by
placing an overrides file in the ceph-<service>@.service.d directory.
This can be generic so as to allow any potential changes required to
the ceph-<service> service files.
The overrides file is only setup when the
"ceph_<service>_systemd_overrides" config_template override variable is
specified.
The available service systemd override files are as follows:
ceph_mds_systemd_overrides
ceph_mgr_systemd_overrides
ceph_mon_systemd_overrides
ceph_osd_systemd_overrides
ceph_rbd_mirror_systemd_overrides
ceph_rgw_systemd_overrides
The original fix to issue #1755 only set the permissions on
the monitors to which the key was copied, but not the original
monitor where the key was created. Thus, we use a separate task
to set the permission of the key.
The openstack_keys structure now supports a key called mode
whose value is a string that one could pass to chmod to set
the mode of the key file. The ansible file module applies the
mode to all openstack keys with this property.
Fixes: #1755
This is under the MDS role instead of the mon role because that role
does not create the filesystem under docker.
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
This scenario will create OSDs using ceph-volume and is only available
in ceph releases greater than Luminous.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
There is only two main scenarios now:
* collocated: everything remains on the same device:
- data, db, wal for bluestore
- data and journal for filestore
* non-collocated: dedicated device for some of the component
Signed-off-by: Sébastien Han <seb@redhat.com>
in containerized deployment, if you try to update your `ceph.conf` file
it won't be actually updated on your nodes because it is overwritten by
the copy of the file which is present in your fetch directory.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Move `fsid`,`monitor_name`,`docker_exec_cmd` and `ceph_release` set_fact
to `ceph-defaults` role.
It will allow to reuse these facts without having to play `ceph-common`
or `ceph-docker-common`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This will give us more flexibility and avoid a lot of useless when
skipping all tasks from a non-desired role.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add a new role `ceph-defaults`.
This role aims to handle all defaults vars for `ceph-docker-common` and
`ceph-common` and set basic facts (eg. `fsid`)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Merge `ceph-docker-common` and `ceph-common` defaults vars in
`ceph-defaults` role.
Remove redundant variables declaration in `ceph-mon` and `ceph-osd` roles.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We don't want to have heterogeous ceph.conf anymore and believe that we
should have the right section for the running daemon.
If we don't do this and use profiles, e.g: rgw, we will get a new rgw
section on some of the nodes.
Signed-off-by: Sébastien Han <seb@redhat.com>
It is mandatory now to set the Ceph version you want to install, e.g:
ceph_stable_release: luminous
To find the release names, you can look at the release not doc:
http://docs.ceph.com/docs/master/release-notes/
Signed-off-by: Sébastien Han <seb@redhat.com>
ceph-disk is responsable for enabling the unit file if needed. Actually
since https://github.com/ceph/ceph/pull/12241 it seems that it's not
even needed. On an event of a restart, udev rules will be trigger and
they will ceph-disk activate the device too so the 'enabled' is not
needed.
Closes: https://github.com/ceph/ceph-ansible/issues/1142
Signed-off-by: Sébastien Han <seb@redhat.com>
OSDs get started by ceph-disk before the ceph.conf file is written
with a crush location. That results in a crush map without configured
crush location.
To prevent this, we have to restart the OSDs during the initial setup
after the crush location was added to the ceph.conf file.
We have multiple issues with ceph-disk's cli with bluestore and Ceph
releases. This is mainly due to cli changes with Luminous. Luminous
introduced a --bluestore and --filestore options which respectively does
not exist on releases older than Luminous. The default store being
bluestore on Luminous, simply checking for the store is not enough so we
have to build a specific command line for ceph-disk depending on the
Ceph version we are running and the desired osd_store.
Signed-off-by: Sébastien Han <seb@redhat.com>
The keys and openstack_keys structure now supports an optional
key called acls whose value is a list of strings one could pass
to setfacl. The ansible ACL module applies the ACLs to all
openstack keys with this property.
Fixes: #1688
Remove `rgw enable static website` and `rgw enable usage log` from
ceph.conf and make it usable with ceph_config_overrides as profiles.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit introduces a new directory called "profiles" which
contains some set of variables for a particular use case. These profiles
provide guidance for certain scenarios such as:
* configuring rgw with keystone v3
Signed-off-by: Sébastien Han <seb@redhat.com>
According to Alfredo, this was used for gitbuilders. Right now shaman/chacra
dev repos are unsigned.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Move condition at task level and not at include level to make `fsid`
variable available for all roles.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Some tasks fetch file to `{{ fetch_directory }}/docker_mon_files` and
then try to copy from `{{ fetch_directory }}/{{ fsid }}`. That causes
the playbook to fail.
Fixes: #1683
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
To keep consistency between `{{ openstack_keys }}` and `{{ keys }}`
respectively in `ceph-mon` and `ceph-client` roles.
This commit also add the possibility to set mds caps.
Fixes: #1680
Co-Authored-by: John Fulton <johfulto@redhat.com>
Co-Authored-by: Giulio Fidente <gfidente@redhat.com>
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The ceph.conf template needs to look for the value of monitor_interface
in hostvars[host] because there might be different values set per host.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
In Luminous, ceph-disk defaults to bluestore so all our scenarios are
using bluestore, we need to force testing both.
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
In addition to ceph/ceph-docker@69d9aa6, this explains how to deploy a
containerized cluster with a custom admin secret.
Basically, just need to pass the `admin_secret` defined in your
`group_vars/all.yml` to the `ceph_mon_docker_extra_env` variable.
Eg:
`ceph_mon_docker_extra_env: -e CLUSTER={{ cluster }} -e FSID={{ fsid }}
-e MON_NAME={{ monitor_name }} -e ADMIN_SECRET={{ admin_secret }}`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Fail with a sane message if the devices or raw_journal_devices variables
are strings instead of lists during manual device assignment.
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
rsync is required by the ansible synchronize package. Ensure
it is installed when local installation is selected.
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
Add a new parameter `admin_secret` that allow to deploy a ceph cluster
with a custom admin secret.
Fix: #1630
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commits refactors how we deploy bluestore. We have existing
scenarios that we don't want to change too much. This commits eases the
user experience by now changing the way you use scenarios. Bluestore is
just a different interface to store objects but the scenarios more or
less remain the same.
If you set osd_objectstore == 'bluestore' along with
journal_collocation: true, you will get an OSD running bluestore with DB
and WAL partitions on the same device.
If you set osd_objectstore == 'bluestore' along with
raw_multi_journal: true, you will get an OSD running bluestore with a
dedicated drive for the rocksdb DB, then the remaining
drives (used with 'devices') will have WAL and DATA collocated.
If you set osd_objectstore == 'bluestore' along with
raw_multi_journal: true and declare bluestore_wal_devices you will get
an OSD running bluestore with a dedicated drive for rocksdb db, a
dedicated drive partition for rocksdb WAL and a dedicated drive for
DATA.
Signed-off-by: Sébastien Han <seb@redhat.com>
There is no need for 2 variables to enable bluestore, prior to this
patch one had to do the following to activate bluestore:
osd_objectstore: bluestore
bluestore: true
Now you just need to set `osd_objectstore: bluestore`.
Fixes: https://github.com/ceph/ceph-ansible/issues/1475
Signed-off-by: Sébastien Han <seb@redhat.com>
remove `ceph_mon_docker_interface` and use `monitor_interface` instead
for both containerized and non-containerized deployment.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Some variables are missing from ceph-docker-common role since the
include of check_mandatory_vars.yml has been re-added in the ceph-mon
role.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The check regarding the networking scenario configuration has been
moved from ceph-common to ceph-mon in 1de8176 but the include was not re-added
in 189f4fe
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
e8187f6 does not fix the ipv6 as expected since `ansible_default_*` are
filled with the IP address carried by the network interface used by the
default gateway route. By the way, it assumes that the MON_IP address will
be this IP address which is not always the case.
We need to keep using the previous fact but add some intelligence in the
template to determine how to retrieve the ipv4|ipv6 address since the path
to the fact in `hostvars` is not the same according to ipv4 vs ipv6 case.
Fix: 1569
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add an extra variable to the openstack pools, which creates them with
defined rules. This will allow to place different pools on e.g.
different type of disks.
This commit will also set a new default rule when defined and move
the rbd pool to the new rule.
Somehow the shell module will return an error if the command line is not
next to it.
Plus fixed the import with the right path.
Signed-off-by: Sébastien Han <seb@redhat.com>
Followup on https://github.com/ceph/ceph-ansible/pull/1469 where we
merged most of the container code from roles/ceph-*/task/docker/*.yml
into roles/ceph-docker-common/tasks/
It seems that we forgot to remove the original files.
Signed-off-by: Sébastien Han <seb@redhat.com>
The new test in the checks PGs are no longer working on distributions
where /bin/sh isn't linked to /bin/bash.
Fix: #1619
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
OpenStack's Gnocchi service expects to have a pool called "metrics".
This change addess "metrics" to the list of `openstack_pools` and
creates a corresponding key. It is only run if the user sets
`openstack_config: false`.
The current handler only restarts one OSD on each OSD server. After
the first one the handler stops, not matter what results the checks had.
Co-Authored-By: Gaudenz Steinlin (@gaudenz)
Remove "osd mkfs type" and the other pre-Bluestore parameters from the
generated ceph.conf so that disk activation on OSDs will work. The
current default xfs config results in a failed deployment and
incorrect partition metadata.
For newly created cluster the command: ceph --cluster {{ cluster }} osd
pool get rbd size does not respond properly.
We only want to check if the rbd pool exists, so we know use an ls |
grep approach.
Closes: https://github.com/ceph/ceph-ansible/issues/1547
Signed-off-by: Sébastien Han <seb@redhat.com>
Rewrite the check_pgs by using json parsing instead of complex regexp to
parse the `ceph -s` output.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add a default value for `ceph_docker_on_openstack` to avoid a
conditional check error for the task `pause after docker install before starting` in
`roles/ceph-docker-common/tasks/pre_requisites/prerequisites.yml`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The fact ['ansible_$interface']['ipv4'] is a dictionary where
['ansible_$interface']['ipv6'] is a list. If we use
ansible_default_ipv6|ipv4 is is always a dictionary which allows us to
get the ipv6 and ipv4 address without adding more complexity to the
template.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
For some reason we changed the check of pgs but it appears it could be
dangerous because the current check might satisfied as long as 1 PG is
active+clean.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
since `-e CEPH_DAEMON=OSD_CEPH_DISK_ACTIVATE` is already hardcoded in
`eph-osd-run.sh.j2` there is no need to add `-e
CEPH_DAEMON=OSD_CEPH_DISK_ACTIVATE` as a default value in defaults vars.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
`ceph-docker-common`:
At the moment there is a lot of duplicated tasks in each
`./roles/ceph-<role>/tasks/docker/main.yml` that could be refactored in
`./roles/ceph-docker-common/tasks/main.yml`.
`*_containerized_deployment` variables:
All `*_containerized_deployment` have been refactored to a single
variable `containerized_deployment`
duplicate `cephx` variables in `group_vars/* have been removed.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We only check for everything expect 'distro' because that
is a valid way of deploying RHCS, with preprepared repos
present on the nodes.
Signed-off-by: Sébastien Han <seb@redhat.com>
Problem: we could end up in situation where we would install a package
on a machine that does not have the right repo enabled. Because the
condition was set to OR we weren't pinning a particular host but just a
condition. Let's say someone sets 'ceph_origin == "distro"', this would
try to install OSD packages on Monitors.
Solution: use a AND condition to first pin to the group_name (which
identifies a set of hosts) AND then after this one of the installation
condition.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1453119
Co-Authored-By: https://github.com/zhsj
Signed-off-by: Sébastien Han <seb@redhat.com>
Problem: fail to deploy a containerized Ceph cluster with ipv6
Solution: do not hardcode ipv4 when bootstrapping the container.
Now use ip_version: ipv6 to get a containerized cluster deployed with
ipv6.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1451786
Signed-off-by: Sébastien Han <seb@redhat.com>
In addition to `196fa7e` this commit check if a container has been
already launched and delete it before retrying the ceph osd prepare
process.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The CI on Docker is reporting the following error:
STDERR:
Error EINVAL: bad entity name
This is due to the fact that this auth entity name does not exist on
Jewel so we should not create that key when running Jewel containers.
Fixes: https://github.com/ceph/ceph-ansible/issues/1514
Signed-off-by: Sébastien Han <seb@redhat.com>
Already documented in the Red Hat Ceph Storage 2 Installation Guide
for Red Hat Enterprise Linux, but not here
Signed-off-by: Florian Klink <flokli@flokli.de>
"rgw override bucket index max shards" and
"rgw bucket default quota max objects" were in the
client section of the ceph.conf and not being
applied, this commit moves them to global
Resolves: bz#1391500
Signed-off-by: Ali Maredia <amaredia@redhat.com>
We shouldn't need this anymore as the upgrade bug that
debian_ceph_packages was used to workaround should have
been fixed as of jewel.
See https://github.com/ceph/ceph-ansible/issues/1481 for more
detailed information.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Change civetweb_num_thread default to 100
Add capability to override number of pgs for
rgw pools.
Add ceph.conf vars to enable default bucket
object quota at users choosing into the ceph.conf.j2
template
Resolves: rhbz#1437173
Resolves: rhbz#1391500
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Restore the check_socket that was removed by `5bec62b`.
This commit also improves the logging in `restart_*_daemon.sh` scripts
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Prior to this change, ansible was only checking for the existence of the
package, now if upgrade_ceph_packages is true this means we are
performing an upgrade.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1442016
Signed-off-by: Sébastien Han <seb@redhat.com>
Proof-of-concept clusters or actual production clusters will never want to use this. We also do not test it anywhere for this same reason.
Signed-off-by: Gregory Meno <gmeno@redhat.com>
This is to allow ceph-mgr daemons to remote control
osd and mds daemons with MCommand messages.
Fixes: http://tracker.ceph.com/issues/19713
Signed-off-by: John Spray <john.spray@redhat.com>
Without this, we don't test the mgr role so we need to add it.
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
This is the same fix as bc846b7da6
applied to the other part of the code-base that builds ceph.conf (I'd
missed that 349b9ab3e7 had duplicated
this code).
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
Ansible's assemble module by default will put all files in the src
directory together into dest. We only want to put {{ cluster }}.conf
and osd.conf together, not anything that might have found its way into
/etc/ceph/ceph.d (e.g. files left by the sysadmin taking backups
before an ansible run). So specify a regexp that matches only those
two files.
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
Ansible evaluates the 'with_items' before the 'when' so if the inventory
does not have the group declared it'll fail. To fix this, we set an
empty array to make the with_items happy and then evaluate with the
'when'.
Signed-off-by: Sébastien Han <seb@redhat.com>
Prior to this change we were deploying a monitor using tis fqdn name but
we were checking its state and performing actions on it using its
shortname.
Signed-off-by: Sébastien Han <seb@redhat.com>
The Ceph Manager daemon (ceph-mgr) runs alongside monitor daemons, to
provide additional monitoring and interfaces to external monitoring and
management systems.
Only works as of the Kraken release.
Co-Authored-By: Guillaume Abrioux <gabrioux@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
ceph-create-keys unit file was removed here:
* 8bcb4646b6
* dc5fe8d415
As a consequence the systemctl preset command now fails to run since the
unit does not exist anymore. Due to the redirection in /dev/null we
don't know what's happening.
Ultimately the mon unit doesn't get enabled and the mon service won't
start after reboot.
Removing the old/non-existent unit makes the command succeed now.
ceph fix: https://github.com/ceph/ceph/pull/14226
Signed-off-by: WingkaiHo <sanguosfiang@163.com>
Co-Authored-By: Sébastien Han <seb@redhat.com>
Until now, only the first task were executed.
The idea here is to use `listen` statement to be able to notify multiple
handler and regroup all of them in `./handlers/main.yml` as notifying an
included handler task is not possible.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
As reported in
https://github.com/ceph/ceph-ansible/issues/1403 when devices are held
by lvm and `osd_auto_discovery` is set to true, it's not enough to check
for a partition count = 0 since Ansible does not report.
This patch also looks for 'holders' which in a case of lvm corresponds
to the name of the pv. Now we also look for holders = 0.
Fixes: #1403
Signed-off-by: Sébastien Han <seb@redhat.com>
Problem: too many different commands to do the same thing. The 'cut'
command on infrastructure-playbooks/purge-cluster.yml was also wrong.
This sed command from osixia in ceph-docker
https://github.com/ceph/ceph-docker/pull/580/ addresses all the
scenarios.
Signed-off-by: Sébastien Han <seb@redhat.com>
ntp is still installed even if ntp_service_enabled is set to false.
That could be a problem if the time synchronization is managed by
something else than ceph-ansible or if you want to use different NTP
implementation as suggested in #1354.
Fixes: #1354
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Signed-off-by: Guits <gabrioux@redhat.com>
If a group of hosts is empty, (for instance 'mdss', in case of a
deployment without any mds node), the playbook will fails when trying
to restart service with `"'dict object' has no attribute u'XXX'"` error.
The idea here is to force the `with_items` statements in all included handler tasks
to get at least an empty array.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
systctl tuning should be in the sysctl.d directory. This creates
a seperation from what values were set specific to ceph, and what
values were set by the operator.
Signed-off-by: Tyler Brekke <tbrekke@redhat.com>
With ' in osd_crush_location, systemd will show this error:
ceph-osd-prestart.sh[2931]: Invalid command: invalid chars ' in 'root=
Signed-off-by: Christian Zunker <christian.zunker@codecentric.de>
This fixes issue #1299. According to @ktdreyer s comment in the ticket,
he fixed the web server config so also older (non-SNI) python clients
can use the uri module here.
After the jewel release the mon startup does not generate keys, but it's
still harmless to call ceph-create-keys with jewel because this task has
a 'creates' argument that will cause it not to run if the keys already
exist.
Removing this when condition also allows the downstream CI tests to
install kraken or luminous without resetting ceph_stable_release, which does not
pertain to rhcs.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
This is not only for monitors, but also mds, rgw and rbd mirror so
making the var name more generic:
ceph_docker_enable_centos_extra_repo
Signed-off-by: Sébastien Han <seb@redhat.com>
Sometimes the socket appears during the 5th attempt and sometimes not so
increasing the timeout a little bit.
Signed-off-by: Sébastien Han <seb@redhat.com>
Add the possibility to create openstack pools and keys even for containerized deployments
Fix: #1321
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This removes the implicit order requirement when using OSD fragments.
When you use OSD fragments and ceph-osd role is not the last one,
the fragments get removed from ceph.conf by ceph-common.
It is not nice to have this code at two locations, but this is
necessary to prevent problems, when ceph-osd is the last role as
ceph-common gets executed before ceph-osd.
This could be prevented when ceph-common would be explicitly called
at the end of the playbook.
Signed-off-by: Christian Zunker <christian.zunker@codecentric.de>
This option was missing for rrgw, mds, rbd mirror and nfs making these
daemon impossible to run on a kv deployment with containers.
Signed-off-by: Sébastien Han <seb@redhat.com>
Prior to this change, ceph-ansible would install the main NFS Ganesha
server daemon on Ubuntu, but it would skip the Ceph FSALs.
Running "apt-get install nfs-ganesha" will only install the main NFS Ganesha
server. It does *not* pull in the RGW FSAL
(/usr/lib/x86_64-linux-gnu/ganesha/libfsalrgw.so)
Running "apt-get install nfs-ganesha-fsal" will install the RGW FSAL as
well as the main NFS Ganesha server package.
Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
This patch introduces calamari_debug option which will turn on debugging
for calamari before initializing and running it.
Signed-off-by: Boris Ranto <branto@redhat.com>
From Josh Durgin, "I'd recommend not setting vfs_cache_pressure in
ceph-ansible. The syncfs issue is still there, and has caused real
problems in the past, whereas there hasn't been good data showing lower
vfs_cache_pressure is very helpful - the only cases I'm aware of have
shown it makes little difference to performance."
https://bugzilla.redhat.com/show_bug.cgi?id=1395451