When we consume the distribution packages, we don't have the choise on
which version to install, so we shouldn't require that variable to be
set. Distributions normally provide only one version of Ceph in the
official repositories so we get whatever they provide.
Signed-off-by: Markos Chandras <mchandras@suse.de>
openSUSE Leap 42.3 provides support for Ceph Luminous in both the
distribution package and the latest available version in the OBS
repository so add these as the only available installation methods for
openSUSE.
Signed-off-by: Markos Chandras <mchandras@suse.de>
Like 80d32dec, the path to the fact is not correct.
In any case, we will retrieve the IP address in hostvars, the variable
is the way we get the interface name according where it has been set
(eg.: inventory host file vs. group_vars/)
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1510906
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
there is no need to have a condition on this task, this test should be
always run since the result will be interpreted later.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This will prevent ceph-ansible from using a loop device while it
shouldn't in auto_discovery mode.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The path to the fact is not correct.
In any case, we will retrieve the IP address in hostvars, the variable
is the way we get the interface name according where it has been set
(eg.: inventory host file vs. group_vars/)
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1510906
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Use `devices` variable instead of `ansible_devices`, otherwise it means
we are not using the devices which have been 'auto discovered'
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The current code will also return lvm devices such as /dev/dm-2, this
kind of device type is not supported by ceph-disk at the moment. Now we
just ignore them.
Signed-off-by: Sébastien Han <seb@redhat.com>
- One can not run scripts directly in place, that mounted with `noexec`
option. But one can run scripts as arguments for `bash/sh`.
Signed-off-by: Arano-kai <captcha.is.evil@gmail.com>
During the initial implementation of this 'old' thing we were falling
into this issue without noticing
https://github.com/moby/moby/issues/30341 and where blindly using --rm,
now this is fixed the prepare container disappears and thus activation
fail.
I'm fixing this for old jewel images.
Also this fixes the machine reboot case where the docker logs are
purgend. In the old scenario, we now store the log locally in the same
directory as the ceph-osd-run.sh script.
Signed-off-by: Sébastien Han <seb@redhat.com>
Setting monitor_interface in group_vars/all.yml makes the
hostvars[host]['monitor_interface'] non-existing.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1507922
Signed-off-by: Sébastien Han <seb@redhat.com>
Only chmod or setfacl the requested keyring(s) in the
opentack_keys data structure when the mode or acls keys
of that data structure exist.
User may specify four permission combinations for the
keyring file(s): 1. only set ACL, 2. only set mode,
3. set neither mode nor ACL, 4. set mode and then ACL.
Fixes: #2092
Use "ceph_tcmalloc_max_total_thread_cache" to set the
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES value inside /etc/default/ceph for
Debian installs, or /etc/sysconfig/ceph for Red Hat/CentOS installs.
By default this is set to 0, so the default package value will be used,
if specified this value will be changed to match the variable, and ceph
osd services will be restarted.
There was a huge resync from luminous to jewel in ceph-docker:
https://github.com/ceph/ceph-docker/pull/797
This change brought a new handy function to discover partitions tight to
an OSD. This function doesn't exist in the old image so the
ceph-osd-run.sh script breaks when trying to deploy Jewel OSD with that
old Jewel image version.
Signed-off-by: Sébastien Han <seb@redhat.com>
stable-3.0 brought numerous changes in ceph-ansible variables, this PR
aims to maintain backward compatibility for someone running stable-2.2
upgrading to stable-3.0 but keeps its groups_vars untouched.
We will then determine the right options to make sure the upgrade works
but we are expecting that new variables should be used.
We will drop this in a near future, maybe 3.1 or 3.2.
Signed-off-by: Sébastien Han <seb@redhat.com>
This feature isn't available before luminous, therefore, we need to play
them only on luminous and after otherwise the playbook will fail.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3f3d4b9c727d06154c422d445fc2a245aceaed89)
3a58757 introduced an issue for Jewel deployments, since this role is
skipped, `enabled_ceph_mgr_modules.stdout` doesn't exist, therefore, it
ends up with an attribute error.
Uses `.get()` to retrieve `stdout` with a default value so it won't fail
if this attribute doesn't exist (jewel).
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
In Jewel, we don't use bootstrap-rbd keyring for rbd-mirror nodes, it
results with a socket path/name different according to which ceph
release you are deploying.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit add new osd scenarios, it aims to simplify the CI setup and
brings a better coverage on the OSD scenarios.
We decided to differentiate between filestore and bluestore, thinking
ahead when filestore won't be supported anymore.
So we now have two classes of tests:
* Filestore
* Bluestore
In each of those classes we have container and non-container.
Then for each we test the following:
* collocated
* collocated dmcrypt
* non-collocated
* non-collocated dmcrypt
* auto discovery collocated
* auto discovery collocated dmcrypt
This gives us a nice coverage and also reduces the footprint on the CI.
We are now up to 4 scenarios, each containing 6 OSD VMs.
Signed-off-by: Sébastien Han <seb@redhat.com>
When doing collocation the condition "inventory_hostname in play_hosts"
is breaking the restart workflow.
Signed-off-by: Sébastien Han <seb@redhat.com>
- Add upgrade support for rbd mirror and nfs daemons.
- Only works with systemd (remove sysvinit and upstart occurence)
- A bit of cleanup
Signed-off-by: Sébastien Han <seb@redhat.com>
This will solve the following issue when starting docker containers on ubuntu:
invalid argument "1\u00a0" for --cpus=1 : failed to parse 1 as a rational number
Closes-bug: #2056
1. add the variables to docker_collocation
2. trigger the check when a MDS is part of the inventory file, not when
we run on an MDS...
Signed-off-by: Sébastien Han <seb@redhat.com>
The container deployment is serialized, adding this task as a best
effort. If docker is already present we pull the image otherwise we wait
for the role to play.
Signed-off-by: Sébastien Han <seb@redhat.com>
on jewel `ceph-rbd-mirror.target` isn't enabled, therefore, if the node
is rebooted, the service doesn't get started.
from ceph-rbd-mirror unit file:
```
[Install]
WantedBy=ceph-rbd-mirror.target
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This patch simplifies the checks and installation tasks for NTP.
Debian and Red Hat had a check for NTP's presence but would then
install NTP right afterwards anyways. In addition, there were
tasks for atomic that weren't used anywhere else in the role.
This patch also uses a dynamic include to reduce delays from
skipped tasks.
This patch changes the `when:` keys so that they have no jinja2
delimiters. This avoids Ansible warnings which could turn into
errors in a future Ansible release.
We now have a variable called ceph_pools that is mandatory when
deploying a MDS.
It's a dictionnary that contains a pool name and a PG count. PG count is
mandatory and must be set, the playbook will fail otherwise.
Closes: https://github.com/ceph/ceph-ansible/issues/2017
Signed-off-by: Sébastien Han <seb@redhat.com>
Ansible throws warnings when using yum/dnf/rpm with the command
module:
[WARNING]: Consider using yum module rather than running yum
This patch adds the `warn: no` argument to suppress the warnings
in the Ansible output.
The `always_run` key is deprecated and being removed in Ansible 2.4.
Using it causes a warning to be displayed:
[DEPRECATION WARNING]: always_run is deprecated.
This patch changes all instances of `always_run` to use the `always`
tag, which causes the task to run each time the playbook runs.
Modern versions of Ansible can handle a list of packages passed
directly to the package modules. This patch optimizes the package
install process by passing the list of packages directly to the
module.
This is causing unknown issues when trying to start a dmcrypt container.
Basically the container is stuck at mount opening the LUKS device. This
is still unknown why this is causing trouble but we need to move
forward. Also, this doesn't seem to help in any ways to fix the race
condition we've seen.
Here is the log for dmcrypt:
cryptsetup 1.7.4 processing "cryptsetup --debug --verbose --key-file
key luksClose fbf8887d-8694-46ca-b9ff-be79a668e2a9"
Running command close.
Locking memory.
Installing SIGINT/SIGTERM handler.
Unblocking interruption on signal.
Allocating crypt device context by device
fbf8887d-8694-46ca-b9ff-be79a668e2a9.
Initialising device-mapper backend library.
dm version [ opencount flush ] [16384] (*1)
dm versions [ opencount flush ] [16384] (*1)
Detected dm-crypt version 1.14.1, dm-ioctl version 4.35.0.
Device-mapper backend running with UDEV support enabled.
dm status fbf8887d-8694-46ca-b9ff-be79a668e2a9 [ opencount flush ]
[16384] (*1)
Releasing device-mapper backend.
Trying to open and read device /dev/sdc1 with direct-io.
Allocating crypt device /dev/sdc1 context.
Trying to open and read device /dev/sdc1 with direct-io.
Initialising device-mapper backend library.
dm table fbf8887d-8694-46ca-b9ff-be79a668e2a9 [ opencount flush
securedata ] [16384] (*1)
Trying to open and read device /dev/sdc1 with direct-io.
Crypto backend (gcrypt 1.5.3) initialized in cryptsetup library
version 1.7.4.
Detected kernel Linux 3.10.0-693.el7.x86_64 x86_64.
Reading LUKS header of size 1024 from device /dev/sdc1
Key length 32, device size 1943016847 sectors, header size 2050
sectors.
Deactivating volume fbf8887d-8694-46ca-b9ff-be79a668e2a9.
dm status fbf8887d-8694-46ca-b9ff-be79a668e2a9 [ opencount flush ]
[16384] (*1)
Udev cookie 0xd4d14e4 (semid 32769) created
Udev cookie 0xd4d14e4 (semid 32769) incremented to 1
Udev cookie 0xd4d14e4 (semid 32769) incremented to 2
Udev cookie 0xd4d14e4 (semid 32769) assigned to REMOVE task(2) with
flags (0x0)
dm remove fbf8887d-8694-46ca-b9ff-be79a668e2a9 [ opencount flush
retryremove ] [16384] (*1)
fbf8887d-8694-46ca-b9ff-be79a668e2a9: Stacking NODE_DEL [verify_udev]
Udev cookie 0xd4d14e4 (semid 32769) decremented to 1
Udev cookie 0xd4d14e4 (semid 32769) waiting for zero
Signed-off-by: Sébastien Han <seb@redhat.com>
in addition to c4dcdaa20 this commit adds the missing condition on
install tasks for debian_rhcs deployment. Without them, these tasks are
played on any kind of deployment.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
* DBus on host should include ganesha service file
* to allow ganesha container to respond on DBus it needs to run
in --privileged mode (ganesha folks contacted to look at this)
* ceph_nfs_include_exports_dir variable replaced with more general
ceph_nfs_dynamic_exports
This is something that has nothing to do in `ceph-common`, this
is too specific to `ceph-iscsi-gw` role.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This is something that has nothing to do in `ceph-common`, this
is too specific to `ceph-nfs` role.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Make role `ceph-mgr` handling itself the installation of `ceph-mgr`
package because it's complicated to manage it regarding we are going to
install `jewel vs. luminous`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
During the initial play, the docker command doesn't not exist and then
there is no stdout_lines to the command. So get allows us to fix this by
declaring an array if the command fails.
Signed-off-by: Sébastien Han <seb@redhat.com>
Use an intermediate variable to build the final `dedicated_devices` list
to avoid duplicate entry in that array. (We need a 1:1 relation between
`dedicated_devices` and `devices` since we are using a `with_together`
later.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
With jewel, `bootstrap_rbd_keyring` is not set because of this condition:
```
when:
- ceph_release_num.{{ ceph_release }} >= ceph_release_num.luminous
```
Therefore, the task `try to fetch ceph config and keys` will fail.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
* Change version from 2 to 3.
* use ceph_rhcs_cdn_debian_repo_version to use other repositories along
* with ceph_rhcs_cdn_debian_repo
Signed-off-by: Sébastien Han <seb@redhat.com>
`ceph_release` isn't available at this step of the playbook because it
is set later based on the installed binaries.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1486062
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The value of doing this is fairly low compare to the added value.
So we remove these tasks, if rbd pool on Jewel doesn't have the right PG
value you can always increase it.
Signed-off-by: Sébastien Han <seb@redhat.com>
All keyring are getting copied to all nodes.
This commit fixes a leftover from a previous code refactor.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1498583
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
iscsi-gw needs a 'rbd' pool to configure iscsi target.
Note: I could have used the facts already set in `ceph-mon` but I voluntarily
didn't do it to not create a dependancy between these two roles.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit refacts the code regarding all `set_osd_pool_default_*`
related tasks by avoiding usage of useless `set_fact` to determine
whether a key is present in `ceph_conf_overrides`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The rbd pool is the default pool that gets created during ceph cluster
initializaiton. If we act on the rbd related operations too early, the
rbd pool does not exist yet. Move the call to perform rbd operations
to a later stage after other pools have been created.
The rbd_pool.yml playbook has all the operations related to the rbd pool.
Replace the always_run (deprecated) directive with check_mode.
Most of the ceph related tasks only need to run once. The run_once directive
executes the task on the first host.
The ceph sub-command to delete a pool is delete (not rm).
The changes submitted here were tested with this ceph version.
ceph version 0.94.9-9.el7cp (b83334e01379f267fb2f9ce729d74a0a8fa1e92c)
This upload includes these changes:
- Use the fail module (instead of assert).
- From luminous release, the rbd pool is no longer created by default.
Delete the code to create the rbd pool for luminous release
- Conform the .yml files to use the suggested syntax.
The commands are executed on the mcp nodes and I think shell ansible module
is the right one to use. The command module is used to execute commands on
remote nodes. I can make the change to use command module if that is
prefrerred.
This is to ensure `docker_exec_cmd` fact is set with the correct value
in case of daemons collocation
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Also we now play ceph-config to have everything being generated for new
daemons bootstrap during upgrade.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1497959
Signed-off-by: Sébastien Han <seb@redhat.com>
generate_crt|bool|default(false) won't apply the default value, this
generate_crt|default(false)|bool will
Signed-off-by: Sébastien Han <seb@redhat.com>
The task which sets `ceph_current_fsid` in `facts.yml` in case of containerized
deployment, will definitely fail because `docker_exec_cmd` is not set
yet.
This commits simply makes `facts.yml` played after `check_socket.yml` so
`docker_exec_cmd` is set properly.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commits refacts the role ceph-mds
The goal here is to create cephfs in `ceph-mon` for both containerized
and non-containerized cases so we don't need the admin keyring on mds
nodes anymore.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1488999
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Using systemd module allows us to do in one task what we did in three
tasks:
- enable unit file,
- issue a `daemon-reload`,
- start the service
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Running the socket check on all the hosts will override the default
value of docker_exec_cmd, leaving it with the last value (currently
rbd-mirror), as a result the subsequent docker_exec_cmd usage for the
:x
Signed-off-by: Sébastien Han <seb@redhat.com>
There is a bug in the rbd mirror unit file, the upstream fix is here:
https://github.com/ceph/ceph/pull/17969.
This should be reverted once the patch is merged and backport is done.
Signed-off-by: Sébastien Han <seb@redhat.com>
This fixes the error :
```
The conditional check 'sestatus.stdout != 'Disabled'' failed.
```
that occurs when running on non rhel based system since the
`sestatus` fact is registered only on rhel based distribution.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Specify the timeout flag to ceph-create-keys, which causes it to time out
if a monitor quorum isn't achieved. This overrides the default timeout
of 10 minutes, causing ceph-ansible to fail faster in the event of cluster
network issues.
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
We have seen issues with leftover socker. So now, if a socket is found
we also check if it's accessed by a process. If so, we can run the
handler, if not we remove it and continue the playbook.
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
It's sad but we can not rely on the prepare container anymore since the
log are flushed after reboot. So inpecting the container does not return
anything.
Now, instead we use a ephemeral container to look up for the
journal/block.db/block.wal (depending if filestore or bluestore) and
build the activate command accordingly.
Signed-off-by: Sébastien Han <seb@redhat.com>
need to use `hostvars[host]['XXX']` to retrieve the monitor
interface and/or radosgw interface.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1493920
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
- move the file fetch/push to the existing task
- rename the include
- generate the ganesha template from ansible
- re-arrange role structure
- re-use tasks for non-container and container
- configure keys for non-container and container
- fix rgw container key collection;
Signed-off-by: Sébastien Han <seb@redhat.com>
the rbd key was not pushed on rbd nodes because its keyring path was not
added in `ceph_config_keys`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We generate the ceph.conf on all the nodes through the
ceph-docker-common so there is no need to push it to the Ansible file.
Also this is breaking the ceph.conf template generation since we only
generate sections based on the host the ansible task is running on.
For example, what's typically happening, we bootstrap the monitor, we
get a ceph.conf generated for a mon only, we go on an osd, we generate
the ceph.conf with osd section (done by ceph-docker-common) but this
gets overwritten by the copy_config task of the ceph-osd role.
Signed-off-by: Sébastien Han <seb@redhat.com>
- Change capitalization of config options to be
in line with what config.txt in the nfs-ganesha
tree says
Signed-off-by: Ali Maredia <amaredia@redhat.com>
RHCS install wasn't working at all prior to this commit as the name of
the include was pointing to a non-existing file.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492056
Signed-off-by: Sébastien Han <seb@redhat.com>
When ceph-nfs service is managed by pacemaker, it's useful to
not enable and start ceph-nfs service through systemd but let
pacemaker to start the service in a next step.
In analogy to ceph_nfs_rgw_user, we should be able to define a user
with which the nfs-ganesha Ceph FSAL connects to the cluster.
Introduce a ceph_nfs_ceph_user variable, setting its default to
"admin" (which preserves the prior behavior of always connecting as
client.admin).
Fixes#1910.
When Ansible is not run with verbose options it's difficult to see which
include and/or set_fact does what. So adding a name for each clarifies.
Signed-off-by: Sébastien Han <seb@redhat.com>
The variable "statleftover" was removed by commit
a60c74f61e
and never added back to the new playbook,
yet it is still being referenced.
Adding it back
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492224
Signed-off-by: Sébastien Han <seb@redhat.com>
On a container env, machines don't have any ceph binaries so we need to
use a container to run the commands.
Signed-off-by: Sébastien Han <seb@redhat.com>
Use default delay since the mon (in particular) can take more time to
restart.
Solves error with:
STDERR:
Error response from daemon: No such container: ceph-mon-mon0
Signed-off-by: Sébastien Han <seb@redhat.com>
All keys are copied to all nodes.
This commit split that task in each roles so keys are copied to their
respective nodes.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1488999
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Less configuration for the user, the container inherit from the global
variables. No more container specific variables.
Signed-off-by: Sébastien Han <seb@redhat.com>
In a collocated scenario, where you might put a rgw, a mds and a mon on
the same node you don't want the handler blindly restart all the daemons
on the node. Indeed some of them might not be configured yet.
Implementing a more precise socket detection, for each daemon type.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1488813
Signed-off-by: Sébastien Han <seb@redhat.com>
Prior to this patch this activation sequence for autodetection was
always skipped because we were asking to activate on device without
partitions, which doesn't make sense.
We also fix the way we lookup for a device, since the data partition is
always numbered 1, we take the min element of the dict.
Closes: https://github.com/ceph/ceph-ansible/issues/1782
Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add new boolean parameter for client config create_key_file_only
with a default of false. When create_key_file_only is true, the
client tasks to connect to the external ceph cluster to verify
the key `ceph auth import` the key are skipped.
Fixes: #1848
We must mask the image so we are sure that even if the system reboots
then the OSDs won't start.
Also remove Ceph udev rules if found on the system prior to deploy
containers. If we don't do this we are exposed to conflicts between udev
rules and sytemd unit files.
Also add the CI will now test the migration from a non-containerized cluster to a
containerized cluster.
Signed-off-by: Sébastien Han <seb@redhat.com>
The way we handle the restart for both mds and rgw is not ideal, it will
try to restart the daemon on the host that don't run the daemon,
resulting in a service file being created (see bug description).
Now we restart each daemon precisely and in a serialized fashion.
Note: the current implementation does NOT support multiple mds or rgw on
the same node.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1469781
Signed-off-by: Sébastien Han <seb@redhat.com>
This patch adds passing the RGW_CIVETWEB_IP to the docker
container. This IP defaults to the value of radosgw_civetweb_bind_ip.
radosgw_civetweb_bind_ip default to ipv4.default
Without this value, the RGW containter will bind to 0.0.0.0
Remove the old check prior systemd.
We only support systemd so there is no need to run a condition on
systemd. The playbook will fail if systemd is not present.
Signed-off-by: Sébastien Han <seb@redhat.com>
The installation process is now described as follow:
* you still have to choose a 'ceph_origin' installation method. The
origin can be a 'repository' (add a new repository), distro (it will use
the packages provided by the native repo source of your distribution),
local (only available on redhat system, it installs locally built
packages). This option is not well tested, so use it carefully
* if ceph_origin == 'repository' you will have to decide what kind of
repository you want to enable:
- community: corresponds to the stable upstream/community version
- enterprise: corresponds to the stable enterprise/downstream version
(basically you are a red hat customer)
- dev: it will install ceph from packages built out of the github
development branches
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The list can not be evaluated properly if it containers '[]', which is
the case when using the filter "default([])". To fix this, we have to
properly merge the lists.
This is fixing the issue: "list object has no element 1"
Signed-off-by: Sébastien Han <seb@redhat.com>
By detecting the ceph version running in the container we can easily
apply conditions like:
ceph_release_num.{{ ceph_release }} >= ceph_release_num.luminous
We do that already, in ceph-docker-common/tasks/fetch_configs.yml.
This fixes the error:
TASK [ceph-docker-common : register rbd bootstrap key]
******************************************************
fatal: [magna005]: FAILED! => {"failed": true, "msg": "The conditional
check 'ceph_release_num.{{ ceph_release }} >= ceph_release_num.luminous'
failed. The error was: error while evaluating conditional
(ceph_release_num.{{ ceph_release }} >= ceph_release_num.luminous):
'dict object' has no attribute 'dummy'\n\nThe error appears to have been
in
'/home/ubuntu/ceph-ansible/roles/ceph-docker-common/tasks/fetch_configs.yml':
line 2, column 3, but may\nbe elsewhere in the file depending on the
exact syntax problem.\n\nThe offending line appears to be:\n\n---\n-
name: register rbd bootstrap key\n ^ here\n"}
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1486062
Signed-off-by: Sébastien Han <seb@redhat.com>
Logging inside the container is not useful since it writes to the
overlayfs partition, resulting in potential performance degradation on
the container.
If you need to check the logs, just look at journald.
Signed-off-by: Sébastien Han <seb@redhat.com>
with_items is evaluated before the when condition so if the task that
registers the 'results' is skipped the task will fail with:
{"failed": true, "msg": "'dict object' has no attribute 'results'"}
Defaulting to an empty array fixes the issue.
Reverts: abdd66619e
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1482061
Signed-off-by: Sébastien Han <seb@redhat.com>
The script can fail to get the osd id because the osds are activated by
udev and it can take a while for them to activate. This commit fixes
that by trying to get all the osds per node in a loop.
This commit also makes the osd services enabled so that they are
available after reboot.
Signed-off-by: Boris Ranto <branto@redhat.com>
There should be no need to use sudo when writing or using these files.
It creates an issue when the user running ansible-playbook does not
have sudo privs.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Some deployments can't copy infrastructure playbooks outside of the
infrastructure-playbooks directory. Thus they use ANSIBLE_ROLES_PATH to
overcome this. However some roles have 'playbook_dir' hardcoded, which
results in wrong path since the execution comes from
infrastructure-playbooks. Basically the role triggered by a playbook
from infrastructure-playbooks believes that the roles are in
infrastructure-playbooks/roles. This commit fixes that.
Signed-off-by: Sébastien Han <seb@redhat.com>
This will give us more flexibility and the possibility to deploy a client node
for an external ceph-cluster.
related BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1469426Fixes: #1670
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Before this commit we were forcing ipv4 which might not be available.
Now setting ip_version to ipv4 or ipv6 will give you the right support.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1484189
Signed-off-by: Sébastien Han <seb@redhat.com>
This commit eases the use of the
infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml
playbook. We basically run it with a couple of pre-tasks and then we let
the playbook run the docker roles.
It obviously expect to have proper variables configured in order to
work.
Signed-off-by: Sébastien Han <seb@redhat.com>
The lvm_volumes variable is now a list of dictionaries that represent
each OSD you'd like to deploy using ceph-volume. Each dictionary must
have the following keys: data, journal and data_vg. Each dictionary also
can optionaly provide a journal_vg key.
The 'data' key represents the lv name used for the OSD and the 'data_vg'
key is the vg name that the given lv resides on. The 'journal' key is
either an lv, device or partition. The 'journal_vg' key is optional and
must be the vg name for the journal lv if given. This key is mainly used
for purging of the journal lv if purge-cluster.yml is run.
For example:
lvm_volumes:
- data: data_lv1
journal: journal_lv1
data_vg: vg1
journal_vg: vg2
- data: data_lv2
journal: /dev/sdc
data_vg: vg1
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Resolves issue: Multiple RGW Ceph.conf Issue #1258
In multi-RGW setup, in ceph.conf the RGW sections
contain identical bind IP in civetweb line. So this
modification fixes that issue and puts the right IP
for each RGW.
Signed-off-by: SirishaGuduru SGuduru@walmartlabs.com
Modified ceph-defaults and ran generate_group_vars_sample.sh
group_vars/osds.yml.sample and group_vars/rhcs.yml.sample are
not part of the changes. But they got modified when
generate_group_vars_sample.sh is ran to generate group_vars/
all.yml.sample.
Uncommented added variables in ceph-defaults
Updated tests by adding value for radosgw_interface
Added radosgw_interface to centos cluster tests
Modified ceph-rgw role,rebased and ran generate_group_vars_sample.sh
In ceph-rgw role removed check_mandatory_vars.yml.
Rebased on master.
Ran generate_group_vars_sample.sh and then the below files got
modified.
The rbd-mirror daemon will be HA under luminous and new daemon health
features require a way to uniquely identify rbd-mirror instances.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
To be properly evaluated the "skipped" conditions must always have the
first place on the list of condition, otherwise the other conditions are
evaluated before and make the task fail.
Closes: https://github.com/ceph/ceph-ansible/issues/1733
Signed-off-by: Sébastien Han <seb@redhat.com>
Problem: task "check for a ceph socket in containerized deployment" will
be skipped if we are not an OSD.
with_items are still evaluated before when conditions so if the task was
skipped the dict will be empty and then fail.
Adding a "not skipped" condition skips the execution of the task.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1482061
Signed-off-by: Sébastien Han <seb@redhat.com>
This commits force ceph-common to be installed early in deployment on
nodes.
For instance, ceph-rbdmirror doesn't have the CLI installed while it is
needed for some tasks which uses it to set some facts.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
ceph services can fail to start under certain circumstances (for
example, when running in a container) because the default systemd
service configuration causes namespace issues.
To work around this we can override the system service settings by
placing an overrides file in the ceph-<service>@.service.d directory.
This can be generic so as to allow any potential changes required to
the ceph-<service> service files.
The overrides file is only setup when the
"ceph_<service>_systemd_overrides" config_template override variable is
specified.
The available service systemd override files are as follows:
ceph_mds_systemd_overrides
ceph_mgr_systemd_overrides
ceph_mon_systemd_overrides
ceph_osd_systemd_overrides
ceph_rbd_mirror_systemd_overrides
ceph_rgw_systemd_overrides
The original fix to issue #1755 only set the permissions on
the monitors to which the key was copied, but not the original
monitor where the key was created. Thus, we use a separate task
to set the permission of the key.
The openstack_keys structure now supports a key called mode
whose value is a string that one could pass to chmod to set
the mode of the key file. The ansible file module applies the
mode to all openstack keys with this property.
Fixes: #1755
This is under the MDS role instead of the mon role because that role
does not create the filesystem under docker.
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
This scenario will create OSDs using ceph-volume and is only available
in ceph releases greater than Luminous.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
There is only two main scenarios now:
* collocated: everything remains on the same device:
- data, db, wal for bluestore
- data and journal for filestore
* non-collocated: dedicated device for some of the component
Signed-off-by: Sébastien Han <seb@redhat.com>
in containerized deployment, if you try to update your `ceph.conf` file
it won't be actually updated on your nodes because it is overwritten by
the copy of the file which is present in your fetch directory.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Move `fsid`,`monitor_name`,`docker_exec_cmd` and `ceph_release` set_fact
to `ceph-defaults` role.
It will allow to reuse these facts without having to play `ceph-common`
or `ceph-docker-common`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This will give us more flexibility and avoid a lot of useless when
skipping all tasks from a non-desired role.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add a new role `ceph-defaults`.
This role aims to handle all defaults vars for `ceph-docker-common` and
`ceph-common` and set basic facts (eg. `fsid`)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Merge `ceph-docker-common` and `ceph-common` defaults vars in
`ceph-defaults` role.
Remove redundant variables declaration in `ceph-mon` and `ceph-osd` roles.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We don't want to have heterogeous ceph.conf anymore and believe that we
should have the right section for the running daemon.
If we don't do this and use profiles, e.g: rgw, we will get a new rgw
section on some of the nodes.
Signed-off-by: Sébastien Han <seb@redhat.com>
It is mandatory now to set the Ceph version you want to install, e.g:
ceph_stable_release: luminous
To find the release names, you can look at the release not doc:
http://docs.ceph.com/docs/master/release-notes/
Signed-off-by: Sébastien Han <seb@redhat.com>
ceph-disk is responsable for enabling the unit file if needed. Actually
since https://github.com/ceph/ceph/pull/12241 it seems that it's not
even needed. On an event of a restart, udev rules will be trigger and
they will ceph-disk activate the device too so the 'enabled' is not
needed.
Closes: https://github.com/ceph/ceph-ansible/issues/1142
Signed-off-by: Sébastien Han <seb@redhat.com>
OSDs get started by ceph-disk before the ceph.conf file is written
with a crush location. That results in a crush map without configured
crush location.
To prevent this, we have to restart the OSDs during the initial setup
after the crush location was added to the ceph.conf file.
We have multiple issues with ceph-disk's cli with bluestore and Ceph
releases. This is mainly due to cli changes with Luminous. Luminous
introduced a --bluestore and --filestore options which respectively does
not exist on releases older than Luminous. The default store being
bluestore on Luminous, simply checking for the store is not enough so we
have to build a specific command line for ceph-disk depending on the
Ceph version we are running and the desired osd_store.
Signed-off-by: Sébastien Han <seb@redhat.com>
The keys and openstack_keys structure now supports an optional
key called acls whose value is a list of strings one could pass
to setfacl. The ansible ACL module applies the ACLs to all
openstack keys with this property.
Fixes: #1688
Remove `rgw enable static website` and `rgw enable usage log` from
ceph.conf and make it usable with ceph_config_overrides as profiles.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit introduces a new directory called "profiles" which
contains some set of variables for a particular use case. These profiles
provide guidance for certain scenarios such as:
* configuring rgw with keystone v3
Signed-off-by: Sébastien Han <seb@redhat.com>
According to Alfredo, this was used for gitbuilders. Right now shaman/chacra
dev repos are unsigned.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Move condition at task level and not at include level to make `fsid`
variable available for all roles.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Some tasks fetch file to `{{ fetch_directory }}/docker_mon_files` and
then try to copy from `{{ fetch_directory }}/{{ fsid }}`. That causes
the playbook to fail.
Fixes: #1683
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
To keep consistency between `{{ openstack_keys }}` and `{{ keys }}`
respectively in `ceph-mon` and `ceph-client` roles.
This commit also add the possibility to set mds caps.
Fixes: #1680
Co-Authored-by: John Fulton <johfulto@redhat.com>
Co-Authored-by: Giulio Fidente <gfidente@redhat.com>
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The ceph.conf template needs to look for the value of monitor_interface
in hostvars[host] because there might be different values set per host.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
In Luminous, ceph-disk defaults to bluestore so all our scenarios are
using bluestore, we need to force testing both.
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
In addition to ceph/ceph-docker@69d9aa6, this explains how to deploy a
containerized cluster with a custom admin secret.
Basically, just need to pass the `admin_secret` defined in your
`group_vars/all.yml` to the `ceph_mon_docker_extra_env` variable.
Eg:
`ceph_mon_docker_extra_env: -e CLUSTER={{ cluster }} -e FSID={{ fsid }}
-e MON_NAME={{ monitor_name }} -e ADMIN_SECRET={{ admin_secret }}`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Fail with a sane message if the devices or raw_journal_devices variables
are strings instead of lists during manual device assignment.
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
rsync is required by the ansible synchronize package. Ensure
it is installed when local installation is selected.
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
Add a new parameter `admin_secret` that allow to deploy a ceph cluster
with a custom admin secret.
Fix: #1630
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commits refactors how we deploy bluestore. We have existing
scenarios that we don't want to change too much. This commits eases the
user experience by now changing the way you use scenarios. Bluestore is
just a different interface to store objects but the scenarios more or
less remain the same.
If you set osd_objectstore == 'bluestore' along with
journal_collocation: true, you will get an OSD running bluestore with DB
and WAL partitions on the same device.
If you set osd_objectstore == 'bluestore' along with
raw_multi_journal: true, you will get an OSD running bluestore with a
dedicated drive for the rocksdb DB, then the remaining
drives (used with 'devices') will have WAL and DATA collocated.
If you set osd_objectstore == 'bluestore' along with
raw_multi_journal: true and declare bluestore_wal_devices you will get
an OSD running bluestore with a dedicated drive for rocksdb db, a
dedicated drive partition for rocksdb WAL and a dedicated drive for
DATA.
Signed-off-by: Sébastien Han <seb@redhat.com>
There is no need for 2 variables to enable bluestore, prior to this
patch one had to do the following to activate bluestore:
osd_objectstore: bluestore
bluestore: true
Now you just need to set `osd_objectstore: bluestore`.
Fixes: https://github.com/ceph/ceph-ansible/issues/1475
Signed-off-by: Sébastien Han <seb@redhat.com>
remove `ceph_mon_docker_interface` and use `monitor_interface` instead
for both containerized and non-containerized deployment.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Some variables are missing from ceph-docker-common role since the
include of check_mandatory_vars.yml has been re-added in the ceph-mon
role.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The check regarding the networking scenario configuration has been
moved from ceph-common to ceph-mon in 1de8176 but the include was not re-added
in 189f4fe
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
e8187f6 does not fix the ipv6 as expected since `ansible_default_*` are
filled with the IP address carried by the network interface used by the
default gateway route. By the way, it assumes that the MON_IP address will
be this IP address which is not always the case.
We need to keep using the previous fact but add some intelligence in the
template to determine how to retrieve the ipv4|ipv6 address since the path
to the fact in `hostvars` is not the same according to ipv4 vs ipv6 case.
Fix: 1569
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add an extra variable to the openstack pools, which creates them with
defined rules. This will allow to place different pools on e.g.
different type of disks.
This commit will also set a new default rule when defined and move
the rbd pool to the new rule.
Somehow the shell module will return an error if the command line is not
next to it.
Plus fixed the import with the right path.
Signed-off-by: Sébastien Han <seb@redhat.com>
Followup on https://github.com/ceph/ceph-ansible/pull/1469 where we
merged most of the container code from roles/ceph-*/task/docker/*.yml
into roles/ceph-docker-common/tasks/
It seems that we forgot to remove the original files.
Signed-off-by: Sébastien Han <seb@redhat.com>
The new test in the checks PGs are no longer working on distributions
where /bin/sh isn't linked to /bin/bash.
Fix: #1619
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
OpenStack's Gnocchi service expects to have a pool called "metrics".
This change addess "metrics" to the list of `openstack_pools` and
creates a corresponding key. It is only run if the user sets
`openstack_config: false`.
The current handler only restarts one OSD on each OSD server. After
the first one the handler stops, not matter what results the checks had.
Co-Authored-By: Gaudenz Steinlin (@gaudenz)
Remove "osd mkfs type" and the other pre-Bluestore parameters from the
generated ceph.conf so that disk activation on OSDs will work. The
current default xfs config results in a failed deployment and
incorrect partition metadata.
For newly created cluster the command: ceph --cluster {{ cluster }} osd
pool get rbd size does not respond properly.
We only want to check if the rbd pool exists, so we know use an ls |
grep approach.
Closes: https://github.com/ceph/ceph-ansible/issues/1547
Signed-off-by: Sébastien Han <seb@redhat.com>
Rewrite the check_pgs by using json parsing instead of complex regexp to
parse the `ceph -s` output.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add a default value for `ceph_docker_on_openstack` to avoid a
conditional check error for the task `pause after docker install before starting` in
`roles/ceph-docker-common/tasks/pre_requisites/prerequisites.yml`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The fact ['ansible_$interface']['ipv4'] is a dictionary where
['ansible_$interface']['ipv6'] is a list. If we use
ansible_default_ipv6|ipv4 is is always a dictionary which allows us to
get the ipv6 and ipv4 address without adding more complexity to the
template.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
For some reason we changed the check of pgs but it appears it could be
dangerous because the current check might satisfied as long as 1 PG is
active+clean.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
since `-e CEPH_DAEMON=OSD_CEPH_DISK_ACTIVATE` is already hardcoded in
`eph-osd-run.sh.j2` there is no need to add `-e
CEPH_DAEMON=OSD_CEPH_DISK_ACTIVATE` as a default value in defaults vars.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
`ceph-docker-common`:
At the moment there is a lot of duplicated tasks in each
`./roles/ceph-<role>/tasks/docker/main.yml` that could be refactored in
`./roles/ceph-docker-common/tasks/main.yml`.
`*_containerized_deployment` variables:
All `*_containerized_deployment` have been refactored to a single
variable `containerized_deployment`
duplicate `cephx` variables in `group_vars/* have been removed.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We only check for everything expect 'distro' because that
is a valid way of deploying RHCS, with preprepared repos
present on the nodes.
Signed-off-by: Sébastien Han <seb@redhat.com>
Problem: we could end up in situation where we would install a package
on a machine that does not have the right repo enabled. Because the
condition was set to OR we weren't pinning a particular host but just a
condition. Let's say someone sets 'ceph_origin == "distro"', this would
try to install OSD packages on Monitors.
Solution: use a AND condition to first pin to the group_name (which
identifies a set of hosts) AND then after this one of the installation
condition.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1453119
Co-Authored-By: https://github.com/zhsj
Signed-off-by: Sébastien Han <seb@redhat.com>
Problem: fail to deploy a containerized Ceph cluster with ipv6
Solution: do not hardcode ipv4 when bootstrapping the container.
Now use ip_version: ipv6 to get a containerized cluster deployed with
ipv6.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1451786
Signed-off-by: Sébastien Han <seb@redhat.com>
In addition to `196fa7e` this commit check if a container has been
already launched and delete it before retrying the ceph osd prepare
process.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The CI on Docker is reporting the following error:
STDERR:
Error EINVAL: bad entity name
This is due to the fact that this auth entity name does not exist on
Jewel so we should not create that key when running Jewel containers.
Fixes: https://github.com/ceph/ceph-ansible/issues/1514
Signed-off-by: Sébastien Han <seb@redhat.com>
Already documented in the Red Hat Ceph Storage 2 Installation Guide
for Red Hat Enterprise Linux, but not here
Signed-off-by: Florian Klink <flokli@flokli.de>
"rgw override bucket index max shards" and
"rgw bucket default quota max objects" were in the
client section of the ceph.conf and not being
applied, this commit moves them to global
Resolves: bz#1391500
Signed-off-by: Ali Maredia <amaredia@redhat.com>
We shouldn't need this anymore as the upgrade bug that
debian_ceph_packages was used to workaround should have
been fixed as of jewel.
See https://github.com/ceph/ceph-ansible/issues/1481 for more
detailed information.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Change civetweb_num_thread default to 100
Add capability to override number of pgs for
rgw pools.
Add ceph.conf vars to enable default bucket
object quota at users choosing into the ceph.conf.j2
template
Resolves: rhbz#1437173
Resolves: rhbz#1391500
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Restore the check_socket that was removed by `5bec62b`.
This commit also improves the logging in `restart_*_daemon.sh` scripts
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Prior to this change, ansible was only checking for the existence of the
package, now if upgrade_ceph_packages is true this means we are
performing an upgrade.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1442016
Signed-off-by: Sébastien Han <seb@redhat.com>
Proof-of-concept clusters or actual production clusters will never want to use this. We also do not test it anywhere for this same reason.
Signed-off-by: Gregory Meno <gmeno@redhat.com>
This is to allow ceph-mgr daemons to remote control
osd and mds daemons with MCommand messages.
Fixes: http://tracker.ceph.com/issues/19713
Signed-off-by: John Spray <john.spray@redhat.com>
Without this, we don't test the mgr role so we need to add it.
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
This is the same fix as bc846b7da6
applied to the other part of the code-base that builds ceph.conf (I'd
missed that 349b9ab3e7 had duplicated
this code).
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
Ansible's assemble module by default will put all files in the src
directory together into dest. We only want to put {{ cluster }}.conf
and osd.conf together, not anything that might have found its way into
/etc/ceph/ceph.d (e.g. files left by the sysadmin taking backups
before an ansible run). So specify a regexp that matches only those
two files.
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
Ansible evaluates the 'with_items' before the 'when' so if the inventory
does not have the group declared it'll fail. To fix this, we set an
empty array to make the with_items happy and then evaluate with the
'when'.
Signed-off-by: Sébastien Han <seb@redhat.com>
Prior to this change we were deploying a monitor using tis fqdn name but
we were checking its state and performing actions on it using its
shortname.
Signed-off-by: Sébastien Han <seb@redhat.com>
The Ceph Manager daemon (ceph-mgr) runs alongside monitor daemons, to
provide additional monitoring and interfaces to external monitoring and
management systems.
Only works as of the Kraken release.
Co-Authored-By: Guillaume Abrioux <gabrioux@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
ceph-create-keys unit file was removed here:
* 8bcb4646b6
* dc5fe8d415
As a consequence the systemctl preset command now fails to run since the
unit does not exist anymore. Due to the redirection in /dev/null we
don't know what's happening.
Ultimately the mon unit doesn't get enabled and the mon service won't
start after reboot.
Removing the old/non-existent unit makes the command succeed now.
ceph fix: https://github.com/ceph/ceph/pull/14226
Signed-off-by: WingkaiHo <sanguosfiang@163.com>
Co-Authored-By: Sébastien Han <seb@redhat.com>
Until now, only the first task were executed.
The idea here is to use `listen` statement to be able to notify multiple
handler and regroup all of them in `./handlers/main.yml` as notifying an
included handler task is not possible.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
As reported in
https://github.com/ceph/ceph-ansible/issues/1403 when devices are held
by lvm and `osd_auto_discovery` is set to true, it's not enough to check
for a partition count = 0 since Ansible does not report.
This patch also looks for 'holders' which in a case of lvm corresponds
to the name of the pv. Now we also look for holders = 0.
Fixes: #1403
Signed-off-by: Sébastien Han <seb@redhat.com>
Problem: too many different commands to do the same thing. The 'cut'
command on infrastructure-playbooks/purge-cluster.yml was also wrong.
This sed command from osixia in ceph-docker
https://github.com/ceph/ceph-docker/pull/580/ addresses all the
scenarios.
Signed-off-by: Sébastien Han <seb@redhat.com>
ntp is still installed even if ntp_service_enabled is set to false.
That could be a problem if the time synchronization is managed by
something else than ceph-ansible or if you want to use different NTP
implementation as suggested in #1354.
Fixes: #1354
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Signed-off-by: Guits <gabrioux@redhat.com>
If a group of hosts is empty, (for instance 'mdss', in case of a
deployment without any mds node), the playbook will fails when trying
to restart service with `"'dict object' has no attribute u'XXX'"` error.
The idea here is to force the `with_items` statements in all included handler tasks
to get at least an empty array.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
systctl tuning should be in the sysctl.d directory. This creates
a seperation from what values were set specific to ceph, and what
values were set by the operator.
Signed-off-by: Tyler Brekke <tbrekke@redhat.com>
With ' in osd_crush_location, systemd will show this error:
ceph-osd-prestart.sh[2931]: Invalid command: invalid chars ' in 'root=
Signed-off-by: Christian Zunker <christian.zunker@codecentric.de>
This fixes issue #1299. According to @ktdreyer s comment in the ticket,
he fixed the web server config so also older (non-SNI) python clients
can use the uri module here.
After the jewel release the mon startup does not generate keys, but it's
still harmless to call ceph-create-keys with jewel because this task has
a 'creates' argument that will cause it not to run if the keys already
exist.
Removing this when condition also allows the downstream CI tests to
install kraken or luminous without resetting ceph_stable_release, which does not
pertain to rhcs.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
This is not only for monitors, but also mds, rgw and rbd mirror so
making the var name more generic:
ceph_docker_enable_centos_extra_repo
Signed-off-by: Sébastien Han <seb@redhat.com>
Sometimes the socket appears during the 5th attempt and sometimes not so
increasing the timeout a little bit.
Signed-off-by: Sébastien Han <seb@redhat.com>
Add the possibility to create openstack pools and keys even for containerized deployments
Fix: #1321
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This removes the implicit order requirement when using OSD fragments.
When you use OSD fragments and ceph-osd role is not the last one,
the fragments get removed from ceph.conf by ceph-common.
It is not nice to have this code at two locations, but this is
necessary to prevent problems, when ceph-osd is the last role as
ceph-common gets executed before ceph-osd.
This could be prevented when ceph-common would be explicitly called
at the end of the playbook.
Signed-off-by: Christian Zunker <christian.zunker@codecentric.de>
This option was missing for rrgw, mds, rbd mirror and nfs making these
daemon impossible to run on a kv deployment with containers.
Signed-off-by: Sébastien Han <seb@redhat.com>
Prior to this change, ceph-ansible would install the main NFS Ganesha
server daemon on Ubuntu, but it would skip the Ceph FSALs.
Running "apt-get install nfs-ganesha" will only install the main NFS Ganesha
server. It does *not* pull in the RGW FSAL
(/usr/lib/x86_64-linux-gnu/ganesha/libfsalrgw.so)
Running "apt-get install nfs-ganesha-fsal" will install the RGW FSAL as
well as the main NFS Ganesha server package.
Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
This patch introduces calamari_debug option which will turn on debugging
for calamari before initializing and running it.
Signed-off-by: Boris Ranto <branto@redhat.com>
From Josh Durgin, "I'd recommend not setting vfs_cache_pressure in
ceph-ansible. The syncfs issue is still there, and has caused real
problems in the past, whereas there hasn't been good data showing lower
vfs_cache_pressure is very helpful - the only cases I'm aware of have
shown it makes little difference to performance."
https://bugzilla.redhat.com/show_bug.cgi?id=1395451
Install package from official repos rather than pip when using RHEL.
This commit fix https://bugzilla.redhat.com/show_bug.cgi?id=1420855
Also this commit Refact all `roles/ceph-*/tasks/docker/pre_requisite.yml`
to avoid a lot of duplicated code.
Fix: #1303
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This was needed for Hammer and older version, not needed anymore since
we have a 'ceph' user to run ceph processes.
Signed-off-by: Sébastien Han <seb@redhat.com>
Check if ceph filesystem already exists before creating it.
If the ceph filesystem doesn't exist, execute the task only on one node.
Fix: #1314
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Since distro will not allow /usr/share to be writable (e.g: atomic) so
we let the operator decide where to put that script.
Signed-off-by: Sébastien Han <seb@redhat.com>
Oh yeah! This patch adds more fine grained control on how we run the
activation osd container. We now use --device to give a read, write and
mknodaccess to a specific device to be consumed by Ceph. We also use
SYS_ADMIN cap to allow mount operations, ceph-disk needs to temporary
mount the osd data directory during the activation sequence.
This patch also enables the support of dedicated journal devices when
deploying ceph-docker with ceph-ansible.
Depends on https://github.com/ceph/ceph-docker/pull/478
Signed-off-by: Sébastien Han <seb@redhat.com>
As of Infernalis, the Ceph daemons run as an unprivileged "ceph" UID,
and this is by design.
Commit f19b765 altered the default
civetweb port from 80 to 8080 with a comment in the commit log about
"until this gets solved"
Remove the comment about permissions on Infernalis, because this is
always going to be the case on the Ceph versions we support, and it
is just confusing.
If users want to expose civetweb to s3 clients using privileged TCP
ports, they can redirect traffic with iptables, or use a reverse proxy
application like HAproxy.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
This avoids a situation where during a rolling_update we try to talk to
a mon to get the fsid and if that mon is down the playbook hangs
indefinitely.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
This gives us more flexibility than installing the ceph-release package
as we can easily use different mirrors. Also, I noticed an issue when
upgrading from jewel -> kraken as the ceph-release package for those
releases both have the same version number and yum doesn't know to
update anything.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
To configure kernel the task is using "command" module which is not
respect operator ">". So this task just print to "stdout": "never >
/sys/kernel/mm/transparent_hugepage/enabled"
fix: #1319
Signed-off-by: Sébastien Han <seb@redhat.com>
Some playbooks use [0-9]*, others use \d+$
The latter is more correct since cluster name may contain numbers.
Signed-off-by: Shengjing Zhu <zsj950618@gmail.com>
So unit files were stored in /var/lib/ceph some where in
/etc/systemd/system. Now they are all under /etc/systemd/system.
closes: #1296
Signed-off-by: Sébastien Han <seb@redhat.com>
If cephx is disabled it is not necessary to include `facts_mon_fsid.yml`
in `roles/ceph-common/tasks/facts.yml`.
Fix: #1300
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We changed the way we declare image.
Prior to this patch we must have a "user/image:tag"
format, which is incompatible with non docker-hub registry where you
usually don't have a "user". On the docker hub a "user" is also
identified as a namespace, so for Ceph the user was "ceph".
Variables have been simplified with only:
* ceph_docker_image
* ceph_docker_image_tag
1. For docker hub images: ceph_docker_name: "ceph/daemon" will give
you the 'daemon' image of the 'ceph' user.
2. For non docker hub images: ceph_docker_name: "daemon" will simply
give you the "daemon" image.
Infrastructure playbooks have been modified as well.
The file group_vars/all.docker.yml.sample has been removed as well.
It is hard to maintain since we have to generate it manually. If
you want to configure specific variables for a specific daemon simply
edit group_vars/$DAEMON.yml
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1420207
Signed-off-by: Sébastien Han <seb@redhat.com>
We shouldn't test directly the value of
`ceph_conf_overrides.global.osd_pool_default_pg_num` because this can
cause the playbook to fail if the key `global` is not present in
`ceph_conf_overrides`. Therefore we have to use the facts that have been
defined earlier.
Fix: #1242
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
On ubntu systems mkdir is in /bin where on atomic it is /usr/bin/.
We use the shell built-in function "command" to find its right location.
Signed-off-by: Sébastien Han <seb@redhat.com>
Since we now only support systemd has an init system we can finally
treat containers as processes using systemd and this for all the
distros.
Signed-off-by: Sébastien Han <seb@redhat.com>
This commits allows us to restart Ceph daemon machine by machine instead
of restarting all the daemons in a single shot.
Rework the structure of the handler for clarity as well.
Signed-off-by: Sébastien Han <seb@redhat.com>
According to #1216, we need to simply the code by removing the
support of anything before Jewel.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Some users purge their environments and leave it in a non-optimal state.
e.g: packages are still installed but /etc/ceph and /var/lib/ceph don't
exist anymore. This will result in multiple failures across the play,
sometimes hard to detect. Populating these directories "just in case"
should help us solving these problems.
Closes: #1253
Signed-off-by: Sébastien Han <seb@redhat.com>
Sometimes users for testing, tend to delete the whole /var/lib/ceph and
then run ansible again, OSD will never come up if we do not create their
directory.
Signed-off-by: Sébastien Han <seb@redhat.com>
This patch makes sure we set the proper pool size on the rbd pool.
Usually during bootstrap the rbd pool size is not honoured so we need to
add this workaround.
Signed-off-by: Sébastien Han <seb@redhat.com>
This allows the user to set ip_version to either ipv4 or ipv6. This
resolves a bug where monitor_address is set to an ipv6 address, but the
template fails to render because it's hardcoded to look for an 'ipv4'
key in the ansible facts.
See: https://bugzilla.redhat.com/show_bug.cgi?id=1416010
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Resolves: bz#1416010
could have scenario where different openstack components would
use the same pool, but the logic would create the same pool
more than once
add unique filter to account for this
Allow for more operator flexibility in the `rgw frontends` setting
while maintaining backwards compatibility with the old vars. This
allows an operator to, for example, use the civetweb settings for
implementing SSL ports.
For available civetweb configuration parameters, see:
https://github.com/civetweb/civetweb/blob/master/docs/UserManual.md
It is not enough to check for the mds to exists, it actually always does
because we declare the variable. So we need to make sure that there is a
mds host.
Signed-off-by: Sébastien Han <seb@redhat.com>
Since we introduced config_overrides we removed a lot of options from
the default template. In some cases, like mds pool, openstack pools etc
we need to know the amount of PGs required. The idea here is to skip the
task if ceph_conf_overrides.global.osd_pool_default_pg_num is not define
in your `group_vars/all.yml`.
Closes: #1145
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-Authored-By: Guillaume Abrioux <gabrioux@redhat.com>
This allows for the role to be used with ansible-galaxy and to fix the
include in all the meta/main.yml files in the roles.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
The libcephfs1 package was removed from ceph-common in
cb1c06901e, however it was not synced
to group_vars/all.yml.sample using the `generate_group_vars_sample.sh`
script. This fixes up the comment formatting in the ceph-common
defaults and brings the group_vars sample back into sync.
Prior to this change, a playbook run with '--tags' or '--skip-tags'
would fail, because the ceph-common role would not include the
release.yml task, and this file defines critical things like
ceph_release.
Thanks Andrew Schoen <aschoen@redhat.com> for help with the fix.
Task put initial mon keyring in mon kv store from
ceph-mon/tasks/ceph_keys.yml is failing when cephx is disabled. The root
cause is that variable monitor_keyring is not populated by any task from
deploy_monitors.yml.
Fixes: #1211
Signed-off-by: Sébastien Han <seb@redhat.com>
Prior to this patch we had several ways to runs containers, we could use
ansible's docker module on some distro and on containers distros we were
using systemd. We strongly believe threating containers as services with
systemd is the right approach so this patch generalizes to all the
distros. These days most of the distros are running systemd so it's fair
assumption.
Signed-off-by: Sébastien Han <seb@redhat.com>
Once we have our first monitor up and running we need to add it to the
monitor store as a safety measure. Just in case the local file gets
deleted and you need to add a new monitor. Now you can retrieve this key
like this:
ceph config-key get initial_mon_keyring > initial_mon_keyring.txt
Signed-off-by: Sébastien Han <seb@redhat.com>
There is no need to become root on local_action. This will event trigger
an error on some systems as it will try to run a sudo command. If the
current user does not have passwordless sudo, Ansible will fail. Anyway
using the current user is perfectly fine and no elevation privilege is
needed.
Signed-off-by: Sébastien Han <seb@redhat.com>
The Keystone v2 APIs are deprecated and scheduled to be removed in
Q release of Openstack. This adds support for configuring RGW to
use the current Keystone v3 API.
The PKI keys are used to decrypt the Keystone revocation list when
PKI tokens are used. When UUID or Fernet token providers are used in
Keystone, PKI certs may not exist, so we now accommodate this scenario
by allowing the operator to disable the PKI tasks.
Jewel added support for user/pass authentication with Keystone,
allowing deployers to disable Keystone admin token as required
for production deployments.
This implements configuration for the new RGW Keystone user/pass
authentication feature added in Jewel.
See docs here: http://docs.ceph.com/docs/master/radosgw/keystone/
Just for clarity and because we can we now show the name of the
ceph configuration file that is generated.
Signed-off-by: Sébastien Han <seb@redhat.com>
This commit solves the situation where you lost your fetch directory and
you are running ansible against an existing cluster. Since no fetch
directory is present the file containing the initial mon keyring
doesn't exist so we are generating a new one.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We do not need to run another condition for 'ceph_rhcs' since the
include we came from already has it, so we are already inside this
condition.
We also spell red hat entirely instead of rh and we remove capital
letters.
Signed-off-by: Sébastien Han <seb@redhat.com>
When `ceph_stable_rh_storage` is True, every cluster node should have a
`/etc/apt/preferences.d/rhcs.pref` file with the following contents:
```
Explanation: Prefer Red Hat packages
Package: *
Pin: release o=/Red Hat/
Pin-Priority: 999
```
ceph-deploy already did this when used with ice-setup, and we need to do
the same thing with the ceph-ansible stack.
Closes: #1182 and https://bugzilla.redhat.com/show_bug.cgi?id=1404515
Signed-off-by: Sébastien Han <seb@redhat.com>
Only when ceph_origin == "upstream", install_on_redhat.yml will include
redhat_ceph_repository.yml, same as debian.
In redhat_ceph_repository.yml, ceph_custom_repo will be added.
But in check_mandatory_vars.yml, ceph_origin=="upstream" can't be combined
with ceph_custom
If previous check was not run, .stdout_lines is not a valid key on the dictionary.
To get around this, use .get("stdout_lines") instead.
Also add in a default empty list
in hammer, ceph-common depended on libcephfs (indirectly, via
python-cephfs). this is no longer the case in jewel or later, so it can
be removed from debian_ceph_packages
Signed-off-by: Casey Bodley <cbodley@redhat.com>
For readibility and clarity we do not run any tasks directly in the
main.yml file. This file should only contain include, which helps us
later to apply conditionnals if we want to.
Signed-off-by: Sébastien Han <seb@redhat.com>
This commit re-uses some of the existing ceph-ansible variables for a
containirzed deployment. There is no reasons why we should add new
variables for the containerized deployment.
Signed-off-by: Sébastien Han <seb@redhat.com>
mon_group_name variable can be used to override mons group, but
this task assumes the group is always 'mons'. So we need to use
the var to find the group name instead.
Before this patch only the address for the first mon would show
in the ceph.conf even if there were multiple mons in the inventory.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Once the monitor process starts it will also trigger `ceph-create-keys`
which will collect the admin key and bootstrap keys. We used to force
this command because we were having issues on some distros like centos
7.0 and 7.1 not triggering this. This is fixed on centos 7.2 and not an
issue on ubuntu 14.04 or 16.04 so we can remove this task. If the
monitor hangs or fails to start the playbook will fail right after at
the "wait for client.admin key exists" task after 300sec.
Closes: #1161
Signed-off-by: Sébastien Han <seb@redhat.com>
This commit solves the situation where you lost your fetch directory and
you are running ansible against an existing cluster. Since no fetch
directory is present the file containing the fsid doesn't exist so we
are creating a new one. Later the ceph.conf gets updated with a wrong
fsid which causes problems for clients and ceph processes.
Closes: #1148
Signed-off-by: Sébastien Han <seb@redhat.com>
Adding that avoids this bug:
https://github.com/ansible/ansible/issues/18206
Without that you'll get failures like:
TASK [ceph-mon : set keys permissions]
*****************************************
task path:
/home/andrewschoen/ceph-ansible/roles/ceph-mon/tasks/ceph_keys.yml:31
fatal: [mon0]: FAILED! => {"failed": true, "msg": "'dict object' has no attribute 'stdout_lines'"}
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
We removed the "apache" setting for "radosgw_frontend" in
adfdf6871e.
As part of that change, we removed the final references to
ceph-extra.repo, but I failed to clean up this file itself.
Now that nothing uses this file, delete it.
This file contained the sole reference to redhat_distro_ceph_extra, so
we can drop that variable as well.
Refactor the code using 'package' module
Fix Issue #520
(However it doesn't cover all cases because some cases are not refactorable.
Ex: because of diverging packages name between distribution)
a397922 introduced a syntax error by attempting to default an unquoted
string, which causes execution failures on some ansible versions with:
Failed to template {{ ceph_rhcs_mount_path }}: Failed to template {{ ceph_stable_rh_storage_mount_path | default(/tmp/rh-storage-mount) }}: template error while templating string: unexpected '/'
libfcgi is dead upstream (http://tracker.ceph.com/issues/16784)
The RGW developers intend to remove libfcgi support entirely before the
Luminous release.
Since libfcgi gets little-to-no developer attention or testing, remove
it entirely from ceph-ansible.
Ansible task was not properly fetching OSD cluster keyring causing
the keyring to be missing when we needed to authenticate. Similarly, we
were not properly waiting on the OSD keyring to be available before
continuing.
Signed-off-by: Ivan Font <ifont@redhat.com>
- Update rolling update playbook to support containerized deployments
for mons, osds, mdss, and rgws
- Skip checking if existing cluster is running when performing a rolling
update
- Fixed bug where we were failing to start the mds container because it
was missing the admin keyring. The admin keyring was missing because
it was not being pushed from the mon host to the ansible host due to
the keyring not being available before running the copy_configs.yml
task include file. Now we forcefully wait for the admin keyring to be
generated before continuing with the copy_configs.yml task include file
- Skip pre_requisite.yml when running on atomic host. This technically
no longer requires specifying to skip tasks containing the with_pkg tag
- Add missing variables to all.docker.sample
- Misc. cleanup
Signed-off-by: Ivan Font <ifont@redhat.com>
We have a fact that detects the package manager, so we can detect if
systemd is used. Radosgw was still using some old logic from Ubuntu.
Ubuntu 16.04 now has systemd so we don't need to configure rgw as it was
running on upstart.
Signed-off-by: Sébastien Han <seb@redhat.com>
ansible 2.2 deprecates first_available_file option which is used in
the config_template module by 'generate ceph configuration file' task.
This change syncs the config_module files from their master repository
in github.com/openstack/openstack/ansible-plugins which includes the fix
2f6cac2cf6
Signed-off-by: Alberto Murillo Silva <alberto.murillo.silva@intel.com>
Even for dmcrypt we need to check the "devices" status and
"raw_journal_devices" as well so we can fix them if there is something
wrong with them.
Signed-off-by: Sébastien Han <seb@redhat.com>
Before this commit if you had set monitor_interface in your
inventory file for a specific host it would be ignored and the value
in group_vars/all would have been used.
Also, this enables support for monitor_address again as it had been
broken by previous changes to this template.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
This is done for preventing of their use-before-definition for osd scenarios checks (should be removed after a refactor has properly seperated all the checks into appropriate roles).
Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>
Users reported that pool_default_pg_num is not honoured for the default
pool 'rbd'. So now we check the pg num value for the RBD pool and if it
does not match pool_default_pg_num then we delete and recreate it.
We also make sure the pool is empty first, just in case someone changed
the value manually and didn't reflect the change in ceph-ansible.
The only issue with this patch is that the pool ID will not be 0 anymore
but more likely 1.
Signed-off-by: Sébastien Han <seb@redhat.com>
backward compatibility for ceph-ansible version running latest code but
using variables defined before commit: 492518a2
Signed-off-by: Sébastien Han <seb@redhat.com>
ceph-fetch-keys role currently works only if cluster name is 'ceph'.
This commit allows to set custom cluster name in 'defaults' in the same
fashion as other roles do.
This RHCS version is now generally available. Default to using it.
Signed-off-by: Alfredo Deza <adeza@redhat.com>
Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
Related: rhbz#1357631
By overriding the openstack_pools variable introduced by this commit, the
deployer may choose not to create some of the openstack pools, or to add
new pools which were not foreseen by ceph-ansible, e.g. for a gnocchi
storage backend.
For backwards compatibility, we keep the openstack_glance_pool,
openstack_cinder_pool, openstack_nova_pool and
openstack_cinder_backup_pool variables, although the user may now choose
to specify the pools directly as dictionary literals inside the
openstack_pools list.
This allows us to test devices set with persistent naming such as
/dev/disk/by-*
When registering devices we can use persisent (/dev/disk/by-*) or
non-persistent (/dev/sd*). Both declarations are supported by
ceph-ansible. There was just two tasks that were not compatible with
this. Since we support using partitions directly we need to test that
because the device activation will be different. To test if the device
is a partition we use a regular expression which wasn't compatible with
the persistent device naming format (/dev/disk/by-*).
This commit solves this issue by reading the path of the symlink since
devices like /dev/disk/by-* are symlinks to devices like /dev/sd*
Signed-off-by: Sébastien Han <seb@redhat.com>
For some providers (such as upcoming Linode support), some NICs may have
multiple IP addresses. (In the case of Linode, the only NIC has a public
and private IP address.) This is normally okay as we can use the
ceph.conf cluster_network and public_network variables to force the
monitor to listen on the addresses we want. However, we also need
ansible to set the correct monitor IP addresses in "mon hosts" (i.e. the
addresses the monitors will listen on!). This new monitor_address_block
setting tells ansible which IP address to use for each monitor.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
- Move mon_containerized_default_ceph_conf_with_kv config from ceph-mon
to ceph-common defaults as it's used in ceph-nfs
- Update conditional to generate ganesha config when not
mon_containerized_default_ceph_conf_with_kv
- Revert change to store radosgw keyring using ansible_hostname on
ansible server so that ceph-nfs can find it
- Update ceph-ceph-nfs0-rgw-user container to use ansible_hostname
variable
Signed-off-by: Ivan Font <ivan.font@redhat.com>
use the activation scenario instead of the full ceph_disk one, we
already have a task to prepare osds so we just need to activate the
device.
working for me using vagrant :)
Signed-off-by: Sébastien Han <seb@redhat.com>
There is no need to run the actions from
roles/ceph-mon/tasks/docker/create_configs.yml
on the first monitor only since the monitor deployment happens
**serially**.
Moreover with Vagrant it's useful to allow the auto creation of the
cluster fsid, so enabling the option. If this is not desired you can
still set `fsid: 9c9c0448-0551-401d-b55b-e5b3a42bae42` for example.
Signed-off-by: Sébastien Han <seb@redhat.com>
- Gather facts only for mons before processing ceph-mon role serially in
containerized playbook sample
- Updated ceph.conf in order to generate a valid ceph.conf
Signed-off-by: Ivan Font <ivan.font@redhat.com>
- Move fsal_rgw config to ceph-common, as it's shaered with ceph-rgw
- Update all.docker.sample with NFS config
- Rename fsal_rgw to nfs_obj_gw and fsal_ceph to nfs_file_gw, because
the former names mean nothing to non-Ganesha developers
Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
-First install ceph into a directory with CMake
cmake -DCMAKE_INSTALL_LIBEXECDIR=/usr/lib -DWITH_SYSTEMD=ON -DCMAKE_INSTALL_PREFIX:PATH:=/usr <ceph_src_dir> && make DESTDIR=<install_dir> install/strip
-Ceph-ansible copies over the install_dir
-User can use rundep_installer.sh to install any runtime dependencies that ceph needs onto the machine from rundep
* changed s/colocation/collocation/
* declare dmcrypt variable in ceph-common so the variables check does
not fail
Signed-off-by: Sébastien Han <seb@redhat.com>
This fixes#845 for containerized deployments. We now also mount the
/etc/localtime volume in the containers in order to synchronize the host
timezone with the container timezone.
Signed-off-by: Ivan Font <ivan.font@redhat.com>
Prior to this change, each ceph cluster node would end up with several
"qemu-client-$pid.log" files owned by root. The [client] section would
capture *all* client activity (for example the "ceph health" command,
etc), not just librbd-in-qemu.
Restrict this section to libvirt clients only so that we don't generate
these spurious log files for other Ceph client traffic.
Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
Deployment fails when the ``secure_cluster`` is false:
TASK [ceph-mon : secure the cluster]
*******************************************
fatal: [saceph-mon.vm.ceph.asheplyakov]: FAILED! => {"failed": true, "msg": "'dict object' has no attribute 'stdout_lines'"}
fatal: [saceph-mon2.vm.ceph.asheplyakov]: FAILED! => {"failed": true, "msg": "'dict object' has no attribute 'stdout_lines'"}
fatal: [saceph-mon3.vm.ceph.asheplyakov]: FAILED! => {"failed": true, "msg": "'dict object' has no attribute 'stdout_lines'"}
A conditional include evaluates all included tasks with the (additional)
conditional applied to every task [1]. Thus all tasks from `secure_cluster.yml'
are always evaluated (with an additional 'when: secure_cluster' condition).
The `secure the cluster' task iterates over ``ceph_pools.stdout_lines``
even if ``secure_cluster`` is false: in loops ansible applies conditional
to every item (by design) [2]. However the `collect all the pools' task
is skipped if the very same condition evaluates to false, which leaves
the ``ceph_pools`` undefined, so the `secure the cluster' task fails:
Provide the default (empty) list to avoid the problem.
[1] http://docs.ansible.com/ansible/playbooks_conditionals.html#applying-when-to-roles-and-includes
[2] http://docs.ansible.com/ansible/playbooks_conditionals.html#loops-and-conditionalsCloses: #913
Signed-off-by: Alexey Sheplyakov <asheplyakov@mirantis.com>
Update each role's task to use the respective role's username, image
name, and image tag to check if a container is already running. This was
causing false failures because we were not matching any running
containers and subsequently running checks.yml to check the status of
cluster files being left behind.
Signed-off-by: Ivan Font <ivan.font@redhat.com>
Journal size is not mandatory anymore, a default from 5GB is being
added. A simple warning message will show up if the size is set to
something below 5GB.
Signed-off-by: Sébastien Han <seb@redhat.com>
The ceph-common role fails when you run ansible with --check. Adding
always_run to a few tasks makes the check go through easier (although
it's not foolproof).
This will help if the path to the iso exists in the originating server but not
in the remote paths. This issue is not seen if using /tmp/file.iso but does
show up when using nested paths.
Signed-off-by: Alfredo Deza <adeza@redhat.com>
Resolves: rhbz#1355762
init_system was getting the value of "systemd\n"
and was later compared to be equal to "systemd"
making the wrong scripts to be executed.
Signed-off-by: Alberto Murillo <alberto.murillo.silva@intel.com>
Add the ability to use a custom repo, rather than just upstream, RHEL,
and distro. This allows ansible to be used for internal testing.
Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
If the docker image cannot be retrieved we will fail this task silently
and the playbook ultimately succeeds without a successful deployment.
This change makes it so we fail the playbook immediately.
Signed-off-by: Ivan Font <ivan.font@redhat.com>
Ceph has the ability to export it's filesystem via NFS using Ganesha.
Add a ceph-nfs role that will start Ganesha and export the Ceph
filesystems.
Note that, although support is going in to export RGW via NFS, this is
not working yet.
Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
The config template is in ceph-common, not in the individual roles, so
roles referencing it need to use playbook_dir, not role_path.
Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
- Check for nmap being available was not running as a local_action, when the checks using nmap were
- Various fixes on Ansible 2.x now that the above is working
This causes ceph-ansible scripts to fail when targeting Centos7 machines.
Installation fails because newer ceph package dependencies provided
by ceph-release-{version}.noarch.rpm were overridden by older
package dependency versions in default distribution repositories,
due to the fact that default distribution repositories have higher
priority.
Docker makes it difficult to use images that are not on signed
registries. This is a problem for developers, who likely won't have
access to a registry with proper signed certificates.
This allows the ability to use any docker image on the machine running
vagrant/ansible. The way it works is that the image in question is
exported locally, then sent to each target box and imported there.
Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
This will allow nodes to install rhcs that do
not have access to the internet.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Resolves: rhbz#1337601
In order to align all Ansible versions, we now use the full path for the
template. We rely on `role_path` variable. Now all the tasks using
the template module have a uniform syntax.
Might fix issue raised in #483
Signed-off-by: Sébastien Han <seb@redhat.com>
The scenarios were not being accurately compared to ensure that:
* A single scenario was choosen
* ONLY a single scenario was choosen
This solution does not scale for long, but that can be addressed in a
different patchset.
By default, this roles will create a ceph config file and get the admin
key. You can optionnally add other users, keys and pools for your tests.
Closes: #769
Signed-off-by: Sébastien Han <seb@redhat.com>
Add support to allow ceph-ansible to install and
configure Ceph on Debian on the ppc64le architecture.
Canonical has ppc64le Debian packages in Ubuntu distros
and on Ubuntu Cloud Archive. Both of which can be installed
and configured using the 'distro' or 'uca' options in
ceph-ansible when this patch is used.
Signed-off-by: Samuel Matzek <smatzek@us.ibm.com>
Since ##461 we have been having the ability to override ceph default
options. Previously we had to add a new line in the template and then
another variable as well. Doing a PR for one option was such a pain. As
a result, we now have tons of options that we need to maintain across
all the ceph version, yet another painful thing to do.
This commit removes all the ceph options so they are handled by ceph
directly. If you want to add a new option, feel free to to use the
`ceph_conf_overrides` variable of your `group_vars/all`.
Risks, for those who have been managing their ceph using ceph-ansible
this is not a trivial change as it will trigger a change in your
`ceph.conf` and then restart all your ceph services. Moreover if you did
some specific tweaks as well, prior to run ansible you should update the
`ceph_conf_overrides` variable to reflect your previous changes.
To avoid service restart, you need to know a bit of ansible for this,
but generally the idea would be to run ansible on a dummy host to
generate the ceph.conf, then scp this file to all your ceph hosts and
you should be good.
Closes: #693
Signed-off-by: Sébastien Han <seb@redhat.com>
This is purely a refactor. Converts when 'and' conditionals into lists
rather than multiline strings. This does not work for nested
conditionals, but those can be formated with indents.
Moves one line when statements onto the same line as the when command
itself.
A small logic bug was found in ceph-osd/tasks/check_devices.yml which
which was also fixed.
Signed-off-by: Sam Yaple <sam@yaple.net>
Somehow on CentOS 7.2 with Jewel, the service enablement by the Ansible service module
does not seem to work properly.
Signed-off-by: Sébastien Han <seb@redhat.com>
This adds a helper fact that uses the ``init_system`` fact to determine if
we should be using systemd or not when controlling services.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
The ceph-osd role currently uses ansible_service_mgr, which is a fact
only available on ansible 2.x and greater. This commit sets a similar
fact called init_system which will store the contents of /proc/1/comm
(systemd, init, etc.) and then references it ceph-osd instead.
Closes#741
If the ceph cluster name includes numbers, the grep used to find the OSD
IDs from /var/lib/ceph/osd/ would also return the numbers that were in
the cluster name.
For example, if the cluster was named 'mine123' and there was only one
OSD on the node, then the task that finds the OSD IDs would return
'123' and '0'.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
This adds support to allow the install of Ceph from the
Ubuntu Cloud Archive. The Ubuntu Cloud Archive provides newer
release of Ceph than the normal Ubuntu distro repository.
Signed-off-by: Samuel Matzek <smatzek@us.ibm.com>
Since developement versions of Ceph are after infernalis a package split
happened. So basically ceph-mon, ceph-osd, ceph-mds need to be
installed.
Signed-off-by: Sébastien Han <seb@redhat.com>
This will allow a user to conditionally install the ceph package on rpm
based systems. Installing this package is not required or wanted in
versions passed infernalis.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Introducing a new config option: `radosgw_civetweb_bind_ip` which points
to the `ansible_default_ipv4` by default. You can override this
variable. Use ansible facts to put a proper value.
Signed-off-by: Sébastien Han <seb@redhat.com>
This fixes the ceph.conf template so that it will look for an inventory
defined value for monitor_interface or for monitor_interface defined in
a group_vars file.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
This fixes a bug where monitor_interface might be set in your inventory
file and not by using group_vars or --extra-vars causing the template to
use the default address of 0.0.0.0 instead of the defined
monitor_interface.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Instead of creating the RBD client socket path three different places
in three different ways, this creates it once. Ceph on OpenStack users
have the option to customize the permissions of the RBD client
directories.
Fixes#687
As written, generating the config file for ceph-mon in Docker yielded:
ERROR: config_template is not a legal parameter in an Ansible task or
handler
This fixes that error condition.
We now check if the device has already been prepared, if we detect a
ceph partition we do not prepare the device.
Also fixed some issues while running on Atomic or CoreOS.
Signed-off-by: Sébastien Han <seb@redhat.com>
fixing the can't open /var/lib/ceph/bootstrap-osd/ceph.keyring: can't
open /var/lib/ceph/bootstrap-osd/ceph.keyring: (13) Permission denied
Signed-off-by: Sébastien Han <seb@redhat.com>
we now have the ability to enable the `cluster` variable with a specific
value that will determine the name of the cluster.
Signed-off-by: Sébastien Han <seb@redhat.com>
ceph.conf.j2 template requires a new line between mon_containerized_deployment_with_kv and fsid variables
With this commit , i have added a new line for better readablity
ceph.conf file generation task in ceph-common role was getting failed
because it ansible cant find defination of varriable mon_containerized_deployment_with_kv
This fix declare mon_containerized_deployment_with_kv under ceph-common/defaults/main.yml which fixes this issue
Signed-off-by: ksingh7 <karan.singh731987@gmail.com>
With Jewel comes a new store to store Ceph object: BlueStore. Adding an
extra scenario might seem like a useless duplication however the
ultimate goal is remove the other roles later. Thus this is easier to
add new role instead of modifying existing one. Once we drop the support
for release older than Jewel we will just remove all the previous
scenario files.
Signed-off-by: Sébastien Han <seb@redhat.com>
Some versions (?) of libvirt provide a 'libvirt' group instead of
'libvirtd'. (Observed with libvirt-daemon-1.2.17-13.el7_2.2.x86_64.)
This makes the RBD client directory owner and group configurable to
allow for this.
this is to allow ceph-authtool to read and write to /var/ and /etc on CentOS Atomic.
Add doc on how to run containerized deployment on RHEL/CentOS Atomic
Signed-off-by: Huamin Chen <hchen@redhat.com>
This would allow users who don't know what interface to provide to
give an IP address to use for the monitor instead.
Note: the includes are needed in ceph.conf.j2 because without them
jinja2 can not properly evaluate the template and will complain about a
missing 'ansible_interface' variable. The includes allow the template to
be evaluated correctly and then the correct include will be used during
render time.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Since we want to activate the OSD when it's a partition we are looking
for a return code that is equal to 0 which means the device is a
partition.
closes: #636
Signed-off-by: Sébastien Han <seb@redhat.com>
* `/var/run/ceph/rbd-clients` is not created automatically
* because it is missing, ceph-rgw complains about missing client
socket on start up; it is because the containing directory is
not there
* so we just add it to the list of directory pre-requisite
* the client-name is actually `rgw.{{ ansible_hostname }}` instead
of just `{{ ansible_hostname }}`
* it matches the directory created under `/var/lib/ceph/radosgw`
* and, it matches the client-name used to create the keyring in
`pre_requisite.yml`
Currently we don't yet support runnings OSDs w/ selinux in
enforcing mode. Thus its better to ensure that ceph-ansible
explicitly makes selinux permissive. This should help in
scenarios such as hyperconverged where OSDs are colocated
with VMs on compute nodes which needs selinux enforcing, but
OSDs don't.
Signed-off-by: Deepak C Shetty <deepakcs@redhat.com>
Where it was located before meant it might be skipped if you don't run
tasks with the package-install tag. This fixes the situation where you
want to configure an rhcs node, but do not want to do any package
installs.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
When installing RHCS there is an option to install from distro provided
packages, this commit modifies the check to allow that to happen.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
changing the name of the directory causes issues with git subtree which
will create new commits. Creating a symlink for vagrant to be happy.
Signed-off-by: Sébastien Han <seb@redhat.com>
in order to have a build on the galaxy we need to have a proper
dependency set for ceph-common. On the galaxy ceph-common does not
exist, only ceph.ceph-common is available.
Signed-off-by: Sébastien Han <seb@redhat.com>
this commit introduces the ability to use fqdn for mon/mds name while
generation the ceph.conf file from the template.
Simply turn mon_use_fqdn and or mds_use_fqdn to true to use FQDN.
Signed-off-by: Sébastien Han <seb@redhat.com>
This adds a script, generate_group_vars_sample.sh, that generates
group_vars/*.sample from roles/ceph-*/defaults/main.yml to avoid
discrepancies between the sets of files. It also converts the line
endings in the various main.yml from DOS to Unix, since generating the
samples was spreading the line ending plague around to more files.
0644 should never be a directory mode. 1777 makes it so that any user
can create a ceph client, not just root. (This is helpful if, for
instance, nova-compute is running as non-root.)
Previously, creating pools was skipped if cephx was disabled; instead,
we should only skip key creation if cephx is disabled, and create
pools any time openstack_config is true.
If using another method to generate a consistent fsid, then we can
skip creation of an (unused) cluster UUID file. If cephx is disabled
as well, we can skip creation of the fetch directory entirely.
Skip a number of ceph keyring-related tasks (or remove the keyring
portion of some tasks) when cephx is disabled. Specifically, avoid
generating the initial keyring, which only clutters up the ansible
repo if cephx is not in use.
This commit allows you to set a new variable to 'true' if you want to
have ceph admin key copied over different kind of hosts such as MDS,
OSD, RGW. To enable this just set `copy_admin_key` to true.
Closes: #555
Signed-off-by: Sébastien Han <seb@redhat.com>
When autodiscovering disks, disks can be skipped if either they are
removable, or if they have partitions on them. Skipped actions have no
'rc' attribute, though, so the 'ceph prepare' conditional fails unless
we first check to ensure that the results were not skipped before
checking the return value.
The firewall checks can fail for any number of reasons -- e.g., the
ceph cluster hostnames are unresolvable from the ansible host, or the
ports are filtered by some intermediate hop, etc. Make two changes to
make those checks better:
* Set pipefail when running the checks, so if nmap itself fails the
command will be marked as 'failed'. Specifically, this fixes the
case where the hostnames cannot be resolved.
* Add a new variable, check_firewall, which can be used to disable
checks entirely. Specifically, this fixes the case where some
intermediate firewall filters the ports, so nmap returns "filtered".
If cephx is set to false, the "set keys permissions" task fails with:
file ({# ceph_keys.stdout_lines #}) is absent, cannot continue
This skips that step when cephx is false.
Installs on RHEL with ceph_origin set to distro previously would fail
because no packages would get installed, but all of the checks passed
fine. This adds support for ceph_origin: distro, simply installing the
packages using yum/dnf and assuming that the sysadmin has provided a
repository containing them.
This also supports the use case where Satellite or a similar local
mirror is in use, and the admin does not or cannot use the additional
repositories the role would otherwise add.
The purpose of this is so we can connect to the mons and gather the keys
needed to configure an OSD or additonal MON without having to reconfigure
the existing mons at the same time.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
In our use case we might only be configuring mons and not osds in the
same call, so we don't want to check variables needed for osds when they
are not needed to configure a mon.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
as stated in https://github.com/ansible/ansible/issues/4297
if we register a variable twice and even if a task is skipped the
register will not get overwritten... So we use the fact variant as
mentionned in the ansible issue.
Signed-off-by: Sébastien Han <seb@redhat.com>
While this is not widly used (AFAIK :p) the feature was broken. Thanks
to @zmc for reporting it. You can now set `osd_auto_discovery` to
true in your group_vars/osd and it will go through all the devices
available and will make them OSDs.
Signed-off-by: Sébastien Han <seb@redhat.com>
Currently deploying a MON fails with "bad symbolic permission for mode"
errors due to the file/directory modes not being interpreted as octal
values. This commit updates roles/ceph-common/tasks/main.yml to set
the file/directory modes to strings so they can be interpreted
correctly.
Closes issue #525
run containerized daemons in virtual machines.
to enable it simply do:
`cp site-docker.yml.sample site-docker.yml`
and set `docker: true` in `vagrant_variables.yml`
Signed-off-by: Sébastien Han <seb@redhat.com>
At the moment, all the tasks using the file module are duplicated to have differents ownerships depending on the fact `is_ceph_infernalis`.
The goal of this commit is to have a new logic for this:
- First set facts depending on the `is_ceph_infernalis` fact
- Create the files or directories using the setted facts as ownerships.
We have a requirement to install the packages first without
configuration. These tags should allow us to target the tasks need to do
that.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
as reported in #510 some systems don't have uuidgen installed so we
better use a more global way to generate it. It sounds like python
should be available in case uuidgen is not.
Otherwise we will have to find another way :)
closes#510
Signed-off-by: Sébastien Han <seb@redhat.com>
Currently, all the ceph package installation resources use
"state=latest", which means subsequent runs of the ceph playbooks
could result in ceph being upgraded if there are package updates
available in the selected repo.
This commit adds a new variable to ceph-common called
'upgrade_ceph_packages' which defaults to False. This variable is used
in the package installation resources for ceph packages to determine if
the resource should use "state=present" or "state=latest". If the
variable gets set to True, "state=latest" will be used.
Additionally, we update rolling_update.yml to override
upgrade_ceph_packages to true to permit package upgrades in this
context specifically.
Closes issue #506
It seems that in ansible 2.0 even if a task is skipped by it's `when`
clause not evaluating to true the variables in the play are still
rendered. Because these were not defined in defaults/main.yml ansible
was failing in installs/install_on_redhat where those variables are
being used in a `with_items` stanza.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
This change allows for configurable Ceph Conf Directory permissions. This
is required for integrators of Ceph, like OpenStack Cinder, which needs to
read from /etc/ceph for operation.
Use command module instead of shell since we do not do anything fancy
here. Remove the duplicate register.
Signed-off-by: Sébastien Han <seb@redhat.com>
As raised in #466 it is important in order to avoid unnecessary
troubleshooting to check that ceph ports are allowed on the platform.
The check runs a nmap command from the host running Ansible
to all the ceph nodes with their respective ports.
Signed-off-by: Sébastien Han <seb@redhat.com>
Thanks to @cloudnull great patch at
https://github.com/ansible/ansible/pull/12555
we now have the ability to add more configuration options instead of
having to push a PR to add a new option to the template. So you can
dynamically add and remove flags.
To use it, edit `ceph_conf_overrides` in `group_vars/all` like so:
```
ceph_conf_overrides
global:
foo: 12345
bar: 6789
```
Signed-off-by: Sébastien Han <seb@redhat.com>
Because of some permission issue, likely due to the recent ceph user, if
80 is used for civetweb we get:
set_ports_option: cannot bind to 80: 13 (Permission denied)
Changing the port to 8080 until this gets solved.
Signed-off-by: Sébastien Han <seb@redhat.com>