This condition is useless and it's also creating issues we don't see in
our CI. ceph_release is set by either ceph-common or ceph-docker-common
so let's keep it this way.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1645379
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Somewhat something changed with the introduction of msg2 and we have to
add each node as a peer so the monitors can form a quorum. This might be
due to our CI environment, although adding this is completly harmless
and solves monitors not being able to form quorum.
It seems that the initial monitor map wasn't containing the right
information about the peers (addresses like 0.0.0.0/0r1, for each rank.
Signed-off-by: Sébastien Han <seb@redhat.com>
The nfs_ganesha_dev_apt_repo variable was set incorrect in task
"fetch nfs-ganesha development repository"
Signed-off-by: Bruceforce <Bruceforce@users.noreply.github.com>
If we don't copy the key after the package install the directory /var/lib/ceph/bootstrap-rbd-mirror
will not exist and the copy will fail.
Signed-off-by: Sébastien Han <seb@redhat.com>
We don't need to create the directories on non-containers, they are
created by the packages.
Closes: https://github.com/ceph/ceph-ansible/issues/3430
Signed-off-by: Sébastien Han <seb@redhat.com>
When one of the currently supported NTP services has been set up,
disable rest of the NTP services on Ceph nodes.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Merge ntp_debian.yml and ntp_rpm.yml into one (the new file is called
setup_ntp.yml) since they are almost identical. Also avoid repetition
of the common setup step for ntpd and chronyd services.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Since the current user on the controller node, might not have the
permission to read the TLS certificate and related files, copy these
files to the Ceph nodes as root user.
Fixes: https://github.com/ceph/ceph-ansible/issues/3465
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Make linter happy and add more robustness to remote tasks by retrying 3
times (the default) before failing.
Signed-off-by: Sébastien Han <seb@redhat.com>
These aliases have led to several issues making believe that ceph
binaries are actually present on the host when running the command.
However it wasn't explicit that the commands were only ran inside a
container.
It has brought to much confusion so we decided to remove them.
Closes: https://github.com/ceph/ceph-ansible/issues/3445
Signed-off-by: Sébastien Han <seb@redhat.com>
sometimes we play the whole role `ceph-defaults` just to access the
default value of some variables. It means we play the `facts.yml` part
in this role while it's not desired. Splitting this role will speedup
the playbook.
Closes: #3282
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We want to test podman on f29 non-atomic, atomic is not a hard
requirement. However, if you want to get podman then you will have to
install it first before running the playbook.
Signed-off-by: Sébastien Han <seb@redhat.com>
This commit removes the default module, so ceph-ansible does not enable
any manager module.
To enable a module you need to set a value to 'ceph_mgr_modules', you
can pass a list of modules like this:
ceph_mgr_modules:
- status
- dashboard
Signed-off-by: Sébastien Han <seb@redhat.com>
Json is a type structure which is always typed as a string, where before
this we were declaring a dict, which is not a json valid structure.
Signed-off-by: Sébastien Han <seb@redhat.com>
The code is now able (again) to start osds that where configured with
ceph-disk on a non-container scenario.
Closes: https://github.com/ceph/ceph-ansible/issues/3388
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 452069cb3a)
Applying and passing the OSD_BLUESTORE/FILESTORE on the fly is wrong for
existing clusters as their config will be changed.
Typically, if an OSD was prepared with ceph-disk on filestore and we
change the default objectstore to bluestore, the activation will fail.
The flag osd_objectstore should only be used for the preparation, not
activation. The activate in this case detects the osd objecstore which
prevents failures like the one described above.
Signed-off-by: Sébastien Han <seb@redhat.com>
If an existing cluster runs this config, and has ceph-disk OSD, the
`expose_partitions` won't be expected by jinja since it's inside the
'old' if. We need it as part of the osd_scenario != 'lvm' condition.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1640273
Signed-off-by: Sébastien Han <seb@redhat.com>
This commit unifies the container and non-container code, which in the
meantime gives use the ability to deploy N mon container at the same
time without having to serialized the deployment. This will drastically
reduces the time needed to bootstrap the cluster.
Note, this is only possible since Nautilus because the monitors are
bootstrap the initial keys on their own once they reach quorum. In the
Nautilus version of the ceph-container mon, we stopped generating the
keys 'manually' from inside the container, for more detail see: https://github.com/ceph/ceph-container/pull/1238
Signed-off-by: Sébastien Han <seb@redhat.com>
When collocating mon and mgr, the mgr container will attempt to create
its own key since it has the admin key at its disposal. Also at this
point there is nothing to fetch since the key is not created by the
mons, as mentionned above the mgr creates the key on its own.
Signed-off-by: Sébastien Han <seb@redhat.com>
This will speed up the deployment and also deploy mon and mgr collocated
just as recommended.
This won't prevent you of adding more and dedicaded machines for mgr if
needed.
Signed-off-by: Sébastien Han <seb@redhat.com>
During the first iteration, the command won't return anything, or can
simply fail and might not return a valid json structure. Ansible will
fail parsing it in the filter `from_json` so let's default that variable
to empty dictionary.
Signed-off-by: Sébastien Han <seb@redhat.com>
This removes a bit of unnecessary code, the check was always wrong
because of the condition 'not ceph_current_status.get('rc', 1) == 0'
It will never match since `Not` is used for bool and we are checking for
an rc.
Also, even though the check would work, this will be a major blocker for
a complete meltdown. If the whole platform is shutdown then nothing will
be up but files will be present, so this check is definitely wrong.
Signed-off-by: Sébastien Han <seb@redhat.com>
This is false, `./defaults/main.yml` is not supposed to be modified
directly. groups_vars a/o host_vars should always be preferred.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This will fix the following yamllint warning:
Variables should have spaces after {{ and before }}
Signed-off-by: Christian Berendt <berendt@betacloud-solutions.de>
change default value of `radosgw_address` to keep consistency with
`monitor_address`.
Moreover, `ceph-validate` checks if the value is '0.0.0.0' to determine
if it has to run `check_eth_rgw.yml`.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600227
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This is not needed to play these tasks on nodes that are not in rgw
group.
Always playing this code makes `shrink_mon.yml` failing.
Typical error:
```
TASK [ceph-defaults : set_fact _radosgw_address to radosgw_interface - ipv4] ***
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-shrink_mon/roles/ceph-defaults/tasks/set_radosgw_address.yml:21
Thursday 22 November 2018 12:34:51 +0000 (0:00:00.154) 0:00:12.371 *****
fatal: [localhost]: FAILED! => {}
MSG:
The task includes an option with an undefined variable. The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute u'ansible_eth1'
```
Indeed, `radosgw_interface` is the network interface on rgw only. It is
expected that this same interface doesn't exist on `localhost`, so, when
running `shrink_mon.yml`, the role `ceph-defaults` is called in
`hosts: localhost` and causes the playbook to fail.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
It seems Atomic 7.5 has podman already, however this is an old version
(0.4). The podman integration is targetting RHEL 8, so Fedora is
currently the closest to that.
Signed-off-by: Sébastien Han <seb@redhat.com>
During its initialisation both rbd-target-api and rbd-target-gw try to
open /dev/log for their syslog handler. If the device is not present the
service fails to start. Thus expose /dev/log from the host in the
container solves that problem.
Signed-off-by: Sébastien Han <seb@redhat.com>
Since 84fcf4639140c390a7f1fcd790ba190503713f86 we now use the container
binary cli to create ceph keys instead of creating a container and
'docker execing' into it.
Signed-off-by: Sébastien Han <seb@redhat.com>
In order to be able to retrieve udev information, we must expose its
socket. As per, https://github.com/ceph/ceph/pull/25201 ceph-volume will
start consuming udev output.
Signed-off-by: Sébastien Han <seb@redhat.com>
This is to add a granularity level.
We can have ceph specific variables that user shouldn't have to change
here.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add real default value for osd pool size customization.
Ceph itself has an `osd_pool_default_size` default value to `3`.
If users don't specify a pool size in various pools definition within
ceph-ansible, we should default to `3`.
By the way, this kind of condition isn't really clear:
```
when:
- rbd_pool_size | default ("")
```
we should try to get the customized value then default to what is in
`osd_pool_default_size` (which has its default value pointing to
`ceph_osd_pool_default_size` (`3`) as well) and compare it to
`ceph_osd_pool_default_size`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
`osd_pool_default_pg_num` parameter is set in `ceph-mon`.
When using ceph-ansible with `--limit` on a specifc group of nodes, it
will fail when trying to access this variables since it wouldn't be
defined.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1518696
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
ceph.conf doesn't accept float value.
Typical error seen:
```
$ sudo ceph daemon osd.2 config get osd_memory_target
Can't get admin socket path: unable to get conf option admin_socket for osd.2:
parse error setting 'osd_memory_target' to '7823740108,8' (strict_si_cast:
unit prefix not recognized)
```
This commit ensures the value inserted in ceph.conf will be an integer.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
It is safer to use the list filter than the keys() method since the keys
method does have some interoperability issues between python2 and
python3 based ansible/jinja.
Signed-off-by: Boris Ranto <branto@redhat.com>
If you use python3 based ansible then keys() returns a dict_keys object,
not a list of keys. This breaks the installation on such a system. Using
the list filter provides a more robust solution that should work on both
python2 and python3 based ansible. You can find some more information
about the issue, here:
https://github.com/ansible/ansible/issues/19514
Signed-off-by: Boris Ranto <branto@redhat.com>
* The default value of osd_memory_target used by ceph is 4294967296 bytes,
so use the same as ceph-ansible default.
* Convert ansible_memtotal_mb to bytes to calculate osd_memory_target
Signed-off-by: Neha Ojha <nojha@redhat.com>
This error was introduced in the recent refactor of ceph-docker-common
in https://github.com/ceph/ceph-ansible/pull/3251. However, the Ansible
galaxy linter is not happy about it and fails importing the role.
Removing this since it's not used anymore.
Signed-off-by: Sébastien Han <seb@redhat.com>
if firewalld.service systemd unit is masked, the handler will fail when
trying to restart it.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650281
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
since `ceph-volume` introduction, there is no need to split those tasks.
Let's refact this part of the code so it's clearer.
By the way, this was breaking rolling_update.yml when `openstack_config:
true` playbook because nothing ensured OSDs were started in ceph-osd role (In
`openstack_config.yml` there is a check ensuring all OSD are UP which was
obviously failing) and resulted with OSDs on the last OSD node not started
anyway.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Those tasks aren't needed in docker-common since the introduction of
`ceph-infra` role. They are duplicated tasks.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
this is already done in ceph-defaults, there is no need to have this
check in ceph-docker-common.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
this fact is already set in ceph-defaults, there is no need to set it
again in ceph-docker-common
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Instead of looping over a list of packages or repeating the task
separately for different packages, pass the list of packages to the
task performing package management.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
This is needed for Nautilus since the ceph-create-keys script goes away.
(https://github.com/ceph/ceph/pull/21305)
Now the module if called with 'state: fetch_initial_keys' will lookup
keys generated by the monitor and write them down on the filesystem to
the right location (/etc/ceph and /var/lib/ceph/boostrap*).
This is not applicable to container since keys are generated by the
container only.
Signed-off-by: Sébastien Han <seb@redhat.com>
The firewall setup for igw is not getting setup because iscsi_group_name
does not it exist. It should be iscsi_gw_group_name.
Signed-off-by: Mike Christie <mchristi@redhat.com>
The default igw api port is 5000 in the manual setup docs and
ceph-iscsi-config package so this syncs up ansible.
Signed-off-by: Mike Christie <mchristi@redhat.com>
This is needed for Nautilus since the ceph-create-keys script goes away.
(https://github.com/ceph/ceph/pull/21305)
Now the module if called with 'state: fetch_initial_keys' will lookup
keys generated by the monitor and write them down on the filesystem to
the right location (/etc/ceph and /var/lib/ceph/boostrap*).
This is not applicable to container since keys are generated by the
container only.
Signed-off-by: Sébastien Han <seb@redhat.com>
description = 'Use `when: var` rather than `when: var != ""` (or ' \ 'conversely `when: not var` rather than `when: var == ""`)'
Signed-off-by: Sébastien Han <seb@redhat.com>
The use of a handler meant that the cache would be updated at the very
end of the play, which doesn't work when adding a development repo and
trying to install right after it. This mostly reverts
53cdddf886 without an actual `git revert`
because that caused other conflicts.
Signed-off-by: Alfredo Deza <adeza@redhat.com>
Update the meta with the relavant support such as:
* ansible version: min 2.4
* distro supported (tested on) centos 7
Signed-off-by: Sébastien Han <seb@redhat.com>
Do not run the linter for these 3:
* we use latest for pip docker-py package
* for ssl keys this is a false positive since the inital command is a
'shell' it'll always change
* for keystone, we must use shell since the with_items contains pipes
Signed-off-by: Sébastien Han <seb@redhat.com>
Calling command should have changed_when false otherwise each time it
runs it will show as 'changed' and this is irrelevant.
Commands should not change things if nothing needs doing
Signed-off-by: Sébastien Han <seb@redhat.com>
since the jinja logic has been moved into ansible task, we can simply
this part of the code and use `_current_monitor_address`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
So we don't have to loop over `_monitor_addresses` when we need the
monitor address of the current node being played.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
using consecutive set_fact in the playbook instead of complex jinja syntax
makes ceph.conf.j2 more readable.
By the way, jinja can be painful to debug at some point.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Latest ansible version at the moment is 2.7
We should explicitly require 2.7 only on master branch.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Let's test ceph-ansible master against ansible 2.7 to catch early any
potential issue with this ansible version.
Closes: #3148
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
run commands on containers when containerized deployments.
(At the moment, all commands are run on the host only)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
since `rgw_multisite_endpoint_addr` has a default value to
`{{ ansible_fqdn }}`, it shouldn't be mandatory to set this variable.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
- updated README-MULTISITE
- re-added destroy.yml
- added tasks in ceph-validate to make sure the
rgw multisite vars are set
Signed-off-by: Ali Maredia <amaredia@redhat.com>
We should give users the possibility to set the IP they want as
multisite endpoint, setting the default value to `{{ ansible_fqdn }}` to
not force them to set this variable.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
- remove destroy tasks
- cleanup conditionals and syntax
- remove unnecessary realm pulls
- enable multisite to be tested in automated
testing infra
- add multisite related vars to main.yml and
group_vars
- update README-MULTISITE
- ensure all `radosgw-admin` commands are being run
on a mon
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Since we do not have enough data to put valid upper bounds for the memory
usage of these daemons, do not put artificial limits by default. This will
help us avoid failures like OOM kills due to low default values.
Whenever required, these limits can be manually enforced by the user.
More details in
https://bugzilla.redhat.com/show_bug.cgi?id=1638148
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1638148
Signed-off-by: Neha Ojha <nojha@redhat.com>
we ensure that firewalld is installed and running before adding any
rule. This has no sense anymore not to reload firewalld once the rule
are added.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The playbook has various improvements:
* run ceph-validate role before doing anything
* run ceph-fetch-keys only on the first monitor of the inventory list
* set noup flag so PGs get distributed once all the new OSDs have been
added to the cluster and unset it when they are up and running
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1624962
Signed-off-by: Sébastien Han <seb@redhat.com>
This commits simplies the usage of the ceph-fetch-keys role. The role
now has a nicer way to find various ceph keys and fetch them on the
ansible server.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1624962
Signed-off-by: Sébastien Han <seb@redhat.com>
Currently a throw-away container is built to run ceph client
commands to setup users, pools & auth keys. This utilises
the same base ceph container which has all the ceph services
inside it.
This PR allows the use of a separate container if the deployer
wishes - but defaults to use the same full ceph container.
This can be used for different architectures or distributions,
which may support the the Ceph client, but not Ceph server,
and allows the deployer to build and specify a separate client
container if need be.
Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>
a non skipped task won't have the `skipped` attribute, so `start
firewalld` task will complain about that.
Indeed, `skipped` and `rc` attributes won't exist since the first task
`check firewalld installation on redhat or suse` won't be skipped in
case of non-containerized deployment.
Fixes: #3236
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1541840
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Liberty is no longer available in the UCA. The last available release there
is currently Queens.
Signed-off-by: Christian Berendt <berendt@betacloud-solutions.de>
`ceph_osd_container_stat` might not be set on other osd node.
We must ensure we are on the last node before trying to evaluate
`ceph_osd_container_stat`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
As of now, we should no longer support Jewel in ceph-ansible.
The latest ceph-ansible release supporting Jewel is `stable-3.1`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit does a couple of things:
* Avoid code duplication
* Clarify the code
* add more unit tests
* add myself to the author of the module
Signed-off-by: Sébastien Han <seb@redhat.com>
This task was created for ceph-disk based deployments so it's not needed
when osd are prepared with ceph-volume.
Signed-off-by: Sébastien Han <seb@redhat.com>
The restart script wasn't working with the current new addition of
ceph-volume in container where now OSDs have the OSD id name in the
container name.
Signed-off-by: Sébastien Han <seb@redhat.com>
Now that the container is named ceph-osd@<id> looking for something that
contains a host is not necessary. This is also backward compatible as it
will continue to match container names with hostname in them.
Signed-off-by: Sébastien Han <seb@redhat.com>
We don't need to pass the device and discover the OSD ID. We have a
task that gathers all the OSD ID present on that machine, so we simply
re-use them and activate them. This also handles the situation when you
have multiple OSDs running on the same device.
Signed-off-by: Sébastien Han <seb@redhat.com>
We don't need to pass the hostname on the container name but we can keep
it simple and just call it ceph-osd-$id.
Signed-off-by: Sébastien Han <seb@redhat.com>
expose_partitions is only needed on ceph-disk OSDs so we don't need to
activate this code when running lvm prepared OSDs.
Signed-off-by: Sébastien Han <seb@redhat.com>
The batch option got recently added, while rebasing this patch it was
necessary to implement it. So now, the batch option can work on
containerized environments.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1630977
Signed-off-by: Sébastien Han <seb@redhat.com>
At the moment, all daemons accept connections from 0.0.0.0.
We should at least restrict to public_network and add
cluster_network for OSDs.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1541840
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Fixes the deprecation warning:
[DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of
using `result|search` use `result is search`.
Signed-off-by: Noah Watkins <nwatkins@redhat.com>
These checks will never pass unless ceph_stable_release is passed and
ceph-defaults is run before ceph-validate. Additionally, we don't want
to support deploying jewel upstream at ceph-ansible master.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1637537
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Check firewall isn't working as expected and might break deployments.
This part of the code will be reworked soon.
Let's focus on configure_firewall code for now.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1541840
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Instead used "import_tasks" and "include_tasks" to tell whether tasks
must be included statically or dynamically.
Fixes: https://github.com/ceph/ceph-ansible/issues/2998
Signed-off-by: Rishabh Dave <ridave@redhat.com>
`monitor_address_block` should be read from hostvars[host] instead of
current node being played.
eg:
Let's assume we have:
```
[mons]
ceph-mon0 monitor_address=192.168.1.10
ceph-mon1 monitor_interface=eth1
ceph-mon2 monitor_address_block=192.168.1.0/24
```
the ceph.conf generation task will end up with:
```
fatal: [ceph-mon0]: FAILED! => {}
MSG:
'ansible.vars.hostvars.HostVarsVars object' has no attribute u'ansible_interface'
```
the reason is that it will assume `monitor_address_block` isn't defined even on
ceph-mon2 because looking for `monitor_address_block` instead of
`hostvars[host]['monitor_address_block']`, therefore it enters in the condition as default value:
```
{%- else -%}
{% set interface = 'ansible_' + (monitor_interface | replace('-', '_')) %}
{% if ip_version == 'ipv4' -%}
{{ hostvars[host][interface][ip_version]['address'] }}
{%- elif ip_version == 'ipv6' -%}
[{{ hostvars[host][interface][ip_version][0]['address'] }}]
{%- endif %}
{%- endif %}
```
`monitor_interface` is set with default value `'interface'` so the `interface`
variable is built with 'ansible_' + 'interface'. It makes ansible throwing a
confusing message about `'ansible_interface'`.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1635303
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Allow user to choose between timesyncd, chronyd and ntpd
Installation will default to timesyncd since it is distributed as
part of the systemd installation for most distros.
Added note indicating NTP daemon type is not used for containerized
deployments.
Fixes issue #3086 on Github
Signed-off-by: Benjamin Cherian <benjamin_cherian@amat.com>
The linux kernel target layer, LIO, does not support the iscsi target to
mix ACLs that have chap enabled and disabled under the same tpg. This
patch adds a check and fails if this type of setup is detected.
This fixes Red Hat BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1615088
Signed-off-by: Mike Christie <mchristi@redhat.com>
The role contains all the handlers for Ceph services. We decided to
leave ceph-defaults role with variables and a few facts only. This is
useful when organizing the site.yml files and also adding the known
variables to infrastructure-playbooks.
Signed-off-by: Sébastien Han <seb@redhat.com>
As per #1013 it appears that BS will soon use THP to lower TLB misses,
also disabling THP hasn't demonstrated any gains so far.
Closes: https://github.com/ceph/ceph-ansible/issues/1013
Signed-off-by: Sébastien Han <seb@redhat.com>
`+` is more idiomatic for "one or more" in a regex than `{1,}`; the
latter was introduced in a previous fix for an incorrect `{1,2}`
restriction.
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
After restarting each OSD, restart_osd_daemon.sh checks that the
cluster is in a good state before moving on to the next one. One of
the checks it does is that the number of pgs in the state
"active+clean" is equal to the total number of pgs in the cluster.
On large clusters (e.g. we have 173,696 pgs), it is likely that at
least one pg will be scrubbing and/or deep-scrubbing at any one
time. These pgs are in state "active+clean+scrubbing" or
"active+clean+scrubbing+deep", so the script was erroneously not
including them in the "good" count. Similar concerns apply to
"active+clean+snaptrim" and "active+clean+snaptrim_wait".
Fix this by considering as good any pg whose state contains
active+clean. Do this as an integer comparison to num_pgs in pgmap.
(could this be backported to at least stable-3.0 please?)
Closes: #2008
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
Previously RETRIES was set (by default to 40) once at the start of the
script; this meant that it would only ever wait for up to 40 lots of
30s across *all* the OSDs on a host before bombing out. In fact, we
want to be prepared to wait for the same amount of time after each OSD
restart for the clusters' pgs to be happy again before continuing.
Closes: #3154
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
...with the exception of the purge operation, since
removing Calamari would still be useful for an old
cluster.
Signed-off-by: John Spray <john.spray@redhat.com>
For now our best guess is to count the number of devices and multiply
by osds_per_device. Ideally we'd like to run ceph-volume lvm batch
--report and get the number of OSDs that way, but currently we need
a ceph.conf in place already before we can do that. There is a tracker
ticket that would allow os to get around the need for a ceph.conf:
http://tracker.ceph.com/issues/36088
Fixes: https://github.com/ceph/ceph-ansible/issues/3135
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
the default value for _rgw_hostname was took from the current node being
played while it should be took from the respective node in the loop.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622505
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This avoids errors when the osd scenario choosen does not require
setting devices or lvm_volumes. The default values for these are not
set because they exist in the ceph-osd role, not ceph-defaults.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
BlueStore's cache is sized conservatively by default, so that it does
not overwhelm under-provisioned servers. The default is 1G for HDD, and
3G for SSD.
To replace the page cache, as much memory as possible should be given to
BlueStore. This is required for good performance. Since ceph-ansible
knows how much memory a host has, it can set
`bluestore cache size = max(total host memory / num OSDs on this host * safety
factor, 1G)`
Due to fragmentation and other memory use not included in bluestore's
cache, a safety factor of 0.5 for dedicated nodes and 0.2 for
hyperconverged nodes is recommended.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1595003
Signed-off-by: Neha Ojha <nojha@redhat.com>
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
The commit:
commit 1164cdc002
Author: Guillaume Abrioux <gabrioux@redhat.com>
Date: Thu Aug 2 11:58:47 2018 +0200
iscsigw: install ceph-iscsi-cli package
installs the cli package but does not start and enable the
rbd-target-api daemon needed for gwcli to communicate with the igw
nodes. This patch just enables and starts it for the non-container
setup. The container setup is already doing this.
This fixes bz https://bugzilla.redhat.com/show_bug.cgi?id=1613963
Signed-off-by: Mike Christie <mchristi@redhat.com>
As of rhel 7.6, it has been decided it doesn't make sense to confine
`ganesha_t` anymore. It means this domain won't exist anymore.
Let's add a `failed_when: false` in order to make the deployment not
failing when trying to run this command.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1626070
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
If this is set to anything other than the default value of 1 then the
--osds-per-device flag will be used by the batch command to define how
many osds will be created per device.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
This command line is not supported.
According to official documentation:
```
Note that shell command lines are not directly supported.
If shell command lines are to be used,
they need to be passed explicitly to a shell implementation of some kind.
```
We must run this using /bin/sh instead.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
let's add ansible_hostname as a default value for rgw_hostname if no
hostname in servicemap matches ansible_fqdn.
Fixes: #3063
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622505
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit is adding quotes that make keyring unusuable
eg:
```
client.john
key: AQAN0RdbAAAAABAAH5D3WgMN9Rxw3M8jkpMIfg==
caps: [mds] ''
caps: [mgr] 'allow *'
caps: [mon] 'allow rw'
caps: [osd] 'allow rw'
```
Trying to import such a keyring and use it will result:
```
Error EACCES: access denied
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623417
This reverts commit 424815501a.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>