- also add `--foreground` which seems to fix some issue we are facing when
using timeout with `podman`.
- use this fact in the `is ceph running already?` task.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
nfs-ganesha v2.5 and 2.6 have hit EOL. Install nfs-ganesha v2.7
stable that is currently being maintained.
Signed-off-by: Ramana Raja <rraja@redhat.com>
With this, we could have multiple rgw instances on a single host
with a single run, don't have to use rgw-standalone.yml which does not
seems able to bind ports separately.
If you want to have multiple rgw instances, just change 'radosgw_instances'
to the number you want, which defaults to 1.
Not compatible with Multi-Site yet.
Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>
sometimes we play the whole role `ceph-defaults` just to access the
default value of some variables. It means we play the `facts.yml` part
in this role while it's not desired. Splitting this role will speedup
the playbook.
Closes: #3282
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We want to test podman on f29 non-atomic, atomic is not a hard
requirement. However, if you want to get podman then you will have to
install it first before running the playbook.
Signed-off-by: Sébastien Han <seb@redhat.com>
change default value of `radosgw_address` to keep consistency with
`monitor_address`.
Moreover, `ceph-validate` checks if the value is '0.0.0.0' to determine
if it has to run `check_eth_rgw.yml`.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600227
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This is not needed to play these tasks on nodes that are not in rgw
group.
Always playing this code makes `shrink_mon.yml` failing.
Typical error:
```
TASK [ceph-defaults : set_fact _radosgw_address to radosgw_interface - ipv4] ***
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-shrink_mon/roles/ceph-defaults/tasks/set_radosgw_address.yml:21
Thursday 22 November 2018 12:34:51 +0000 (0:00:00.154) 0:00:12.371 *****
fatal: [localhost]: FAILED! => {}
MSG:
The task includes an option with an undefined variable. The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute u'ansible_eth1'
```
Indeed, `radosgw_interface` is the network interface on rgw only. It is
expected that this same interface doesn't exist on `localhost`, so, when
running `shrink_mon.yml`, the role `ceph-defaults` is called in
`hosts: localhost` and causes the playbook to fail.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
It seems Atomic 7.5 has podman already, however this is an old version
(0.4). The podman integration is targetting RHEL 8, so Fedora is
currently the closest to that.
Signed-off-by: Sébastien Han <seb@redhat.com>
This is to add a granularity level.
We can have ceph specific variables that user shouldn't have to change
here.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Add real default value for osd pool size customization.
Ceph itself has an `osd_pool_default_size` default value to `3`.
If users don't specify a pool size in various pools definition within
ceph-ansible, we should default to `3`.
By the way, this kind of condition isn't really clear:
```
when:
- rbd_pool_size | default ("")
```
we should try to get the customized value then default to what is in
`osd_pool_default_size` (which has its default value pointing to
`ceph_osd_pool_default_size` (`3`) as well) and compare it to
`ceph_osd_pool_default_size`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
`osd_pool_default_pg_num` parameter is set in `ceph-mon`.
When using ceph-ansible with `--limit` on a specifc group of nodes, it
will fail when trying to access this variables since it wouldn't be
defined.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1518696
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
It is safer to use the list filter than the keys() method since the keys
method does have some interoperability issues between python2 and
python3 based ansible/jinja.
Signed-off-by: Boris Ranto <branto@redhat.com>
* The default value of osd_memory_target used by ceph is 4294967296 bytes,
so use the same as ceph-ansible default.
* Convert ansible_memtotal_mb to bytes to calculate osd_memory_target
Signed-off-by: Neha Ojha <nojha@redhat.com>
if firewalld.service systemd unit is masked, the handler will fail when
trying to restart it.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650281
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
So we don't have to loop over `_monitor_addresses` when we need the
monitor address of the current node being played.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
using consecutive set_fact in the playbook instead of complex jinja syntax
makes ceph.conf.j2 more readable.
By the way, jinja can be painful to debug at some point.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Currently a throw-away container is built to run ceph client
commands to setup users, pools & auth keys. This utilises
the same base ceph container which has all the ceph services
inside it.
This PR allows the use of a separate container if the deployer
wishes - but defaults to use the same full ceph container.
This can be used for different architectures or distributions,
which may support the the Ceph client, but not Ceph server,
and allows the deployer to build and specify a separate client
container if need be.
Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>
Liberty is no longer available in the UCA. The last available release there
is currently Queens.
Signed-off-by: Christian Berendt <berendt@betacloud-solutions.de>
As of now, we should no longer support Jewel in ceph-ansible.
The latest ceph-ansible release supporting Jewel is `stable-3.1`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Fixes the deprecation warning:
[DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of
using `result|search` use `result is search`.
Signed-off-by: Noah Watkins <nwatkins@redhat.com>
Check firewall isn't working as expected and might break deployments.
This part of the code will be reworked soon.
Let's focus on configure_firewall code for now.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1541840
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Allow user to choose between timesyncd, chronyd and ntpd
Installation will default to timesyncd since it is distributed as
part of the systemd installation for most distros.
Added note indicating NTP daemon type is not used for containerized
deployments.
Fixes issue #3086 on Github
Signed-off-by: Benjamin Cherian <benjamin_cherian@amat.com>
The role contains all the handlers for Ceph services. We decided to
leave ceph-defaults role with variables and a few facts only. This is
useful when organizing the site.yml files and also adding the known
variables to infrastructure-playbooks.
Signed-off-by: Sébastien Han <seb@redhat.com>
As per #1013 it appears that BS will soon use THP to lower TLB misses,
also disabling THP hasn't demonstrated any gains so far.
Closes: https://github.com/ceph/ceph-ansible/issues/1013
Signed-off-by: Sébastien Han <seb@redhat.com>
`+` is more idiomatic for "one or more" in a regex than `{1,}`; the
latter was introduced in a previous fix for an incorrect `{1,2}`
restriction.
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
After restarting each OSD, restart_osd_daemon.sh checks that the
cluster is in a good state before moving on to the next one. One of
the checks it does is that the number of pgs in the state
"active+clean" is equal to the total number of pgs in the cluster.
On large clusters (e.g. we have 173,696 pgs), it is likely that at
least one pg will be scrubbing and/or deep-scrubbing at any one
time. These pgs are in state "active+clean+scrubbing" or
"active+clean+scrubbing+deep", so the script was erroneously not
including them in the "good" count. Similar concerns apply to
"active+clean+snaptrim" and "active+clean+snaptrim_wait".
Fix this by considering as good any pg whose state contains
active+clean. Do this as an integer comparison to num_pgs in pgmap.
(could this be backported to at least stable-3.0 please?)
Closes: #2008
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
Previously RETRIES was set (by default to 40) once at the start of the
script; this meant that it would only ever wait for up to 40 lots of
30s across *all* the OSDs on a host before bombing out. In fact, we
want to be prepared to wait for the same amount of time after each OSD
restart for the clusters' pgs to be happy again before continuing.
Closes: #3154
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
BlueStore's cache is sized conservatively by default, so that it does
not overwhelm under-provisioned servers. The default is 1G for HDD, and
3G for SSD.
To replace the page cache, as much memory as possible should be given to
BlueStore. This is required for good performance. Since ceph-ansible
knows how much memory a host has, it can set
`bluestore cache size = max(total host memory / num OSDs on this host * safety
factor, 1G)`
Due to fragmentation and other memory use not included in bluestore's
cache, a safety factor of 0.5 for dedicated nodes and 0.2 for
hyperconverged nodes is recommended.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1595003
Signed-off-by: Neha Ojha <nojha@redhat.com>
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
let's add ansible_hostname as a default value for rgw_hostname if no
hostname in servicemap matches ansible_fqdn.
Fixes: #3063
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622505
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>