The [group|host]_vars directories are ignored for the dashboard playbook
when the inventory file directory doesn't contain those directories.
Closes: #4601
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1761612
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Otherwise it fails like following:
```
TASK [ceph-mds : allow multimds] **************************************************************************************************************************************************
Monday 22 July 2019 16:37:38 +0800 (0:00:03.269) 0:13:25.651 ***********
fatal: [rhel7u6clone1]: FAILED! => {"msg": "The conditional check 'ceph_release_num[ceph_release] == ceph_release_num.luminous' failed. The error was: error while evaluating conditional (ceph_release_num[ceph_release] == ceph_release_num.luminous): 'dict object' has no attribute u'dummy'\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-mds/tasks/create_mds_filesystems.yml': line 43, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: allow multimds\n ^ here\n"}
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1645379
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The commit replaces the pv/vg/lv commands used with the ansible command
module by the lvg and lvol modules.
This also fixes the size of the second data LV because we were only using
50% of the remaining space instead of 100%.
With a 50G device, the result was:
- data-lv1 was 25G
- data-lv2 was 12.5G
Instead of:
- data-lv1 was 25G
- data-lv2 was 25G
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When using python3 the name of the rtslib rpm is python3-rtslib. The
packages that use rtslib already have code that detects the python
version and distro deps, so drop it from the ceph iscsi gw task list and
let the ceph-iscsi rpm dependency handle it.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1760930
Signed-off-by: Mike Christie <mchristi@redhat.com>
this task is a leftover and no longer needed.
It even causes bug when collocating nfs with mon.
Closes: #4609
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Due the 'failed_when: false' statement present in the peer task then
the playbook continues to ran even if the peer task was failing (like
incorrect remote peer format.
"stderr": "rbd: invalid spec 'admin@cluster1'"
This patch adds a task to list the peer present and add the peer only if
it's not already added. With this we don't need the failed_when statement
anymore.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1665877
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When the iscsi gateway or the ceph configuration file change then we
need to notify the rbd target api/gw services to be restarted.
This patch also merges the rbd-target-api and rbd-target-gw handler
into a single file and listen.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The common roles don't need to be executed again on each group plays
(like mons, osds, etc..).
We only need to execute them during the first play. That wat, we will
apply the changes on all nodes in parallel instead of doing it once per
group.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The is_atomic and container_binary facts are already defined in the
ceph-facts role so we don't need to have dedicated tasks for that
before the ceph-facts role exectution.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
There is no need to loop over all mgr nodes to set this fact, it's even
breaking deployments because it tries to copy all mgr keyring on all
mgr.
Closes: #4602
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The current ceph-validate role is using both validate action and fail
module tasks to validate the ceph configuration.
The validate action is based on the notario python library. When one of
the notario validation fails then a python stack trace is reported to the
ansible task. This output isn't understandable by users.
This patch removes the validate action and the notario depencendy. The
validation is now done with only fail ansible module.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1654790
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
[303] mktemp used in place of tempfile module
[602] Don't compare to empty string
[701] No 'galaxy_info' found
[702] Use 'galaxy_tags' rather than 'categories'
This patch also changes the ansible log_path value via the
ANSIBLE_LOG_PATH environment variable in the travis configuration to
avoid warnings.
[WARNING]: log file at /home/travis/ansible/ansible.log is not writeable
and we cannot create it, aborting
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We are using multiple listen topics with the handlers. That means that
we are notifying 4 tasks for each handler.
Instead we can group the listen on an include_tasks and based on the
group condition.
Before:
NOTIFIED HANDLER ceph-handler : set _mon_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy mon restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph mon daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _mon_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _osd_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy osd restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph osds daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _osd_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _mds_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy mds restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph mds daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _mds_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _rgw_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy rgw restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph rgw daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _rgw_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _mgr_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy mgr restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph mgr daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _mgr_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _rbdmirror_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy rbd mirror restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph rbd mirror daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _rbdmirror_handler_called after restart for mon0
After:
NOTIFIED HANDLER ceph-handler : mons handler for mon0
NOTIFIED HANDLER ceph-handler : osds handler for mon0
NOTIFIED HANDLER ceph-handler : mdss handler for mon0
NOTIFIED HANDLER ceph-handler : rgws handler for mon0
NOTIFIED HANDLER ceph-handler : mgrs handler for mon0
NOTIFIED HANDLER ceph-handler : rbdmirrors handler for mon0
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This is already done in the main playbooks but absent in the dashboard
playbook.
The facts are already gathered during the first play of the main
playbooks so we don't need to doing twice.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Delegating on remote node isn't necessary here since we are already
iterating over the right nodes.
Closes: #4518
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit removes some legacy tasks.
These tasks aren't needed, they cause the playbook to fail when
collocating daemons.
Closes: #4553
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit adds a validation task to prevent from installing an OSD on
the same disk as the OS.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623580
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
If the mgr dashboard doesn't restart fast enough then the inject
dashboard task will fail with a HTTP error 400.
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 914, in _handle_command
return self.handle_command(inbuf, cmd)
File "/usr/share/ceph/mgr/dashboard/module.py", line 450, in handle_command
push_local_dashboards()
File "/usr/share/ceph/mgr/dashboard/grafana.py", line 132, in push_local_dashboards
retry()
File "/usr/share/ceph/mgr/dashboard/grafana.py", line 89, in call
result = self.func(*self.args, **self.kwargs)
File "/usr/share/ceph/mgr/dashboard/grafana.py", line 127, in push
grafana.push_dashboard(body)
File "/usr/share/ceph/mgr/dashboard/grafana.py", line 54, in push_dashboard
response.raise_for_status()
File "/usr/lib/python2.7/site-packages/requests/models.py", line 834, in raise_for_status
raise HTTPError(http_error_msg, response=self)
HTTPError: 400 Client Error: Bad Request
Instead we can trigger this task before the module restart.
Closes: #4565
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
We don't have one global pipeline anymore reporting the jenkins job
status. So we need to match the CI scenario name.
This also add the Travis CI status as a condition.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When switching from a baremetal deployment to a containerized deployment
we only umount the OSD data partition.
If the OSD is encrypted (dmcrypt: true) then there's an additional
partition (part number 5) used for the lockbox and mount in the
/var/lib/ceph/osd-lockbox/ directory.
Because this partition isn't umount then the containerized OSD aren't
able to start. The partition is still mount by the system and can't be
remount from the container.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This commit moves this task in order to stop the nfs server service
regardless the deployment type desired (containerized or non
containerized).
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1508506
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The syntax here wasn't working, this refact fixes this task.
Also, removing the `ignore_errors: true` which was hidding the failure.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1508506
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This commit refacts the way we set `ceph_uid` fact in `ceph-facts` and
removes all `set_fact` tasks for `ceph_uid` in switch-to-containers playbook
to avoid duplicated code.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
We don't need to have dedicated variables for the RGW integration into
the Ceph Dashboard and need to be manually filled.
Instead we can use the current values from the RGW nodes by using the
IP and port from the first RGW instance of the first RGW node via the
radosgw_address and radosgw_frontend_port variables.
We don't need to specify all RGW nodes, this will be done automatically
with one node.
The RGW api scheme is using the radosgw_frontend_ssl_certificate variable
to determine if the value is http or https. This variable is also reuse
as a condition for the ssl verify task.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This patch moves the https dashboard configuration into a dedicated
block to avoid the multiple occurence of the dashboard_protocol
condition.
It also fixes the dashboard certificate and key variables handling in
the condition introduced by ab54fe2. Those variables aren't boolean but
strings so we can test them via the length filter.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This commit excludes client nodes from facts gathering, they are not
needed and can speed up this task.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The secondary vagrant variables didn't have the grafana vm variable
set which create an vagrant error.
There was an error loading a Vagrantfile. The file being loaded
and the error message are shown below. This is usually caused by
an invalid or undefined variable.
This patch also changes the ssh-extra-args parameter to ssh-common-args
to get the same values for ssh/sftp/scp. Otherwise we can see warnings
from ansible and some tasks are failing.
[WARNING]: sftp transfer mechanism failed on [mon0]. Use ANSIBLE_DEBUG=1
to see detailed information
It also updates the ssh-common-args value for the rgw-multisite scenario
to reflect the ANSIBLE_SSH_ARGS environment variable value.
Finally changing the IP addresses due to the Vagrant refact done in the
commit 778c51a
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Add missing tag on ceph-handler role call.
Otherwise, we can't use `--tags='ceph_update_config'` for updating the
ceph configuration file.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1754432
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The ceph dashboard tasks didn't use the cluster option if the cluster
name isn't the default value.
Closes: #4529
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
This commit merges the two restart tasks into a single one, this way
it's one task less to notify.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
The block section were used with the dashboard_enabled condition when
the code was included in the main playbooks.
Because this condition isn't present in the dashboard playbook anymore
we can remove the block section.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
When using the ansible --limit option on one or few OSD nodes and if the
handler is triggered then we will restart the OSD service on all OSDs
nodes instead of the hosts limited by the limit value.
Even if the play is limited by the --limit value we are using all OSD
nodes from the OSD group.
with_items: '{{ groups[osd_group_name] }}'
Instead we should iterate only on the nodes present in both OSD group and
limit list.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
e695efc introduced a regression in the _radosgw_address fact when using
the radosgw_address_block variable.
There's no item there because we don't use the items lookup. This is
only used for _monitor_address with monitor_address_block.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1758099
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
because of the current ip address assignation, it's not possible to
deploy more than 9 nodes per daemon type.
This commit refact a bit and allows us to get around this limitation.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
There is no need to get n * number of nodes the different keyrings.
Adding a `run_once: true` here avoid running a ceph command too many
times which could be impacting large cluster deployment.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
During the rolling_update scenario, the fsid value is retrieve from the
current ceph cluster configuration via the ceph daemon config command.
This command tries first to resolve the admin socket path via the
ceph-conf command.
Unfortunately this command won't work if you have a duplicate key in the
ceph configuration even if it only produces a warning. As a result the
task will fail.
Can't get admin socket path: unable to get conf option admin_socket for
mon.xxx: warning: line 13: 'osd_memory_target' in section 'osd' redefined
Instead of using ceph daemon we can use the --admin-daemon option
because we already know what the socket admin path value based on the
ceph cluster and mon hostname values.
Closes: #4492
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>