monit: investigating tests again - using copilot on this one (#11255)
* add monit version to successful exit
* install the standard monit - if 5.34, then bail out
* add 3sec wait after service restart
- that restart happens exactly before the task receiving the SIGTERM, so maybe, just maybe, it just needs time to get ready for the party
* wait for monit initialisation after restart
* monit tests: check service-specific status in readiness wait
The wait task was checking 'monit status' (general), but the actual
failing command is 'monit status -B httpd_echo' (service-specific).
This causes a race where general status succeeds but service queries
fail. Update to check the exact command format that will be used.
* monit tests: remove 5.34.x version restriction
The version restriction was based on incorrect diagnosis. The actual
issue was the readiness check validating general status instead of
service-specific queries. Now that we check the correct command
format, the tests should work across all monit versions.
* monit tests: add stabilization delay after readiness check
After the readiness check succeeds, add a 1-second pause before
running actual tests. Monit 5.34.x and 5.35 appear to have a
concurrency issue where rapid successive 'monit status -B' calls
can cause hangs even though the first call succeeds.
* monit tests: add retry logic for state changes to handle monit daemon hangs
Monit daemon has an intermittent concurrency bug across versions 5.27-5.35
where 'monit status -B' commands can hang (receiving SIGTERM) even after
the daemon has successfully responded to previous queries. This appears
to be a monit daemon issue, not a timing problem.
Add retry logic with 2-second delays to the state change task to work
around these intermittent hangs. Skip retries if the failure is not
SIGTERM (rc=-15) to avoid masking real errors.
* monit tests: capture and display monit.log for debugging
Add tasks in the always block to capture and display the monit log file.
This will help diagnose the intermittent hanging issues by showing what
monit daemon was doing when 'monit status -B' commands hang.
* monit tests: enable verbose logging (-v flag)
Modify the monit systemd service to start with -v flag for verbose
logging. This should provide more detailed information in the monit
log about what's happening when status commands hang.
* monit: add 0.5s delay after state change command
After extensive testing and analysis with verbose logging enabled, identified
that monit's HTTP interface can become temporarily unresponsive immediately
after processing state change commands (stop, start, restart, etc.).
This manifests as intermittent SIGTERM (rc=-15) failures when the module
calls 'monit status -B <service>' to verify the state change. The issue
affects all monit versions tested (5.27-5.35) and is intermittent, suggesting
a race condition or brief lock in monit's HTTP request handling.
Verbose logging confirmed:
- State change commands complete successfully
- HTTP server reports as 'started'
- But subsequent status checks can hang without any log entry
Adding a 0.5 second sleep after sending state change commands gives the
monit daemon time to fully process the command and become responsive again
before the first status verification check.
This complements the existing readiness check after daemon restart and
the retry logic for SIGTERM failures in the tests.
* tests(monit): remove workarounds after module race condition fix
After 10+ successful CI runs with no SIGTERM failures, removing test-level
workarounds that are now redundant due to the 0.5s delay fix in the module:
- Remove 1-second stabilization pause after daemon restart
The module's built-in 0.5s delay after state changes makes this unnecessary
- Remove retry logic for SIGTERM failures in state change tests
The race condition is now prevented at the module level
- Remove verbose logging setup and log capture
Verbose mode didn't log HTTP requests, so it didn't help diagnose the issue
and adds unnecessary overhead
Kept the readiness check with retries after daemon restart - still needed
to validate daemon is responsive after service restart (different scenario
than the state change race condition).
* restore tasks/main.yml
* monit tests: reduce readiness check retries from 60 to 10
After successful CI runs, observed that monit daemon becomes responsive
within 1-2 seconds after restart. The readiness check typically passes
on the first attempt.
Reducing from 60 retries (30s timeout) to 10 retries (5s timeout) is
more appropriate and allows tests to fail faster if something is
genuinely broken.
* add changelog frag
* Update changelogs/fragments/11255-monit-integrationtests.yml
---------
(cherry picked from commit
|
||
|---|---|---|
| .azure-pipelines | ||
| .github | ||
| changelogs | ||
| docs/docsite | ||
| LICENSES | ||
| meta | ||
| plugins | ||
| tests | ||
| .git-blame-ignore-revs | ||
| .gitignore | ||
| .mypy.ini | ||
| .yamllint | ||
| antsibull-nox.toml | ||
| CHANGELOG.md | ||
| CHANGELOG.md.license | ||
| CHANGELOG.rst | ||
| CHANGELOG.rst.license | ||
| commit-rights.md | ||
| CONTRIBUTING.md | ||
| COPYING | ||
| galaxy.yml | ||
| noxfile.py | ||
| README.md | ||
| REUSE.toml | ||
| ruff.toml | ||
Community General Collection
This repository contains the community.general Ansible Collection. The collection is a part of the Ansible package and includes many modules and plugins supported by Ansible community which are not part of more specialized community collections.
You can find documentation for this collection on the Ansible docs site.
Please note that this collection does not support Windows targets. Only connection plugins included in this collection might support Windows targets, and will explicitly mention that in their documentation if they do so.
Code of Conduct
We follow Ansible Code of Conduct in all our interactions within this project.
If you encounter abusive behavior violating the Ansible Code of Conduct, please refer to the policy violations section of the Code of Conduct for information on how to raise a complaint.
Communication
-
Join the Ansible forum:
- Get Help: get help or help others. This is for questions about modules or plugins in the collection. Please add appropriate tags if you start new discussions.
- Tag
community-general: discuss the collection itself, instead of specific modules or plugins. - Social Spaces: gather and interact with fellow enthusiasts.
- News & Announcements: track project-wide announcements including social events.
-
The Ansible Bullhorn newsletter: used to announce releases and important changes.
For more information about communication, see the Ansible communication guide.
Tested with Ansible
Tested with the current ansible-core 2.17, ansible-core 2.18, ansible-core 2.19, ansible-core 2.20 releases and the current development version of ansible-core. Ansible-core versions before 2.17.0 are not supported. This includes all ansible-base 2.10 and Ansible 2.9 releases.
External requirements
Some modules and plugins require external libraries. Please check the requirements for each plugin or module you use in the documentation to find out which requirements are needed.
Included content
Please check the included content on the Ansible Galaxy page for this collection or the documentation on the Ansible docs site.
Using this collection
This collection is shipped with the Ansible package. So if you have it installed, no more action is required.
If you have a minimal installation (only Ansible Core installed) or you want to use the latest version of the collection along with the whole Ansible package, you need to install the collection from Ansible Galaxy manually with the ansible-galaxy command-line tool:
ansible-galaxy collection install community.general
You can also include it in a requirements.yml file and install it via ansible-galaxy collection install -r requirements.yml using the format:
collections:
- name: community.general
Note that if you install the collection manually, it will not be upgraded automatically when you upgrade the Ansible package. To upgrade the collection to the latest available version, run the following command:
ansible-galaxy collection install community.general --upgrade
You can also install a specific version of the collection, for example, if you need to downgrade when something is broken in the latest version (please report an issue in this repository). Use the following syntax where X.Y.Z can be any available version:
ansible-galaxy collection install community.general:==X.Y.Z
See Ansible Using collections for more details.
Contributing to this collection
The content of this collection is made by good people just like you, a community of individuals collaborating on making the world better through developing automation software.
We are actively accepting new contributors.
All types of contributions are very welcome.
You don't know how to start? Refer to our contribution guide!
The current maintainers are listed in the commit-rights.md file. If you have questions or need help, feel free to mention them in the proposals.
You can find more information in the developer guide for collections, and in the Ansible Community Guide.
Also for some notes specific to this collection see our CONTRIBUTING documentation.
Running tests
See here.
Collection maintenance
To learn how to maintain / become a maintainer of this collection, refer to:
It is necessary for maintainers of this collection to be subscribed to:
- The collection itself (the
Watchbutton →All Activityin the upper right corner of the repository's homepage). - The "Changes Impacting Collection Contributors and Maintainers" issue.
They also should be subscribed to Ansible's The Bullhorn newsletter.
Publishing New Version
See the Releasing guidelines to learn how to release this collection.
Release notes
See the changelog.
Roadmap
In general, we plan to release a major version every six months, and minor versions every two months. Major versions can contain breaking changes, while minor versions only contain new features and bugfixes.
See this issue for information on releasing, versioning, and deprecation.
More information
- Ansible Collection overview
- Ansible User guide
- Ansible Developer guide
- Ansible Community code of conduct
Licensing
This collection is primarily licensed and distributed as a whole under the GNU General Public License v3.0 or later.
See LICENSES/GPL-3.0-or-later.txt for the full text.
Parts of the collection are licensed under the BSD 2-Clause license and the MIT license.
All files have a machine readable SDPX-License-Identifier: comment denoting its respective license(s) or an equivalent entry in an accompanying .license file. Only changelog fragments (which will not be part of a release) are covered by a blanket statement in REUSE.toml. This conforms to the REUSE specification.