3.5 KiB
Jobwatch
(C) 2022, Michael Höß, hoess@gmx.net, MIT-Licensed
What is jobwatch
If you just want to execute a simple shell-script via cron every couple of hour, often only a few lines of code are required.
But if you want to ensure
- the script really run
- in the required interval
- completed without error/with the expected exitcode
- has the expected output
- only one instances executes in parallel
and want all this to be monitored with your CheckMK, you have to add a lot of plumbing code.
jobwatch helps to migitate this problem. It wraps the execution of your script, and does all the rest for you, by providing output, which can be fed to CheckMK by the Agent-Plugin-mechanism.
jobwatch
- checks for required exit-codes, and maps them to OK, WARN, CRIT, UNKN (0-3)
- searches through the console-output of your script via regex to classify certain keywords as OK, WARN or CRIT (0, 1, 2)
- sends previous regex matches to logwatch
- prevents running your job in multiple instances (previous job took longer than expected)
- let you define the required run-intervals
Simply add a .job-file into /etc/jobwatch.d where you defines all of this, call you script via jobwatch -j job in the crontab and you are done.
Sample-Job /etc/jobwatch.d/sample.job.yml:
cmd: /usr/bin/w
args:
- -i
exitcode_map:
- from: 23 # Map
to: 3
- from: -1 # Map all nonzero-exit codes to 1=WARN
to: 1
- from: 0 # Map exitcode 0 also to 1=WARN
to: 1
log_matches:
- regex: .*192.168.*.*
state: 1
- regex: ".*tty[0-9].+-.*" # 2=CRIT when sombody is logged on the console
state: 2
alt_msg: "%v -> a user is logged in at console"
- regex: .+ # Include All other lines as OK
state: 0
hide_output: False # Dont't hide output, but pass through
last_run_warn: # How often is this job to be expected to be executed
val: 8
unit: "h"
last_run_crit:
val: 16
Note: job-names may only consists of chars
a-z,0-9and-
The config-config dir is When run as
- root:
/etc/jobwatch.d - user
$HOME/etc/jobwatch.d
Deploying jobwatch
- Compile the go program, if required, the src-dir:
go get .;go build . cp jobwatch /usr/local/binln -s /usr/local/bin/jobwatch /usr/lib/check_agent/plugins
On the check-mk server just deploy the provede .mkp, no further config there is currently needed
Invoking Jobwatch
jobwatch -h
Usage of jobwatch:
-d Debug
-i string
JobId Instance. Default ''
-j string
JobId. reads $jobDir/$job.job.yml Default ''
-jd string
JobDir. Default: $HOME/etc/jobwatch.d /etc/jobwatch.d'
-- After this parameter every further parameter is passed to the
called program
Without any parameters CheckMK-Agent-Output is generated
Example1 from an crontab
10 9 * * 0 root /usr/local/bin/jobwatch -j bu-dupl
Example2 from an crontab, here we use the same job for different instances. (In this e.g. we have backup-sets)
10 5 * * 0,3 root /usr/local/bin/jobwatch -j bu-borg -i b01-main -- /opt/borg/B01_main.borgjob
10 5 * * 1,4 root /usr/local/bin/jobwatch -j bu-borg -i b10-dvp -- /opt/borg/B10_dvp.borgjob
Instances/Multiuser
See "Invoking jobwatch above"
Note instance-names have the same restrictions as job-names
When using instances, the final job name in the output is [userid]/[job-name]_[instance-name], otherwise only [userid]/[job-name]
TBD:
- redirect script-output to a logfile via Config
- job-timeout
- more metrics (last-success-ful run )