122 lines
3.5 KiB
Markdown
122 lines
3.5 KiB
Markdown
# Jobwatch
|
|
|
|
> (C) 2022, Michael Höß, hoess@gmx.net, MIT-Licensed
|
|
|
|
## What is jobwatch
|
|
|
|
If you just want to execute a simple shell-script via cron every couple of hour,
|
|
often only a few lines of code are required.
|
|
|
|
But if you want to ensure
|
|
- the script really run
|
|
- in the required interval
|
|
- completed without error/with the expected exitcode
|
|
- has the expected output
|
|
- only one instances executes in parallel
|
|
|
|
and want all this to be monitored with your CheckMK, you have to add a lot of plumping code.
|
|
|
|
jobwatch helps to migitate this problem. It wraps the execution of your script, and does
|
|
all the rest for you, by providing output, which can be fed to CheckMK by the
|
|
Agent-Plugin-mechanism.
|
|
|
|
jobwatch
|
|
- checks for required _exit-codes_, and maps them to _OK, WARN, CRIT, UNKN (0-3)_
|
|
- _searches_ through the _console-output_ of your script via _regex_ to classify certain
|
|
keywords as OK, WARN or CRIT (0, 1, 2)
|
|
- sends previous regex matches to _logwatch_
|
|
- prevents running your job in multiple instances (previous job took longer than expected)
|
|
- let you define the required run-intervals
|
|
|
|
Simply add a .job-file into /etc/jobwatch.d where you defines all of this,
|
|
call you script via jobwatch -j job in the crontab and you are done.
|
|
|
|
## Sample-Job `/etc/jobwatch.d/sample.job.yml`:
|
|
```
|
|
cmd: /usr/bin/w
|
|
args:
|
|
- -i
|
|
exitcode_map:
|
|
- from: 23 # Map
|
|
to: 3
|
|
- from: -1 # Map all nonzero-exit codes to 1=WARN
|
|
to: 1
|
|
- from: 0 # Map exitcode 0 also to 1=WARN
|
|
to: 1
|
|
log_matches:
|
|
- regex: .*192.168.*.*
|
|
state: 1
|
|
- regex: ".*tty[0-9].+-.*" # 2=CRIT when sombody is logged on the console
|
|
state: 2
|
|
alt_msg: "%v -> a user is logged in at console"
|
|
- regex: .+ # Include All other lines as OK
|
|
state: 0
|
|
hide_output: False # Dont't hide output, but pass through
|
|
last_run_warn: # How often is this job to be expected to be executed
|
|
val: 8
|
|
unit: "h"
|
|
last_run_crit:
|
|
val: 16
|
|
```
|
|
|
|
> Note: job-names may only consists of chars `a-z`,`0-9` and `- `
|
|
|
|
The config-config dir is When run as
|
|
- root: `/etc/jobwatch.d`
|
|
- user `$HOME/etc/jobwatch.d`
|
|
|
|
## Deploying jobwatch
|
|
- Compile the go program, if required, the src-dir: `go get .;go build .`
|
|
- `cp jobwatch /usr/loca/bin`
|
|
- `ln -s /usr/local/bin /usr/lib/check_agent/plugins`
|
|
|
|
On the check-mk server just deploy the provede .mkp, no further
|
|
config there is currently needed
|
|
|
|
## Invoking Jobwatch
|
|
|
|
```
|
|
jobwatch -h
|
|
Usage of jobwatch:
|
|
-d Debug
|
|
-i string
|
|
JobId Instance. Default ''
|
|
-j string
|
|
JobId. reads $jobDir/$job.job.yml Default ''
|
|
-jd string
|
|
JobDir. Default: $HOME/etc/jobwatch.d /etc/jobwatch.d'
|
|
-- After this parameter every further parameter is passed to the
|
|
called program
|
|
|
|
Without any parameters CheckMK-Agent-Output is generated
|
|
```
|
|
|
|
Example1 from an crontab
|
|
```
|
|
10 9 * * 0 root /usr/local/bin/jobwatch -j bu-dupl
|
|
```
|
|
|
|
Example2 from an crontab, here we use the same job for
|
|
different instances. (In this e.g. we have backup-sets)
|
|
```
|
|
10 5 * * 0,3 root /usr/local/bin/jobwatch -j bu-borg -i b01-main -- /opt/borg/B01_main.borgjob
|
|
10 5 * * 1,4 root /usr/local/bin/jobwatch -j bu-borg -i b10-dvp -- /opt/borg/B10_dvp.borgjob
|
|
```
|
|
|
|
### Instances/Multiuser
|
|
|
|
See "Invoking jobwatch above"
|
|
|
|
> Note instance-names have the same restrictions as job-names
|
|
|
|
When using instances, the final job name in the output is `[userid]/[job-name]_[instance-name]`, otherwise only `[userid]/[job-name]`
|
|
|
|
|
|
|
|
|
|
## TBD:
|
|
- redirect script-output to a logfile via Config
|
|
- job-timeout
|
|
- more metrics (last-success-ful run )
|
|
|