jobwatch/README.md

3.5 KiB

Jobwatch

(C) 2022, Michael Höß, hoess@gmx.net, MIT-Licensed

What is jobwatch

If you just want to execute a simple shell-script via cron every couple of hour, often only a few lines of code are required.

But if you want to ensure

  • the script really run
  • in the required interval
  • completed without error/with the expected exitcode
  • has the expected output
  • only one instances executes in parallel

and want all this to be monitored with your CheckMK, you have to add a lot of plumbing code.

jobwatch helps to migitate this problem. It wraps the execution of your script, and does all the rest for you, by providing output, which can be fed to CheckMK by the Agent-Plugin-mechanism.

jobwatch

  • checks for required exit-codes, and maps them to OK, WARN, CRIT, UNKN (0-3)
  • searches through the console-output of your script via regex to classify certain keywords as OK, WARN or CRIT (0, 1, 2)
  • sends previous regex matches to logwatch
  • prevents running your job in multiple instances (previous job took longer than expected)
  • let you define the required run-intervals

Simply add a .job-file into /etc/jobwatch.d where you defines all of this, call you script via jobwatch -j job in the crontab and you are done.

Sample-Job /etc/jobwatch.d/sample.job.yml:

cmd: /usr/bin/w
args:
  - -i
exitcode_map:
  - from: 23    # Map 
    to: 3
  - from: -1    # Map all nonzero-exit codes to 1=WARN
    to: 1
  - from: 0     # Map exitcode 0 also to 1=WARN
    to: 1    
log_matches:
  - regex: .*192.168.*.*
    state: 1
  - regex: ".*tty[0-9].+-.*" # 2=CRIT when sombody is logged on the console
    state: 2
    alt_msg: "%v -> a user is logged in at console"
  - regex: .+               # Include All other lines as OK
    state: 0    
hide_output: False          # Dont't hide output, but pass through
last_run_warn:              # How often is this job to be expected to be executed
  val: 8
  unit: "h"
last_run_crit: 
  val: 16

Note: job-names may only consists of chars a-z,0-9 and -

The config-config dir is When run as

  • root: /etc/jobwatch.d
  • user $HOME/etc/jobwatch.d

Deploying jobwatch

  • Compile the go program, if required, the src-dir: go get .;go build .
  • cp jobwatch /usr/local/bin
  • ln -s /usr/local/bin/jobwatch /usr/lib/check_agent/plugins

On the check-mk server just deploy the provede .mkp, no further config there is currently needed

Invoking Jobwatch

jobwatch -h
Usage of jobwatch:
  -d    Debug
  -i string
        JobId Instance. Default ''
  -j string
        JobId. reads $jobDir/$job.job.yml Default ''
  -jd string
        JobDir. Default: $HOME/etc/jobwatch.d /etc/jobwatch.d'
  -- After this parameter every further parameter is passed to the
     called program

Without any parameters CheckMK-Agent-Output is generated

Example1 from an crontab

10 9 * * 0       root   /usr/local/bin/jobwatch -j bu-dupl

Example2 from an crontab, here we use the same job for different instances. (In this e.g. we have backup-sets)

10 5 * * 0,3 root       /usr/local/bin/jobwatch -j bu-borg -i b01-main -- /opt/borg/B01_main.borgjob
10 5 * * 1,4 root       /usr/local/bin/jobwatch -j bu-borg -i b10-dvp -- /opt/borg/B10_dvp.borgjob

Instances/Multiuser

See "Invoking jobwatch above"

Note instance-names have the same restrictions as job-names

When using instances, the final job name in the output is [userid]/[job-name]_[instance-name], otherwise only [userid]/[job-name]

TBD:

  • redirect script-output to a logfile via Config
  • job-timeout
  • more metrics (last-success-ful run )