Improve README/Docs, Minor fix

This commit is contained in:
Michael Höß 2022-02-23 12:30:36 +01:00
parent d4c130360b
commit 825701e285
3 changed files with 101 additions and 14 deletions

101
README.md
View File

@ -1,14 +1,20 @@
# Jobwatch # Jobwatch
## (C) 2022, Michael Höß, MIT-Licensed > (C) 2022, Michael Höß, hoess@gmx.net, MIT-Licensed
### What is jobwatch ## What is jobwatch
If you just want to execute a simple shell-script via cron every couple of hour, If you just want to execute a simple shell-script via cron every couple of hour,
often only a few lines of code are required. often only a few lines of code are required.
But if you want to ensure the script really run, in the required interval, completed But if you want to ensure
without error, you have to add a lot of plumping code. - the script really run
- in the required interval
- completed without error/with the expected exitcode
- has the expected output
- only one instances executes in parallel
and want all this to be monitored with your CheckMK, you have to add a lot of plumping code.
jobwatch helps to migitate this problem. It wraps the execution of your script, and does jobwatch helps to migitate this problem. It wraps the execution of your script, and does
all the rest for you, by providing output, which can be fed to CheckMK by the all the rest for you, by providing output, which can be fed to CheckMK by the
@ -25,9 +31,90 @@ jobwatch
Simply add a .job-file into /etc/jobwatch.d where you defines all of this, Simply add a .job-file into /etc/jobwatch.d where you defines all of this,
call you script via jobwatch -j job in the crontab and you are done. call you script via jobwatch -j job in the crontab and you are done.
TBD: ## Sample-Job `/etc/jobwatch.d/sample.job.yml`:
- document deployment ```
- document job-file (see included sample) cmd: /usr/bin/w
args:
- -i
exitcode_map:
- from: 23 # Map
to: 3
- from: -1 # Map all nonzero-exit codes to 1=WARN
to: 1
- from: 0 # Map exitcode 0 also to 1=WARN
to: 1
log_matches:
- regex: .*192.168.*.*
state: 1
- regex: ".*tty[0-9].+-.*" # 2=CRIT when sombody is logged on the console
state: 2
alt_msg: "%v -> a user is logged in at console"
- regex: .+ # Include All other lines as OK
state: 0
hide_output: False # Dont't hide output, but pass through
last_run_warn: # How often is this job to be expected to be executed
val: 8
unit: "h"
last_run_crit:
val: 16
```
> Note: job-names may only consists of chars `a-z`,`0-9` and `- `
The config-config dir is When run as
- root: `/etc/jobwatch.d`
- user `$HOME/etc/jobwatch.d`
## Deploying jobwatch
- Compile the go program, if required, the src-dir: `go get .;go build .`
- `cp jobwatch /usr/loca/bin`
- `ln -s /usr/local/bin /usr/lib/check_agent/plugins`
On the check-mk server just deploy the provede .mkp, no further
config there is currently needed
## Invoking Jobwatch
```
jobwatch -h
Usage of jobwatch:
-d Debug
-i string
JobId Instance. Default ''
-j string
JobId. reads $jobDir/$job.job.yml Default ''
-jd string
JobDir. Default: $HOME/etc/jobwatch.d /etc/jobwatch.d'
-- After this parameter every further parameter is passed to the
called program
Without any parameters CheckMK-Agent-Output is generated
```
Example1 from an crontab
```
10 9 * * 0 root /usr/local/bin/jobwatch -j bu-dupl
```
Example2 from an crontab, here we use the same job for
different instances. (In this e.g. we have backup-sets)
```
10 5 * * 0,3 root /usr/local/bin/jobwatch -j bu-borg -i b01-main -- /opt/borg/B01_main.borgjob
10 5 * * 1,4 root /usr/local/bin/jobwatch -j bu-borg -i b10-dvp -- /opt/borg/B10_dvp.borgjob
```
### Instances/Multiuser
See "Invoking jobwatch above"
> Note instance-names have the same restrictions as job-names
When using instances, the final job name in the output is `[userid]/[job-name]_[instance-name]`, otherwise only `[userid]/[job-name]`
## TBD:
- redirect script-output to a logfile via Config - redirect script-output to a logfile via Config
- job-timeout - job-timeout
- more metrics (last-success-ful run ) - more metrics (last-success-ful run )

View File

@ -4,20 +4,20 @@ args:
exitcode_map: exitcode_map:
- from: 23 - from: 23
to: 3 to: 3
- from: -1 - from: -1 # Map all nonzero-exit codes to 1=WARN
to: 1 to: 1
- from: 0 - from: 0 # Map exitcode 0 also to 1=WARN
to: 1 to: 1
log_matches: log_matches:
- regex: .*192.168.*.* - regex: .*192.168.*.*
state: 1 state: 1
- regex: "-" - regex: ".*tty[0-9].+-.*" # 2=CRIT when sombody is logged on the console
state: 2 state: 2
alt_msg: "%v -> a user is logged in at console" alt_msg: "%v -> a user is logged in at console"
- regex: .+ - regex: .+ # Include All other lines as OK
state: 0 state: 0
hide_output: False hide_output: False # Dont't hide output, but pass through
last_run_warn: last_run_warn: # How often is this job to be expected to be executed
val: 8 val: 8
unit: "h" unit: "h"
last_run_crit: last_run_crit:

View File

@ -50,7 +50,7 @@ func LoadJob(jobdir string, jobid string) (*Job, error) {
dirs = append(dirs, jobdir) dirs = append(dirs, jobdir)
} else { } else {
if hd, err := os.UserHomeDir(); err == nil { if hd, err := os.UserHomeDir(); err == nil {
dirs = append(dirs, filepath.FromSlash(hd)+"/jobwatch.d") dirs = append(dirs, filepath.FromSlash(hd)+"/etc/jobwatch.d")
} }
dirs = append(dirs, "/etc/jobwatch.d") dirs = append(dirs, "/etc/jobwatch.d")
} }