systemd monitoring

We want to do an action (send an alert) when a systemd unit fails.

It's easy: we want to follow journalctl logs, but only messages with the syslog identifier systemd:

journalctl -t systemd
Oct 31 00:00:00 hostname systemd[1]: logrotate.service: Deactivated successfully.
Oct 31 00:00:00 hostname systemd[1]: Finished Logrotate Service.
Oct 31 00:00:00 hostname systemd[1]: systemd-tmpfiles-create.service: Deactivated successfully.
Oct 31 00:00:00 hostname systemd[1]: systemd-tmpfiles-create.service: Consumed 70ms CPU time, 2.9M memory peak, 1.5M read from disk.
Oct 31 00:00:01 hostname systemd[1]: backup.service: Failed with result 'exit-code'.

We're interested by lines like the last one, which indicate that a unit failed.

We don't need all the metadata, so we add the -o cat option.

journalctl -t systemd -o cat
logrotate.service: Deactivated successfully.
Finished Logrotate Service.
systemd-tmpfiles-create.service: Deactivated successfully.
systemd-tmpfiles-create.service: Consumed 70ms CPU time, 2.9M memory peak, 1.5M read from disk.
backup.service: Failed with result 'exit-code'.

We need a Pattern that will match a systemd unit:

{
  patterns: {
    unit: {
      regex: @'[a-zA-Z0-9\-_@]+\.(:?automount|mount|scope|service|slice|socket|path|target|timer)\b',
      // Optionnally ignore units that you don't want to monitor:
      // ignore: ["buggy-job.service"],
    }
  }
}

We can now create the corresponding Stream and Filter:

{
  streams: {
    systemd: {
      cmd: ['journalctl', '-fn0', '-o', 'cat', '-t', 'systemd'],
      filters: {
        failedunit: {
          regex: [@'^<unit>: Failed with result'],
          actions: {
            // Add an action
          }
        }
      }
    }
  }
}

See Example Actions for inspiration on how to send an alert.