Web crawlers

Configuration to ban malicious Web crawlers. Here the idea is that most attackers will first try to scan what to attack on a server.

We stick to paths no unmalicious human should try by themselves.

List:

  • /.env
  • /password.txt
  • /passwords.txt
  • /config\.json
    • Rationale: .env and password(s).txt, config.json are often searched by bots, as they can contain sensitive information, such as database credentials. Do not include the third path if a client must retrieve a config.json file.
  • /info\.php
    • Rationale: info.pgp is a file often written for debugging purposes, which contains <?php phpinfo() ?>. This function exposes way too much information about the PHP environment, which is very useful when looking for security holes.
  • /wp-login\.php
  • /wp-includes
    • Rationale: Wordpress default authentication path. Do not include if you use Wordpress.
  • /owa/auth/logon.aspx
    • Rationale: Outlook authentication path. Do not include if Outlook is in use on your infrastructure.
  • /auth.html
  • /auth1.html
    • Rationale: I don't know what it is, but it has been tried by numerous bots on my webserver. Do not include if you use this path on your infrastructure.
  • /dns-query
    • Rationale: DOH (DNS Over HTTPS) standard path. Do not include if have a DOH server on your infrastructure.

(Feel free to add your own discoveries to this list!)

By adding (?:[^/" ]*/) at the beginning of each regex, we also cover all subpaths.

As a pattern, we'll use ip. See here.

Example:

{
  streams: {
    nginx: {
      cmd: ['...'], // see ./nginx.md
      filters: {
        slskd: {
          regex: [
            // (?:[^/" ]*/)* is a "non-capturing group" regex that allow for subpath(s)
            // example: /code/.env should be matched as well as /.env
            //           ^^^^^
            @'^<ip>.*"GET /(?:[^/" ]*/)*\.env ',
            @'^<ip>.*"GET /(?:[^/" ]*/)*password.txt ',
            @'^<ip>.*"GET /(?:[^/" ]*/)*passwords.txt ',
            @'^<ip>.*"GET /(?:[^/" ]*/)*config\.json ',
            @'^<ip>.*"GET /(?:[^/" ]*/)*info\.php ',
            @'^<ip>.*"GET /(?:[^/" ]*/)*wp-login\.php',
            @'^<ip>.*"GET /(?:[^/" ]*/)*wp-includes',
            @'^<ip>.*"GET /(?:[^/" ]*/)*owa/auth/logon.aspx ',
            @'^<ip>.*"GET /(?:[^/" ]*/)*auth.html ',
            @'^<ip>.*"GET /(?:[^/" ]*/)*auth1.html ',
            @'^<ip>.*"GET /(?:[^/" ]*/)*dns-query ',
          ],
          action: banFor('720h'),
        },
      },
    },
  },
}