Web crawlers

Configuration to ban malicious Web crawlers. Here the idea is that most attackers will first try to scan what to attack on a server.

We stick to paths no unmalicious human should try by themselves.

List:

  • /.env
  • /password.txt
  • /passwords.txt
  • /config\.json
    • Rationale: .env and password(s).txt, config.json are often searched by bots, as they can contain sensitive information, such as database credentials. Do not include the third path if a client must retrieve a config.json file.
  • /info\.php
    • Rationale: info.pgp is a file often written for debugging purposes, which contains <?php phpinfo() ?>. This function exposes way too much information about the PHP environment, which is very useful when looking for security holes.
  • /wp-login\.php
  • /wp-includes
    • Rationale: Wordpress default authentication path. Do not include if you use Wordpress.
  • /owa/auth/logon.aspx
    • Rationale: Outlook authentication path. Do not include if Outlook is in use on your infrastructure.
  • /auth.html
  • /auth1.html
    • Rationale: I don't know what it is, but it has been tried by numerous bots on my webserver. Do not include if you use this path on your infrastructure.
  • /dns-query
    • Rationale: I don't know why they are looking for this, but it has been tried by numerous bots on my webserver. Do not include if use this path on your infrastructure.

(Feel free to add your own discoveries to this list!)

By adding (?:[^/" ]*/) at the beginning of each regex, we also cover all subpaths.

As a pattern, we'll use ip. See here.

Example:

{
  streams: {
    nginx: {
      cmd: ['...'], // see ./nginx.md
      filters: {
        slskd: {
          regex: [
            // (?:[^/" ]*/)* is a "non-capturing group" regex that allow for subpath(s)
            // example: /code/.env should be matched as well as /.env
            //           ^^^^^
            @'^<ip>.*"GET /(?:[^/" ]*/)*\.env ',
            @'^<ip>.*"GET /(?:[^/" ]*/)*password.txt ',
            @'^<ip>.*"GET /(?:[^/" ]*/)*passwords.txt ',
            @'^<ip>.*"GET /(?:[^/" ]*/)*config\.json ',
            @'^<ip>.*"GET /(?:[^/" ]*/)*info\.php ',
            @'^<ip>.*"GET /(?:[^/" ]*/)*wp-login\.php',
            @'^<ip>.*"GET /(?:[^/" ]*/)*wp-includes',
            @'^<ip>.*"GET /(?:[^/" ]*/)*owa/auth/logon.aspx ',
            @'^<ip>.*"GET /(?:[^/" ]*/)*auth.html ',
            @'^<ip>.*"GET /(?:[^/" ]*/)*auth1.html ',
            @'^<ip>.*"GET /(?:[^/" ]*/)*dns-query ',
          ],
          action: banFor('720h'),
        },
      },
    },
  },
}