Web crawlers

Configuration to ban malicious Web crawlers. Here the idea is that most attackers will first try to scan what to attack on a server.

We stick to paths no unmalicious human should try by themselves.

List:

  • /.env
  • /password.txt
  • /passwords.txt
  • /config\.json
    • Rationale: .env and password(s).txt, config.json are often searched by bots, as they can contain sensitive information, such as database credentials. Do not include the third path if a client must retrieve a config.json file.
  • /info\.php
    • Rationale: info.pgp is a file often written for debugging purposes, which contains <?php phpinfo() ?>. This function exposes way too much information about the PHP environment, which is very useful when looking for security holes.
  • /wp-login\.php
  • /wp-includes
    • Rationale: Wordpress default authentication path. Do not include if you use Wordpress.
  • /owa/auth/logon.aspx
    • Rationale: Outlook authentication path. Do not include if Outlook is in use on your infrastructure.
  • /auth.html
  • /auth1.html
    • Rationale: I don't know what it is, but it has been tried by numerous bots on my webserver. Do not include if you use this path on your infrastructure.
  • /dns-query
    • Rationale: DOH (DNS Over HTTPS) standard path. Do not include if have a DOH server on your infrastructure.

(Feel free to add your own discoveries to this list!)

By adding (?:[^/" ]*/) at the beginning of each regex, we also cover all subpaths.

As a pattern, we'll use ip. See here.

Example:

{ streams: { nginx: { cmd: ['...'], // see ./nginx.md filters: { slskd: { regex: [ // (?:[^/" ]*/)* is a "non-capturing group" regex that allow for subpath(s) // example: /code/.env should be matched as well as /.env // ^^^^^ @'^<ip>.*"GET /(?:[^/" ]*/)*\.env ', @'^<ip>.*"GET /(?:[^/" ]*/)*password.txt ', @'^<ip>.*"GET /(?:[^/" ]*/)*passwords.txt ', @'^<ip>.*"GET /(?:[^/" ]*/)*config\.json ', @'^<ip>.*"GET /(?:[^/" ]*/)*info\.php ', @'^<ip>.*"GET /(?:[^/" ]*/)*wp-login\.php', @'^<ip>.*"GET /(?:[^/" ]*/)*wp-includes', @'^<ip>.*"GET /(?:[^/" ]*/)*owa/auth/logon.aspx ', @'^<ip>.*"GET /(?:[^/" ]*/)*auth.html ', @'^<ip>.*"GET /(?:[^/" ]*/)*auth1.html ', @'^<ip>.*"GET /(?:[^/" ]*/)*dns-query ', ], action: banFor('720h'), }, }, }, }, }