Too many Invalid HTTP_HOST header exception errors

Have you deployed an application to production?

Are you getting too many, seemingly random Invalid HTTP_HOST header exception errors?

Background

I have recently deployed a Django app:

served via gunicorn/Nginx
onto an AWS Lightsail instance
with a CloudFlare certificate for https/SSL (followed this answer on StackOverflow for configuring it)

Following deployment I had a barrage of Invalid HTTP_HOST header error emails. Every. Single. Day.

I looked at the Nginx configuration. But it looked identical to other configurations I have in production. Configurations that do not have this kind of error.

The Nginx server block looks like:

upstream dbr_project {
  server unix:/home/ubuntu/[..]/gunicorn.sock fail_timeout=0;
}

server {
    listen 80 default_server;
    listen [::]:80 default_server;
    server_name subdomain.example.com;

    if ($http_x_forwarded_proto = "http") {
      return 301 https://$server_name$request_uri;
    }

    ...

}

On further inspection, these exceptions were being caused by bots. Example user agent strings:

HTTP_USER_AGENT = 'Mozilla/5.0 (compatible; Nimbostratus-Bot/v1.3.2; http://cloudsystemnetworks.com)'

HTTP_USER_AGENT = 'masscan/1.0 (https://github.com/robertdavidgraham/masscan)'

Even though /robots.txt is set up to prevent robots visiting all URLs. This is its content:

User-agent: *
Disallow: /

So the cause looks external. And the “dirty” fix I resorted to is to stop reporting this class of error.

Fix

Update: I revised this fix following the comments made below. What I applied to finally fix this is an Nginx more than a Django fix. I describe the Nginx change I made in the section “Revised Solution” below.

I’ve followed this answer on StackOverflow to stop reporting, or “suppress”, this error.

My previous LOGGING config (which I stripped down to a bare minimum for this post) looked like:

LOGGING = {
    "version": 1,
    "disable_existing_loggers": False,
    "handlers": {
        "mail_admins": {
            "level": "ERROR",
            "class": "django.utils.log.AdminEmailHandler",
        },
    },
    "loggers": {
        "django.request": {
            "handlers": ["console", "mail_admins"],
            "level": "ERROR",
            "propagate": False,
        },
    },
}

To stop reporting this error:

I’ve added a null handler; docs on LOGGING handlers here
set the handler for logging django.security.DisallowedHost exception to use this null handler

This is how the logging configuration looks like after the changes. The newly-added handler and logger are highlighted:

LOGGING = {
    "version": 1,
    "disable_existing_loggers": False,
    "handlers": {
        "mail_admins": {
            "level": "ERROR",
            "class": "django.utils.log.AdminEmailHandler",
        },
        "null": {
            "level": "DEBUG",
            "class": "logging.NullHandler",
        },
    },
    "loggers": {
        "django.security.DisallowedHost": {
            "handlers": ["null"],
            "propagate": False,
        },
        "django.request": {
            "handlers": ["console", "mail_admins"],
            "level": "ERROR",
            "propagate": False,
        },
    },
}

Final Thoughts

I’m not sure this is the best “solution”. In fact I regard it a “dirty fix” more than a solution.

Because ideally I wouldn’t be suppressing any django.security exception notifications.

Still it follows Django’s standard machinery to suppress logging a specific exception. Which is a good thing.

P.S. Sentry or similar can group this exception into one “error” page. Which in effect contains thousands of this exception’s occurrence. That does not count as a better approach 😊

Revised Solution

Take a look at the comments thread below. Bots are effecting requests without passing the correct host.

Solution is to enforce the domain subdomain.example.com using Nginx. This is the resulting Nginx config:

upstream subdomain_project {
  # fail_timeout=0 means we always retry an upstream even if it failed
  # to return a good HTTP response (in case the Unicorn master nukes a
  # single worker for timing out).
  server unix:/home/ubuntu/[..]/gunicorn.sock fail_timeout=0;
}

server {
  listen 80 default_server;
  listen [::]:80 default_server;

  if ($host !~* ^(subdomain.example.com)$ ) {
    return 444;
  }
}

server {
  server_name subdomain.example.com;
  ...
}

Notice how:

the domain subdomain.example.com is being handled by the second server block
unless the request’s host is subdomain.example.com, then HTTP 444 is returned.

Untangled Development

Too many Invalid HTTP_HOST header exception errors

Background

Fix

Final Thoughts

Revised Solution

Comments !