On the purpose of the www-data user

What exactly is the point of the www-data user on Debian? Here’s what I found out when I tried to get to the bottom of it.

If you’re running the Debian GNU/Linux distro, and call cat /etc/passwd you’ll see a list of all the users on the system. To list all groups you can call cat /etc/group. Note that the user names correspond to a numerical user id (UID) and a group id (GID). In reality, it’s the UID and GID that’s used for calculating access levels.

When you run these commands you’ll notice that there’s a lot of users and groups configured for your system. And like me you might wonder, what exactly do they do?

Well worry no more! I was happy to learn that Debian ships with a README and explainer for this user and group setup. On your Debian system you can find it in the directory /usr/share/doc/base-passwd. It can also be viewed online here.

Here’s the reasoning on why these users need to exist:

Some user ids (UIDs) and group ids (GIDs) are reserved globally for use by certain packages. Because some packages need to include files which are owned by these users or groups, or need the ids compiled into binaries, these ids must be used on any Debian system only for the purpose for which they are allocated.

When creating new users they’ll be given a UID and GID from some predefined ranges. Other ranges of UIDs and GIDs are globally reserved for specific users, by Debian policy. This means that packages can make use of the permission model for its files, without risk of a dynamically created user accidentally getting assigned a UID or GID which allows them access to the package’s files.

So, what about the www-user? #

From Debian’s base-passwd package documentation we learn:

www-data: Some web servers run as www-data. Web content should not be owned by this user, or a compromised web server would be able to rewrite a web site. Data written out by web servers will be owned by www-data. (source)

According to the docs, a default installation of a web server on Debian might be running under the www-data user. For example Debian’s php-fpm service will use the www-data user as owner for a socket file, wherein commands piped to it will be run by PHP processes also owned by www-data.

If a web server is compromised, the attacker might get control of the web server process which might be running under the www-data user. The attacker can then try to do privilege escalation to get root access. If root access isn’t possible, the attacker might get far by merely exploiting the www-data user’s privileges. Exactly how far an attacker might get by exploiting the www-data user hinges on how well that user is locked down.

In practise you might grant www-data only read access to your web content. And only if you need to support user uploaded content, you’ll grant write access to a select few folders.

That’s the gist of the www-data user’s purpose. Truthfully, I’m still figuring out how to use it effectively, but at least now I know why it’s there and I learnt something new about Debian’s approach to permissions.

Commands for spelunking #

Just seeing the www-data user listed in /etc/passwd didn’t really tell me much about how it was being used on my system. So, I had to find ways to poke, prod and explore the system.

Here’s some commands I found useful for that.

  • find / -group GROUP, will find all files owned by GROUP. You might need to prepend sudo to give find access to traverse certain directories.
  • find / -user USER, works the same as above, but will find all files owned by USER.
  • ps -u USER or pstree USER lists processes owned by USER.
    • ps -ef lists all running processes.
    • pstree --compact-not shows duplicate processes, which can be useful for seeing how much of the CPU certain processes might be hogging up.
  • systemctl list-unit-files gives an overview of services and jobs installed on the current system.