On the purpose of the www-data user
What exactly is the point of the www-data
user on Debian? Here’s what I found out when I tried to get to the bottom of it.
If you’re running the Debian GNU/Linux distro, and call cat /etc/passwd
you’ll see a list of all the users on the system. To list all groups you can call cat /etc/group
. Note that the user names correspond to a numerical user id (UID) and a group id (GID). In reality, it’s the UID and GID that’s used for calculating access levels.
When you run these commands you’ll notice that there’s a lot of users and groups configured for your system. And like me you might wonder, what exactly do they do?
Well worry no more! I was happy to learn that Debian ships with a README and explainer for this user and group setup. On your Debian system you can find it in the directory /usr/share/doc/base-passwd
. It can also be viewed online here.
Here’s the reasoning on why these users need to exist:
Some user ids (UIDs) and group ids (GIDs) are reserved globally for use by certain packages. Because some packages need to include files which are owned by these users or groups, or need the ids compiled into binaries, these ids must be used on any Debian system only for the purpose for which they are allocated.
When creating new users they’ll be given a UID and GID from some predefined ranges. Other ranges of UIDs and GIDs are globally reserved for specific users, by Debian policy. This means that packages can make use of the permission model for its files, without risk of a dynamically created user accidentally getting assigned a UID or GID which allows them access to the package’s files.
So, what about the www-user? #
From Debian’s base-passwd
package documentation we learn:
www-data: Some web servers run as www-data. Web content should not be owned by this user, or a compromised web server would be able to rewrite a web site. Data written out by web servers will be owned by www-data. (source)
According to the docs, a default installation of a web server on Debian might be running under the www-data
user. For example Debian’s php-fpm service will use the www-data
user as owner for a socket file, wherein commands piped to it will be run by PHP processes also owned by www-data
.
If a web server is compromised, the attacker might get control of the web server process which might be running under the www-data
user. The attacker can then try to do privilege escalation to get root access. If root access isn’t possible, the attacker might get far by merely exploiting the www-data
user’s privileges. Exactly how far an attacker might get by exploiting the www-data
user hinges on how well that user is locked down.
In practise you might grant www-data
only read access to your web content. And only if you need to support user uploaded content, you’ll grant write access to a select few folders.
That’s the gist of the www-data
user’s purpose. Truthfully, I’m still figuring out how to use it effectively, but at least now I know why it’s there and I learnt something new about Debian’s approach to permissions.
Commands for spelunking #
Just seeing the www-data
user listed in /etc/passwd
didn’t really tell me much about how it was being used on my system. So, I had to find ways to poke, prod and explore the system.
Here’s some commands I found useful for that.
find / -group GROUP
, will find all files owned byGROUP
. You might need to prependsudo
to givefind
access to traverse certain directories.find / -user USER
, works the same as above, but will find all files owned byUSER
.ps -u USER
orpstree USER
lists processes owned byUSER
.ps -ef
lists all running processes.pstree --compact-not
shows duplicate processes, which can be useful for seeing how much of the CPU certain processes might be hogging up.
systemctl list-unit-files
gives an overview of services and jobs installed on the current system.