How to list top 20 largest files in Unix hosted filespace

Last Updated on 6 October 2009 by gerry

I’m often disappointed to discover that Unix commands don’t have the options I expect them to have. Of course that is what the pipe is for, so having spent the time digging around to find out how to construct the required syntax using a series of commands, I’m going to document it for myself here.

The case in point: I’m teetering on the edge of my disk quota on my web/email hosting filespace – I want to use the du command show me the files that are the main culprits.

Problem #1: I just want files

My first issue is that du shows files and folders. There doesn’t appear to be an option to exclude directories, so it looks like I need to use find and -exec result into du, though I suspect this is a poor use of resources.

find . -type f -exec du -ka {} \;

Problem #2: just give me the worst offenders

To the rescue come sort -nr to sort the list of files numerically in reverse order and head -20 to display only the top 20 biggest files:

find . -type f -exec du -ka {} ; | sort -nr | head -20

Problem #3: that’s a bit slow if truth be told

My suspicions about efficiency appear to be well-founded as it takes more time than it really should to traverse the directory tree of my hosted websites. It is at this point I learn about the xargs command, something I’ve not come across till now. This replaces the -exec portion of the find command thus:

find . -type f |xargs du -ka {} | sort -nr | head -20

Problem #4: spaces, quotes can banjax things

All is well until find supplies filenames with spaces or quotes or other delimiters to xargs. Some of the biggest files are mailbox folders with names like “Sent Items”, so this is a bit of a show-stopper. man find suggests the -X option, but my mileage varies in a bad way with that. The -print0 option is suggested as an alternative and it does the trick in conjunction with the xargs -0 option.

find . -type f -print0|xargs -0 du -ka {} | sort -nr | head -20

I’m happy to settle with this, although there are some du errors reported:

du: {}: No such file or directory

Lastly problem #5: check files in multiple locations

I want to run the command in three places

  1. my home directory
  2. my web space
  3. my email space

On a C-Shell login I wasn’t sure how to concatenate the output from multiple commands. It turns out one just groups the commands in parentheses and separate them with semi-colons. At this point I want to digress and give my hosting company, pairNetworks, a big plug. They currently provide hosting for nearly 200,000 customers and in spite of (or because of) the scale of their operation, I have nothing but praise for their service and support. The webmaster package I use has ssh telnet access and on each occasion I have required support they have resolved my issues with speed and courtesy.

Ive been hosted by pair Networks for years and am a thoroughly happy customer.
I'm a thoroughly happy customer - I'm a goofy goober 🙂

Enough shameless plugging…so the final syntax of the command I need to remember is:

(find /usr/home/mulvenna -type f -print0;find /usr/www/users/mulvenna -type f -print0;find /usr/boxes/mulvenna -type f -print0)|xargs -0 du -ka {} | sort -nr | head -20

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Find me on Mastodon