DROID (Digital Record and Object Identification) is an open-source tool developed by the UK National Archives to batch identify file formats.
Point the program at one or more folders, it scans them (and, optionally, any subfolders) and produces reports on the contents: the file types, space used and so on.
While that sounds familiar, DROID goes much further than most of the competition. Other programs might just assume that a PDF is a Portable Document File, but DROID examines the contents, gives you the PDF version if the extension is correct, and tells you the real file format if it isn't.
DROID takes an equally thorough approach with archives, peeking inside to identify the real types of their contents.
The results of the scan are displayed in a list view which follows the structure of your folder tree. That is, you'll see a table listing the contents of the root folder - name, size, last modified date, format, version, MIME type, hash, and more - along with folders which can be expanded to view their contents.
You can filter the view by any combination of these fields, for example to view only MP4s greater than 20MB modified in the last year.
The data can also be summarised in text reports (which take forever to produce and are horribly basic), or exported to CSV files for more analysis elsewhere.
Now includes functionality to process the contents of ARC and WARC (web archive) files
Added SHA1 to the hash algorithms available when profiling
Filtering is now case-insensitive on file name, extension and format name
Fix for intermittent skipping of OLE2 containers due to memory pressure (github issue #67) (such cases now logged)
Minor updates to help pages
DROID is slow at scanning and report generation, and very poor at file visualisation. Other tools give you colorful graphs and allow you to drill down with a click, but here you must build filters and wait to see static text reports.
DROID does stand out for its ability to identify file formats by contents, and if that sounds useful, give it a try. But if you just want to see which folders are hogging most of your hard drive space, you'll get better results elsewhere.