A post summarizing a few common uses for du to get an approximate folder/file size
The full documentation “man du” is here
https://linux.die.net/man/1/du
A few common options
(1) -h human-readable format, you almost always want this option on
(2) -c get the full size
for example to get the full size of a directory
du -ch | grep total
(3) -s get summarize
for example, to get the size of all the folders in the current directory without showing the size of the subfolders
du -sh ./*
(4) Sometimes the size is inconsistent with ls -l
“ls” gives the “logical size”, “du” gives the physical size.
“ls” sometimes can be larger than “du” size because it only shows the size of the data, not the indirect nodes that stores pointers to the data.
“ls” sometimes can be larger than “du” (especially on sparse data), because there the empty blocks with “0” values are not counted with “du”. To see the full size, do “du -ch –apparent-size”.
from the man du page
(–apparent-siz print apparent sizes, rather than disk usage; although the apparent size is usually smaller, it may be larger due to holes in (`sparse’) files, internal fragmentation, indirect blocks, and the like )
For example, a sparse graph data I copied over shows that it is 27 GB with “ls -l”, and 17GB with “du -ch”. But with “du -ch –apparent-size”, it comes to 27 GB.
Good links to read on
- https://www.linuxquestions.org/questions/linux-newbie-8/why-is-du-command-showing-incorrect-results-4175528726/
- https://unix.stackexchange.com/questions/94386/wrong-output-of-du
- http://dysphoria.net/OperatingSystems1/5_file_allocation_unix.html
- http://dysphoria.net/OperatingSystems1/5_file_allocation_index.html (indexed allocation used in Linux)
Indexed allocation used in Linux
Another possibility is to store all the pointers to a file’s blocks in a single array, rather like a page table lists all of the frames in which a process’s pages are stored. The array of disk blocks could be stored directly in the directory entry or inode, or could be stored in a disk block by itself. Advantages, allow random access (with two levels of indexing, need only three accesses to read a block—fewer for smaller files). Doesn’t require a FAT—so suitable for very large disks.
Graphic demonstrations of node, indirect blocks, single indirect blocks, and more.