Saturday, August 13, 2005

Large directories and NTFS

Use case:

  1. Create new directory.
  2. Put on the order of 55 thousand files in it.
  3. Do dir directoryName >someTempFile.txt
What, you expected it to finish quickly? I don't think so. On this machine, with a fast hard drive, it takes several minutes.

Excuse: it's because the directory is fragmented. Good observation! Chances are that directory is big time fragmented now. Solution: defragment the whole drive and try again.

It still takes over a minute.

Excuse: it's because the hard drive does not have anything cached after that defragmentation run. Ok, fine. Solution: try again immediately so everything is still cached.

It still takes 17 seconds.

No more excuses now. For extra pain, try to add a new file to that folder. The time it takes is much longer than a blink.

Think of the implications for file server work. It means that a hash-bucketed folder approach will be way cheaper than a large folder, even paying the price of having to hash all filenames to find them.

And the slowdown doesn't occur all of a sudden, no sir. It will creep in, hiding from plain view until it's too late. Put 1000 files in a folder and you will notice that adding files isn't fast anymore. Remember all those dollars you spent on the newest hardware upgrades? Well, by 5000 files, you will have remembered 300bps modems.

Quick: which are the largest folders in your system? Really? Does it hurt now?

So in the extremely rare cases when folders larger than "small" change, users should implement a hash-bucket folder structure because the OS is unable to deal with the exceptional case efficiently.

BTrees sure sound cool now...

No comments: