Fog Creek Software
Discussion Board




Restrictions on # of files in a Windows Directory?

I am designing an application that will store many separate user data files. There will be several 10s of thousands of users. Each user will have a collection of its own data files, each in a directory dedicated to that user.

My question is basically - what would I sacrifice in terms of file system reliability and performance by having all such user directories in one directory, versus concocting a multi level tree of user directories?

IE: if it is problematic to have several thousand separate directory entries in one directory, I could envision a directory structure in which the all user IDs ending in '0' go to a directory called c:\userdata\0, user IDs ending in '1' go to a directory called c:\userdata\1, etc. Or use more digits from the end of the user ID for greater granularity: c:\userdata\000, c:\userdata\001, etc.

But I would prefer the simplest approach - one mass of user directories - if there is no real performance or reliability hit in doing so. I feel that it is prudent to know any de facto limits, if there are any.

Assume Windows 2000 Server or Win 2003 server and NTFS.

Thanks.

Bored Bystander
Thursday, July 17, 2003

Have you ever scrolled through a directory with thousands of entries? It's not fun. :-)

I'd split them up, perhaps even going with the first two letters (otherwise S, T, and E will still be huge). Most 4GL's make creating folders like this trivial.

BTW, thinking about it - I'd only go with the first two letters if you can create the folders on demand (so there are only folders with content).

Philo

Philo
Thursday, July 17, 2003

For some reason performance gets terrible around 1000 - 3000 files. I'm not sure why but you can literally start getting file i/o errors around 5000.

It's worth taking the time to come up with an algorithm that insures that you never put more than 100 items in a folder.

Joel Spolsky
Friday, July 18, 2003

Ok, I figured that there would be a performance hit. I recall the old DOS limits of 255 directory entries at the root and I thought perhaps that NTFS and NT technology had surmounted it.

Yeah, admittedly, 1000's of directory entries would not be a pretty thing...

Thanks, guys.

Bored Bystander
Friday, July 18, 2003

NTFS has no limits. I can understand performance degrading, but not throwing random file I/O errors (I would expect this kind of behavior from FAT, but not NTFS).

Brad Wilson (dotnetguy.techieswithcats.com)
Friday, July 18, 2003

I've seen servers with over 100,000 files in a folder, NTFS copes fine, its just explorer which strugles to view them.

Tony E
Friday, July 18, 2003

Could Joel give us more details? Which operating system causes degradation at the 1,000 to 3,000 number? Does it matter if the disk is indexed or not?

The advantage of just using one directory, particularly with the web, is so great, that full details of the drawbacks would be greatly appreciated.

Stephen Jones
Friday, July 18, 2003

I have been having some test datasets like this; FAT is majorly slow, but NTFS is much slicker.

If you are dividing them based by first letter or something, it is easy to later spin whole subsets off to other disks, so it scales better.  Maybe that is easier in a *nix though.

constructive comment
Friday, July 18, 2003

"The advantage of just using one directory, particularly with the web, is so great"

?

Can you expand on that thought?

Philo

Philo
Friday, July 18, 2003

Unbroken links

Stephen Jones
Friday, July 18, 2003

Why would links break if they were in subdirectories?

(FWIW, I have a vested interest in this topic - I've got 60k+ documents in a very structured directory arrangement)

Philo

Philo
Friday, July 18, 2003

Because if you ever decide to change the subdirectories then you have a broken link.

I try and subdivide storage folders to less than 100 files per folder. but if I need to access them programmatically I keep the tree structure to a minumum. Less chance of SNAFUS

Stephen Jones
Friday, July 18, 2003

Can't find the reference now but I seem to recall that NTFS performance is optimal with rather shallow directory trees and short filenames.

Just me (Sir to you)
Friday, July 18, 2003

"Because if you ever decide to change the subdirectories then you have a broken link."

[shrug] And if you ever decide to rename your files you have a broken link. Seems like a strawman to me.

Philo

Philo
Friday, July 18, 2003

No, more smoke and mirrors.

Organizing things in a hierarchy is great. I do it and can or less find any file I want just by looking at the file names. But putting that mental model into the path name is the equivalent of hard coding text strings.

How do you know that the secretary, or boss, might not decide to reorganize your files; he/she will have no idea how links work.

Stephen Jones
Friday, July 18, 2003

Bored:

"My question is basically - what would I sacrifice in terms of file system reliability and performance by having all such user directories in one directory, versus concocting a multi level tree of user directories?"

I have direct experience with this.  If you have many tens of thousands of users, this approach simply won't work (ie, it won't scale).

The reason is that NTFS has a limit to the size of its directory entry space for a given directory.  At an avg. length of about 32 characters, I have been unable to create more than about 18,000 subdirectories for a given directory.

I never bothered to figure out the exact limit (since I abandoned my strategy when I hit the wall), but there _is_ a limit on the number of subdirectories a given directory may have.

(this was with Win2k, not sure what advances have occurred since then...YMMV)

c++_conventioner
Friday, July 18, 2003

I don't think there should be a practical limit, but there can be issues to deal with.

from: http://groups.google.com/groups?q=NTFS+maximum+number+of+subdirectories&start=10&hl=en&lr=&ie=UTF-8&oe=UTF-8&safe=off&selm=eeLvLnkvBHA.2384%40tkmsftngp05&rnum=13

"Short file names
Every time you create a file with a long file name, NTFS creates a second
file entry that has a similar 8.3 short file name. A file with an 8.3 short
file name has a file name containing 1 to 8 characters and a file name
extension containing 1 to 3 characters. The file name and file name
extension are separated by a period.

If you have a large number of files (300,000 or more) in a folder, and the
files have long file names with the same initial characters, the time
required to create the files increases. The increase occurs because NTFS
bases the short file name on the first six characters of the long file name.
In folders with more than 300,000 files, the short file names start to
conflict after NTFS uses all of the 8.3 names that are similar to the long
file names. Repeated conflicts between a generated short file name and
existing short file names cause NTFS to regenerate the short file name from
6 to 8 times.

To reduce the time required to create files, you can use the fsutil behavior
set disable8dot3 command to disable the creation of 8.3 short file names.
(You must restart your computer for this setting to take effect.)

<<This command isn't available in W2K, but there is a way to do this in the
registry, but I don't have this info handy. There should be a KB article on
this. Also, don't disable short file names on a server that stores DFS a
root.>>

If you want NTFS to generate 8.3 names, you can improve performance by using
a naming scheme in which long file names differ at the beginning instead of
at the end of the name.

Folder structure
NTFS supports volumes with large numbers of files and folders, so create a
folder structure that works best for your organization. Some guidelines to
consider when designing a folder structure include:

  a.. Avoid putting a large number of files into a folder if you use
programs that create, delete, open, or close files quickly or frequently.
The better solution is to logically separate the files into folders so that
you can distribute the workload on multiple folders at a time.
  b.. If there is no way to logically separate the files into folders, put
all the files into one folder, and then disable 8.3 file name generation. If
you must use 8.3 names, use a file naming scheme that ensures that the first
six characters are unique.
Important

  a.. The time required to run Chkdsk.exe increases with larger folders."

See also http://www.winntmag.com/Articles/Index.cfm?IssueID=27&ArticleID=3455 for a good description of what directories in NTFS actualy are.

Just me (Sir to you)
Friday, July 18, 2003

You may want to revisit your design. Is there any other storage mechanisim available beside thousands of files per user??  What about a database??  Having so many files seems to be very unwieldy, inefficent and prone to problems.

Of course I have no idea of the busines problem you are trying to solve.

DJ
Friday, July 18, 2003

Just out of curiousity, why don't you store the data as text in a db.  Or you could store the binary file as a BLOB in a db.  ADO has a binary write stream object.  Wouldn't that be optimal?

shiggins
Friday, July 18, 2003

Why not divide your users up into small segments of 50-100 users and then compress it? You'll get less files, and presumably all those thousands of users won't be logged on at the same time, so your performance hit wouldn't be all *that* great - right? Worth a check...

Mickey Petersen
Friday, July 18, 2003

On the design: OK, after reading some of the issues and considering "do I really want to have thousands of files in one directory" I think the thing to do here would be to partition the files into directories of a few hundred each (at most), by using a hierarchical directory structure.

On the design issue of saving separate files: yeah, I know that separate files sounds much cruder and lower tech than a database with BLOBs. I would have been very surprised to not see any mention of BLOBs. My response to this is that NTFS *is* a database already, is it not? A database of files, that is. And with NTFS I don't have to worry about corruption of BLOBs, retrieval of each BLOB and restoration of its data into a file in order to work with the data in the application that uses the data, etc. A simple disk backup will save the files. And I can compress the folder through NTFS.

Bored Bystander
Friday, July 18, 2003

"Why not divide your users up into small segments"

Too messy.


Monday, July 21, 2003

*  Recent Topics

*  Fog Creek Home