File ID support on Windows sucks.

On my current project I need reliable file identifiers, a binary blob that is unique to each physical file (not each file name) on the system, including those on local and remote volumes.

On the *nix platforms, the st_dev and st_ino fields work well. At any point in time on a running system, the st_dev-st_ino combinatation is guaranteed unique for a given file. Hard links can reliably be detected. Of course there a few edge cases that can screw things up, most notably mounting windows file shares.

On Windows, file-id support sucks. The first major problem is that there is no equivalent to st_dev, anywhere. In kernel mode you can fake it by hashing the ptr to the volumes VPB or DEVICE_OBJECT, but in user mode the only thing you can get at is the name of the object, the volume create time, and the volume serial number.

The general consensus by Microsoft seems to be that the volume serial number is what you should use, but this fails pretty quickly. Network shares return the serial number of the source file system volume at the _root_ of the share, but you get duplicate file ids if the share contains junctions or mount points. Even with local volumes, all you have to do is copy a VHD file and mount both copies to show how the volume serial number is pretty useless as a st_dev replacement.

A hash of the native object volume/device name seems about the only option, but getting this data cannot be done efficiently, or deterministically. I expect I will end up using then native object namespace prefix with heuristics to try to handle the issues with redirector style file systems.

Oh, and ReFS no longer guarantees it’s 64 bit file ids are even volume unique. For that you need to use the new Windows 8 info-types to query the 128 bit file id. It sure would have been nice for them to at least provide a hash compressed 64 bit file-id. Handling this case is going to slow things down even further.

Leave a Reply