Many times I need to enumerate the files in my disk or in a folder and subfolders, but that always has been slow. All the file enumeration techniques go through the disk structures querying the file names and going to the next one. With the Windows file indexing, this has gone to another level of speed: you can query and filter your data almost instantaneously, with one pitfall: it only works in the indexed parts of the disk (usually the libraries and Windows folders), being unusable for your data folders, unless you add them to the indexer:
Wouldn't it be nice to have a database that stores all files in the system and is updated as the files change? Some apps, like Copernic Desktop Search or X1 Search do exactly that: they have a database that indexes your system and can do fast queries for you. The pitfall is that you don't have an API to integrate to your programs, so the only way you have is to query the files is to use their apps.
At some time, Microsoft thought of doing something like a database of files, creating what was called WinFS - Windows Future Storage, but the project was cancelled. So, we have to stick with our current APIs to query files. Or no? As a matter of fact there is something in the Windows system that allows us to query the files in a very fast way, and it's called the NTFS MFT (NT file system master file table).
The NTFS MFT is a file structure use internally by Windows that allows querying files in a very fast way. It was designed to be fast and safe (there are two copies of the MFT, in case one of them gets corrupt), and we can access it to get our files enumerated. But some things should be noted when accessing the MFT:
- The MFT is only available for NTFS volumes. So, you cannot access FAT drives with this API
- To access the MFT structures, you must have elevated privileges - a normal user won't be able to access it
- With great power comes great responsibility (this is a SpiderMan comic book quote), so you should know that accessing the internal NTFS structures may harm you system irreversively - use the code with care, and don't blame me if something goes wrong (but here's a suggestion to Windows API designers: why not create some APIs that query the NTFS structures safely for normal users? That could be even be added to UWP programming).
Acessing the MFT structure
There is no formal API to access the MFT structure. You will have to decipher the structure (there is a lot of material here) and access the data using raw disk data read (that's why you need elevated privileges). This is a lot of work and hours of trial and error.
Fortunately, there are some libraries that do that in C#, and I've used this one, which is licensed as LGPL. You can use the library in your compiled work as a library, with no restriction. If you include the library source code in your code, it will be "derived work" and you must distribute all code as LGPL.
We will create a WPF program that will show the disk usage. It will enumerate all files and show them in the list, so you can see what's taking space in your disk. You will be able to select any of the NTFS disks in your machine.
Open Visual Studio with administrative rights (this is very important, or you won't be able to debug your program). Then create a new WPF project and add a new item. Choose Application Manifest File, you will have an app.manifest file added to your project. Then, you must change the requestedExecutionLevel tag of the file to:
<requestedExecutionLevel level="requireAdministrator" uiAccess="false" />
The next step is to detect the NTFS disks in your system. This is done with this code:
var ntfsDrives = DriveInfo.GetDrives()
.Where(d => d.DriveFormat == "NTFS").ToList();
Then, add the NtfsReader project to the solution and add a reference to it in the WPF project. Then, add the following UI in MainWindow.xaml.cs:
<Grid>
<Grid.RowDefinitions>
<RowDefinition Height="40"/>
<RowDefinition Height="*"/>
<RowDefinition Height="30"/>
</Grid.RowDefinitions>
<StackPanel Orientation="Horizontal" Margin="5">
<TextBlock Text="Drive" VerticalAlignment="Center"/>
<ComboBox x:Name="DrvCombo" Margin="5,0" Width="100"
VerticalContentAlignment="Center"/>
</StackPanel>
<ListBox x:Name="FilesList" Grid.Row="1"
VirtualizingPanel.IsVirtualizing="True"
VirtualizingPanel.IsVirtualizingWhenGrouping="True"
>
<ListBox.ItemTemplate>
<DataTemplate>
<StackPanel Orientation="Horizontal">
<TextBlock Text="{Binding FullName}"
Margin="5,0" Width="450"/>
<TextBlock Text="{Binding Size,StringFormat=N0}"
Margin="5,0" Width="150" TextAlignment="Right"/>
<TextBlock Text="{Binding LastChangeTime, StringFormat=g}"
Margin="5,0" Width="200"/>
</StackPanel>
</DataTemplate>
</ListBox.ItemTemplate>
</ListBox>
<TextBlock x:Name="StatusTxt" Grid.Row="2" HorizontalAlignment="Center" Margin="5"/>
</Grid>
We will have a combobox with all drives in the first row and a listbox with the files. The listbox has an ItemTemplate that will show the name, size and date of last change of each file. To fill this data, you will have to add this code in MainWindow.xaml.cs:
public MainWindow()
{
InitializeComponent();
var ntfsDrives = DriveInfo.GetDrives()
.Where(d => d.DriveFormat == "NTFS").ToList();
DrvCombo.ItemsSource = ntfsDrives;
DrvCombo.SelectionChanged += DrvCombo_SelectionChanged;
}
private void DrvCombo_SelectionChanged(object sender,
System.Windows.Controls.SelectionChangedEventArgs e)
{
if (DrvCombo.SelectedItem != null)
{
var driveToAnalyze = (DriveInfo) DrvCombo.SelectedItem;
var ntfsReader =
new NtfsReader(driveToAnalyze, RetrieveMode.All);
var nodes =
ntfsReader.GetNodes(driveToAnalyze.Name)
.Where(n => (n.Attributes &
(Attributes.Hidden | Attributes.System |
Attributes.Temporary | Attributes.Device |
Attributes.Directory | Attributes.Offline |
Attributes.ReparsePoint | Attributes.SparseFile)) == 0)
.OrderByDescending(n => n.Size);
FilesList.ItemsSource = nodes;
}
}
It gets all NTFS drives in your system and fills the combobox. In the SelectionChanged event handler, the reader gets all nodes in the drive. These nodes are filtered to remove all that are not normal files and then ordered descending by size and added to the listbox.
If you run the program you will see some things:
- If you look at the output window in Visual Studio, you will see these debug messages:
1333.951 MB of volume metadata has been read in 26.814 s at 49.748 MB/s
1324082 nodes have been retrieved in 2593.669 ms
This means that it took 2.6s to read and analyze all files in the disk (pretty fast for 1.3 million files, no?).
- When you change the drive in the combobox, the program will freeze for some time and the list will be filled with the files. The freezing is due to the fact that you are blocking the main thread while you are analyzing the disk. To avoid this, you should run the code in a secondary thread, like this code:
private async void DrvCombo_SelectionChanged(object sender,
System.Windows.Controls.SelectionChangedEventArgs e)
{
if (DrvCombo.SelectedItem != null)
{
var driveToAnalyze = (DriveInfo) DrvCombo.SelectedItem;
DrvCombo.IsEnabled = false;
StatusTxt.Text = "Analyzing drive";
List<INode> nodes = null;
await Task.Factory.StartNew(() =>
{
var ntfsReader =
new NtfsReader(driveToAnalyze, RetrieveMode.All);
nodes =
ntfsReader.GetNodes(driveToAnalyze.Name)
.Where(n => (n.Attributes &
(Attributes.Hidden | Attributes.System |
Attributes.Temporary | Attributes.Device |
Attributes.Directory | Attributes.Offline |
Attributes.ReparsePoint | Attributes.SparseFile)) == 0)
.OrderByDescending(n => n.Size).ToList();
});
FilesList.ItemsSource = nodes;
DrvCombo.IsEnabled = true;
StatusTxt.Text = $"{nodes.Count} files listed. " +
$"Total size: {nodes.Sum(n => (double)n.Size):N0}";
}
}
This code creates a task and runs the analyzing code in it, and doesn't freeze the UI. I just took care of disabling the combobox and putting a warning for the user. After the code is run, the nodes list is assigned to the listbox and the UI is re-enabled.
This code can show you the list of the largest files in your disk, but you may want to analyze it by other ways, like grouping by extension or by folder. WPF has an easy way to group and show data: the CollectionViewSource. With it, you can do grouping and sorting in the ListBox. We will change our UI to add a new ComboBox to show the new groupings:
<Grid>
<Grid.RowDefinitions>
<RowDefinition Height="40"/>
<RowDefinition Height="*"/>
<RowDefinition Height="30"/>
</Grid.RowDefinitions>
<StackPanel Orientation="Horizontal" Margin="5">
<TextBlock Text="Drive" VerticalAlignment="Center"/>
<ComboBox x:Name="DrvCombo" Margin="5,0" Width="100"
VerticalContentAlignment="Center"/>
</StackPanel>
<StackPanel Grid.Row="0" HorizontalAlignment="Right" Orientation="Horizontal" Margin="5">
<TextBlock Text="Sort" VerticalAlignment="Center"/>
<ComboBox x:Name="SortCombo" Margin="5,0" Width="100"
VerticalContentAlignment="Center" SelectedIndex="0"
SelectionChanged="SortCombo_OnSelectionChanged">
<ComboBoxItem>Size</ComboBoxItem>
<ComboBoxItem>Extension</ComboBoxItem>
<ComboBoxItem>Folder</ComboBoxItem>
</ComboBox>
</StackPanel>
<ListBox x:Name="FilesList" Grid.Row="1"
VirtualizingPanel.IsVirtualizing="True"
VirtualizingPanel.IsVirtualizingWhenGrouping="True" >
<ListBox.ItemTemplate>
<DataTemplate>
<StackPanel Orientation="Horizontal">
<TextBlock Text="{Binding FullName}"
Margin="5,0" Width="450"/>
<TextBlock Text="{Binding Size,StringFormat=N0}"
Margin="5,0" Width="150" TextAlignment="Right"/>
<TextBlock Text="{Binding LastChangeTime, StringFormat=g}"
Margin="5,0" Width="200"/>
</StackPanel>
</DataTemplate>
</ListBox.ItemTemplate>
<ListBox.GroupStyle>
<GroupStyle>
<GroupStyle.HeaderTemplate>
<DataTemplate>
<StackPanel Orientation="Horizontal">
<TextBlock Text="{Binding Name}" FontSize="15" FontWeight="Bold"
Margin="5,0"/>
<TextBlock Text="(" VerticalAlignment="Center" Margin="5,0,0,0" />
<TextBlock Text="{Binding Items.Count}" VerticalAlignment="Center"/>
<TextBlock Text=" files - " VerticalAlignment="Center"/>
<TextBlock Text="{Binding Items,
Converter={StaticResource ItemsSizeConverter}, StringFormat=N0}"
VerticalAlignment="Center"/>
<TextBlock Text=" bytes)" VerticalAlignment="Center"/>
</StackPanel>
</DataTemplate>
</GroupStyle.HeaderTemplate>
</GroupStyle>
</ListBox.GroupStyle>
</ListBox>
<TextBlock x:Name="StatusTxt" Grid.Row="2" HorizontalAlignment="Center" Margin="5"/>
</Grid>
The combobox has three options, Size, Extension and Folder. The first one is the same thing we've had until now; the second will group the files by extension and the third will group the files by top folder. We've also added a GroupStyle to the listbox. If we don't do that, the data will be grouped, but the groups won't be shown. If you notice the GroupStyle, you will see that we're adding the name, then the count of the items (number of files in the group), then we have a third TextBox where we pass the Items and a converter. That's because we want to show the total size in bytes of the group. For that, I've created a converter that converts the Items in the group to the sum of the bytes of the file:
public class ItemsSizeConverter : IValueConverter
{
public object Convert(object value, Type targetType, object parameter,
CultureInfo culture)
{
var items = value as ReadOnlyObservableCollection<object>;
return items?.Sum(n => (double) ((INode)n).Size);
}
public object ConvertBack(object value, Type targetType, object parameter,
CultureInfo culture)
{
throw new NotImplementedException();
}
}
The code for the SelectionChanged for the sort combobox is:
private void SortCombo_OnSelectionChanged(object sender,
SelectionChangedEventArgs e)
{
if (_view == null)
return;
_view.GroupDescriptions.Clear();
_view.SortDescriptions.Clear();
switch (SortCombo.SelectedIndex)
{
case 1:
_view.GroupDescriptions.Add(new PropertyGroupDescription("FullName",
new FileExtConverter()));
break;
case 2:
_view.SortDescriptions.Add(new SortDescription("FullName",
ListSortDirection.Ascending));
_view.GroupDescriptions.Add(new PropertyGroupDescription("FullName",
new FilePathConverter()));
break;
}
}
We add GroupDescriptions for each kind of group. As we don't have the extension and top path properties in the nodes shown in the listbox, I've created two converters to get these from the full name. The converter that gets the extension from the name is:
class FileExtConverter : IValueConverter
{
public object Convert(object value, Type targetType, object parameter,
CultureInfo culture)
{
var fileName = value as string;
return string.IsNullOrWhiteSpace(fileName) ?
null :
Path.GetExtension(fileName).ToLowerInvariant();
}
public object ConvertBack(object value, Type targetType, object parameter,
CultureInfo culture)
{
throw new NotImplementedException();
}
}
The converter that gets the top path of the file is:
class FilePathConverter :IValueConverter
{
public object Convert(object value, Type targetType, object parameter,
CultureInfo culture)
{
var fileName = value as string;
return string.IsNullOrWhiteSpace(fileName) ?
null :
GetTopPath(fileName);
}
private string GetTopPath(string fileName)
{
var paths = fileName.Split(Path.DirectorySeparatorChar).Take(2);
return string.Join(Path.DirectorySeparatorChar.ToString(), paths);
}
public object ConvertBack(object value, Type targetType, object parameter,
CultureInfo culture)
{
throw new NotImplementedException();
}
}
One last thing is to create the _view field, when we are filling the listbox:
private async void DrvCombo_SelectionChanged(object sender,
System.Windows.Controls.SelectionChangedEventArgs e)
{
if (DrvCombo.SelectedItem != null)
{
var driveToAnalyze = (DriveInfo) DrvCombo.SelectedItem;
DrvCombo.IsEnabled = false;
StatusTxt.Text = "Analyzing drive";
List<INode> nodes = null;
await Task.Factory.StartNew(() =>
{
var ntfsReader =
new NtfsReader(driveToAnalyze, RetrieveMode.All);
nodes =
ntfsReader.GetNodes(driveToAnalyze.Name)
.Where(n => (n.Attributes &
(Attributes.Hidden | Attributes.System |
Attributes.Temporary | Attributes.Device |
Attributes.Directory | Attributes.Offline |
Attributes.ReparsePoint | Attributes.SparseFile)) == 0)
.OrderByDescending(n => n.Size).ToList();
});
FilesList.ItemsSource = nodes;
_view = (CollectionView)CollectionViewSource.GetDefaultView(FilesList.ItemsSource);
DrvCombo.IsEnabled = true;
StatusTxt.Text = $"{nodes.Count} files listed. " +
$"Total size: {nodes.Sum(n => (double)n.Size):N0}";
}
else
{
_view = null;
}
}
With all these in place, you can run the app and get a result like this:
Conclusions
As you can see, there is a way to get fast file enumeration for your NTFS disks, but you must have admin privileges to use it. We've created a WPF program that uses this kind of enumeration and allows you to group the data in different ways, using the WPF resources. If you need to enumerate your files very fast, you can consider this way to do it.
The full source code for this article is at https://github.com/bsonnino/NtfsFileEnum
Hi Bruno ,
First great great article and code ! A big Thank you.
Objective: Compare Scanned Files Log Sum output from backup tool.
My challenge is to have some utility that computes just two(2) things: Total No. of Files + Total Size + Dump to ascii file by command line. no gui.
The challenge : must receive excludes no to compute [from backup policies] , could be file extensions and/or full directories.
My question is , excluding sub dir will slowdown a lot file systems with millions of files.
Example:
inputs:
Input1) Exclude_File_Extensions_list (not count/size) *.tmp, *.bck
input2 ) Exclude Specific Folders c:\temp\ ||c:\windows\temp ||etc
c: \ roodir (scan) + subdirs not limit
|_ subdir1(scan) + subdirs not limit
|_subdir2(scan) + subdirs not limit
|_subdir2.1 (scan)+ subdirs not limit
\_subdir2.1.1(scan)+ subdirs not limit
|_subdir3(scan)+ subdirs not limit
|_subdir3.1(NOT_scan)
|_subdir4(scan)+ subdirs not limit
IS this doable fast ? The need is to compare huge filesystems with backup outputs within backup window. So du.exe, or any other utilities that scan file system are to slow.
Cheers , and thank you again
Andre
Yes, I think so – you can do it, provided the limitations of the method – you need admin rights and both the source and destination drives should be formatted with NTFS
Thanks Bruno. Great article. I will cite your prototype and data as proof of concept in a performance issue for a project.
Hello,
i get the error “Unable to read volume information”
at System.IO.Filesystem.Ntfs.NtfsReader.ReadFile
my drive c: is ntfs Format
Are you running as Admin (elevated)? That needs admin rights.
yes
private unsafe void ReadFile(byte* buffer, ulong len, ulong absolutePosition)
{
NtfsReader.NativeOverlapped lpOverlapped = new NtfsReader.NativeOverlapped(absolutePosition);
uint lpNumberOfBytesRead;
if (!NtfsReader.ReadFile(this._volumeHandle, (IntPtr) (void*) buffer, (uint) len, out lpNumberOfBytesRead, ref lpOverlapped))
throw new Exception(“Unable to read volume information”);
Why are you using NTFS structures to read the contents of the file? In fact, the article shows how to use NTFS to enumerate files, not to read their contents.
There are other ways to read files in a fast way (ex. Memory mapped files). You could try this, instead of going on the low level path
i like to read only the filenames and size of the files of a complete drive in a very fast way
That’s exactly what I’m doing in the article: compile the code at GitHub and you will see the filenames and sizes of the files in your NTFS drives