Skip to content
Bruno Sonnino
Menu
  • Home
  • About
Menu

Super fast file enumeration with NTFS structures

Posted on 22 March 2019

Many times I need to enumerate the files in my disk or in a folder and subfolders, but that always has been slow. All the file enumeration techniques go through the disk structures querying the file names and going to the next one. With the Windows file indexing, this has gone to another level of speed: you can query and filter your data almost instantaneously, with one pitfall: it only works in the indexed parts of the disk (usually the libraries and Windows folders), being unusable for your data folders, unless you add them to the indexer:

Wouldn't it be nice to have a database that stores all files in the system and is updated as the files change? Some apps, like Copernic Desktop Search or X1 Search do exactly that: they have a database that indexes your system and can do fast queries for you. The pitfall is that you don't have an API to integrate to your programs, so the only way you have is to query the files is to use their apps.

At some time, Microsoft thought of doing something like a database of files, creating what was called WinFS - Windows Future Storage, but the project was cancelled. So, we have to stick with our current APIs to query files. Or no? As a matter of fact there is something in the Windows system that allows us to query the files in a very fast way, and it's called the NTFS MFT (NT file system master file table).

The NTFS MFT is a file structure use internally by Windows that allows querying files in a very fast way. It was designed to be fast and safe (there are two copies of the MFT, in case one of them gets corrupt), and we can access it to get our files enumerated. But some things should be noted when accessing the MFT:

  • The MFT is only available for NTFS volumes. So, you cannot access FAT drives with this API
  • To access the MFT structures, you must have elevated privileges - a normal user won't be able to access it
  • With great power comes great responsibility (this is a SpiderMan comic book quote), so you should know that accessing the internal NTFS structures may harm you system irreversively - use the code with care, and don't blame me if something goes wrong (but here's a suggestion to Windows API designers: why not create some APIs that query the NTFS structures safely for normal users? That could be even be added to UWP programming).

Acessing the MFT structure

There is no formal API to access the MFT structure. You will have to decipher the structure (there is a lot of material here) and access the data using raw disk data read (that's why you need elevated privileges). This is a lot of work and hours of trial and error.

Fortunately, there are some libraries that do that in C#, and I've used this one, which is licensed as LGPL. You can use the library in your compiled work as a library, with no restriction. If you include the library source code in your code, it will be "derived work" and you must distribute all code as LGPL.

We will create a WPF program that will show the disk usage. It will enumerate all files and show them in the list, so you can see what's taking space in your disk. You will be able to select any of the NTFS disks in your machine.

Open Visual Studio with administrative rights (this is very important, or you won't be able to debug your program). Then create a new WPF project and add a new item. Choose Application Manifest File, you will have an app.manifest file added to your project. Then, you must change the requestedExecutionLevel tag of the file to:

<requestedExecutionLevel level="requireAdministrator" uiAccess="false" />
XML

The next step is to detect the NTFS disks in your system. This is done with this code:

var ntfsDrives = DriveInfo.GetDrives()
    .Where(d => d.DriveFormat == "NTFS").ToList();
C#

Then, add the NtfsReader project to the solution and add a reference to it in the WPF project. Then, add the following UI in MainWindow.xaml.cs:

<Grid>
    <Grid.RowDefinitions>
        <RowDefinition Height="40"/>
        <RowDefinition Height="*"/>
        <RowDefinition Height="30"/>
    </Grid.RowDefinitions>
    <StackPanel Orientation="Horizontal" Margin="5">
        <TextBlock Text="Drive" VerticalAlignment="Center"/>
        <ComboBox x:Name="DrvCombo" Margin="5,0" Width="100" 
                  VerticalContentAlignment="Center"/>
    </StackPanel>
    <ListBox x:Name="FilesList" Grid.Row="1" 
             VirtualizingPanel.IsVirtualizing="True"
             VirtualizingPanel.IsVirtualizingWhenGrouping="True"
             >
        <ListBox.ItemTemplate>
            <DataTemplate>
                <StackPanel Orientation="Horizontal">
                    <TextBlock Text="{Binding FullName}" 
                               Margin="5,0" Width="450"/>
                    <TextBlock Text="{Binding Size,StringFormat=N0}" 
                               Margin="5,0" Width="150" TextAlignment="Right"/>
                    <TextBlock Text="{Binding LastChangeTime, StringFormat=g}" 
                               Margin="5,0" Width="200"/>
                </StackPanel>
            </DataTemplate>
        </ListBox.ItemTemplate>
    </ListBox>
    <TextBlock x:Name="StatusTxt" Grid.Row="2" HorizontalAlignment="Center" Margin="5"/>
</Grid>
XML

We will have a combobox with all drives in the first row and a listbox with the files. The listbox has an ItemTemplate that will show the name, size and date of last change of each file. To fill this data, you will have to add this code in MainWindow.xaml.cs:

public MainWindow()
{
    InitializeComponent();
    var ntfsDrives = DriveInfo.GetDrives()
        .Where(d => d.DriveFormat == "NTFS").ToList();
    DrvCombo.ItemsSource = ntfsDrives;
    DrvCombo.SelectionChanged += DrvCombo_SelectionChanged;
}

private void DrvCombo_SelectionChanged(object sender, 
    System.Windows.Controls.SelectionChangedEventArgs e)
{
    if (DrvCombo.SelectedItem != null)
    {
        var driveToAnalyze = (DriveInfo) DrvCombo.SelectedItem;
        var ntfsReader =
            new NtfsReader(driveToAnalyze, RetrieveMode.All);
        var nodes =
            ntfsReader.GetNodes(driveToAnalyze.Name)
                .Where(n => (n.Attributes & 
                             (Attributes.Hidden | Attributes.System | 
                              Attributes.Temporary | Attributes.Device | 
                              Attributes.Directory | Attributes.Offline | 
                              Attributes.ReparsePoint | Attributes.SparseFile)) == 0)
                .OrderByDescending(n => n.Size);
        FilesList.ItemsSource = nodes;
    }
}
C#

It gets all NTFS drives in your system and fills the combobox. In the SelectionChanged event handler, the reader gets all nodes in the drive. These nodes are filtered to remove all that are not normal files and then ordered descending by size and added to the listbox.

If you run the program you will see some things:

  • If you look at the output window in Visual Studio, you will see these debug messages:
1333.951 MB of volume metadata has been read in 26.814 s at 49.748 MB/s
1324082 nodes have been retrieved in 2593.669 ms

This means that it took 2.6s to read and analyze all files in the disk (pretty fast for 1.3 million files, no?).

  • When you change the drive in the combobox, the program will freeze for some time and the list will be filled with the files. The freezing is due to the fact that you are blocking the main thread while you are analyzing the disk. To avoid this, you should run the code in a secondary thread, like this code:
private async void DrvCombo_SelectionChanged(object sender, 
    System.Windows.Controls.SelectionChangedEventArgs e)
{
    if (DrvCombo.SelectedItem != null)
    {
        var driveToAnalyze = (DriveInfo) DrvCombo.SelectedItem;
        DrvCombo.IsEnabled = false;
        StatusTxt.Text = "Analyzing drive";
        List<INode> nodes = null;
        await Task.Factory.StartNew(() =>
        {
            var ntfsReader =
                new NtfsReader(driveToAnalyze, RetrieveMode.All);
            nodes =
                ntfsReader.GetNodes(driveToAnalyze.Name)
                    .Where(n => (n.Attributes &
                                 (Attributes.Hidden | Attributes.System |
                                  Attributes.Temporary | Attributes.Device |
                                  Attributes.Directory | Attributes.Offline |
                                  Attributes.ReparsePoint | Attributes.SparseFile)) == 0)
                    .OrderByDescending(n => n.Size).ToList();
        });
        FilesList.ItemsSource = nodes;
        DrvCombo.IsEnabled = true;
        StatusTxt.Text = $"{nodes.Count} files listed. " +
                         $"Total size: {nodes.Sum(n => (double)n.Size):N0}";
    }
}
C#

This code creates a task and runs the analyzing code in it, and doesn't freeze the UI. I just took care of disabling the combobox and putting a warning for the user. After the code is run, the nodes list is assigned to the listbox and the UI is re-enabled.

This code can show you the list of the largest files in your disk, but you may want to analyze it by other ways, like grouping by extension or by folder. WPF has an easy way to group and show data: the CollectionViewSource. With it, you can do grouping and sorting in the ListBox. We will change our UI to add a new ComboBox to show the new groupings:

<Grid>
    <Grid.RowDefinitions>
        <RowDefinition Height="40"/>
        <RowDefinition Height="*"/>
        <RowDefinition Height="30"/>
    </Grid.RowDefinitions>
    <StackPanel Orientation="Horizontal" Margin="5">
        <TextBlock Text="Drive" VerticalAlignment="Center"/>
        <ComboBox x:Name="DrvCombo" Margin="5,0" Width="100" 
                  VerticalContentAlignment="Center"/>
    </StackPanel>
    <StackPanel Grid.Row="0" HorizontalAlignment="Right" Orientation="Horizontal" Margin="5">
        <TextBlock Text="Sort" VerticalAlignment="Center"/>
        <ComboBox x:Name="SortCombo" Margin="5,0" Width="100" 
                  VerticalContentAlignment="Center" SelectedIndex="0" 
                  SelectionChanged="SortCombo_OnSelectionChanged">
            <ComboBoxItem>Size</ComboBoxItem>
            <ComboBoxItem>Extension</ComboBoxItem>
            <ComboBoxItem>Folder</ComboBoxItem>
        </ComboBox>
    </StackPanel>
    <ListBox x:Name="FilesList" Grid.Row="1" 
             VirtualizingPanel.IsVirtualizing="True"
             VirtualizingPanel.IsVirtualizingWhenGrouping="True" >
        <ListBox.ItemTemplate>
            <DataTemplate>
                <StackPanel Orientation="Horizontal">
                    <TextBlock Text="{Binding FullName}" 
                               Margin="5,0" Width="450"/>
                    <TextBlock Text="{Binding Size,StringFormat=N0}" 
                               Margin="5,0" Width="150" TextAlignment="Right"/>
                    <TextBlock Text="{Binding LastChangeTime, StringFormat=g}" 
                               Margin="5,0" Width="200"/>
                </StackPanel>
            </DataTemplate>
        </ListBox.ItemTemplate>
        <ListBox.GroupStyle>
            <GroupStyle>
                <GroupStyle.HeaderTemplate>
                    <DataTemplate>
                        <StackPanel Orientation="Horizontal">
                            <TextBlock Text="{Binding Name}" FontSize="15" FontWeight="Bold" 
                                       Margin="5,0"/>
                            <TextBlock Text="(" VerticalAlignment="Center" Margin="5,0,0,0" />
                            <TextBlock Text="{Binding Items.Count}" VerticalAlignment="Center"/>
                            <TextBlock Text=" files - " VerticalAlignment="Center"/>
                            <TextBlock Text="{Binding Items, 
                                Converter={StaticResource ItemsSizeConverter}, StringFormat=N0}"
                                         VerticalAlignment="Center"/>
                            <TextBlock Text=" bytes)" VerticalAlignment="Center"/>
                        </StackPanel>
                    </DataTemplate>
                </GroupStyle.HeaderTemplate>
            </GroupStyle>
        </ListBox.GroupStyle>
    </ListBox>
    <TextBlock x:Name="StatusTxt" Grid.Row="2" HorizontalAlignment="Center" Margin="5"/>
</Grid>
XML

The combobox has three options, Size, Extension and Folder. The first one is the same thing we've had until now; the second will group the files by extension and the third will group the files by top folder. We've also added a GroupStyle to the listbox. If we don't do that, the data will be grouped, but the groups won't be shown. If you notice the GroupStyle, you will see that we're adding the name, then the count of the items (number of files in the group), then we have a third TextBox where we pass the Items and a converter. That's because we want to show the total size in bytes of the group. For that, I've created a converter that converts the Items in the group to the sum of the bytes of the file:

public class ItemsSizeConverter : IValueConverter
{
    public object Convert(object value, Type targetType, object parameter, 
        CultureInfo culture)
    {
        var items = value as ReadOnlyObservableCollection<object>;
        return items?.Sum(n => (double) ((INode)n).Size);
    }

    public object ConvertBack(object value, Type targetType, object parameter, 
        CultureInfo culture)
    {
        throw new NotImplementedException();
    }
}
C#

The code for the SelectionChanged for the sort combobox is:

private void SortCombo_OnSelectionChanged(object sender, 
    SelectionChangedEventArgs e)
{
    if (_view == null)
        return;
    _view.GroupDescriptions.Clear();
    _view.SortDescriptions.Clear();
    switch (SortCombo.SelectedIndex)
    {
        case 1:
            _view.GroupDescriptions.Add(new PropertyGroupDescription("FullName", 
                new FileExtConverter()));
            break;
        case 2:
            _view.SortDescriptions.Add(new SortDescription("FullName",
                ListSortDirection.Ascending));
            _view.GroupDescriptions.Add(new PropertyGroupDescription("FullName", 
                new FilePathConverter()));
            break;
    }
}
C#

We add GroupDescriptions for each kind of group. As we don't have the extension and top path properties in the nodes shown in the listbox, I've created two converters to get these from the full name. The converter that gets the extension from the name is:

class FileExtConverter : IValueConverter
{
    public object Convert(object value, Type targetType, object parameter, 
        CultureInfo culture)
    {
        var fileName = value as string;
        return string.IsNullOrWhiteSpace(fileName) ? 
            null : 
            Path.GetExtension(fileName).ToLowerInvariant();
    }

    public object ConvertBack(object value, Type targetType, object parameter, 
        CultureInfo culture)
    {
        throw new NotImplementedException();
    }
}
C#

The converter that gets the top path of the file is:

class FilePathConverter :IValueConverter
{
    public object Convert(object value, Type targetType, object parameter, 
        CultureInfo culture)
    {
        var fileName = value as string;
        return string.IsNullOrWhiteSpace(fileName) ?
            null :
            GetTopPath(fileName);
    }

    private string GetTopPath(string fileName)
    {
        var paths = fileName.Split(Path.DirectorySeparatorChar).Take(2);
        return string.Join(Path.DirectorySeparatorChar.ToString(), paths);
    }

    public object ConvertBack(object value, Type targetType, object parameter, 
        CultureInfo culture)
    {
        throw new NotImplementedException();
    }
}
C#

One last thing is to create the _view field, when we are filling the listbox:

 private async void DrvCombo_SelectionChanged(object sender, 
     System.Windows.Controls.SelectionChangedEventArgs e)
 {
     if (DrvCombo.SelectedItem != null)
     {
         var driveToAnalyze = (DriveInfo) DrvCombo.SelectedItem;
         DrvCombo.IsEnabled = false;
         StatusTxt.Text = "Analyzing drive";
         List<INode> nodes = null;
         await Task.Factory.StartNew(() =>
         {
             var ntfsReader =
                 new NtfsReader(driveToAnalyze, RetrieveMode.All);
             nodes =
                 ntfsReader.GetNodes(driveToAnalyze.Name)
                     .Where(n => (n.Attributes &
                                  (Attributes.Hidden | Attributes.System |
                                   Attributes.Temporary | Attributes.Device |
                                   Attributes.Directory | Attributes.Offline |
                                   Attributes.ReparsePoint | Attributes.SparseFile)) == 0)
                     .OrderByDescending(n => n.Size).ToList();
         });
         FilesList.ItemsSource = nodes;
         _view = (CollectionView)CollectionViewSource.GetDefaultView(FilesList.ItemsSource);

         DrvCombo.IsEnabled = true;
         StatusTxt.Text = $"{nodes.Count} files listed. " +
                          $"Total size: {nodes.Sum(n => (double)n.Size):N0}";
     }
     else
     {
         _view = null;
     }
 }
C#

With all these in place, you can run the app and get a result like this:

Conclusions

As you can see, there is a way to get fast file enumeration for your NTFS disks, but you must have admin privileges to use it. We've created a WPF program that uses this kind of enumeration and allows you to group the data in different ways, using the WPF resources. If you need to enumerate your files very fast, you can consider this way to do it.

The full source code for this article is at https://github.com/bsonnino/NtfsFileEnum

11 thoughts on “Super fast file enumeration with NTFS structures”

  1. Andre says:
    28 November 2019 at 15:11

    Hi Bruno ,

    First great great article and code ! A big Thank you.

    Objective: Compare Scanned Files Log Sum output from backup tool.
    My challenge is to have some utility that computes just two(2) things: Total No. of Files + Total Size + Dump to ascii file by command line. no gui.

    The challenge : must receive excludes no to compute [from backup policies] , could be file extensions and/or full directories.

    My question is , excluding sub dir will slowdown a lot file systems with millions of files.

    Example:

    inputs:
    Input1) Exclude_File_Extensions_list (not count/size) *.tmp, *.bck
    input2 ) Exclude Specific Folders c:\temp\ ||c:\windows\temp ||etc

    c: \ roodir (scan) + subdirs not limit
    |_ subdir1(scan) + subdirs not limit
    |_subdir2(scan) + subdirs not limit
    |_subdir2.1 (scan)+ subdirs not limit
    \_subdir2.1.1(scan)+ subdirs not limit
    |_subdir3(scan)+ subdirs not limit
    |_subdir3.1(NOT_scan)
    |_subdir4(scan)+ subdirs not limit

    IS this doable fast ? The need is to compare huge filesystems with backup outputs within backup window. So du.exe, or any other utilities that scan file system are to slow.

    Cheers , and thank you again

    Andre

    Reply
    1. bsonnino says:
      5 January 2020 at 13:56

      Yes, I think so – you can do it, provided the limitations of the method – you need admin rights and both the source and destination drives should be formatted with NTFS

      Reply
  2. David G Horsman says:
    7 October 2020 at 21:51

    Thanks Bruno. Great article. I will cite your prototype and data as proof of concept in a performance issue for a project.

    Reply
  3. Pingback: Discover what is consuming your disk with NTFS structures – Bruno Sonnino
  4. niko says:
    13 September 2023 at 02:08

    Hello,
    i get the error “Unable to read volume information”
    at System.IO.Filesystem.Ntfs.NtfsReader.ReadFile
    my drive c: is ntfs Format

    Reply
    1. sonnino@gmail.com says:
      14 September 2023 at 07:41

      Are you running as Admin (elevated)? That needs admin rights.

      Reply
  5. niko says:
    16 September 2023 at 05:56

    yes

    Reply
  6. niko says:
    16 September 2023 at 06:10

    private unsafe void ReadFile(byte* buffer, ulong len, ulong absolutePosition)
    {
    NtfsReader.NativeOverlapped lpOverlapped = new NtfsReader.NativeOverlapped(absolutePosition);
    uint lpNumberOfBytesRead;
    if (!NtfsReader.ReadFile(this._volumeHandle, (IntPtr) (void*) buffer, (uint) len, out lpNumberOfBytesRead, ref lpOverlapped))
    throw new Exception(“Unable to read volume information”);

    Reply
    1. sonnino@gmail.com says:
      16 September 2023 at 09:04

      Why are you using NTFS structures to read the contents of the file? In fact, the article shows how to use NTFS to enumerate files, not to read their contents.

      There are other ways to read files in a fast way (ex. Memory mapped files). You could try this, instead of going on the low level path

      Reply
  7. niko says:
    16 September 2023 at 23:53

    i like to read only the filenames and size of the files of a complete drive in a very fast way

    Reply
    1. sonnino@gmail.com says:
      17 September 2023 at 07:47

      That’s exactly what I’m doing in the article: compile the code at GitHub and you will see the filenames and sizes of the files in your NTFS drives

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • May 2025
  • December 2024
  • October 2024
  • August 2024
  • July 2024
  • June 2024
  • November 2023
  • October 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • June 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • July 2021
  • June 2021
  • May 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • October 2020
  • September 2020
  • April 2020
  • March 2020
  • January 2020
  • November 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • June 2017
  • May 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • July 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • October 2015
  • August 2013
  • May 2013
  • February 2012
  • January 2012
  • April 2011
  • March 2011
  • December 2010
  • November 2009
  • June 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • July 2008
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • Development
  • English
  • Português
  • Uncategorized
  • Windows

.NET AI Algorithms asp.NET Backup C# Debugging Delphi Dependency Injection Desktop Bridge Desktop icons Entity Framework JSON Linq Mef Minimal API MVVM NTFS Open Source OpenXML OzCode PowerShell Sensors Silverlight Source Code Generators sql server Surface Dial Testing Tools TypeScript UI Unit Testing UWP Visual Studio VS Code WCF WebView2 WinAppSDK Windows Windows 10 Windows Forms Windows Phone WPF XAML Zip

  • Entries RSS
  • Comments RSS
©2025 Bruno Sonnino | Design: Newspaperly WordPress Theme
Menu
  • Home
  • About