After using the program developed in the last post, I was thinking about some ways to optimize it. Then I went to the FileFinder class and saw this:
class FileFinder
{
public async Task<ConcurrentDictionary<string, List>> GetFiles(string[] paths,
Regex excludeFilesRegex, Regex excludePathsRegex, bool incremental)
{
var files = new ConcurrentDictionary<string, List>();
var tasks = paths.Select(path =>
Task.Factory.StartNew(() =>
{
var rootDir = "";
var drive = Path.GetPathRoot(path);
if (!string.IsNullOrWhiteSpace(drive))
{
rootDir = drive[0] + "_drive";
rootDir = rootDir + path.Substring(2);
}
else
rootDir = path;
var selectedFiles = GetFilesInDirectory(path, excludeFilesRegex, excludePathsRegex, incremental);
files.AddOrUpdate(rootDir, selectedFiles.ToList(), (a, b) => b);
}));
await Task.WhenAll(tasks);
return files;
}
private List GetFilesInDirectory(string directory, Regex excludeFilesRegex,
Regex excludePathsRegex,bool incremental)
{
var files = new List();
try
{
var directories = Directory.GetDirectories(directory);
try
{
var selectedFiles = Directory.EnumerateFiles(directory).Where(f => !excludeFilesRegex.IsMatch(f.ToLower()));
if (incremental)
selectedFiles = selectedFiles.Where(f => (File.GetAttributes(f) & FileAttributes.Archive) != 0);
files.AddRange(selectedFiles);
}
catch
{
}
foreach (var dir in directories.Where(d => !excludePathsRegex.IsMatch(d.ToLower())))
{
files.AddRange(GetFilesInDirectory(Path.Combine(directory, dir), excludeFilesRegex, excludePathsRegex, incremental));
}
}
catch
{
}
return files;
}
}
I pass the filters to the GetFilesInDirectory method and do my filter there. That way, the folders I don’t want aren’t enumerated. For that, I had to make a change in the Config class, adding a new property for the path Regex and initializing it:
public class Config
{
public Config(string fileName)
{
if (!File.Exists(fileName))
return;
var doc = XDocument.Load(fileName);
if (doc.Root == null)
return;
IncludePaths = doc.Root.Element("IncludePaths")?.Value.Split(';');
ExcludeFiles = doc.Root.Element("ExcludeFiles")?.Value.Split(';') ?? new string[0] ;
ExcludePaths = doc.Root.Element("ExcludePaths")?.Value.Split(';') ?? new string[0];
BackupFile = $"{doc.Root.Element("BackupFile")?.Value}{DateTime.Now:yyyyMMddhhmmss}.zip";
ExcludeFilesRegex = new Regex(string.Join("|", ExcludeFiles));
ExcludePathRegex = new Regex(string.Join("|", ExcludePaths));
}
public Regex ExcludeFilesRegex { get; }
public Regex ExcludePathRegex { get; }
public IEnumerable IncludePaths { get; }
public IEnumerable ExcludeFiles { get; }
public IEnumerable ExcludePaths { get; }
public string BackupFile { get; }
}
With this changes, I could run the program again and measure the differences. That made a great difference. Before the change, the program was taking 160s to enumerate the files and give me 470000 files. After the change, enumerating the files took only 14.5s to give me the same files (I ran the programs three times each to avoid distortions). That’s a huge difference, no?
Then I started to think a little bit more and thought that the Regex could be compiled. So, I made two simple changes in the Config class:
ExcludeFilesRegex = new Regex(string.Join("|", ExcludeFiles),RegexOptions.Compiled);
ExcludePathRegex = new Regex(string.Join("|", ExcludePaths), RegexOptions.Compiled);
When I ran again the program, it took me 11.5s to enumerate the files. It doesn’t seem much, but it’s a 25% improvement with just a simple change. That was really good. That way, I have a backup program that enumerates files way faster than before.
All the source code for the project is at https://github.com/bsonnino/BackupData