Some time ago I wrote a blog post on Creating OpenXml files with Delphi. It showed how to open and create Word files using Delphi by using only a Zip and an Xml component.
There, I showed the structure of an OpenXml file: a zip file with a folder structure and a bunch of xml files:
In this blog post, we will explore the process of opening, inspecting, and creating OpenXml files using Delphi. We'll also discuss the benefits of using the OpenXml SDK for high-level file manipulation.
Inspecting an OpenXml file
To open this kind of file in C#, we can use the Packaging API. This API allows us to access and analyze the file's parts without directly dealing with the complexities of zips and xmls. As seen in the example we can get the parts of the file with this code:
using System.IO.Packaging;
using var package = Package.Open(fileName, FileMode.Open, FileAccess.Read);
package.GetParts().Select(p => p.ContentType).OrderBy(p => p).ToList().ForEach(p => Console.WriteLine(p));
You have to add the package System.IO.Packaging to the project. Once you do that, you will be able to retrieve and display the parts of the OpenXml file. The output will provide insights into the file's structure, such as the main part being "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml.", which indicates we are analyzing a Word file.
To retrieve the relationships within the file, you can use the code snippet below:
using System.IO.Packaging;
using var package = Package.Open(fileName, FileMode.Open, FileAccess.Read);
package.GetRelationships().ToList()
.ForEach(r => Console.WriteLine(
$"{r.Id} - {r.SourceUri} - {r.TargetUri} - {r.TargetMode} - {r.RelationshipType}"));
Additionally, you can obtain the main and properties files using the code below:
using System.IO.Packaging;
using var package = Package.Open(fileName, FileMode.Open, FileAccess.Read);
package.GetParts().Where(p => p.ContentType.Contains("main+xml")).ToList()
.ForEach(p => Console.WriteLine(p.Uri));
package.GetParts().Where(p => p.ContentType.Contains("core-properties")).ToList()
.ForEach(p => Console.WriteLine(p.Uri));
package.GetParts().Where(p => p.ContentType.Contains("extended-properties")).ToList()
.ForEach(p => Console.WriteLine(p.Uri));
Once we have done that, we can open the file and get its contents, such as core properties and extended properties.
using System.IO.Packaging;
using System.Xml;
using var package = Package.Open(fileName, FileMode.Open, FileAccess.Read);
package.GetParts().Where(p => p.ContentType.Contains("core-properties")).ToList()
.ForEach(p => WriteXmlToConsole(package.GetPart(p.Uri).GetStream());
package.GetParts().Where(p => p.ContentType.Contains("extended-properties")).ToList()
.ForEach(p => WriteXmlToConsole(package.GetPart(p.Uri).GetStream()));
// Write formatted XML to console
void WriteXmlToConsole(Stream stream)
{
var doc = new XmlDocument();
doc.Load(stream);
var settings = new XmlWriterSettings
{
Indent = true,
IndentChars = " ",
NewLineChars = "\r\n",
NewLineHandling = NewLineHandling.Replace
};
using var writer = XmlWriter.Create(Console.OpenStandardOutput(), settings);
doc.Save(writer);
}
Creating an OpenXml file
Instead of using the low-level Packaging API, we can leverage the OpenXml SDK, which provides a higher-level API for OpenXml file manipulation. You can find the source code and documentation at https://github.com/OfficeDev/Open-XML-SDK.
With the OpenXML SDK, you don't need to manipulate packages, relationships, or properties directly. Instead, you have new classes that allow you to work with Office files directly. For example, you have WordprocessingDocument, SpreadsheetDocument, and PresentationDocument classes for working with documents, spreadsheets, or presentations. The following code snippet creates a Word file with a text sentence:
using System.Xml;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
void CreateDoc(string filepath, string message)
{
using (WordprocessingDocument doc = WordprocessingDocument.Create(filepath, WordprocessingDocumentType.Document))
{
MainDocumentPart mainPart = doc.AddMainDocumentPart();
mainPart.Document = new Document();
Body body = mainPart.Document.AppendChild(new Body());
Paragraph para = body.AppendChild(new Paragraph());
Run run = para.AppendChild(new Run());
run.AppendChild(new Text(message));
para.AppendChild(new Run());
}
}
You need to add the DocumentFormat.OpenXml package to the project.
If you call it using CreateDoc("hello.docx","Hello World!");
you will get
Adding more information to the file
While the previous example created a simple file, you can extend the code to include more complex information. As an illustration, let's create a file that lists all the fonts installed on the system:
void CreateDocWithAllFonts(string filepath)
{
using (WordprocessingDocument doc = WordprocessingDocument.Create(filepath, WordprocessingDocumentType.Document))
{
MainDocumentPart mainPart = doc.AddMainDocumentPart();
mainPart.Document = new Document();
Body body = mainPart.Document.AppendChild(new Body());
// Get all fonts available in the system
var fonts = System.Drawing.FontFamily.Families.Select(f => f.Name).ToList();
foreach (var font in fonts)
{
Paragraph para = body.AppendChild(new Paragraph());
Run run = para.AppendChild(new Run());
run.AppendChild(new Text(font));
run.RunProperties = new RunProperties(new RunFonts() { Ascii = font });
para.AppendChild(new Run());
}
}
}
To execute this code, you'll need to add the System.Drawing.Common package to your project. The resulting file will contain a list of all the installed fonts.
Conclusions
As you can see, manipulating OpenXml files in C# is made easier with the OpenXml SDK. By using the provided classes, you can work directly with the file's data without worrying about its internal structure. However, if you require more granular control, you can still utilize the low-level Packaging API to work with the raw data. This flexibility enables effortless creation and modification of OpenXml files while maintaining the ability to handle specific details when needed.
All the source code for this project is at https://github.com/bsonnino/OpenXmlCSharp
Thanks, Bruno!
Great tutorial, and exactly what I need right now.
BTW, why does everyone only show how to CREATE, but no one shows how to READ docx-files?
For example, I also need to read out all the paragraphs in a loop without losing their Range – how can I do this without OLE?
That’s very easy: in the first part of the article, I showed how to open a docx file. You can open the document.xml file and read it – it will have all the paragraphs in the document and you will be able to do what you want with them.
I would recommend you to open a word document with a zip manager and the open the document.xml file and take a look at its contents