The models available in the Windows AI APIs have unlocked a whole new world of features: you can add chat to your apps, answer questions in natural language, describe images for accessibility, enhance image resolution, remove image backgrounds or extract text from an image.
These were features very difficult to implement, but the Windows AI APIs makes this very easy. One useful feature is to read a table from an image and convert it into a text table that can be edited or processed. This article will show how to use the Windows OCR model to read a table from an image and convert it to an ASCII table.
The OCR model was promoted to stable release status, so you can use it in your production code with version 1.7.2 of Windows App SDK or later.
We will build a WinUI 3 app in this article, but you can create a console or WPF app with the WinAppSDK. For more info, just take a look at my last article.
In Visual Studio, create a blank, WinUI3 packaged app:
To use the Windows AI models, we will have to change some things:
- In the Solution Explorer, right-click in the project and select Properties. Change the Target OS Version and Supported OS Version to 10.0.22621.0
- In the Solution Explorer, right-click in the project dependencies and select Manage NuGet packages. Ensure that the Microsoft.WindowsAppSDK NuGet package version is 1.7.250513003 or later. If not, change to it
After these changes, you are able to use the Windows AI models in your application. Don't forget to match the platform of the app to the platform you are using (ARM64 or x64). You should be aware that the Windows AI models only work on Copilot+PCs with a Neural Processing Unit (NPU) capable of at least 40+TOPs of performance.
The next step is to add the UI for our app. In MainWindow.xaml, add this code
We have one button to paste the image from the clipboard, an image to display the pasted image, and a TextBlock to show the converted table. At the bottom, there's a status bar.
The code to paste the image from the clipboard is:
We check if there is an image in the clipboard. If there is no image, we display a message in the Status Bar and return. If there is an image, we get a reference to the stream, open it, read it into a Bitmap, convert the bitmap to a standard format, assign the bitmap to the Image Source and process the image to recognize the table and add it to the TextBlock.
Before using the OCR, it must be initialized. That is done in the InitializeRecognizer method:
This method disables the Paste button and gets the TextRecognizer state with GetReadyState. If this method returns NotSupportedOnCurrentSystem or DisabledByUser, a message is displayed in the status bar and returns. If there is an update for the model, this method will return EnsureNeeded and we must call EnsureReadyAsync. EnsureReadyAsync will download the model. As this can be a lenghty operation, it can return the progress of the operation, which is shown in the Progress handler. Once the model finishes downloading, an instance is initialized with CreateAsync. SetButtonEnabled is:
The InitializeRecognizer method is called when the UI is loaded. In the construtor of MainWindow.xaml.cs, add:
RecognizeAndAddTable is:
The two first lines is what you need to recognize the text in the image: create an ImageBuffer and pass it as a parameter to _textRecognizer.RecognizeTextFromImage. This method will return you all the recognized text in the Lines property. Each RecognizedLine in the result will have the text, bounding box and words for the piece of recognized text.
In our case, we don't need the individual words, just the text and the bounding boxes. We transform the lines property into an array of RecognizedCell instances, more suited for our purposes:
Then, we set the Row and Column properties of each element and create the table to add to the TextBlock. SetRows will set the row for each element:
SetRows sorts the cells by their top position. Then, it walks through the cells verifying if the top is greater than the highest one found until now. If it is, we determine that a new row has started. This function returns the number of rows, so it can be used later, when creating the table.
SetCols is very similar to SetRows, the differences are that the elements are sorted by their left position and organized in a way that we check if each element has a position to the right of the largest one found until now.
CreateTable creates an ASCII table, using '+', '-' and '|' characters as the borders:
This method will get the column widths by grouping the columns and getting the maximum text length for each one. Then, it will create the table by assembling the table top (that will be used also to separate the header and to close the table at the bottom), the header and then will add the rows to the table. GetLine will generate the table line with the row data:
With that in place, you can run the program and convert an image to the ASCII table:
As you can see, it's very easy to recognize text in an image. These two lines do all the hard work and the rest of the program is just a matter of arranging the recognized text the way you want:
The full source code for this article is at https://github.com/bsonnino/ImageToTable
1 thought on “Transforming an image into a table with Windows OCR”