WPF> PDF via PDFSharp.Xps: repair the output of hyperlinks
A short post to continue to my previous post about generating PDF from a WPF application using PDFSharp. As described in that article, generation is done using FlowDocument as an intermediary. In FlowDocument we can use the Hyperlink to display different types of hyperlinks, but it turned out that the version I used PDFSharp.Xps converter bluntly ignores attached to elements XpsElement attributes FixedPage_NavigateUri .
I spent some time sorting out the PDF 1.4 output format, but so far I have not been able to figure out how to fix the print correctly in the PdfContentWriter of the projectPDFSharp.Xps .
Under the cut, a simpler solution is presented, namely the imposition of a hyperlink to the text in the form of Link Annotation . Also at the end of the article you will find the result of my research on the topic of a “kosher” solution to the problem, through the introduction of primitives into the PDF output process.
Solution via Link Annotation
Here is the link to the commit with fix. As the teaser wrote, in the PdfContentWriter code , I added the creation of Link Annotation. I did this in the WritePath (...) method (see code below).
// Checking is there a link attached with this Path
if (path.FixedPage_NavigateUri != null && !string.IsNullOrEmpty(path.FixedPage_NavigateUri.Trim()))
{
var bounds = path.Data.GetBoundingBox();
var xpsPage = path.Parent as FixedPage;
if (xpsPage != null)
{
var pxToPtScale = xpsPage.PointHeight/xpsPage.Height;
try
{
var uri = new Uri(path.FixedPage_NavigateUri);
page.AddWebLink(
new PdfRectangle(bounds.Left*pxToPtScale, page.Height - bounds.Top*pxToPtScale,
bounds.Right*pxToPtScale, page.Height - bounds.Bottom*pxToPtScale),
uri.AbsoluteUri);
}
catch (Exception)
{
Debug.Assert(false, "WritePath(...) > Invalid URI string provided");
}
}
}
In this code, I just get the borders of the Path object just added to the PDF page and I do this only for those Paths that have a non-empty value FixedPage_NavigateUri. As it turned out, the vertical axis of the PDF sheet is directed opposite to the same axis in XPS, so we subtract the vertical coordinates of the block border from the page height. Next, the resulting coordinates are translated from screen pixels to points. I suspect that the corresponding coefficient depends on the resolution of screen fonts, so we calculate it dynamically. The link attached to the Path is passed through the Uri class to verify that the link is valid. Perhaps there is a more reliable / efficient / functional way to convert URIs. We use this method for now, as the simplest. If the link address is invalid, then simply write a message to the Debug console. You can also add a logging code here.
The result of the converter with such a patch is presented in the picture in the teaser of the article. Pay attention to the black border around the link. This is the created link annotation. The presence of a black border is a problem that can be solved at least by post-processing the created PDF. It will show the annotation block markup in unencrypted form.
16 0 obj << / Type / Annot / NM (11aabcc9-2402-4718-8184-7ffb9bbb031c) / M (D: 20131119233814 + 04'00 ') / Subtype / Link /Rect[81.885 64.185 158.123 50.55] / BS <> / Border [0 0 0] / A <> >> endobj
I suspect that in this markup the text "/ Border [0 0 0]" defines the RGB components of the color border.
Investigation Results
The solution through reference annotation lay on the surface. The only difficulty was determining the correct coordinates. But the solution is not the best. It would be more correct to fix the output of the primitives itself, and not overlay a crutch in the form of an annotation on top of the outputted Path object. As you can see in the picture at the beginning of the article, by default this annotation is displayed with an ugly black border.
So I downloaded the PDF v specification. 1.4 , opened the PDFSharp and PDFSharp.Xps projects and began to study the code.
In the PdfLinkAnnotation class , I came across a view code
internal override void WriteObject(PdfWriter writer)
{
// ... //
switch (this.linkType)
{
// ... //
case LinkType.Web:
//pdf.AppendFormat("/A<>\n", PdfEncoders.EncodeAsLiteral(this.url));
Elements[Keys.A] = new PdfLiteral("<>", //PdfEncoders.EncodeAsLiteral(this.url));
PdfEncoders.ToStringLiteral(this.url, PdfStringEncoding.WinAnsiEncoding, writer.SecurityHandler));
break;
// ... //
}
Googling on the line / A <brought me to the Analyzing PFs page , where I saw an approximate view of the layout of the link block.
6 0 obj << / Type / Action / S / URI / URI (http://stinkeye.org) >> endobj
Opening the resulting PDF, I found the following:
4 0 obj << / Type / Page / MediaBox [0 0 468 295.98] / Parent 3 0 R / Contents 5 0 R / Resources << / ProcSet [/ PDF / Text / ImageB / ImageC / ImageI] / ExtGState << / GS0 6 0 R / GS1 15 0 R >> / Font << / F0 10 0 R / F1 14 0 R >> >> / Annots [16 0 R] / Group << / CS / DeviceRGB / S / Transparency / I false / K false >> >> endobj
This is a page layout unit.
5 0 obj << / Length 1114 / Filter / FlateDecode >> stream xњнYЫn? 7?} ПWрҐ /? М? п $ P? ђT; Ї ў? p} K ‹Zm? # @ tx? Г “] 'СН? Ћ3? ®юg? ± і3¶ј№ыkіъwуpіyxГ? Pg? АЯY№“ Brother = v ..... ..... Џ ?? kCh ~ tsch “LA • muw {Yf? LgQYyu ?? ?? ah! DBjw $ d'bs? K¬¦¤YpD¤oѓ $ · A? Cyu ?? Pђ”:? € Ђl2и? FY <ё › w? U`? oШЎdvђ¶н {1Фў † zHEЃ? о <.? dnW? nЯl? yy> Я \ Ч? Цѕ? i? sп endstream endobj
An ellipsis hides text that is not supported by the markup of a habrahabr. There are a lot of unprintable characters encoded by WinAnsi. All PDF primitives and Unicode text created by the converter are translated into it, in other words, it is the raw content of the binor stream. Consequently, there is unlikely to be anything interesting. Let's go debase.
We set a break in PdfContentWriter.WritePath (Path path) . For this break point, add the condition
path.FixedPage_NavigateUri! = null &&! string.IsNullOrEmpty (path.FixedPage_NavigateUri)
so as not to press on F5 again.
After we parsed the template and clicked on the Print button in the main window, we will fall into this break point and be able to see the contents of the primitive stream in text form. There will be something like the following text.
q% - BeginContent 0.75 0 0 -0.75 0 295.98 cm -100 Tz q% - begin Glyphs 0 0 0 rg / GS0 gs BT / F0 -1 Tf 24 0 0 24 18.18 40.1867 Tm 0 0 Td <002B0048004F004F0052000F0003002B0044004500550044004B0044004500550004> Tj ET Q% - end Glyphs q% - begin Glyphs 0 0 0 rg / GS0 gs BT / F1 -1 Tf 16 0 0 16 18.18 87.3933 Tm 0 0 Td <0028005B005300480055004C005000480051> Tj 4.865 0 Td <0057> Tj 0.34 0 Td <004C0051004A0003005A004C> Tj 2.661 0 Td <0057> Tj 0.34 0 Td <004B000300470052> Tj 1.936 0 Td <0057> Tj 0.34 0 Td <002F004C00540058004C0047000F00030029004F0052005A0027005200460058005000480051> Tj 9.836 0 Td <0057> Tj 0.34 0 Td <000300440051004700030033002700290036004B004400550053> Tj ET Q% - end Glyphs % ...% q% - begin canvas 1 0 0 1 18.18 145.44 cm q% - begin Path 1 0 0 1 5 10.4533 cm 0 0.204 0.506 rg 5 2.5 m 5 3.88 3.88 5 2.5 5 s 1.12 5 0 3.88 0 2.5 s 0 1.12 1.12 0 2.5 0 s 3.88 0 5 1.12 5 2.5 s h f * Q% - end Path q% - begin Glyphs 0 0.204 0.506 rg / GS0 gs BT / F0 -1 Tf 14 0 0 14 20 17.8367 Tm 0 0 Td <00270052004600580050004800510057000300260052005100570048005B0057> Tj ET Q% - end Glyphs % ...% Q% - end canvas % ...% q% - begin Path / GS1 gs 0 0 0 rg 109.18 309.06 101.65 18.18 re f Q% - end Path
What do we see here? PostScript q - Q instructions are graphical contexts. They are nested inside each other and the indentation obviously plays a role here (yes, probably all of this is in the specification for the PDF format, but I don’t have time yet to study it deeply). How to embed markup for a link block in a Path markup block
<< / Type / Action / S / URI / URI (http://stinkeye.org) >>
I have not figured it out yet. The closest layout option was found in the specification (p. 635, example 9.14):
/ Link << / MCID 1 >>% Marked-content sequence 1 (link) BDC% Begin marked-content sequence 0.7 w% Set line width [] 0 d% Solid dash pattern 111.094 751.8587 m% Move to beginning of underline 174.486 751.8587 l% Draw underline 0.0 0.0 1.0 RG% Set stroking color to blue S% Stroke underline BT% Begin text object 14 0 0 14 111.094 753.976 Tm% Set text matrix 0.0 0.0 1.0 rg% Set nonstroking color to blue (with a link) Tj% Show text of link ET% End text object EMC% End marked-content sequence
In this markup I can not understand what is " << / MCID 1 >> ". It is also not clear how and where this markup block will be correctly placed.
I will be very grateful for the help in implementing the failed fix. Thanks for attention!
Only registered users can participate in the survey. Please come in.
There are plans to write examples of using the dotLiquid template engine in practice. Any interest?
- 59.2% Yes 16
- 37% No 10
- 3.7% I will propose another topic in the comments 1