
An interesting little problem we’ve had to implement for a client recently; how to take an HTML (passed through as an email attachment), and convert it to a PDF in a VB.net command line program.
There are lots of 3rd party libraries around that do that, but typically, they’re expensive and not terribly reliable, formatting the HTML that we wanted to format quite poorly. So we came up with this:
---------------
Public Function ConvertHTML(oldfilename As String, newfilename As String) As String
Dim pngfilename As String = Path.GetTempFileName()
Dim res As String = "" ' = ok
Try
Using wb As System.Windows.Forms.WebBrowser = New System.Windows.Forms.WebBrowser
wb.ScrollBarsEnabled = False
wb.ScriptErrorsSuppressed = True
wb.Navigate(oldfilename)
While Not (wb.ReadyState = WebBrowserReadyState.Complete)
Application.DoEvents()
End While
wb.Width = wb.Document.Body.ScrollRectangle.Width
wb.Height = wb.Document.Body.ScrollRectangle.Height
If wb.Height > 3000 Then
wb.Height = 3000
End If
' Get a Bitmap representation of the webpage as it's rendered in the WebBrowser control
Dim b As Bitmap = New System.Drawing.Bitmap(wb.Width, wb.Height)
Dim hr As Integer = b.HorizontalResolution
Dim vr As Integer = b.VerticalResolution
wb.DrawToBitmap(b, New Rectangle(0, 0, wb.Width, wb.Height))
wb.Dispose()
If File.Exists(pngfilename) Then
File.Delete(pngfilename)
End If
b.Save(pngfilename, Imaging.ImageFormat.Png)
b.Dispose()
Using doc As PdfSharp.Pdf.PdfDocument = New PdfSharp.Pdf.PdfDocument
Dim page As PdfSharp.Pdf.PdfPage = New PdfSharp.Pdf.PdfPage()
page.Width = PdfSharp.Drawing.XUnit.FromInch(wb.Width / hr)
page.Height = PdfSharp.Drawing.XUnit.FromInch(wb.Height / vr)
doc.Pages.Add(page)
Dim xgr As PdfSharp.Drawing.XGraphics = PdfSharp.Drawing.XGraphics.FromPdfPage(page)
Dim img As PdfSharp.Drawing.XImage = PdfSharp.Drawing.XImage.FromFile(pngfilename)
xgr.DrawImage(img, 0, 0)
doc.Save(newfilename)
doc.Close()
img.Dispose()
xgr.Dispose()
End Using
End Using
Catch ex As Exception
res = "Error: " & ex.Message
Finally
If File.Exists(pngfilename) Then
File.Delete(pngfilename)
End If
End Try
Return res
End Function
—————
For this to work, you will need to add the following libraries to your project:
- PDFSharp (which is free to use; download from http://pdfsharp.codeplex.com/releases)
- System.Windows.Forms (even if you’re writing a console application like us)
- System.Drawing
Is there a catch? Not really. It uses undocumented functionality in the WebBrowser object (DrawToBitmap), but that’s been working correctly for several Microsoft releases of .NET (since at least 2010), and if the webpage is too large, it will truncate at 3000px (a restriction I built in to the above code as an error check; you can remove it if you wish, though there is a size restriction at which point DrawToBitmap will fail).
Our code writes to a single PDF page, purely because that what we needed for our client – with a little image manipulation, it should be simple to spread that across multiple PDF pages if required.