Convert a PDF into a series of images using C# and GhostScript

An application I was recently working on received PDF files from a webservice which it then needed to store in a database. I wanted the ability to display previews of these documents within the application. While there are a number of solutions for creating PDF files from C#, options for viewing a PDF within your application is much more limited, unless you purchase expensive commercial products, or use COM interop to embed Acrobat Reader into your application.

This article describes an alternate solution, in which the pages in a PDF are converted into images using GhostScript, from where you can then display them in your application.

In order to avoid huge walls of text, this article has been split into two parts, the first dealing with the actual conversion of a PDF, and the second demonstrates how to extend the ImageBox control to display the images.

Caveat emptor

Before we start, some quick points.

  • The method I'm about to demonstrate converts into page of the PDF into an image. This means that it is very suitable for viewing, but interactive elements such as forms, hyperlinks and even good old text selection are not available.
  • GhostScript has a number of licenses associated with it but I can't find any information of the pricing of commercial licenses.
  • The GhostScript API Integration library used by this project isn't complete and I'm not going to go into the bells and whistles of how it works in this pair of articles - once I've completed the outstanding functionality I'll create a new article for it.

Getting Started

You can download the two libraries used in this article from the links below, these are:

  • Cyotek.GhostScript - core library providing GhostScript integration support
  • Cyotek.GhostScript.PdfConversion - support library for converting a PDF document into images

Please note that the native GhostScript DLL is not included in these downloads, you will need to obtain that from the GhostScript project page.

Using the GhostScriptAPI class

As mentioned above, the core GhostScript library isn't complete yet, so I'll just give a description of the basic functionality required by the conversion library.

The GhostScriptAPI class handles all communication with GhostScript. When you create an instance of the class, it automatically calls gsapi_new_instance in the native GhostScript DLL. When the class is disposed, it will automatically release any handles and calls the native gsapi_exit and gsapi_delete_instance methods.

In order to actually call GhostScript, you call the Execute method, passing in either a string array of all the arguments to pass to GhostScript, or a typed dictionary of commands and values. The GhostScriptCommand enum contains most of the commands supported by GhostScript, which may be a preferable approach rather than trying to remember the parameter names themselves.

Defining conversion settings

The Pdf2ImageSettings class allows you to customize various properties of the output image. The following properties are available:

  • AntiAliasMode - specifies the antialiasing level between Low, Medium and High. This internally will set the dTextAlphaBits and dGraphicsAlphaBits GhostScript switches to appropriate values.
  • Dpi - dots per inch. Internally sets the r switch. This property is not used if a paper size is set.
  • GridFitMode - controls the text readability mode. Internally sets the dGridFitTT switch.
  • ImageFormat - specifies the output image format. Internally sets the sDEVICE switch.
  • PaperSize - specifies a paper size from one of the standard sizes supported by GhostScript.
  • TrimMode - specifies how the image should be sized. Your milage may vary if you try and use the paper size option. Internally sets either the dFIXEDMEDIA and sPAPERSIZE or the dUseCropBox or the dUseTrimBox switches.

Typical settings could look like this:

      Pdf2ImageSettings settings;

      settings = new Pdf2ImageSettings();
      settings.AntiAliasMode = AntiAliasMode.High;
      settings.Dpi = 300;
      settings.GridFitMode = GridFitMode.Topological;
      settings.ImageFormat = ImageFormat.Png24;
      settings.TrimMode = PdfTrimMode.CropBox;

Converting the PDF

To convert a PDF file into a series of images, use the Pdf2Image class. The following properties and methods are offered:

  • ConvertPdfPageToImage - converts a given page in the PDF into an image which is saved to disk
  • GetImage - converts a page in the PDF into an image and returns the image
  • GetImages - converts a range of pages into the PDF into images and returns an image array
  • PageCount - returns the number of pages in the source PDF
  • PdfFilename - returns or sets the filename of the PDF document to convert
  • PdfPassword - returns or sets the password of the PDF document to convert
  • Settings - returns or sets the settings object described above

A typical example to convert the first image in a PDF document:

Bitmap firstPage = new Pdf2Image("sample.pdf").GetImage();

The inner workings

Most of the code in the class is taken up with the GetConversionArguments method. This method looks at the various properties of the conversion such as output format, quality, etc, and returns the appropriate commands to pass to GhostScript:

protected virtual IDictionary<GhostScriptCommand, object> GetConversionArguments(string pdfFileName, string outputImageFileName, int pageNumber, string password, Pdf2ImageSettings settings)
    {
      IDictionary<GhostScriptCommand, object> arguments;

      arguments = new Dictionary<GhostScriptCommand, object>();

      // basic GhostScript setup
      arguments.Add(GhostScriptCommand.Silent, null);
      arguments.Add(GhostScriptCommand.Safer, null);
      arguments.Add(GhostScriptCommand.Batch, null);
      arguments.Add(GhostScriptCommand.NoPause, null);

      // specify the output
      arguments.Add(GhostScriptCommand.Device, GhostScriptAPI.GetDeviceName(settings.ImageFormat));
      arguments.Add(GhostScriptCommand.OutputFile, outputImageFileName);

      // page numbers
      arguments.Add(GhostScriptCommand.FirstPage, pageNumber);
      arguments.Add(GhostScriptCommand.LastPage, pageNumber);

      // graphics options
      arguments.Add(GhostScriptCommand.UseCIEColor, null);

      if (settings.AntiAliasMode != AntiAliasMode.None)
      {
        arguments.Add(GhostScriptCommand.TextAlphaBits, settings.AntiAliasMode);
        arguments.Add(GhostScriptCommand.GraphicsAlphaBits, settings.AntiAliasMode);
      }

      arguments.Add(GhostScriptCommand.GridToFitTT, settings.GridFitMode);

      // image size
      if (settings.TrimMode != PdfTrimMode.PaperSize)
        arguments.Add(GhostScriptCommand.Resolution, settings.Dpi.ToString());

      switch (settings.TrimMode)
      {
        case PdfTrimMode.PaperSize:
          if (settings.PaperSize != PaperSize.Default)
          {
            arguments.Add(GhostScriptCommand.FixedMedia, true);
            arguments.Add(GhostScriptCommand.PaperSize, settings.PaperSize);
          }
          break;
        case PdfTrimMode.TrimBox:
          arguments.Add(GhostScriptCommand.UseTrimBox, true);
          break;
        case PdfTrimMode.CropBox:
          arguments.Add(GhostScriptCommand.UseCropBox, true);
          break;
      }

      // pdf password
      if (!string.IsNullOrEmpty(password))
        arguments.Add(GhostScriptCommand.PDFPassword, password);

      // pdf filename
      arguments.Add(GhostScriptCommand.InputFile, pdfFileName);

      return arguments;
    }
    

As you can see from the method above, the commands are being returned as a strongly typed dictionary - the GhostScriptAPI class will convert these into the correct GhostScript commands, but the enum is much easier to work with from your code! The following is an example of the typical GhostScript commands to convert a single page in a PDF document:

-q -dSAFER -dBATCH -dNOPAUSE -sDEVICE=png16m -sOutputFile=tmp78BC.tmp -dFirstPage=1 -dLastPage=1 -dUseCIEColor -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -dGridFitTT=2 -r150 -dUseCropBox=true sample.pdf

The next step is to call GhostScript and convert the PDF which is done using the ConvertPdfPageToImage method:

    public void ConvertPdfPageToImage(string outputFileName, int pageNumber)
    {
      if (pageNumber < 1 || pageNumber > this.PageCount)
        throw new ArgumentException("Page number is out of bounds", "pageNumber");

      using (GhostScriptAPI api = new GhostScriptAPI())
        api.Execute(this.GetConversionArguments(this._pdfFileName, outputFileName, pageNumber, this.PdfPassword, this.Settings));
    }

As you can see, this is a very simple call - create an instance of the GhostScriptAPI class and then pass in the list of parameters to execute. The GhostScriptAPI class takes care of everything else.

Once the file is saved to disk, you can then load it into a Bitmap or Image object for use in your application. Don't forget to delete the file when you are finished with it!

Alternatively, the GetImage method will convert the file and return the bitmap image for you, automatically deleting the temporary file. This saves you from having to worry about providing and deleting the output file, but it does mean you are responsible for disposing of the returned bitmap.

    public Bitmap GetImage(int pageNumber)
    {
      Bitmap result;
      string workFile;

      if (pageNumber < 1 || pageNumber > this.PageCount)
        throw new ArgumentException("Page number is out of bounds", "pageNumber");

      workFile = Path.GetTempFileName();

      try
      {
        this.ConvertPdfPageToImage(workFile, pageNumber);
        using (FileStream stream = new FileStream(workFile, FileMode.Open, FileAccess.Read))
          result = new Bitmap(stream);
      }
      finally
      {
        File.Delete(workFile);
      }

      return result;
    }

You could also convert a range of pages at once using the GetImages method:

public Bitmap[] GetImages(int startPage, int lastPage)
{
  List<Bitmap> results;

  if (startPage < 1 || startPage > this.PageCount)
    throw new ArgumentException("Start page number is out of bounds", "startPage");

  if (lastPage < 1 || lastPage > this.PageCount)
    throw new ArgumentException("Last page number is out of bounds", "lastPage");
  else if (lastPage < startPage)
    throw new ArgumentException("Last page cannot be less than start page", "lastPage");

  results = new List<Bitmap>();
  for (int i = startPage; i <= lastPage; i++)
    results.Add(this.GetImage(i));

  return results.ToArray();
}

In conclusion

The above methods provide a simple way of providing basic PDF viewing in your applications. In the next part of this series, we describe how to extend the ImageBox component to support conversion and navigation.

Update 10/07/2012

Downloads

Filename Description Version Release Date
Cyotek.GhostScript.zip

Work in progress class library for providing GhostScript integration in a .NET application.

04/09/2011 Download
Cyotek.GhostScript.PdfConversion.zip

Class library for converting PDF files into images using GhostScript. Also requires the Cyotek.GhostScript assembly.

04/09/2011 Download

Leave a Comment

While we appreciate comments from our users, please follow our posting guidelines. Have you tried the Cyotek Forums for support from Cyotek and the community?

Comments

DotNetShoutout

# Reply

[b]Convert a PDF into a series of images using C# and GhostScript[/b] Thank you for submitting this cool story - Trackback from DotNetShoutout

DotNetKicks.com

# Reply

[b]Convert a PDF into a series of images using C# and GhostScript[/b] You've been kicked (a good thing) - Trackback from DotNetKicks.com

MichaW

# Reply

Thank you for this. Please change for support of other languages/codepages, i.e. german Umlaute. Thanks

Gravatar

Richard Moss

# Reply

Hello,

Thanks for the comment - could you clarify what were having a problem with so I can look into it further?

Regards; Richard Moss

Muhammad

# Reply

Thank you very much. You made my day dude.

Gravatar

Lorena

# Reply

Hi,

I used your library into sharepoint 2010 workflow on server on 64 bit and I get this error:

Unable to load DLL 'gsdll32.dll': The specified module could not be found. (Exception from HRESULT: 0x8007007E)?

Why? On the server I installed ghostscript x64 and x86...

Gravatar

Richard Moss

# Reply

Hello,

While I haven't tested this in SharePoint, or indeed on a server full stop, the most likely cause is gsdll32.dll isn't in your path - either copy it to the folder where your assemblies are deployed, or into a folder which is part of your path, such as \Windows\SysWOW64

Hope this helps; Richard Moss

Gravatar

chandrasekhar

# Reply

Hi, I am getting the following error when i load a pdf file into Picturebox "Failed to process GhostScript command." Please help me ...

Gravatar

Richard Moss

# Reply

This error is thrown when Ghostscript cannot perform the action was requested of it. There could be any number of causes, with such limited information I have no way of knowing. Check the the exception object for the Ghostscript result code and the arguments provided.

Gravatar

chandrasekhar

# Reply

Hi How to convert pdf to bmp images in c# . Any examples for converting of pdf to bmp images? Please help me

Gravatar

Richard Moss

# Reply

That's exactly what this article does; converts a PDF into a series of images. To save them into bitmaps, call the Save method of one of the images passing in the ImageFormat.Bmp format.

Gravatar

Rafi

# Reply

Hi. I am getting this error when i try to convert pdf2image(.GetImage()). The Error is "Unable to find an entry point named 'gsapi_new_instance' in DLL 'gsdll32.dll'.". Note : Web Application bin dir have gsdll32.dll and i am not installed ghostscript.exe. How can i get it?

Gravatar

Richard Moss

# Reply

As I've already noted to another commenter, I haven't tested this on a server environment or in a web application and so I can't assist on deployment issues. The error would indicate that while a DLL is present and loaded, it doesn't actually contain the function that is being requested of it. That would most likely indicate the wrong version of Ghostscript being loaded or perhaps a custom build that doesn't export the function. Or it could be an error in the C# function declaration. Sorry for not being specific, but there's no single cause.

od

# Reply

I had the same issue. I fixed it by installing an older version (8.54). As far as deployment is concerned, you can copy the folder gs\gs8.54\lib independently and set an environment variable "GS_LIB" with the value pointing to the lib folder.

Gravatar

AmityRooso

# Reply

Is this dll will work against IIS 7.5 and 64 bit OS/Windows 7?

Gravatar

Richard Moss

# Reply

Hello,

It should work if your application is 32bit as it's calling the 32bit version of Ghostscript (I haven't looked to see if there's a 64bit version of the library - if there is just replace the calls). As long as the correct DLL's are in your path, then it should work fine. But as I've noted multiple times above, I haven't tested this in a server environment at this point in time so I can't say for definite.

Regards; Richard Moss

Gravatar

Rafi

# Reply

You can solve this issue with following steps(If you are using Windows 7(64 bit) OS and IIS7 or above manager ) Pdf2Image should work with this settings

1)Visual Studio 2008 Project Properties : Go Build->Configuration Manager... Change 'AnyCPU' to x86 Click OK

2) IIS7 or above manager: Application Pool and select “Advanced Settings…” Change the “Enable 32-bit Applications” to True Click OK

Gravatar

Richard Moss

# Reply

Thanks for the information, hopefully this will help others!

IIS6 can also be set to run in 32bit mode, although it's a little more complicated than 7.

Irene

# Reply

Thank you so much! It works!

Raymond Lai

# Reply

Hi Richard, I got sever errors when I try to compile your 2 projects in VS2005. 'Cyotek.GhostScript.GhostScriptException.Arguments.get' must declare a body because it is not marked abstract or extern' d:\Cyotek.GhostScript\GhostScriptException.cs Ln:74 Col:33 yotek.GhostScript

Gravatar

Richard Moss

# Reply

Raymond,

This is because the sample is using auto generated properties. Essentially, the C# 3+ compiler will let you just declare a property and it will take care of the backing field and body.

In the example you quoted, line 74 is this:

public string[] Arguments { get; protected set; }

In order to make it compile, you'll need to "expand" this, for example:

private string[] _arguments;

public string[] Arguments
{
  get { return _arguments; }
  set { _arguments = value; }
}

As far as I know, there's no way of making auto generated properties work in that version of Visual Studio, sorry.

Regards; Richard Moss

S. Vikneshwar

# Reply

Can i get the same code for asp.net

Gravatar

Vikneshwar

# Reply

Can the same thing be done with asp.net. I tried but couldnt? Can you post any updates and let notify me with that. In case if it can be done can you mail it to my id?? Im in real need of it. My mail id is s.viknehswar@gmail.com

Thanks and regards. S. Vikneshwar

Gravatar

Richard Moss

# Reply

Hello,

For a sample project which works in ASP.NET, please see the following article: http://cyotek.com/blog/displaying-the-contents-of-a-pdf-file-in-an-asp-net-application-using-ghostscript

Gravatar

Sai Cyouki

# Reply

Hi, The GhostScriptAPI class looks like only support English file name, when I used some asian characters file names such as Japanese, Chinese as parameters to call it, and it returned the error message "Failed to process GhostScript command." Could you give me some suggestion? Thank you.

Gravatar

Richard Moss

# Reply

Hello,

According to this comment (http://sourceforge.net/projects/ghostscript/forums/forum/5451/topic/3303450) Ghostscript is an ANSI application and so doesn't support Unicode. That post is from 2009, not sure if anything has changed since then, but unfortunately it nothing has changed I don't think you could do anything about it unless you compiled the Ghostscript source code yourself and made whatever changes were required for Unicode support.

Regards; Richard Moss

Gravatar

Sathishkumar

# Reply

Hi All, I have one requirement like open the existing PDF document and edit - basically add some image on the particular place and save them in PDF file. 1) Is it possible to edit the pdf after conversion into image in your image editor then finally generate the again the image into pdf? 2) Is there any library where i can load the existing pdf and edit them and finally save into the pdf document?

Please let me know your suggestion for above requirement apart from doubt also

Thanks and Regards, Sathish

Gravatar

Richard Moss

# Reply

Hello,

I'm not aware of any libraries that support extended editing of a PDF file, although there's quite a few that would let you create them. Open source ones include PDFSharp and iTextSharp. I've also used cete's Dynamic PDF extensively (this is a commercial product) and while it would allow you create PDF files it has a disappointingly limiting API and is overall a poor product. We are currently evaluating TallPDF which already seems to be an order of magnitude better than cete's offering, although I personally haven't used it yet. However, there's lots and lots of libraries out there!

Sorry I can't be of more help. I've done lots of work generating PDF's, but not editing them :)

Regards; Richard Moss

Rex

# Reply

Hi I am using your library to build a console application but the library always throws a GhostScriptException. I have checked the output and found the arguments are supplied correctly. The output file was indeed generated to the designated directory but the result variable held a value of -100 and an exception was thrown.

I am using Windows 7 (64-bit), the 32-bit version of gsdll32.dll and Visual Studio 2010. I have also set the platform target to x86 as suggested above

Gravatar

Richard Moss

# Reply

Hello,

This appears to be a generic code for which there could be any number of causes, afraid I can't really help you here. I did find this post (http://stackoverflow.com/questions/4340407/what-is-causing-ghostscript-to-return-an-error-of-100) which may help?

Regards; Richard Moss

Pete

# Reply

Where do you place the gsdll32.dll for a web application? in the bin folder? or make a custom folder like lib? I tried this and then adding network service and IIS read/write permissions to the folder but I get a Unable to load DLL 'gsdll32.dll': error.

Gravatar

Richard Moss

# Reply

Pete,

Did you check my article on my experiences of testing using ASP.net (http://cyotek.com/blog/displaying-the-contents-of-a-pdf-file-in-an-asp-net-application-using-ghostscript)?

It needs to be in the bin folder (or system32 worked as well as I recall), but the important thing is that you have IIS set to run your app as 32bit if you are using the 32bit dll. From your comment, it sounds like either it's not in the path, or it's running in 64bit - take a peek at the article and see if this helps you.

Regards; Richard Moss

Vincent L

# Reply

Hello! I've encountered an issue wherein the Ghostscript fails when trying to load a PDF stored on a network share. It returns a generic -100 error code. Have you any ideas on how to solve this? Am I correct in that DLLImport loads Ghostscript as a separate process, away from ASP.NET process? I'm concerned that permissions are preventing Ghostscript from loading the file. Alternatively, can ahe PDF be accessed from a Stream object of some sort?

Gravatar

Gregory

# Reply

Hello! Is there any possibility to convert pdf obtained from Stream(not from file)? Thank you!

Gravatar

Richard Moss

# Reply

Hello,

As GhostScript is an unmanaged third party library I would expect that no, you can't use streams. I'd expect you'd need to save your stream to a temporary file and call GS with that. At the same time however, I have only tested GS with files, it's possible it has additional support for other inputs - you'd have to check it's documentation.

Regards; Richard Moss

Gravatar

Ulrik

# Reply

I'm trying to convert a PDF to a PNG with a fixed size in pixels. My problem is that (no matter what I do) the images are always generated with the size 612 x 792 pixels (the default papersize = letter)? I set the TrimMode to PaperSize and I've tried A4, A3 and A0. My code:

var converter = new Pdf2Image(Document.FullName, new Pdf2ImageSettings { ImageFormat = Cyotek.GhostScript.ImageFormat.Png24, GridFitMode = GridFitMode.Topological, TrimMode = PdfTrimMode.PaperSize, PaperSize = PaperSize.A0 });

Gravatar

valver

# Reply

Thanks for this great project! How can I modify the code (or part of it) to increase the speed of the PDF to image convertion? I need to convert some very big PDF files and/or with lots of pages.

Thanks.

Lt.Dan

# Reply

Great work! And with some code changes for multi page pdf, the speed is 10 time faster. Thanks!

Lt.Dan

# Reply

Great work! And with some code changes for multi page pdf, the speed is 10 time faster. Thanks!

Lt.Dan

# Reply

Great work!And with some code changes for multi page pdf, the speed is 10 time faster.Thanks!

Gravatar

Sankari

# Reply

Hi, Dan my problem is regarding your smartness i need to process more no of pages but it makes performance issue i.e it going very slow when processing more number of pages please tell me how to solve this problem

Gravatar

Ener

# Reply

I have an error in the "PdfImageBoxSample" project...

Error: Failed to process GhostScript command.

help me

Gravatar

Janardhan

# Reply

Doest it supports for Parallel programming? poor performance in processing large file, though its free, worth to implement where performance is not a matter.

Does it support for all the versions of PDF?

Gravatar

Richard Moss

# Reply

The source code I have provided doesn't explicitly include this and I don't generally include such language features in my samples as I want them to be a bit more accessible and also the use of such code has its own overheads and so shouldn't just be thrown in without thought. While I haven't tested it, I don't see any reason why, for example, the GetImages method couldn't use Parallel.For. But a lot of the code on this site is concept code or example code to demonstrate techniques, in which case obfuscating them with obscure language constructs is not my goal.

In regards to the PDF version I don't know - you'd need to consult the GhostScript documentation for that.

Regards; Richard Moss

userMVC

# Reply

I use Ghostscript in my asp.net mvc application. Wich is placed on the server. Ghostscript works but not all PDF files for some of the files I get the message Failed to Process .. Ghostscript command. "For others it is all good. I use the following settings   Pdf2ImageSettings see = new Pdf2ImageSettings              {                 Dpi = 100;                  ImageFormat = ImageFormat.Jpeg,                  PaperSize = PaperSize.Foolscap              }; Has anyone had this problem

Gravatar

aboy

# Reply

Great work!

Only one issue: the GetPdfPageCount() method does not work with all pdf's - for example encrypted documents or pdf with document level attachments. I used iTextSharp instead:

private int GetPdfPageCount(string fileName)
{
    iTextSharp.text.pdf.PdfReader pdfReader = new iTextSharp.text.pdf.PdfReader(fileName);
    return pdfReader.NumberOfPages;
}
Gravatar

Richard Moss

# Reply

Thanks for the comments! As I recall (I haven't touched this code for a great while now), I took that particular code from elsewhere - I have no knowledge of the PDF format myself so wasn't aware of encryption or otherwise. Thanks for pointing this out though! I do use iTextSharp in other projects though, I'll make sure to use that instead should I revisit this code down the line.

Thanks again; Richard Moss

Gravatar

wann

# Reply

Is it possible for you to write a sample code for beginners like a c# code that uses your libraries. I used package manager to Install-Package GhostScriptSharp but most of the classes show error under their names. I can clearly see GhostscriptSharp added to reference. What do I need to do make those errors go away. These types are not recognized e.g GhostScriptCommand, GhostScriptAPI, AntiAliasMode etc. I have added both projects provided to my solution too. please help. Thanks.

Gravatar

Richard Moss

# Reply

Hello,

Sorry, I'm afraid I can't really help - that isn't one of our packages and I'm not able to provide support for other peoples packages. I have seen this sort of thing before, typically when a .NET 4.0 reference is added to a .NET 3.5 project - perhaps try re-targeting your project to the latest version of .NET and see if that resolves your issue.

Regards; Richard Moss

Armando

# Reply

Hey Richard, Thanks a lot for this brilliant project, I've been looking everywhere for some way to convert a PDF to images, one would think it should be pretty simple, but other than expend $500+ on some complex component I was unable to find anything until I found your blog. Anyways, I've been getting an awful -7 error code on the NativeMethods.gsapi_init_with_args call. I've tried several different PDFs and settings combinations, but no joy. Any thoughts or pointers to a possible source for the issue would be greatly appreciated. The latest settings I've been trying are this: AntiAliasMode = AntiAliasMode.None; Dpi = 100; GridFitMode = GridFitMode.None; ImageFormat = CyotekImageFormat.Jpeg; TrimMode = PdfTrimMode.PaperSize;

Again, thanks and keep up the good work man

Gravatar

Sankari

# Reply

hi it very nice and so good it perform well on single request at a time and 2 request with small file size but it through an exception as "Failed to process GhostScript command." when processing 2 request with each file have 992kb file size how can solve this problem help me.