Find images on PDF pages on iPhone and iPad

July 02nd, 2012

There are situations when we need to know if a PDF page contains, images, where on the page are the images located, what is the size on the page of a displayed image or other information about images in a PDF page.
Some of this information is available directly in the PDF file, other information needs to be computed.

In order to know what images are painted on the page the page content stream needs to be parsed. The page content stream is identified in the page dictionary by the "/Contents" key. This key can be either a stream object or an array of stream objects. The CGPDF API does a little trick for us and always returns the value of the "/Contents" key as a stream object, merging the multiple stream objects in a single one.

1	CGPDFContentStreamRef contentStream = CGPDFContentStreamCreateWithPage(pdfPage);

Once retrieved the page content stream is parsed using the CGPDFScanner* methods. The PDF content stream scanner is created using the CGPDFScannerCreate method. This method takes as parameters the content stream to be parsed and a table of PDF operators. This table of PDF operators lets you associate PDF operators with methods in your code. During parsing, when an operator that is included in the table is encountered the corresponding method is called.
The PDF operators we're interested in are q, Q, cm and Do.

    CGPDFOperatorTableRef operatorTable = CGPDFOperatorTableCreate();
    CGPDFOperatorTableSetCallback(operatorTable, "q", &op_q);
    CGPDFOperatorTableSetCallback(operatorTable, "Q", &op_Q);
    CGPDFOperatorTableSetCallback(operatorTable, "cm", &op_cm);
    CGPDFOperatorTableSetCallback(operatorTable, "Do", &op_Do);
 
    CGPDFScannerRef contentStreamScanner = CGPDFScannerCreate(contentStream, operatorTable, self);
    CGPDFScannerScan(contentStreamScanner);
 
    CGPDFScannerRelease(contentStreamScanner);
    CGPDFOperatorTableRelease(operatorTable);

The q operator saves the current graphics state.

static void op_q(CGPDFScannerRef s, void *info) {
    PDFContentStreamProcessor *csp = (PDFContentStreamProcessor *)info;
 
    // Operator q saves the current graphic state on the stack.
    // Make of copy of the current graphic state (last object in the graphicStates array) and put it on the stack.
    PDFGraphicsState *currentGS = [csp.graphicsStates lastObject];
    PDFGraphicsState *newGS = [[PDFGraphicsState alloc] initWithGraphicState: currentGS];
    [csp.graphicsStates addObject: newGS];
    [newGS release];
}

The Q operator restores the current graphics state.

static void op_Q(CGPDFScannerRef s, void *info) {
    // Operator Q restores the previous graphic state.
    // The current graphic state is removed from the stack, the previous one becomes current
    PDFContentStreamProcessor *csp = (PDFContentStreamProcessor *)info;
    [csp.graphicsStates removeLastObject];
}

The cm operator updates the current transformation matrix by multiplying it with the matrix specified as parameter.

static void op_cm(CGPDFScannerRef s, void *info) {
    CGPDFReal a = -1, b = -1, c = -1, d = -1, tx = -1, ty = -1;
 
    if (!CGPDFScannerPopNumber(s, &ty)) {
        return;
    }
 
    if (!CGPDFScannerPopNumber(s, &tx)) {
        return;
    }
 
    if (!CGPDFScannerPopNumber(s, &d)) {
        return;
    }
 
    if (!CGPDFScannerPopNumber(s, &c)) {
        return;
    }
 
    if (!CGPDFScannerPopNumber(s, &b)) {
        return;
    }
 
    if (!CGPDFScannerPopNumber(s, &a)) {
        return;
    }
 
    // Operator cm multiplies the current transformation matrix with the matrix specifies as parameter.
    CGAffineTransform ctm = CGAffineTransformMake(a, b, c, d, tx, ty);
    PDFContentStreamProcessor *csp = (PDFContentStreamProcessor *)info;
    PDFGraphicsState *gs = [csp.graphicsStates lastObject];
    gs.currentTransformationMatrix = CGAffineTransformConcat(ctm, gs.currentTransformationMatrix);
}

The current transformation matrix is part of the graphics state, when a graphics state is restored the current transformation matrix is discarded and replaced with the one that was in place when the graphics state was saved. The graphics state has many other parameters but for our task we need only the current transformation matrix.

@implementation PDFGraphicsState
 
@synthesize currentTransformationMatrix;
 
- (id)init {
    self = [super init];
    if (self) {
        currentTransformationMatrix = CGAffineTransformMake(1, 0, 0, 1, 0, 0);
    }
 
    return self;
}
 
- (id)initWithGraphicState:(PDFGraphicsState *)gs {
    self = [super init];
    if (self) {
        self.currentTransformationMatrix = gs.currentTransformationMatrix;
    }
 
    return self;
}
 
@end

The Do operator is the one that paints the image on the page. This is where the interesting action happens.

static void op_Do(CGPDFScannerRef s, void *info) {
    const char *imageLabel;
 
    if (!CGPDFScannerPopName(s, &imageLabel)) {
        return;
    }
 
    PDFContentStreamProcessor *csp = (PDFContentStreamProcessor *)info;
    PDFGraphicsState *gs = [csp.graphicsStates lastObject];
    CGAffineTransform ctm = gs.currentTransformationMatrix;
 
    CGPDFContentStreamRef cs = CGPDFScannerGetContentStream(s);
    CGPDFObjectRef imageObject = CGPDFContentStreamGetResource(cs, "XObject", imageLabel);
    CGPDFStreamRef xObjectStream;
    if (CGPDFObjectGetValue(imageObject, kCGPDFObjectTypeStream, &xObjectStream)) {
        CGPDFDictionaryRef xObjectDictionary = CGPDFStreamGetDictionary(xObjectStream);
 
        const char *subtype;
        CGPDFDictionaryGetName(xObjectDictionary, "Subtype", &subtype);
        if (strcmp(subtype, "Image") == 0) {
            PDFImageInfo *imageInfo = [[PDFImageInfo alloc] init];
            imageInfo.imageID = [NSString stringWithCString: imageLabel encoding: NSASCIIStringEncoding];
 
            // Transform the image coordinates into page coordinates based on current transformation matrix.
            imageInfo.vertices[0] = CGPointApplyAffineTransform(CGPointMake(0, 0), ctm); // lower left
            imageInfo.vertices[1] = CGPointApplyAffineTransform(CGPointMake(1, 0), ctm); // lower right
            imageInfo.vertices[2] = CGPointApplyAffineTransform(CGPointMake(1, 1), ctm); // upper right
            imageInfo.vertices[3] = CGPointApplyAffineTransform(CGPointMake(0, 1), ctm); // upper left
 
            CGPDFInteger pixelWidth;
            if (CGPDFDictionaryGetInteger(xObjectDictionary, "Width", &pixelWidth)) {
                imageInfo.pixelWidth = pixelWidth;
            }            
            CGPDFInteger pixelHeight;
            if (CGPDFDictionaryGetInteger(xObjectDictionary, "Height", &pixelHeight)) {
                imageInfo.pixelHeight = pixelHeight;
            }
 
            CGPDFInteger bitsPerComponent;
            if (CGPDFDictionaryGetInteger(xObjectDictionary, "BitsPerComponent", &bitsPerComponent)) {
                imageInfo.bitsPerComponent = bitsPerComponent;
            }
 
            // Vertices 0 and 1 define the horizontal, vertices 1 and 2 define the vertical.
            imageInfo.displayWidth = sqrt((imageInfo.vertices[0].x - imageInfo.vertices[1].x) * (imageInfo.vertices[0].x - imageInfo.vertices[1].x) + 
                                          (imageInfo.vertices[0].y - imageInfo.vertices[1].y) * (imageInfo.vertices[0].y - imageInfo.vertices[1].y));
            imageInfo.displayHeight = sqrt((imageInfo.vertices[1].x - imageInfo.vertices[2].x) * (imageInfo.vertices[1].x - imageInfo.vertices[2].x) + 
                                          (imageInfo.vertices[1].y - imageInfo.vertices[2].y) * (imageInfo.vertices[1].y - imageInfo.vertices[2].y));
 
            imageInfo.horizontalResolution = abs(imageInfo.pixelWidth * 72 / imageInfo.displayWidth);
            imageInfo.verticalResolution = abs(imageInfo.pixelHeight * 72 / imageInfo.displayHeight);
 
            [csp.imagesInfo addObject: imageInfo];
            [imageInfo release];
        }
        else {
            if (strcmp(subtype, "Form") == 0) {
                CGPDFArrayRef matrixArray;
                // Form XObject may have their own matrix that is concatenated with the current transformation matrix
                // before the form XObject is drawn.
                if (CGPDFDictionaryGetArray(xObjectDictionary, "Matrix", &matrixArray)) {
                    if (CGPDFArrayGetCount(matrixArray) == 6) {
                        CGPDFReal m11 = -1, m12 = -1, m21 = -1, m22 = -1, tx = -1, ty = -1;
                        if (CGPDFArrayGetNumber(matrixArray, 0, &m11) &&
                            CGPDFArrayGetNumber(matrixArray, 1, &m12) &&
                            CGPDFArrayGetNumber(matrixArray, 2, &m21) &&
                            CGPDFArrayGetNumber(matrixArray, 3, &m22) &&
                            CGPDFArrayGetNumber(matrixArray, 4, &tx) &&
                            CGPDFArrayGetNumber(matrixArray, 5, &ty)) {
                            CGAffineTransform matrix = CGAffineTransformMake(m11, m12, m21, m22, tx, ty);
                            ctm = CGAffineTransformConcat(ctm, matrix);
                        }
                    }
                }
                CGPDFDictionaryRef formXObjectResourcesDictionary;
                CGPDFDictionaryGetDictionary(xObjectDictionary, "Resources", &formXObjectResourcesDictionary);
                CGPDFContentStreamRef formXObjectContentStream = 
                    CGPDFContentStreamCreateWithStream(xObjectStream, formXObjectResourcesDictionary, cs);
                PDFContentStreamProcessor *formXObjectCSP = 
                    [[PDFContentStreamProcessor alloc] initWithContentStream: formXObjectContentStream];
                [formXObjectCSP scanContentStream: ctm];
 
                [csp.imagesInfo addObjectsFromArray: formXObjectCSP.imagesInfo];
 
                [formXObjectCSP release];
            }
        }
    }
 
}

static void op_Do(CGPDFScannerRef s, void *info) { const char *imageLabel; if (!CGPDFScannerPopName(s, &imageLabel)) { return; } PDFContentStreamProcessor *csp = (PDFContentStreamProcessor *)info; PDFGraphicsState *gs = [csp.graphicsStates lastObject]; CGAffineTransform ctm = gs.currentTransformationMatrix; CGPDFContentStreamRef cs = CGPDFScannerGetContentStream(s); CGPDFObjectRef imageObject = CGPDFContentStreamGetResource(cs, "XObject", imageLabel); CGPDFStreamRef xObjectStream; if (CGPDFObjectGetValue(imageObject, kCGPDFObjectTypeStream, &xObjectStream)) { CGPDFDictionaryRef xObjectDictionary = CGPDFStreamGetDictionary(xObjectStream); const char *subtype; CGPDFDictionaryGetName(xObjectDictionary, "Subtype", &subtype); if (strcmp(subtype, "Image") == 0) { PDFImageInfo *imageInfo = [[PDFImageInfo alloc] init]; imageInfo.imageID = [NSString stringWithCString: imageLabel encoding: NSASCIIStringEncoding]; // Transform the image coordinates into page coordinates based on current transformation matrix. imageInfo.vertices[0] = CGPointApplyAffineTransform(CGPointMake(0, 0), ctm); // lower left imageInfo.vertices[1] = CGPointApplyAffineTransform(CGPointMake(1, 0), ctm); // lower right imageInfo.vertices[2] = CGPointApplyAffineTransform(CGPointMake(1, 1), ctm); // upper right imageInfo.vertices[3] = CGPointApplyAffineTransform(CGPointMake(0, 1), ctm); // upper left CGPDFInteger pixelWidth; if (CGPDFDictionaryGetInteger(xObjectDictionary, "Width", &pixelWidth)) { imageInfo.pixelWidth = pixelWidth; } CGPDFInteger pixelHeight; if (CGPDFDictionaryGetInteger(xObjectDictionary, "Height", &pixelHeight)) { imageInfo.pixelHeight = pixelHeight; } CGPDFInteger bitsPerComponent; if (CGPDFDictionaryGetInteger(xObjectDictionary, "BitsPerComponent", &bitsPerComponent)) { imageInfo.bitsPerComponent = bitsPerComponent; } // Vertices 0 and 1 define the horizontal, vertices 1 and 2 define the vertical. imageInfo.displayWidth = sqrt((imageInfo.vertices[0].x - imageInfo.vertices[1].x) * (imageInfo.vertices[0].x - imageInfo.vertices[1].x) + (imageInfo.vertices[0].y - imageInfo.vertices[1].y) * (imageInfo.vertices[0].y - imageInfo.vertices[1].y)); imageInfo.displayHeight = sqrt((imageInfo.vertices[1].x - imageInfo.vertices[2].x) * (imageInfo.vertices[1].x - imageInfo.vertices[2].x) + (imageInfo.vertices[1].y - imageInfo.vertices[2].y) * (imageInfo.vertices[1].y - imageInfo.vertices[2].y)); imageInfo.horizontalResolution = abs(imageInfo.pixelWidth * 72 / imageInfo.displayWidth); imageInfo.verticalResolution = abs(imageInfo.pixelHeight * 72 / imageInfo.displayHeight); [csp.imagesInfo addObject: imageInfo]; [imageInfo release]; } else { if (strcmp(subtype, "Form") == 0) { CGPDFArrayRef matrixArray; // Form XObject may have their own matrix that is concatenated with the current transformation matrix // before the form XObject is drawn. if (CGPDFDictionaryGetArray(xObjectDictionary, "Matrix", &matrixArray)) { if (CGPDFArrayGetCount(matrixArray) == 6) { CGPDFReal m11 = -1, m12 = -1, m21 = -1, m22 = -1, tx = -1, ty = -1; if (CGPDFArrayGetNumber(matrixArray, 0, &m11) && CGPDFArrayGetNumber(matrixArray, 1, &m12) && CGPDFArrayGetNumber(matrixArray, 2, &m21) && CGPDFArrayGetNumber(matrixArray, 3, &m22) && CGPDFArrayGetNumber(matrixArray, 4, &tx) && CGPDFArrayGetNumber(matrixArray, 5, &ty)) { CGAffineTransform matrix = CGAffineTransformMake(m11, m12, m21, m22, tx, ty); ctm = CGAffineTransformConcat(ctm, matrix); } } } CGPDFDictionaryRef formXObjectResourcesDictionary; CGPDFDictionaryGetDictionary(xObjectDictionary, "Resources", &formXObjectResourcesDictionary); CGPDFContentStreamRef formXObjectContentStream = CGPDFContentStreamCreateWithStream(xObjectStream, formXObjectResourcesDictionary, cs); PDFContentStreamProcessor *formXObjectCSP = [[PDFContentStreamProcessor alloc] initWithContentStream: formXObjectContentStream]; [formXObjectCSP scanContentStream: ctm]; [csp.imagesInfo addObjectsFromArray: formXObjectCSP.imagesInfo]; [formXObjectCSP release]; } } } }

The Do operator actually paints on the page a larger category of objects called XObject and the images are a subset of the XObject category. The Do operator has just one operand, the symbolic name of the XObject being painted. The actual XObject is located in the "/Resources" dictionary, "/XObject" sub-dictionary, of the object that owns the content stream being parsed. The symbolic name, the operand of the Do operator, serves as a key in the "/XObject" sub-dictionary, the value being the actual XObject.

    const char *imageLabel;
 
    if (!CGPDFScannerPopName(s, &imageLabel)) {
        return;
    }
 
    CGPDFContentStreamRef cs = CGPDFScannerGetContentStream(s);
    CGPDFObjectRef imageObject = CGPDFContentStreamGetResource(cs, "XObject", imageLabel);
    CGPDFStreamRef xObjectStream;
    CGPDFObjectGetValue(imageObject, kCGPDFObjectTypeStream, &xObjectStream);

The XObject category is can be divided in 2 sub-categories: form XObjects and images. Form XObjects are self contained vector graphics that use the same content stream format like the page. The difference between images and form XObjects is made through the "/Subtype" key in the XObject's dictionary. If the "/Subype" is "/Form" then the XObject is a form XObject, otherwise if the "/Subtype" is "/Image" the XObject is an image.
If the XObject is a form XObject then its content stream needs to be parsed the same way like the page content stream because it can contain images.

    CGPDFDictionaryRef xObjectDictionary = CGPDFStreamGetDictionary(xObjectStream);
 
    const char *subtype;
    CGPDFDictionaryGetName(xObjectDictionary, "Subtype", &subtype);
    if (strcmp(subtype, "Image") == 0) {
        // ...
    }
    else {
        if (strcmp(subtype, "Form") == 0) {
            // ...
        }
    }

If the XObject is an image then we can retrieve the information we need about it, such as size in pixels and position and size on the page. The width and height of the image in pixels are retrieved from the image dictionary, from the "/Width" and "/Height" keys.

    CGPDFInteger pixelWidth;
    if (CGPDFDictionaryGetInteger(xObjectDictionary, "Width", &pixelWidth)) {
        imageInfo.pixelWidth = pixelWidth;
    }            
    CGPDFInteger pixelHeight;
    if (CGPDFDictionaryGetInteger(xObjectDictionary, "Height", &pixelHeight)) {
        imageInfo.pixelHeight = pixelHeight;
    }

In PDF specification the standard image size in user space units (page units, not pixels) is always 1*1 and its bottom left corner is (0, 0), the top right corner being (1,1). The actual position and size of the image on the page is determined by the current transformation matrix that is in place when the image is painted. Because the image can be painted on the page rotated and/or skewed the positions on the page of the image's corners are computed by transforming the standard image corners ((0,0), (1,1)) through the current transformation matrix.

    // Transform the image coordinates into page coordinates based on current transformation matrix.
    imageInfo.vertices[0] = CGPointApplyAffineTransform(CGPointMake(0, 0), ctm); // lower left
    imageInfo.vertices[1] = CGPointApplyAffineTransform(CGPointMake(1, 0), ctm); // lower right
    imageInfo.vertices[2] = CGPointApplyAffineTransform(CGPointMake(1, 1), ctm); // upper right
    imageInfo.vertices[3] = CGPointApplyAffineTransform(CGPointMake(0, 1), ctm); // upper left

The image size on the page is computed using the pythagorean theorem, vertices 0 and 1 define the horizontal, vertices 1 and 2 define the vertical.

    // Vertices 0 and 1 define the horizontal, vertices 1 and 2 define the vertical.
    imageInfo.displayWidth = sqrt((imageInfo.vertices[0].x - imageInfo.vertices[1].x) * (imageInfo.vertices[0].x - imageInfo.vertices[1].x) + 
                                  (imageInfo.vertices[0].y - imageInfo.vertices[1].y) * (imageInfo.vertices[0].y - imageInfo.vertices[1].y));
    imageInfo.displayHeight = sqrt((imageInfo.vertices[1].x - imageInfo.vertices[2].x) * (imageInfo.vertices[1].x - imageInfo.vertices[2].x) + 
                                  (imageInfo.vertices[1].y - imageInfo.vertices[2].y) * (imageInfo.vertices[1].y - imageInfo.vertices[2].y));

The width and height being computed we can now compute the resolution used to paint the image on the page. The resolution in dpi is computed as the ratio between image size in pixels and image size on the page in inches (our previous calculations of the image size on the page returned the width and height values in points, these values need to be divided by 72 to convert them to inches).

1 2	imageInfo.horizontalResolution = imageInfo.pixelWidth * 72 / imageInfo.displayWidth; imageInfo.verticalResolution = imageInfo.pixelHeight * 72 / imageInfo.displayHeight;

The sample project associated with this post is a modified version of Apple's ZoomingPDFViewer. It loads a PDF page with image and if you tap on an image it draws a border around it and displays the image information. It also uses the coordinate transformation methods defined in the previous post, Links Navigation, in order to transform the point where the user tapped in a page point. This point is then used to detect where the user tapped an image or not.
The entire PDF parsing logic is encapsulated in the PDFContentStreamProcessor class. It maintains the stack of graphics states and the list of images detected on the page. The PDF operand method handlers are also part of this class.

iPDFdev Tips & Tricks for PDF development

Find images on PDF pages on iPhone and iPad

Related

Leave a comment Cancel reply

Recent Posts

Recent Comments

Meta