I have a bit-map image:

( However this should work with any arbitrary image )
        
Now you drawing each pixel as SCNBox of certain color, that means:
Seems like common Minecraft-like optimization problem:
Pseudo-code for meshing algorithm that skips faces of adjancent cubes (simplier, but less effective than greedy meshing):
#define SIZE_X = 16; // image width
#define SIZE_Y = 16; // image height
// pixel data, 0 = transparent pixel
int data[SIZE_X][SIZE_Y];
// check if there is non-transparent neighbour at x, y
BOOL has_neighbour(x, y) {
    if (x < 0 || x >= SIZE_X || y < 0 || y >= SIZE_Y || data[x][y] == 0)
        return NO; // out of dimensions or transparent
    else
        return YES; 
}
void add_face(x, y orientation, color) {
    // add face at (x, y) with specified color and orientation = TOP, BOTTOM, LEFT, RIGHT, FRONT, BACK
    // can be (easier and slower) implemented with SCNPlane's: https://developer.apple.com/library/mac/documentation/SceneKit/Reference/SCNPlane_Class/index.html#//apple_ref/doc/uid/TP40012010-CLSCHSCNPlane-SW8
    // or (harder and faster) using Custom Geometry: https://github.com/d-ronnqvist/blogpost-codesample-CustomGeometry/blob/master/CustomGeometry/CustomGeometryView.m#L84 
}
for (x = 0; x < SIZE_X; x++) {
    for (y = 0; y < SIZE_Y; y++) {
        int color = data[x][y];
        // skip current pixel is transparent
        if (color == 0)
            continue;
        // check neighbour at top
        if (! has_neighbour(x, y + 1))
            add_face(x,y, TOP, );
        // check neighbour at bottom
        if (! has_neighbour(x, y - 1))
            add_face(x,y, BOTTOM);
        // check neighbour at bottom
        if (! has_neighbour(x - 1, y))
            add_face(x,y, LEFT);
        // check neighbour at bottom
        if (! has_neighbour(x, y - 1))
            add_face(x,y, RIGHT);
        // since array is 2D, front and back faces is always visible for non-transparent pixels
        add_face(x,y, FRONT);
        add_face(x,y, BACK);
    }
}
A lot of depends on input image. If it is not big and without wide variety of colors, it I would go with SCNNode adding SCNPlane's for visible faces and then flattenedClone()ing result.