making IPTC data searchable

[亡魂溺海] 提交于 2019-12-01 14:34:36
Inshallah

It is not clear what in particular is giving you problems, but perhaps this will give you some ideas:

<?php
# Images we're searching
$images = array('/path/to/image.jpg', 'another-image.jpg');

# IPTC keywords to values (from exiv2, see below)
$query = array('Byline' => 'Some Author');

# Perform the search
$result = select_jpgs_by_iptc_fields($images, $query);

# Display the results
foreach ($result as $path) {
    echo '<img src="', htmlspecialchars($path), '">';
}

function select_jpgs_by_iptc_fields($jpgs, $query) {
    $matches = array();
    foreach ($jpgs as $path) {
        $iptc = get_jpg_iptc_metadata($path);
        foreach ($query as $name => $values) {
            if (!is_array($values))
                $values = array($values);
            if (count(array_intersect($iptc[$name], $values)) != count($values))
                continue 2;
        }
        $matches[] = $path;
    }
    return $matches;
}

function get_jpg_iptc_metadata($path) {
    $size = getimagesize($path, $info);
    if(isset($info['APP13']))
    {
        return human_readable_iptc(iptcparse($info['APP13']));
    }
    else {
        return null;
    }
}

function human_readable_iptc($iptc) {
# From the exiv2 sources
static $iptc_codes_to_names =
array(    
// IPTC.Envelope-->
"1#000" => 'ModelVersion',
"1#005" => 'Destination',
"1#020" => 'FileFormat',
"1#022" => 'FileVersion',
"1#030" => 'ServiceId',
"1#040" => 'EnvelopeNumber',
"1#050" => 'ProductId',
"1#060" => 'EnvelopePriority',
"1#070" => 'DateSent',
"1#080" => 'TimeSent',
"1#090" => 'CharacterSet',
"1#100" => 'UNO',
"1#120" => 'ARMId',
"1#122" => 'ARMVersion',
// <-- IPTC.Envelope
// IPTC.Application2 -->
"2#000" => 'RecordVersion',
"2#003" => 'ObjectType',
"2#004" => 'ObjectAttribute',
"2#005" => 'ObjectName',
"2#007" => 'EditStatus',
"2#008" => 'EditorialUpdate',
"2#010" => 'Urgency',
"2#012" => 'Subject',
"2#015" => 'Category',
"2#020" => 'SuppCategory',
"2#022" => 'FixtureId',
"2#025" => 'Keywords',
"2#026" => 'LocationCode',
"2#027" => 'LocationName',
"2#030" => 'ReleaseDate',
"2#035" => 'ReleaseTime',
"2#037" => 'ExpirationDate',
"2#038" => 'ExpirationTime',
"2#040" => 'SpecialInstructions',
"2#042" => 'ActionAdvised',
"2#045" => 'ReferenceService',
"2#047" => 'ReferenceDate',
"2#050" => 'ReferenceNumber',
"2#055" => 'DateCreated',
"2#060" => 'TimeCreated',
"2#062" => 'DigitizationDate',
"2#063" => 'DigitizationTime',
"2#065" => 'Program',
"2#070" => 'ProgramVersion',
"2#075" => 'ObjectCycle',
"2#080" => 'Byline',
"2#085" => 'BylineTitle',
"2#090" => 'City',
"2#092" => 'SubLocation',
"2#095" => 'ProvinceState',
"2#100" => 'CountryCode',
"2#101" => 'CountryName',
"2#103" => 'TransmissionReference',
"2#105" => 'Headline',
"2#110" => 'Credit',
"2#115" => 'Source',
"2#116" => 'Copyright',
"2#118" => 'Contact',
"2#120" => 'Caption',
"2#122" => 'Writer',
"2#125" => 'RasterizedCaption',
"2#130" => 'ImageType',
"2#131" => 'ImageOrientation',
"2#135" => 'Language',
"2#150" => 'AudioType',
"2#151" => 'AudioRate',
"2#152" => 'AudioResolution',
"2#153" => 'AudioDuration',
"2#154" => 'AudioOutcue',
"2#200" => 'PreviewFormat',
"2#201" => 'PreviewVersion',
"2#202" => 'Preview',
// <--IPTC.Application2
      );
   $human_readable = array();
   foreach ($iptc as $code => $field_value) {
       $human_readable[$iptc_codes_to_names[$code]] = $field_value;
   }
   return $human_readable;
}

If you don't have extracted those IPTC data from your images, each time someone will search, you'll have to :

  • loop on every images
  • for each image, extract the IPTC data
  • see if the IPTC data for the current image matches

If you have more than a couple image, this will be really bad for performances, I'd say.


So, in my opinion, it would be far better to :

  • add a couple of fields in your database
  • extract the relevant IPTC data when the image is uploaded / stored
  • store the IPTC data in those DB fields
  • search in those DB fields
    • Or use some search engine like Lucene or Sphinx -- but that is another problem.

It'll mean a bit more work for you right now : you have more code to write...

... But it also means your website will have better chances to survive when there are several images and many users doing searches.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!