I need to analyze thousands of jpeg files (by retrieving it EXIF data). It is more than 50 GB of data.I cannot read whole files because it\'ll take too much time.
Is
GdPicture.NET Imaging SDK starting version 10 provides a new image parsing mechanism that allows direct access to image metadata (EXIF, GPS, XMP, IPTC...) without decoding pixels. It supports more than 90 image formats including JPEG, TIFF, RAW and WebP.
Here a link the the GdPicture.NET knowledge base that demonstrates how to extract metadata using C# and VB.NET (many other languages are also supported): tutorial
In case anybody needs further information I will be glad to assist.
Disclaimer: I am the product architect of GdPicture.NET.
You don't need to decompress anything, the Exif information is held in the header before the image, so all you need to do is open the file, read the exif header and decode whatever it is you need. This is if you read the exif data manually (which isn't hard).
If all you need is the sizes, that is right at the front
Edit: note the exif data doesn't actualy have to be at the front, but it almost always is, so it is safe to assume that in general it will be a lot faster than if it wasn't.
Also, have you checked that using the standard API is 'too slow'? I wouldn't have thought it would take that long for 50G (or if doing it a different way would necessarily be faster).
You'll find some code samples in ExifLib - A Fast Exif Data Extractor for .NET 2.0+ (and a full project too) that shows how to read the minimum data necessary to get just the EXIF information out.
I've recently ported my Java metadata-extractor library to .NET. It's been active since 2002 and had heavy testing through widespread use. In my tests, it churns through 2GB of images, extracting all metadata within in around 4 seconds on my machine. You could optimise further by telling it to only read specific types of metadata, such as Exif. It supports many image/video formats, and many metadata types.
Available on GitHub and NuGet.