Thinking outside the box, you may be able to use image metadata to narrow down your dataset.
For example, your images may have fields showing the date and time the image was taken, down to the nearest second.
Duplicates are likely to have identical values.
A tool such as exiv2 could be used to dump out this data to a more convenient and sortable text format (with a little knowledge of batch/shell scripting).
Even fields such as the camera manufacturer and model could be used to reduce a set of 1,000,000 images
to say 100 sets of 10,000 images, a significant improvement.