Detecting duplicate images, based on similarity

jakobbg
jakobbg's picture

Joined: 2003-09-30
Posts: 11
Posted: Fri, 2003-11-14 21:40

I would love to see the feature of being able to scan the entire gallery installation for similar pictures, not based on filesizes, md5-sums of images, or anything like that. Picture can be equal to an other one, even though they are of a different file size, picture type, date, format or quality. I would like to see some sort of fuzzy logic which after analyzing each picure compared every picture to all other pictures, telling what images that might be from the same source. This would make the gallery a high-quality image archive.

Thanks for any views on this.

--
jakob breivik grimstveit,

, www.grimstveit.no/~jakob, www.grimstveit.no/gallery

 
bharat
bharat's picture

Joined: 2002-05-21
Posts: 7994
Posted: Tue, 2003-11-18 10:01

This is a very interesting idea. If we were to integrate this into G2 it would be in the form of an add-on module, and it would most likely require that somebody outside of the project write the code that can do the fuzzy-logic based matching of multiple images. I don't think that's an easy task, so if you come across an app that does do this, please post a link to it here.

 
jakobbg
jakobbg's picture

Joined: 2003-09-30
Posts: 11
Posted: Tue, 2003-11-18 10:12

Yes, I do think it does exist some Open Source project doing that kind of thing already. Will search for this a bit and get back to you. Nice to hear it was a good idea.

The reason I need it, is this: http://www.grimstveit.no/gallery/wallpapers :-)

 
jakobbg
jakobbg's picture

Joined: 2003-09-30
Posts: 11
Posted: Tue, 2003-11-18 10:33
bharat wrote:
This is a very interesting idea. If we were to integrate this into G2 it would be in the form of an add-on module, and it would most likely require that somebody outside of the project write the code that can do the fuzzy-logic based matching of multiple images. I don't think that's an easy task, so if you come across an app that does do this, please post a link to it here.

Absolutely correct. See here:

http://www.freshports.org/graphics/pixieplus/

http://software.vtpaintball.net/scripts/imageSearch.php

http://freshmeat.net/projects/findimagedupes/?topic_id=100

Three projects that might contain interesting code, or complete solutions.

HTH

--
jakob breivik grimstveit - http://www.grimstveit.no/~jakob

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Tue, 2003-11-18 17:35

the task is not that complex. but i don't have time, interest or the required enthusiasm to code it. luckily it already exists and all what you have to do is write a gallery 2 module around the existing executables :)

 
h0bbel
h0bbel's picture

Joined: 2002-07-28
Posts: 13451
Posted: Tue, 2003-11-18 19:27

jakobbg, although this is completely of topic, and I really shouldn't do that since I'm one of the moderators, I am still going to. :)

http://www.grimstveit.no/gallery/album56/IMG_3069 - That's what I saw from my own window at home. :) Greetings from a fellow Bergenser. :)

 
jmullan
jmullan's picture

Joined: 2002-07-28
Posts: 974
Posted: Tue, 2003-11-18 21:52

Here's what findimagedupes uses to generate a "fingerprint" for an image. $image is an Image::Magick object in perl. :)

$image->Sample("160x160!");
$image->Modulate(saturation=>-100);
$image->Blur(factor=>99);
$image->Normalize();
$image->Equalize();
$image->Sample("16x16");
$image->Threshold();
$image->Set(magick=>'pbm');
@blobs = $image->ImageToBlob();
$img = substr($blobs[0],-32,32);

Here is what I think that translates to on the command line:

convert -size 160x160 \
	-resize 160x160 \
	-modulate 0,-100 \
	-normalize \
	-equalize \
	+profile "*" \
	image.jpg \
	- |
convert -sample 16x16 \
	-threshold 0 \
	- \
	fingerprint.pbm

We would then want to read that pbm's image data into a database somewhere so we can compare later.

We should probably also compare image size and md5 sums to remove exact duplicates. :)

Just some thoughts...

 
hk_traveller

Joined: 2002-09-17
Posts: 27
Posted: Tue, 2003-11-25 07:45

Google image search already provide similar functionality and I guess it is based on the relevance of the web pages associated with the image photos.

http://images.google.com/

 
hk_traveller

Joined: 2002-09-17
Posts: 27
Posted: Tue, 2003-11-25 08:17
jmullan wrote:
Here's what findimagedupes uses to generate a "fingerprint" for an image. $image is an Image::Magick object in perl. :)

$image->Sample("160x160!");
$image->Modulate(saturation=>-100);
$image->Blur(factor=>99);
$image->Normalize();
$image->Equalize();
$image->Sample("16x16");
$image->Threshold();
$image->Set(magick=>'pbm');
@blobs = $image->ImageToBlob();
$img = substr($blobs[0],-32,32);

Here is what I think that translates to on the command line:

convert -size 160x160 \
	-resize 160x160 \
	-modulate 0,-100 \
	-normalize \
	-equalize \
	+profile "*" \
	image.jpg \
	- |
convert -sample 16x16 \
	-threshold 0 \
	- \
	fingerprint.pbm

We would then want to read that pbm's image data into a database somewhere so we can compare later.

We should probably also compare image size and md5 sums to remove exact duplicates. :)

Just some thoughts...

I am thinking another logic of image similarity:

1. normalize image into 100x100 grid and represented by a bit (Black or White) i.e. each image is divided into 100x100 grid and based on the brightness of the grid to assign a bit (0/1) and the fingerprint will be kept in database.

2. Then the image matching function compare the fingerprint when searching (bit by bit comparison).

 
HM2K

Joined: 2003-06-12
Posts: 53
Posted: Tue, 2007-02-27 23:28

Did this ever make any further progress?

-HM2K- http://www.hm2k.org/