oaphotodna.py
oaphotodna.py computes PhotoDNA-like hashes (based on the reversed-engineered version available at https://github.com/ArcaneNibble/open-alleged-photodna) for images, compares two images with normalized similarity scoring, and supports a local FAISS-backed nearest-neighbor index for fast lookup of visually similar images.
This version adds:
- a FAISS local vector index
- persistent on-disk metadata in
meta.json - exact L2 nearest-neighbor search
- similarity scores normalized to the same
0..1scale as direct image comparison - query-time filtering by minimum similarity or maximum Euclidean distance
Requirements
Install dependencies:
pip install pillow numpy faiss-cpuWhat the script does
The script supports four main workflows:
- Compute the hash of a single image.
- Compute hashes for every file in a directory and emit JSON.
- Compare two images using either Euclidean or Manhattan distance.
- Build and query a local FAISS index of previously hashed images.
The PhotoDNA-like hash is represented internally as a flat vector of 144 values. FAISS stores these vectors and searches for nearest neighbors using L2 distance.
Help
Print the top-level help:
python bin/oaphotodna.py --helpThe CLI uses traditional, flag-prefixed arguments (for example --hash, --compare, --faiss-query) rather than positional subcommands.
adulau@blakley:~/git/photodna/bin$ python3 oaphotodna.py
usage: oaphotodna.py [-h] (--hash IMAGE | --hash-dir DIRECTORY | --compare IMAGE1 IMAGE2 | --faiss-build ARG [ARG ...] | --faiss-add ARG [ARG ...] | --faiss-query ARG [ARG ...]) [--metric {euclidean,manhattan}]
[--min-similarity MIN_SIMILARITY] [--max-distance MAX_DISTANCE]
Compute and compare PhotoDNA-like hashes, with optional FAISS local indexing.
options:
-h, --help show this help message and exit
--hash IMAGE Compute the hash of one image
--hash-dir DIRECTORY Compute hashes for every file in a directory and output JSON
--compare IMAGE1 IMAGE2
Compare two images
--faiss-build ARG [ARG ...]
Create a new FAISS index: INDEX META IMAGE [IMAGE ...]
--faiss-add ARG [ARG ...]
Append images to an existing FAISS index: INDEX META IMAGE [IMAGE ...]
--faiss-query ARG [ARG ...]
Find closest indexed matches: INDEX META QUERY_IMAGE [TOP_K]
--metric {euclidean,manhattan}
Distance metric for --compare
--min-similarity MIN_SIMILARITY
With --faiss-query, filter results below this similarity threshold [0,1]
--max-distance MAX_DISTANCE
With --faiss-query, filter results above this Euclidean distanceBasic usage
1) Hash a single image
python bin/oaphotodna.py --hash image.jpgOutput:
73,71,74,32,...
2) Hash every file in a directory as JSON
python bin/oaphotodna.py --hash-dir tests/monochromeExample output:
[
{
"filename": "55147310088_42a69416d3_5k.jpg",
"path": "/full/path/to/tests/monochrome/55147310088_42a69416d3_5k.jpg",
"photodna": [73, 71, 74, 32]
}
]Each JSON object includes the base filename, the absolute file path, and the 144-byte PhotoDNA-like vector. Files are processed in sorted filename order, and non-file directory entries are skipped.
3) Compare two images
Default metric is Euclidean:
python bin/oaphotodna.py --compare image1.jpg image2.jpgUse Manhattan distance instead:
python bin/oaphotodna.py --compare image1.jpg image2.jpg --metric manhattanExample output:
Distance (euclidean): 3.7417
Similarity: 0.998779
Similarity scale
The script reports a normalized similarity value between 0 and 1.
1.0means identical hashes- values close to
1.0mean very similar hashes - values closer to
0.0mean more distant hashes
For Euclidean distance, similarity is derived from the maximum possible distance for a 144-dimensional hash with values in the range 0..255:
similarity = 1 - (euclidean_distance / max_possible_distance)
The FAISS query path uses the same normalization so that the similarity reported by --faiss-query is directly comparable to the Similarity: line from --compare.
FAISS local database
Files used
The local database consists of two files:
index.faiss— the FAISS vector indexmeta.json— sidecar metadata used to map FAISS IDs back to files and hashes
What meta.json contains
meta.json stores information that FAISS does not store for you in an application-friendly way:
dimension— vector length, normally144metric— stored metric typenext_id— next numeric ID to assignitems— indexed records
Each item in items contains:
id— numeric FAISS IDpath— canonicalized file pathhash— stored 144-element hashextra— optional metadata placeholder
Build an index
Create a new index from a set of images:
python bin/oaphotodna.py --faiss-build index.faiss meta.json img1.jpg img2.jpg img3.jpgExpected output:
Indexed 3 file(s) into index.faiss
Add images to an existing index
Append more images later:
python bin/oaphotodna.py --faiss-add index.faiss meta.json img4.jpg img5.jpgExpected output:
Added 2 file(s) into index.faiss
Query the index
Search for the closest matches to a query image:
python bin/oaphotodna.py --faiss-query index.faiss meta.json query.jpgSpecify the number of results to return:
python bin/oaphotodna.py --faiss-query index.faiss meta.json query.jpg 20Example output:
Query: query.jpg
Results: 3
[1] /data/images/img2.jpg
id=17
distance=3.7417
similarity=0.998779
distance_squared=14.0000
[2] /data/images/img7.jpg
id=42
distance=5.2915
similarity=0.998273
distance_squared=28.0000
Filter query results by similarity
Only return matches at or above a similarity threshold:
python bin/oaphotodna.py --faiss-query index.faiss meta.json query.jpg 20 --min-similarity 0.95Filter query results by Euclidean distance
Only return matches at or below a maximum Euclidean distance:
python bin/oaphotodna.py --faiss-query index.faiss meta.json query.jpg 20 --max-distance 12Combine both filters
python bin/oaphotodna.py --faiss-query index.faiss meta.json query.jpg 20 --min-similarity 0.98 --max-distance 8FAISS distance notes
FAISS returns squared L2 distance internally.
The script converts that into:
distance_squared— raw FAISS valuedistance— Euclidean distance (sqrt(distance_squared))similarity— normalized0..1score derived from Euclidean distance