Index-card blank / content detector

A tiny (0.95M-param), CPU-runnable image classifier that labels a single cropped archival index card blank or content. It is a cheap pre-filter: run it before expensive VLM metadata extraction (NuExtract3 / Qwen-VL) so blank cards are skipped instead of burning GPU and cluttering the output. Built to generalise across collections (Boston Public Library shelf-list cards + National Library of Scotland Advocates Library cards).

Fine-tuned from apple/mobilevit-xx-small on small-models-for-glam/index-card-blank-content.

Usage

from transformers import pipeline

clf = pipeline("image-classification",
               model="small-models-for-glam/index-card-blank-detector", device=-1)  # CPU
print(clf("card.jpg"))
# [{'label': 'content', 'score': 0.99}, {'label': 'blank', 'score': 0.01}]

As a --skip-blank pre-filter over a folder or dataset, see infer.py in the project repo: cards predicted blank are skipped so the VLM only runs on content cards.

Results (held-out, human-verified gold; evaluated via `pipeline`)

split	accuracy	blank recall	content recall
BPL (60)	1.00	1.00	1.00
NLS content (15)	1.00	n/a	1.00
punch-hole / smudge blanks (28)	1.00	1.00	—

Evaluated through the actual pipeline() inference path (not just the in-training number — a preprocessing mismatch can hide behind a self-consistent training metric).

Backbone sweep (why the smallest)

Blank-vs-content is visually easy (bare stock vs printed form), so accuracy saturates — the smallest backbone was chosen.

backbone	params	gold acc	CPU latency (batch=1)
MobileViT-XX-Small (this model)	0.95M	100%	~26 ms
MobileNetV2	2.23M	100%	~37 ms
ViT-Tiny	5.52M	100%	~17 ms

How it was made (provenance)

AI-bootstrapped → agent-verified → trained, no from-scratch hand labelling:

Weak signals fused into labels: NuExtract3 card_type, an ink-density heuristic with punch-hole removal (100% blank recall / 96% content recall when calibrated; used to harvest extra blanks), the NLS card-detector's box-count (validated as an oracle — reliable on NLS, noisy on BPL crops), and the NLS has_card flag.
Agreement → auto-accept; disagreement → human review. Gold set verified by eye.
Trained on Hugging Face Jobs (t4-small) with class-weighted loss and document-safe augmentation (no flips/aggressive crops; small rotation/jitter + RandomErasing so a small dark blob — the punch-hole — is not read as content).

Full trajectory in the project BUILD-LOG.md; method derives from the data-centric-model-dev workflow and is the companion to the NLS card detector.

Intended use & limitations

Use: a CPU pre-filter in front of VLM metadata extraction for card-catalogue digitisation.
divider cards are a planned third class (captured in the dataset, held out of this binary model); for now a divider is classified content (sent to the VLM).
NLS blank gap (v1): NLS contributed content cards only — no clean blank fronts exist in the source. So NLS blank-detection is unvalidated in v1; reported NLS metrics are for content. A v2 with NLS blank fronts will close this.
Trained on two collections; generalisation to very different card styles is unverified — but the dataset/recipe are built to extend (just add rows tagged by source_collection).

Use this for your own collection

Point an agent at a sample of your cards; bootstrap labels from whatever weak signals you have (a detector, an existing metadata field, an ink-density heuristic with punch-hole handling); auto-accept agreements and human-correct the rest; hold out a small verified gold set; then re-run training with your rows tagged by source_collection.

Downloads last month: 38

Safetensors

Model size

956k params

Tensor type

F32

Model tree for small-models-for-glam/index-card-blank-detector

Base model

apple/mobilevit-xx-small

Finetuned

(8)

this model

Dataset used to train small-models-for-glam/index-card-blank-detector

Evaluation results

BPL gold accuracy on index-card-blank-content (BPL gold)
self-reported

1.000
BPL blank recall on index-card-blank-content (BPL gold)
self-reported

1.000