kniv-deberta-nlp-base-en-small

A compact multi-task NLP student model that performs the same 5 language analysis tasks as our production teacher from a single DeBERTa-v3-small encoder pass: POS tagging, Named Entity Recognition, Dependency Parsing, Semantic Role Labeling, and Dialog Act Classification.

This is the best size/quality tradeoff in the kniv cascade family — 2.8× smaller than the teacher while staying within 0.1–1.4 pts of teacher quality across all five heads. Recommended for general-purpose deployment.

Part of the Rustic initiative by Dragonscale Industries Inc.

Source code GitHub
Teacher kniv-deberta-nlp-base-en-large (443M)
Encoder DeBERTa-v3-small (768d, 6 layers)
Parameters 157.1M (141.3M encoder + 15.8M heads)
Compression 2.8× smaller than teacher
Download 628 MB (PyTorch) / 629 MB (ONNX FP32) / 190 MB (ONNX INT8)
License CC-BY-SA-4.0

Quick Start

ONNX

pip install torch transformers==5.6.2 onnxruntime
import onnxruntime as ort
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("dragonscale-ai/kniv-deberta-nlp-base-en-small")
session = ort.InferenceSession("onnx/cascade.onnx")
pos, ner, arc, label, srl, cls = session.run(None, {
    "input_ids": input_ids,           # int64 [batch, seq]
    "attention_mask": attention_mask, # int64 [batch, seq]
    "predicate_idx": predicate_idx,   # int64 [batch] — verb token index (0 if unused)
})

PyTorch

pip install torch transformers==5.6.2 seqeval
python examples/cascade_demo.py --model models/kniv-deberta-nlp-base-en-small

The demo script loads the model, runs all 5 heads, and prints POS tags, NER entities, dependency tree, SRL frames, and dialog acts.

Benchmark Results

All benchmarks use standard public test sets. No benchmark data was used during training. Results are reproducible via the included benchmark scripts.

Head Score Metric Benchmark Split
POS 0.970 Accuracy UD English EWT test
NER 0.779 F1 CoNLL-2003 (mapped) test
DEP 0.942 / 0.922 UAS / LAS UD English EWT test
SRL 0.831 F1 PropBank EWT test
CLS 0.947 Macro F1 SGD + GPT (8 labels, internal) dev

NER on CoNLL-2003 was evaluated by mapping our 18 OntoNotes entity types to the 4 CoNLL types (PER, ORG, LOC, MISC). Numeric entities (DATE, MONEY, PERCENT, QUANTITY, ORDINAL, CARDINAL, TIME) have no CoNLL equivalent and are mapped to O — this is a strictly harder protocol than CoNLL-trained baselines.

CLS DailyDialog cross-evaluation accuracy: 0.593 (with lossy 8→4 label mapping; informationally lossy comparison).

# Reproduce benchmarks
python models/download_benchmarks.py
python models/student_benchmark_standard.py \
    --model-dir models/kniv-deberta-nlp-base-en-small --backend all

Runtime Performance

Single-call latency on NVIDIA RTX 4070 Laptop GPU (CUDA 13.0, ONNX Runtime 1.25.1 with CUDA 13 build):

Runtime bs=1 seq=64 bs=1 seq=128 bs=32 seq=128 (sent/s)
ONNX FP32 CUDA 6.62 ms 8.96 ms 355
PyTorch CUDA 10.04 ms 13.53 ms 363
ONNX INT8 CPU 21.77 ms 31.84 ms 51
ONNX FP32 CPU 29.41 ms 46.88 ms 29
PyTorch CPU 80.59 ms 99.38 ms 21

Recommended runtimes:

  • GPU server (production sweet spot): ONNX FP32 CUDA at 9 ms latency
  • CPU edge / embedded: ONNX INT8 CPU at 32 ms, 190 MB model size

INT8 quality drops vs FP32: POS −0.54, NER −2.68, DEP −0.95, SRL −1.80, CLS internal −2.58. INT8 hits this 768d encoder harder than xsmall — its weight rows have more dynamic range to compress. For latency-critical CPU deployment where INT8 quality is acceptable, prefer the xsmall variant which has smaller INT8 quality drops.

Architecture

Identical cascade structure to the teacher, with all sizes auto-scaled to the encoder hidden dimension (768d):

DeBERTa-v3-small + pred_embedding
│
├─ ScalarMix(all)  → Linear(17)                              → POS
├─ ScalarMix(all)  → BiLSTM(192) → +POS probs → MLP(37)      → NER  [Viterbi]
├─ ScalarMix(all)  → BiLSTM(192) → +POS/NER probs → Biaffine → DEP
├─ ScalarMix(all)  → +pred interaction features +POS+DEP probs
│                    → BiLSTM(384) → MLP(42)                  → SRL  [Viterbi]
└─ ScalarMix(all)  → AttentionPool → MLP(8)                   → CLS

The architecture mirrors the teacher's design (see the teacher's model card for the full ScalarMix / cascade / predicate embedding rationale). The student auto-scales each head's internal dimensions proportionally to the encoder width.

Training

This model is a distilled student of kniv-deberta-nlp-base-en-large.

The training pipeline is two-stage:

  1. Stage 1 — Distillation (~8 epochs): student learns from teacher's soft logits (KL on POS/NER/SRL/CLS, hard CE for DEP arc/relation), teacher's intermediate hidden states (PKD-style MSE matching), and consistency regularization (R-Drop).

  2. Stage 2 — Fine-tune: SRL gold + teacher silver supervision, then CLS fine-tune with frozen encoder/SRL components to preserve the SRL peak achieved in Stage 2a.

See docs/design-knowledge-distillation.md for the distillation methodology in full.

Limitations

  • English only. Encoder and training data are English.
  • Same data caveats as teacher. NER trained on silver labels (SpanMarker); CLS trained on dialog data (may misclassify news/documents); SRL requires predicate index; DEP uses greedy decoding (no MST).
  • Requires transformers==5.6.2. Other versions produce incorrect outputs.

Model Family

Model Encoder Params Compression SRL F1
kniv-deberta-nlp-base-en-xsmall DeBERTa-v3-xsmall (384d, 12L) 74.7M 5.9× 0.829
kniv-deberta-nlp-base-en-small DeBERTa-v3-small (768d, 6L) 157.1M 2.8× 0.831
kniv-deberta-nlp-base-en-large DeBERTa-v3-large (1024d, 24L) 443M 0.843

Citation

@misc{kniv-cascade-2026,
  title={kniv-deberta-nlp-base-en-small: Distilled Multi-Task NLP Cascade},
  author={Dragonscale Industries Inc.},
  year={2026},
  url={https://huggingface.co/dragonscale-ai/kniv-deberta-nlp-base-en-small}
}

License

CC-BY-SA-4.0

Built by Dragonscale Industries Inc. | Rustic

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dragonscale-ai/kniv-deberta-nlp-base-en-small

Quantized
(11)
this model

Datasets used to train dragonscale-ai/kniv-deberta-nlp-base-en-small

Collection including dragonscale-ai/kniv-deberta-nlp-base-en-small

Evaluation results