mmBERT-32K PII Detector LoRA

LoRA adapter for PII (Personally Identifiable Information) detection using mmBERT-32K-YaRN base model with 32K context length.

Model Details

Property	Value
Base Model	llm-semantic-router/mmbert-32k-yarn
Task	Token Classification (NER)
LoRA Rank	32
LoRA Alpha	64
Max Context	32,768 tokens
Entity Types	17 PII types (35 BIO labels)

Supported PII Types

PERSON - Person names
EMAIL_ADDRESS - Email addresses
PHONE_NUMBER - Phone numbers
STREET_ADDRESS - Street addresses
CREDIT_CARD - Credit card numbers
US_SSN - US Social Security Numbers
US_DRIVER_LICENSE - US Driver License numbers
IBAN_CODE - International Bank Account Numbers
IP_ADDRESS - IP addresses
DATE_TIME - Dates and times
AGE - Age information
ORGANIZATION - Organization names
GPE - Geopolitical entities
ZIP_CODE - ZIP/postal codes
DOMAIN_NAME - Domain names
NRP - Nationalities, religious or political groups
TITLE - Titles (Mr., Dr., etc.)

Training

Dataset: Microsoft Presidio research dataset
Epochs: 5
Batch Size: 16
Learning Rate: 1e-4
Training Samples: ~5000

Usage

from peft import PeftModel
from transformers import AutoModelForTokenClassification, AutoTokenizer

# Load base model and LoRA adapter
base_model = AutoModelForTokenClassification.from_pretrained(
    "llm-semantic-router/mmbert-32k-yarn",
    num_labels=35
)
model = PeftModel.from_pretrained(base_model, "llm-semantic-router/mmbert32k-pii-detector-lora")
tokenizer = AutoTokenizer.from_pretrained("llm-semantic-router/mmbert32k-pii-detector-lora")

License

MIT License

Downloads last month: 96

Model tree for llm-semantic-router/mmbert32k-pii-detector-lora

Base model

jhu-clsp/mmBERT-base

Quantized

llm-semantic-router/mmbert-32k-yarn

Adapter

(5)

this model

Dataset used to train llm-semantic-router/mmbert32k-pii-detector-lora

Collection including llm-semantic-router/mmbert32k-pii-detector-lora

mom-multilingual-class

Collection

long context models for MoM multilingual classifier (domain, jailbreak, pii, factual, feedback) • 10 items • Updated 14 days ago