mom-multilingual-class
Collection
long context models for MoM multilingual classifier (domain, jailbreak, pii, factual, feedback) • 10 items • Updated
LoRA adapter for PII (Personally Identifiable Information) detection using mmBERT-32K-YaRN base model with 32K context length.
| Property | Value |
|---|---|
| Base Model | llm-semantic-router/mmbert-32k-yarn |
| Task | Token Classification (NER) |
| LoRA Rank | 32 |
| LoRA Alpha | 64 |
| Max Context | 32,768 tokens |
| Entity Types | 17 PII types (35 BIO labels) |
PERSON - Person namesEMAIL_ADDRESS - Email addressesPHONE_NUMBER - Phone numbersSTREET_ADDRESS - Street addressesCREDIT_CARD - Credit card numbersUS_SSN - US Social Security NumbersUS_DRIVER_LICENSE - US Driver License numbersIBAN_CODE - International Bank Account NumbersIP_ADDRESS - IP addressesDATE_TIME - Dates and timesAGE - Age informationORGANIZATION - Organization namesGPE - Geopolitical entitiesZIP_CODE - ZIP/postal codesDOMAIN_NAME - Domain namesNRP - Nationalities, religious or political groupsTITLE - Titles (Mr., Dr., etc.)from peft import PeftModel
from transformers import AutoModelForTokenClassification, AutoTokenizer
# Load base model and LoRA adapter
base_model = AutoModelForTokenClassification.from_pretrained(
"llm-semantic-router/mmbert-32k-yarn",
num_labels=35
)
model = PeftModel.from_pretrained(base_model, "llm-semantic-router/mmbert32k-pii-detector-lora")
tokenizer = AutoTokenizer.from_pretrained("llm-semantic-router/mmbert32k-pii-detector-lora")
MIT License
Base model
jhu-clsp/mmBERT-base