--- license: apache-2.0 language: - en base_model: - cisco-ai/SecureBERT2.0-base pipeline_tag: token-classification library_name: transformers tags: - NER - SecureBERT2 - CyberNER - token-classification - cybersecurity --- # Model Card for cisco-ai/SecureBERT2.0-NER The **Secure Modern BERT NER Model** is a fine-tuned transformer based on [**SecureBERT 2.0**](https://huggingface.co/cisco-ai/SecureBERT2.0-base), designed for **Named Entity Recognition (NER)** in cybersecurity text. It extracts domain-specific entities such as **Indicators, Malware, Organizations, Systems, and Vulnerabilities** from unstructured data sources like threat reports, incident analyses, advisories, and blogs. NER in cybersecurity enables: - Automated extraction of indicators of compromise (IOCs) - Structuring of unstructured threat intelligence text - Improved situational awareness for analysts - Faster incident response and vulnerability triage --- ## Model Details ### Model Description - **Developed by:** Cisco AI - **Model Type:** ModernBertForTokenClassification - **Framework:** TensorFlow / Transformers - **Tokenizer Type:** PreTrainedTokenizerFast - **Number of Labels:** 11 - **Task:** Named Entity Recognition (NER) - **License:** Apache-2.0 - **Language:** English - **Base Model:** [cisco-ai/SecureBERT2.0](https://huggingface.co/cisco-ai/SecureBERT2.0-base) #### Supported Entity Labels | Entity | Description | |:--------|:-------------| | `B-Indicator`, `I-Indicator` | Indicators of Compromise (e.g., IPs, domains, hashes) | | `B-Malware`, `I-Malware` | Malware or exploit names | | `B-Organization`, `I-Organization` | Companies or groups mentioned | | `B-System`, `I-System` | Affected software or platforms | | `B-Vulnerability`, `I-Vulnerability` | Specific CVEs or flaw descriptions | | `O` | Outside token | #### Model Configuration | Parameter | Value | |:-----------|:-------| | Hidden size | 768 | | Intermediate size | 1152 | | Hidden layers | 22 | | Attention heads | 12 | | Max sequence length | 8192 | | Vocabulary size | 50368 | | Activation | GELU | | Dropout | 0.0 (embedding, attention, MLP, classifier) | --- ## Uses ### Direct Use - Named Entity Recognition (NER) on cybersecurity text - Threat intelligence enrichment - IOC extraction and normalization - Incident report analysis - Vulnerability mention detection ### Downstream Use This model can be integrated into: - Threat intelligence platforms (TIPs) - SOC automation tools - Cybersecurity knowledge graphs - Vulnerability management and CVE monitoring systems ### Out-of-Scope Use - Non-technical or general-domain NER tasks - Generative or conversational AI applications --- ## Benchmark Cybersecurity NER Corpus ### Dataset Overview | Aspect | Description | |:-------|:-------------| | **Purpose** | Benchmark dataset for extracting cybersecurity entities from unstructured reports | | **Data Source** | Curated threat intelligence documents emphasizing malware and system analysis | | **Annotation Methodology** | Fully hand-labeled by domain experts | | **Entity Types** | Malware, Indicator, System, Organization, Vulnerability | | **Size** | 3.4k training samples + 717 test samples | --- ## How to Get Started with the Model ### Example Usage (Transformers) ```python from transformers import AutoTokenizer, TFAutoModelForTokenClassification, pipeline model_name = "cisco-ai/SecureBERT2.0-NER" tokenizer = AutoTokenizer.from_pretrained(model_name) model = TFAutoModelForTokenClassification.from_pretrained(model_name) ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer) text = "Stealc malware targets browser cookies and passwords." entities = ner_pipeline(text) print(entities) ``` ## Training Details ### Training Objective and Procedure The `SecureBERT2.0-NER` was fine-tuned for **token-level classification** on cybersecurity text using **Cross Entropy Loss**. Training focused on accurately classifying entity boundaries and types across five cybersecurity-specific categories: *Malware, Indicator, System, Organization,* and *Vulnerability*. The **AdamW** optimizer was used with a **linear learning rate scheduler**, and gradient clipping ensured stability during fine-tuning. ### Training Configuration | Setting | Value | |:---------|:------:| | Objective | Token-wise Cross Entropy | | Optimizer | AdamW | | Learning Rate | 1e-5 | | Weight Decay | 0.001 | | Batch Size per GPU | 8 | | Epochs | 20 | | Max Sequence Length | 1024 | | Gradient Clipping Norm | 1.0 | | Scheduler | Linear | | Mixed Precision | fp16 | | Framework | TensorFlow / Transformers | ### Training Dataset The model was fine-tuned on a **cybersecurity-specific NER corpus**, containing annotated threat intelligence reports, advisories, and technical documentation. | Property | Description | |:----------|:-------------| | **Dataset Type** | Manually annotated corpus | | **Language** | English | | **Entity Types** | Malware, Indicator, System, Organization, Vulnerability | | **Train Size** | 3,400 samples | | **Test Size** | 717 samples | | **Annotation Method** | Expert hand-labeling for accuracy and consistency | ### Preprocessing - Texts were tokenized using the `PreTrainedTokenizerFast` tokenizer from SecureBERT 2.0. - All sequences were truncated or padded to 1024 tokens. - Labels were aligned with subword tokens to maintain token–label consistency. ### Hardware and Training Setup | Component | Description | |:-----------|:-------------| | GPUs Used | 8× NVIDIA A100 | | Precision | Mixed precision (fp16) | | Batch Size | 8 per GPU | | Framework | Transformers (TensorFlow backend) | ### Optimization Summary The model converged after approximately **20 epochs**, with loss stabilizing at a low level. Validation metrics (F1, precision, recall) showed steady improvement from epoch 3 onward, confirming effective domain-specific adaptation. ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data Evaluation was conducted on a **cybersecurity-specific NER benchmark corpus** containing annotated threat reports, advisories, and incident analysis texts. This benchmark includes five key entity types: **Malware, Indicator, System, Organization, and Vulnerability**. #### Metrics The following metrics were used to assess model performance: - **F1-score:** Harmonic mean of precision and recall - **Recall:** Measures how many true entities were correctly identified - **Precision:** Measures how many predicted entities were correct ### Results | Model | F1 | Recall | Precision | |:------|:---:|:-------:|:-----------:| | **CyBERT** | 0.351 | 0.281 | 0.467 | | **SecureBERT** | 0.734 | 0.759 | 0.717 | | **SecureBERT 2.0 (Ours)** | **0.945** | **0.965** | **0.927** | #### Summary The **SecureBERT 2.0 NER model** significantly outperforms both CyBERT and the original SecureBERT across all metrics. - It achieves a **F1-score of 0.945**, a **+21% absolute improvement** over SecureBERT. - Its **recall (0.965)** indicates excellent coverage of cybersecurity entities. - Its **precision (0.927)** shows strong accuracy and low false-positive rates. This demonstrates that **domain-adaptive pretraining and fine-tuning** on cybersecurity corpora dramatically improves NER performance compared to general or earlier models. --- ## Reference ``` @article{aghaei2025securebert, title={SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence}, author={Aghaei, Ehsan and Jain, Sarthak and Arun, Prashanth and Sambamoorthy, Arjun}, journal={arXiv preprint arXiv:2510.00240}, year={2025} } ``` --- ## Model Card Authors Cisco AI ## Model Card Contact For inquiries, please contact [ai-threat-intel@cisco.com](mailto:ai-threat-intel@cisco.com)