bert-base-uncased-finetuned-squadv2

This model is a fine-tuned version of google-bert/bert-base-uncased on the SQuAD v2 dataset. It has been trained to perform extractive question answering with the ability to detect unanswerable questions.

Model description

This model is based on BERT base uncased architecture and has been fine-tuned on SQuAD v2, which extends the original SQuAD dataset to include questions that cannot be answered based on the provided context. The model learns to either provide the answer span from the context or indicate that the question cannot be answered.

Key features:

Architecture: BERT base uncased (12 layers, 768 hidden size, 12 attention heads)
Task: Extractive Question Answering with No-Answer Detection
Language: English
Training Data: SQuAD v2.0
Input: Question and context pairs
Output: Answer span or indication that question is unanswerable

Training procedure

Training hyperparameters

The model was trained with the following hyperparameters:

Learning rate: 3e-05
Train batch size: 12
Eval batch size: 8
Optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
LR scheduler: Linear
Number of epochs: 5
Seed: 42

Training results

The model achieved the following performance metrics:

HasAns Exact Match: 71.26%
HasAns F1: 78.78%
NoAns Exact Match: 73.42%
NoAns F1: 73.42%
Best Exact Match: 72.34%
Best F1: 76.10%

Additional training statistics:

Training samples: 131,754
Evaluation samples: 12,134
Training time: 31m 58s
Evaluation time: 42.89s
Training loss: 0.0711
Training samples per second: 343.32
Training steps per second: 28.61

Framework versions

Transformers: 4.47.0.dev0
PyTorch: 2.5.1+cu124
Datasets: 3.1.0
Tokenizers: 0.20.3

Intended uses & limitations

This model is intended for:

Extractive question answering on English text
Detecting unanswerable questions
General-domain questions and contexts
Research and educational purposes

Limitations:

Performance may vary on domain-specific content
May struggle with complex reasoning questions
Limited to extractive QA (cannot generate free-form answers)
Only works with English language content

How to use

import torch
from transformers import AutoModelForQuestionAnswering, AutoTokenizer

# Load model & tokenizer
model_name = "real-jiakai/bert-base-uncased-finetuned-squadv2"
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

def get_answer_v2(question, context, threshold=0.0):
    # Tokenize input with maximum sequence length of 384
    inputs = tokenizer(
        question,
        context,
        return_tensors="pt",
        max_length=384,
        truncation=True
    )
    
    with torch.no_grad():
        outputs = model(**inputs)
        start_logits = outputs.start_logits[0]
        end_logits = outputs.end_logits[0]
        
        # Calculate null score (score for predicting no answer)
        null_score = start_logits[0].item() + end_logits[0].item()
        
        # Find the best non-null answer, excluding [CLS] position
        # Set logits at [CLS] position to negative infinity
        start_logits[0] = float('-inf')
        end_logits[0] = float('-inf')
        
        start_idx = torch.argmax(start_logits)
        end_idx = torch.argmax(end_logits)
        
        # Ensure end_idx is not less than start_idx
        if end_idx < start_idx:
            end_idx = start_idx
            
        answer_score = start_logits[start_idx].item() + end_logits[end_idx].item()
        
        # If null score is higher (beyond threshold), return "no answer"
        if null_score - answer_score > threshold:
            return "Question cannot be answered based on the given context."
            
        # Otherwise, return the extracted answer
        tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
        answer = tokenizer.convert_tokens_to_string(tokens[start_idx:end_idx+1])
        
        # Check if answer is empty or contains only special tokens
        if not answer.strip() or answer.strip() in ['[CLS]', '[SEP]']:
            return "Question cannot be answered based on the given context."
            
        return answer.strip()

# Example usage
context = "The Apollo program was designed to land humans on the Moon and bring them safely back to Earth."
questions = [
    "What was the goal of the Apollo program?",
    "Who was the first person to walk on Mars?",  # Unanswerable question
    "What was the Apollo program designed to do?"
]

for question in questions:
    answer = get_answer_v2(question, context, threshold=1.0)
    print(f"Question: {question}")
    print(f"Answer: {answer}")
    print("-" * 50)

Downloads last month: 3

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for real-jiakai/bert-base-uncased-finetuned-squadv2

Base model

google-bert/bert-base-uncased

Finetuned

(6763)

this model

Dataset used to train real-jiakai/bert-base-uncased-finetuned-squadv2

Evaluation results

HasAns_exact on squad_v2
self-reported

71.250
HasAns_f1 on squad_v2
self-reported

78.770
NoAns_exact on squad_v2
self-reported

73.420
NoAns_f1 on squad_v2
self-reported

73.420
best_exact on squad_v2
self-reported

72.340
best_f1 on squad_v2
self-reported

76.090