# BioClinical Medical Coding Model ## Model Description This is a BioClinicalModernBERT-based model for automated medical coding. The model predicts ICD-10-CM diagnosis codes and HCPCS/CPT procedure codes from clinical notes. ## Model Architecture - **Base Model**: microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext - **Training**: 3-phase fine-tuning approach - Phase 1: Dense retrieval training - Phase 2: Hard negative re-ranking - Phase 3: Multi-label classification - **Code Vocabulary**: 31794 modern medical codes - **Performance**: F1-score: 0.80-0.88 on frequent codes ## Usage ```python from inference import MedicalCodingPredictor # Initialize predictor predictor = MedicalCodingPredictor() # Predict codes from clinical note clinical_note = "Patient presents with chest pain and elevated cardiac enzymes..." predictions = predictor.predict(clinical_note, threshold=0.5) for pred in predictions: print(f"Code: {pred['code']}") print(f"Type: {pred['type']}") print(f"Description: {pred['description']}") print(f"Confidence: {pred['confidence']:.3f}") ``` ## API Response Format ```json { "code": "I25.111", "type": "ICD-10-CM", "description": "CODE DESCRIPTION", "confidence": 0.85, "f1_score": 0.82 } ``` ## Files Included - `pytorch_model.bin`: Model weights - `config.json`: Model configuration - `code_to_idx.json`: Code to index mapping - `idx_to_code.json`: Index to code mapping - `code_descriptions.json`: Code descriptions - `code_f1_scores.json`: Per-code F1 scores - `inference.py`: Inference script - `requirements.txt`: Dependencies ## Training Data Trained on MIMIC-IV clinical notes with temporal matching for accurate code assignment. ## Limitations - Generic code descriptions (update with medical terminology database) - Performance varies by code frequency - Requires clinical validation for production use ## Citation If you use this model, please cite the MIMIC-IV dataset and acknowledge the multi-stage training approach.