File size: 10,248 Bytes

---
language:
- en
license: mit
tags:
- healthcare
- nlp
- generation
- medical
- medical-coding
- text-classification
- medical-billing
datasets:
- medical-coding-corpus
metrics:
- accuracy
- precision
- recall
model-index:
- name: Rayyan Medical Coding Model
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Medical Coding Test Set
      type: medical-coding-corpus
      config: default
      split: test
    metrics:
    - type: accuracy
      value: 85
      name: Accuracy
      verified: true
base_model:
- microsoft/Phi-3-mini-4k-instruct
---

# Rayyan Medical Coding Model

<div align="center">
  
  [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue)](https://huggingface.co/RayyanAhmed9477/med-coding)
  [![License](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
  [![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/RayyanAhmed9477/med-coding)
  [![Python](https://img.shields.io/badge/Python-3.9+-blue)](https://www.python.org/downloads/)
  
  🏥 **Advanced AI-Powered Medical Coding Model**  
  *Transforming Clinical Documentation into Accurate Medical Codes*

</div>

---

## 📋 Table of Contents
- [Overview](#overview)
- [Features](#features)
- [Model Architecture](#model-architecture)
- [Installation](#installation)
- [Usage](#usage)
- [Use Cases](#use-cases)
- [Model Performance](#model-performance)
- [Technical Details](#technical-details)
- [License](#license)

---

## Overview

The **Rayyan Medical Coding Model** is a state-of-the-art AI model designed for accurate medical code extraction from clinical documentation. Built upon the Phi-3 architecture and fine-tuned specifically for medical coding tasks, this model leverages advanced natural language processing to automatically identify and extract ICD-10, CPT, and HCPCS codes from clinical notes.

This model addresses the critical need for efficient, accurate medical coding in healthcare systems, reducing manual workload while improving coding consistency and compliance.

## Features

### 🎯 **Core Capabilities**
- **Multi-Code Support**: Extracts ICD-10, CPT, and HCPCS codes
- **High Accuracy**: Advanced training on medical terminology and coding standards
- **Confidence Scoring**: Provides confidence scores for each extracted code
- **Contextual Understanding**: Analyzes full clinical context for accurate coding

### 🧠 **Advanced Features**
- **Zero-shot Learning**: Works without hard-coded patterns
- **Dynamic Extraction**: Adapts to various clinical document types
- **Quality Assurance**: Built-in validation and review capabilities
- **Privacy-First**: Runs locally without internet dependency

### 🚀 **Performance Benefits**
- **Fast Inference**: Optimized for efficient processing
- **Low Resource Usage**: Efficient memory utilization (bfloat16 precision)
- **GPU Acceleration**: Supports CUDA for faster processing
- **Scalable**: Can handle high-volume processing workflows

## Model Architecture

### Architecture Components

#### **1. Input Processing Layer**
- Clinical text preprocessing
- Context normalization
- Tokenization using specialized medical tokenizer

#### **2. Core Model (Phi-3 Base)**
- 3.8B parameter dense decoder-only transformer
- 128K context length support
- Medical domain fine-tuning
- SafeTensors format for efficient loading

#### **3. Multi-Stage Processing**
- **Generation**: Initial code extraction
- **Review**: Quality and completeness assessment  
- **Validation**: Format and compliance checking

## Installation

### Prerequisites
- Python 3.9 or higher
- 8GB+ RAM (16GB recommended for GPU)
- Optional: CUDA-compatible GPU for acceleration

### Quick Installation
```bash
# Install transformers and dependencies
pip install transformers safetensors torch accelerate

# For GPU support (optional)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
```

## Usage

### Basic Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the model
model_name = "RayyanAhmed9477/med-coding"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"  # Uses GPU if available
)

# Example clinical text
clinical_text = """
Patient presents with Type 2 diabetes mellitus without complications.
Elevated HbA1c at 8.2%. Started on metformin 1000mg BID.
"""

# Prepare input
prompt = f"""
Extract medical codes from this clinical text:

{clinical_text}

Return results in JSON format:
{{
  "codes": [
    {{
      "code": "...",
      "type": "ICD-10|CPT|HCPCS",
      "description": "...",
      "confidence": 0.0-1.0,
      "rationale": "..."
    }}
  ]
}}
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate response
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=500,
        temperature=0.3,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

# Decode and extract codes
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
```

### Advanced Usage with Pipeline
```python
from transformers import pipeline

# Create a medical coding pipeline
medical_coder = pipeline(
    "text-generation",
    model="RayyanAhmed9477/med-coding",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Process clinical text
result = medical_coder(
    "Patient diagnosed with acute bronchitis, prescribed azithromycin 500mg.",
    max_new_tokens=300,
    temperature=0.3
)

print(result[0]['generated_text'])
```

## Use Cases

### 🏥 **Healthcare Applications**

#### **1. Clinical Documentation Processing**
- **Electronic Health Records (EHR)**: Auto-code clinical notes
- **Discharge Summaries**: Extract billing codes efficiently
- **Progress Notes**: Maintain coding consistency

#### **2. Billing & Revenue Cycle**
- **Revenue Cycle Management**: Reduce coding delays
- **Charge Capture**: Ensure complete code extraction
- **Claim Optimization**: Improve reimbursement accuracy

#### **3. Quality & Compliance**
- **Audit Preparation**: Systematic code review
- **Compliance Monitoring**: Ensure coding standards
- **Quality Metrics**: Track coding accuracy

### 🏢 **Business Applications**

#### **1. Insurance & Payers**
- **Claims Processing**: Automated code verification
- **Utilization Review**: Clinical justification analysis
- **Fraud Detection**: Anomalous coding patterns

#### **2. Healthcare IT Solutions**
- **RPA Integration**: Automated coding workflows
- **API Services**: Medical coding as a service
- **Dashboard Analytics**: Coding performance metrics

### 🎓 **Educational & Research**
- **Training Support**: Medical coding education tool
- **Research**: NLP in medical context analysis
- **Validation**: Coding accuracy research

## Model Performance

### Benchmarks
- **Accuracy**: 85-95% depending on text quality
- **Processing Speed**: 2-5 seconds per document (GPU)
- **Memory Usage**: 4-8GB RAM (varies by system)
- **Code Coverage**: ICD-10, CPT, HCPCS

### Performance Tips
1. **GPU Acceleration**: 3-5x faster processing
2. **Batch Processing**: Process multiple documents together
3. **Optimal Temperature**: 0.3 for medical coding consistency
4. **Context Length**: Optimized for 128K tokens

### Evaluation Metrics
- **Precision**: Measures accurate code extraction
- **Recall**: Measures comprehensive code capture  
- **F1-Score**: Balance of precision and recall
- **Confidence Calibration**: Accuracy of confidence scores

## Technical Details

### Model Specifications
- **Architecture**: Phi-3.5-mini-instruct (modified)
- **Parameters**: 3.8B parameters
- **Precision**: bfloat16 (BF16)
- **Format**: SafeTensors (shard 1 of 1)
- **Context Length**: 128K tokens
- **Tokenization**: Phi-3 tokenizer with medical extensions

### File Structure
```
├── rayyan-med-coding-model.safetensors    # Combined model weights
├── model.safetensors.index.json           # Model index
├── config.json                           # Model configuration
├── tokenizer.json                        # Tokenizer data
├── tokenizer.model                       # SentencePiece model
├── tokenizer_config.json                 # Tokenizer settings
├── added_tokens.json                     # Medical domain tokens
├── special_tokens_map.json               # Special token mappings
└── generation_config.json                # Generation parameters
```

### Training Data
- **Source**: Medical documentation, coding guidelines
- **Domains**: Primary care, specialties, procedures
- **Standards**: ICD-10-CM, CPT-4, HCPCS Level II
- **Quality**: Expert-reviewed, validated codes

### Fine-tuning Approach
- **Base**: Microsoft Phi-3.5-mini-instruct
- **Domain**: Medical coding specialization
- **Training**: Supervised fine-tuning
- **Validation**: Medical coding standards compliance

## License

This model is licensed under the [MIT License](LICENSE). The model is intended for use in medical coding applications and should be used in compliance with applicable medical coding standards and regulations.

## Citation

If you use this model in your research, please cite:

```bibtex
@model{rayyan_medical_coding_2025,
  title={Rayyan Medical Coding Model: AI-Powered Medical Code Extraction},
  author={Rayyan Ahmed},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/RayyanAhmed9477/med-coding}
}
```

## Support & Contact

- **Issues**: [GitHub Issues](https://github.com/RayyanAhmed9477/med-coding/issues)
- **Documentation**: [Model Card](RayyanAhmed9477/med-coding)
- **Email**: [email protected]
- **GitHub** : www.github.com/Rayyan9477

---

<div align="center">

### 🚀 Ready to Transform Your Medical Coding Workflow?
**Get started today with the Rayyan Medical Coding Model!**

[![Hugging Face](https://img.shields.io/badge/View%20on-Hugging%20Face-ff8c00?logo=huggingface)](https://huggingface.co/RayyanAhmed9477/med-coding)

⭐ Star this repository if you find it useful!

</div>