File size: 10,248 Bytes
89c6dd4
 
 
 
 
 
 
 
c681a03
 
 
 
89c6dd4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c681a03
 
89c6dd4
 
eafc45c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c681a03
 
eafc45c
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
---
language:
- en
license: mit
tags:
- healthcare
- nlp
- generation
- medical
- medical-coding
- text-classification
- medical-billing
datasets:
- medical-coding-corpus
metrics:
- accuracy
- precision
- recall
model-index:
- name: Rayyan Medical Coding Model
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Medical Coding Test Set
      type: medical-coding-corpus
      config: default
      split: test
    metrics:
    - type: accuracy
      value: 85
      name: Accuracy
      verified: true
base_model:
- microsoft/Phi-3-mini-4k-instruct
---

# Rayyan Medical Coding Model

<div align="center">
  
  [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue)](https://huggingface.co/RayyanAhmed9477/med-coding)
  [![License](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
  [![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/RayyanAhmed9477/med-coding)
  [![Python](https://img.shields.io/badge/Python-3.9+-blue)](https://www.python.org/downloads/)
  
  πŸ₯ **Advanced AI-Powered Medical Coding Model**  
  *Transforming Clinical Documentation into Accurate Medical Codes*

</div>

---

## πŸ“‹ Table of Contents
- [Overview](#overview)
- [Features](#features)
- [Model Architecture](#model-architecture)
- [Installation](#installation)
- [Usage](#usage)
- [Use Cases](#use-cases)
- [Model Performance](#model-performance)
- [Technical Details](#technical-details)
- [License](#license)

---

## Overview

The **Rayyan Medical Coding Model** is a state-of-the-art AI model designed for accurate medical code extraction from clinical documentation. Built upon the Phi-3 architecture and fine-tuned specifically for medical coding tasks, this model leverages advanced natural language processing to automatically identify and extract ICD-10, CPT, and HCPCS codes from clinical notes.

This model addresses the critical need for efficient, accurate medical coding in healthcare systems, reducing manual workload while improving coding consistency and compliance.

## Features

### 🎯 **Core Capabilities**
- **Multi-Code Support**: Extracts ICD-10, CPT, and HCPCS codes
- **High Accuracy**: Advanced training on medical terminology and coding standards
- **Confidence Scoring**: Provides confidence scores for each extracted code
- **Contextual Understanding**: Analyzes full clinical context for accurate coding

### 🧠 **Advanced Features**
- **Zero-shot Learning**: Works without hard-coded patterns
- **Dynamic Extraction**: Adapts to various clinical document types
- **Quality Assurance**: Built-in validation and review capabilities
- **Privacy-First**: Runs locally without internet dependency

### πŸš€ **Performance Benefits**
- **Fast Inference**: Optimized for efficient processing
- **Low Resource Usage**: Efficient memory utilization (bfloat16 precision)
- **GPU Acceleration**: Supports CUDA for faster processing
- **Scalable**: Can handle high-volume processing workflows

## Model Architecture

### Architecture Components

#### **1. Input Processing Layer**
- Clinical text preprocessing
- Context normalization
- Tokenization using specialized medical tokenizer

#### **2. Core Model (Phi-3 Base)**
- 3.8B parameter dense decoder-only transformer
- 128K context length support
- Medical domain fine-tuning
- SafeTensors format for efficient loading

#### **3. Multi-Stage Processing**
- **Generation**: Initial code extraction
- **Review**: Quality and completeness assessment  
- **Validation**: Format and compliance checking

## Installation

### Prerequisites
- Python 3.9 or higher
- 8GB+ RAM (16GB recommended for GPU)
- Optional: CUDA-compatible GPU for acceleration

### Quick Installation
```bash
# Install transformers and dependencies
pip install transformers safetensors torch accelerate

# For GPU support (optional)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
```

## Usage

### Basic Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the model
model_name = "RayyanAhmed9477/med-coding"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"  # Uses GPU if available
)

# Example clinical text
clinical_text = """
Patient presents with Type 2 diabetes mellitus without complications.
Elevated HbA1c at 8.2%. Started on metformin 1000mg BID.
"""

# Prepare input
prompt = f"""
Extract medical codes from this clinical text:

{clinical_text}

Return results in JSON format:
{{
  "codes": [
    {{
      "code": "...",
      "type": "ICD-10|CPT|HCPCS",
      "description": "...",
      "confidence": 0.0-1.0,
      "rationale": "..."
    }}
  ]
}}
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate response
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=500,
        temperature=0.3,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

# Decode and extract codes
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
```

### Advanced Usage with Pipeline
```python
from transformers import pipeline

# Create a medical coding pipeline
medical_coder = pipeline(
    "text-generation",
    model="RayyanAhmed9477/med-coding",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Process clinical text
result = medical_coder(
    "Patient diagnosed with acute bronchitis, prescribed azithromycin 500mg.",
    max_new_tokens=300,
    temperature=0.3
)

print(result[0]['generated_text'])
```

## Use Cases

### πŸ₯ **Healthcare Applications**

#### **1. Clinical Documentation Processing**
- **Electronic Health Records (EHR)**: Auto-code clinical notes
- **Discharge Summaries**: Extract billing codes efficiently
- **Progress Notes**: Maintain coding consistency

#### **2. Billing & Revenue Cycle**
- **Revenue Cycle Management**: Reduce coding delays
- **Charge Capture**: Ensure complete code extraction
- **Claim Optimization**: Improve reimbursement accuracy

#### **3. Quality & Compliance**
- **Audit Preparation**: Systematic code review
- **Compliance Monitoring**: Ensure coding standards
- **Quality Metrics**: Track coding accuracy

### 🏒 **Business Applications**

#### **1. Insurance & Payers**
- **Claims Processing**: Automated code verification
- **Utilization Review**: Clinical justification analysis
- **Fraud Detection**: Anomalous coding patterns

#### **2. Healthcare IT Solutions**
- **RPA Integration**: Automated coding workflows
- **API Services**: Medical coding as a service
- **Dashboard Analytics**: Coding performance metrics

### πŸŽ“ **Educational & Research**
- **Training Support**: Medical coding education tool
- **Research**: NLP in medical context analysis
- **Validation**: Coding accuracy research

## Model Performance

### Benchmarks
- **Accuracy**: 85-95% depending on text quality
- **Processing Speed**: 2-5 seconds per document (GPU)
- **Memory Usage**: 4-8GB RAM (varies by system)
- **Code Coverage**: ICD-10, CPT, HCPCS

### Performance Tips
1. **GPU Acceleration**: 3-5x faster processing
2. **Batch Processing**: Process multiple documents together
3. **Optimal Temperature**: 0.3 for medical coding consistency
4. **Context Length**: Optimized for 128K tokens

### Evaluation Metrics
- **Precision**: Measures accurate code extraction
- **Recall**: Measures comprehensive code capture  
- **F1-Score**: Balance of precision and recall
- **Confidence Calibration**: Accuracy of confidence scores

## Technical Details

### Model Specifications
- **Architecture**: Phi-3.5-mini-instruct (modified)
- **Parameters**: 3.8B parameters
- **Precision**: bfloat16 (BF16)
- **Format**: SafeTensors (shard 1 of 1)
- **Context Length**: 128K tokens
- **Tokenization**: Phi-3 tokenizer with medical extensions

### File Structure
```
β”œβ”€β”€ rayyan-med-coding-model.safetensors    # Combined model weights
β”œβ”€β”€ model.safetensors.index.json           # Model index
β”œβ”€β”€ config.json                           # Model configuration
β”œβ”€β”€ tokenizer.json                        # Tokenizer data
β”œβ”€β”€ tokenizer.model                       # SentencePiece model
β”œβ”€β”€ tokenizer_config.json                 # Tokenizer settings
β”œβ”€β”€ added_tokens.json                     # Medical domain tokens
β”œβ”€β”€ special_tokens_map.json               # Special token mappings
└── generation_config.json                # Generation parameters
```

### Training Data
- **Source**: Medical documentation, coding guidelines
- **Domains**: Primary care, specialties, procedures
- **Standards**: ICD-10-CM, CPT-4, HCPCS Level II
- **Quality**: Expert-reviewed, validated codes

### Fine-tuning Approach
- **Base**: Microsoft Phi-3.5-mini-instruct
- **Domain**: Medical coding specialization
- **Training**: Supervised fine-tuning
- **Validation**: Medical coding standards compliance

## License

This model is licensed under the [MIT License](LICENSE). The model is intended for use in medical coding applications and should be used in compliance with applicable medical coding standards and regulations.

## Citation

If you use this model in your research, please cite:

```bibtex
@model{rayyan_medical_coding_2025,
  title={Rayyan Medical Coding Model: AI-Powered Medical Code Extraction},
  author={Rayyan Ahmed},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/RayyanAhmed9477/med-coding}
}
```

## Support & Contact

- **Issues**: [GitHub Issues](https://github.com/RayyanAhmed9477/med-coding/issues)
- **Documentation**: [Model Card](RayyanAhmed9477/med-coding)
- **Email**: [email protected]
- **GitHub** : www.github.com/Rayyan9477

---

<div align="center">

### πŸš€ Ready to Transform Your Medical Coding Workflow?
**Get started today with the Rayyan Medical Coding Model!**

[![Hugging Face](https://img.shields.io/badge/View%20on-Hugging%20Face-ff8c00?logo=huggingface)](https://huggingface.co/RayyanAhmed9477/med-coding)

⭐ Star this repository if you find it useful!

</div>