kenil-patel-183 commited on
Commit
5701ac3
·
verified ·
1 Parent(s): 66b1329

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -1
README.md CHANGED
@@ -31,4 +31,102 @@ This model classifies handwritten digits (0-9) from 28x28 grayscale images using
31
  - **Layers**: 4 Convolutional layers with BatchNorm and ReLU activation
32
  - **Pooling**: MaxPool2d after first conv layer
33
  - **Final Layer**: Linear layer (3136 → 10)
34
- - **Parameters**: ~50K trainable parameters
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  - **Layers**: 4 Convolutional layers with BatchNorm and ReLU activation
32
  - **Pooling**: MaxPool2d after first conv layer
33
  - **Final Layer**: Linear layer (3136 → 10)
34
+ - **Parameters**: ~50K trainable parameters
35
+
36
+ ## Usage
37
+
38
+ **Security Note:** Requires _trust_remote_code=True_ because it uses custom model/processor classes.
39
+
40
+ ### Using transformers pipeline
41
+ ```python
42
+ from transformers import pipeline
43
+
44
+ clf = pipeline(
45
+ "image-classification",
46
+ model="kenil-patel-183/mnist-cnn-digit-classifier",
47
+ trust_remote_code=True, # required due to custom classes
48
+ )
49
+
50
+ preds = clf("path/to/digit.png", top_k=1)
51
+ print(preds) # [{'label': '7', 'score': 0.998...}]
52
+ ```
53
+
54
+ ### Using manual loading
55
+ ```python
56
+ from transformers import AutoConfig, AutoModel, AutoImageProcessor
57
+ from PIL import Image
58
+
59
+ model_id = "kenil-patel-183/mnist-cnn-digit-classifier"
60
+ config = AutoConfig.from_pretrained(model_id, trust_remote_code=True)
61
+ model = AutoModel.from_pretrained(model_id, trust_remote_code=True)
62
+ processor = AutoImageProcessor.from_pretrained(model_id, trust_remote_code=True)
63
+
64
+ image = Image.open("digit.png")
65
+ inputs = processor(images=image, return_tensors="pt")
66
+ with torch.no_grad():
67
+ outputs = model(**inputs)
68
+ logits = outputs.logits
69
+ pred = logits.argmax(-1).item()
70
+ print(pred)
71
+ ```
72
+
73
+ ## Model Architecture
74
+
75
+ ```
76
+ MNISTCNN(
77
+ (flatten): Flatten(start_dim=1, end_dim=-1)
78
+ (lin): Linear(in_features=3136, out_features=10, bias=True)
79
+ (network): Sequential(
80
+ (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1))
81
+ (1): BatchNorm2d(8, eps=1e-05, momentum=0.1)
82
+ (2): ReLU()
83
+ (3): MaxPool2d(kernel_size=(2, 2), stride=2)
84
+ (4): Conv2d(8, 16, kernel_size=(3, 3), stride=(1, 1))
85
+ (5): BatchNorm2d(16, eps=1e-05, momentum=0.1)
86
+ (6): ReLU()
87
+ (7): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1))
88
+ (8): BatchNorm2d(32, eps=1e-05, momentum=0.1)
89
+ (9): ReLU()
90
+ (10): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))
91
+ (11): BatchNorm2d(64, eps=1e-05, momentum=0.1)
92
+ (12): ReLU()
93
+ )
94
+ )
95
+ ```
96
+
97
+ ## Training Data
98
+
99
+ - **Dataset**: MNIST Handwritten Digits
100
+ - **Training samples**: 60,000
101
+ - **Test samples**: 10,000
102
+ - **Image size**: 28x28 grayscale
103
+ - **Classes**: 10 (digits 0-9)
104
+
105
+ ## Image Preprocessing Requirements
106
+
107
+ For best results, input images should be preprocessed as follows:
108
+
109
+ 1. **Convert to grayscale** if not already
110
+ 2. **Resize to 28x28 pixels**
111
+ 3. **Convert to tensor** (values between 0 and 1)
112
+ 4. **Normalize** with mean=0.1307, std=0.3081
113
+
114
+ ```python
115
+ transform = transforms.Compose([
116
+ transforms.Grayscale(),
117
+ transforms.Resize((28, 28)),
118
+ transforms.ToTensor(),
119
+ transforms.Normalize((0.1307,), (0.3081,))
120
+ ])
121
+ ```
122
+
123
+ ## Performance
124
+
125
+ Achieved 99.25% accuracy on MNIST test set.
126
+
127
+ ## Limitations
128
+
129
+ - **Input format**: Only works with 28x28 grayscale images
130
+ - **Domain**: Optimized for handwritten digits, may not work well on printed text
131
+ - **Background**: Works best with dark digits on light background
132
+ - **Noise**: Performance may degrade with noisy or heavily distorted images