geolip-vit-base-x3 / run_2_train_vit_with_soup_output.txt
AbstractPhil's picture
Create run_2_train_vit_with_soup_output.txt
cf9a64f verified
=================================================================
GEOLIP VISION ENCODER β€” FROM SCRATCH
ViT: 6L/384d/6h, patch16
196 patches + CLS β†’ 128-d output
Device: cuda
=================================================================
Loading soup...
Soup: mAP=0.837 CV_target=0.2731
train: loaded cached targets (118,287)
val: loaded cached targets (5,000)
Caching train images (118,287)...
Resolving data files: 100%
 39/39 [00:00<00:00, 5057.75it/s]
Downloading data: 100%
 39/39 [04:55<00:00,  7.45s/files]
default/train/0002.parquet: 100%
 509M/509M [00:09<00:00, 69.4MB/s]
default/train/0003.parquet: 100%
 502M/502M [00:03<00:00, 298MB/s]
default/train/0004.parquet: 100%
 507M/507M [00:10<00:00, 88.0MB/s]
default/train/0005.parquet: 100%
 499M/499M [00:04<00:00, 95.4MB/s]
default/train/0006.parquet: 100%
 510M/510M [00:09<00:00, 73.4MB/s]
default/train/0007.parquet: 100%
 502M/502M [00:06<00:00, 47.9MB/s]
default/train/0008.parquet: 100%
 514M/514M [00:09<00:00, 90.8MB/s]
default/train/0009.parquet: 100%
 509M/509M [00:06<00:00, 111MB/s]
default/train/0010.parquet: 100%
 509M/509M [00:07<00:00, 89.7MB/s]
default/train/0011.parquet: 100%
 505M/505M [00:05<00:00, 70.6MB/s]
default/train/0012.parquet: 100%
 507M/507M [00:06<00:00, 87.5MB/s]
default/train/0013.parquet: 100%
 502M/502M [00:09<00:00, 59.5MB/s]
default/train/0014.parquet: 100%
 504M/504M [00:09<00:00, 70.8MB/s]
default/train/0015.parquet: 100%
 514M/514M [00:07<00:00, 122MB/s]
default/train/0016.parquet: 100%
 507M/507M [00:07<00:00, 95.1MB/s]
default/train/0017.parquet: 100%
 509M/509M [00:09<00:00, 89.6MB/s]
default/train/0018.parquet: 100%
 504M/504M [00:06<00:00, 63.2MB/s]
default/train/0019.parquet: 100%
 511M/511M [00:10<00:00, 83.7MB/s]
default/train/0020.parquet: 100%
 510M/510M [00:10<00:00, 72.5MB/s]
default/train/0021.parquet: 100%
 504M/504M [00:09<00:00, 77.3MB/s]
default/train/0022.parquet: 100%
 507M/507M [00:10<00:00, 89.6MB/s]
default/train/0023.parquet: 100%
 511M/511M [00:10<00:00, 65.3MB/s]
default/train/0024.parquet: 100%
 505M/505M [00:09<00:00, 78.0MB/s]
default/train/0025.parquet: 100%
 503M/503M [00:04<00:00, 196MB/s]
default/train/0026.parquet: 100%
 508M/508M [00:05<00:00, 121MB/s]
default/train/0027.parquet: 100%
 508M/508M [00:06<00:00, 93.1MB/s]
default/train/0028.parquet: 100%
 507M/507M [00:05<00:00, 122MB/s]
default/train/0029.parquet: 100%
 510M/510M [00:07<00:00, 75.8MB/s]
default/train/0030.parquet: 100%
 505M/505M [00:08<00:00, 71.4MB/s]
default/train/0031.parquet: 100%
 502M/502M [00:04<00:00, 168MB/s]
default/train/0032.parquet: 100%
 502M/502M [00:02<00:00, 321MB/s]
default/train/0033.parquet: 100%
 508M/508M [00:07<00:00, 86.3MB/s]
default/train/0034.parquet: 100%
 504M/504M [00:07<00:00, 78.1MB/s]
default/train/0035.parquet: 100%
 499M/499M [00:16<00:00, 101MB/s]
default/train/0036.parquet: 100%
 507M/507M [00:10<00:00, 78.6MB/s]
default/train/0037.parquet: 100%
 501M/501M [00:09<00:00, 106MB/s]
default/train/0038.parquet: 100%
 79.2M/79.2M [00:01<00:00, 173MB/s]
default/val/0000.parquet: 100%
 504M/504M [00:04<00:00, 128MB/s]
default/val/0001.parquet: 100%
 311M/311M [00:03<00:00, 165MB/s]
Generating train split: 
 118287/0 [01:49<00:00, 1378.35 examples/s]
Generating validation split: 
 5000/0 [00:05<00:00, 617.41 examples/s]
Loading dataset shards: 100%
 39/39 [00:05<00:00,  8.83it/s]
Caching train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 118287/118287 [13:03<00:00, 151.05it/s]
Cached 118287/118287 images
Saved: cached_train_images.pt (35611 MB)
Caching val images (5,000)...
Resolving data files: 100%
 39/39 [00:00<00:00, 4857.40it/s]
Caching val: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5000/5000 [00:33<00:00, 148.88it/s]
Cached 5000/5000 images
Saved: cached_val_images.pt (1505 MB)
=================================================================
BUILD ENCODER
=================================================================
Architecture: 6L/384d/6h, patch16
Input: 224Γ—224 β†’ 196 patches
Output: 128-d (on hypersphere)
Parameters: 11,216,768
=================================================================
TRAINING
20 epochs, lr=0.0003, batch=48
Losses: InfoNCE + MSE + CV + BCE + Procrustes alignment
CV target: 0.2731
Images: train=118,287 val=5,000 (cached as tensors)
=================================================================
E 1/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:44<00:00, 14.97batch/s, cos=0.258, loss=2.6911, nce_acc=0.339, ordered=1]
E1 train: 165s loss=2.6891 nce=2.2529 mse=0.0120 bce=0.1963 nce_acc=0.340
E1 val: mAP=0.151 F1=0.162 R@1=0.032 cos=0.325 cv=0.2663 anchors=95/256 seen=5000/5000 β˜…
E 2/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:40<00:00, 15.32batch/s, cos=0.368, loss=1.7954, nce_acc=0.553, ordered=1]
E2 train: 161s loss=1.7948 nce=1.4297 mse=0.0099 bce=0.1473 nce_acc=0.553
E2 val: mAP=0.206 F1=0.197 R@1=0.062 cos=0.390 cv=0.2552 anchors=99/256 seen=5000/5000 β˜…
E 3/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:40<00:00, 15.37batch/s, cos=0.416, loss=1.4860, nce_acc=0.641, ordered=1]
E3 train: 160s loss=1.4854 nce=1.1484 mse=0.0092 bce=0.1338 nce_acc=0.641
E3 val: mAP=0.246 F1=0.244 R@1=0.091 cos=0.427 cv=0.2234 anchors=98/256 seen=5000/5000 β˜…
E 4/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:40<00:00, 15.40batch/s, cos=0.448, loss=1.2913, nce_acc=0.695, ordered=1]
E4 train: 160s loss=1.2910 nce=0.9727 mse=0.0087 bce=0.1265 nce_acc=0.695
E4 val: mAP=0.272 F1=0.266 R@1=0.113 cos=0.453 cv=0.2078 anchors=99/256 seen=5000/5000 β˜…
E 5/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:40<00:00, 15.40batch/s, cos=0.475, loss=1.1334, nce_acc=0.743, ordered=1]
E5 train: 160s loss=1.1331 nce=0.8303 mse=0.0083 bce=0.1205 nce_acc=0.743
E5 val: mAP=0.296 F1=0.292 R@1=0.139 cos=0.473 cv=0.2133 anchors=98/256 seen=5000/5000 β˜…
E 6/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:37<00:00, 15.63batch/s, cos=0.499, loss=1.0005, nce_acc=0.784, ordered=1]
E6 train: 158s loss=1.0003 nce=0.7111 mse=0.0079 bce=0.1158 nce_acc=0.784
E6 val: mAP=0.317 F1=0.311 R@1=0.164 cos=0.495 cv=0.1835 anchors=98/256 seen=5000/5000 β˜…
E 7/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:38<00:00, 15.60batch/s, cos=0.520, loss=0.8947, nce_acc=0.815, ordered=1]
E7 train: 158s loss=0.8943 nce=0.6172 mse=0.0075 bce=0.1115 nce_acc=0.815
E7 val: mAP=0.337 F1=0.335 R@1=0.190 cos=0.513 cv=0.1809 anchors=96/256 seen=5000/5000 β˜…
E 8/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:38<00:00, 15.59batch/s, cos=0.539, loss=0.8030, nce_acc=0.842, ordered=1]
E8 train: 158s loss=0.8028 nce=0.5365 mse=0.0072 bce=0.1076 nce_acc=0.843
E8 val: mAP=0.344 F1=0.331 R@1=0.207 cos=0.523 cv=0.1779 anchors=95/256 seen=5000/5000 β˜…
E 9/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:38<00:00, 15.58batch/s, cos=0.557, loss=0.7229, nce_acc=0.866, ordered=1]
E9 train: 158s loss=0.7228 nce=0.4665 mse=0.0070 bce=0.1041 nce_acc=0.866
E9 val: mAP=0.361 F1=0.349 R@1=0.218 cos=0.537 cv=0.1764 anchors=95/256 seen=5000/5000 β˜…
E10/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:38<00:00, 15.51batch/s, cos=0.574, loss=0.6538, nce_acc=0.887, ordered=1]
E10 train: 159s loss=0.6538 nce=0.4070 mse=0.0067 bce=0.1009 nce_acc=0.887
E10 val: mAP=0.380 F1=0.361 R@1=0.254 cos=0.557 cv=0.1699 anchors=96/256 seen=5000/5000 β˜…
E11/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:38<00:00, 15.54batch/s, cos=0.589, loss=0.5929, nce_acc=0.905, ordered=1]
E11 train: 159s loss=0.5928 nce=0.3545 mse=0.0065 bce=0.0978 nce_acc=0.905
E11 val: mAP=0.387 F1=0.377 R@1=0.265 cos=0.564 cv=0.1497 anchors=95/256 seen=5000/5000 β˜…
E12/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:38<00:00, 15.55batch/s, cos=0.604, loss=0.5372, nce_acc=0.920, ordered=1]
E12 train: 158s loss=0.5372 nce=0.3073 mse=0.0062 bce=0.0948 nce_acc=0.920
E12 val: mAP=0.400 F1=0.382 R@1=0.276 cos=0.573 cv=0.1639 anchors=95/256 seen=5000/5000 β˜…
E13/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:37<00:00, 15.60batch/s, cos=0.617, loss=0.4917, nce_acc=0.933, ordered=1]
E13 train: 158s loss=0.4917 nce=0.2693 mse=0.0060 bce=0.0920 nce_acc=0.933
E13 val: mAP=0.408 F1=0.392 R@1=0.291 cos=0.582 cv=0.1615 anchors=95/256 seen=5000/5000 β˜…
E14/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:37<00:00, 15.61batch/s, cos=0.629, loss=0.4502, nce_acc=0.945, ordered=1]
E14 train: 158s loss=0.4501 nce=0.2347 mse=0.0058 bce=0.0895 nce_acc=0.945
E14 val: mAP=0.413 F1=0.403 R@1=0.304 cos=0.586 cv=0.1594 anchors=95/256 seen=5000/5000 β˜…
E15/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:37<00:00, 15.63batch/s, cos=0.640, loss=0.4169, nce_acc=0.954, ordered=1]
E15 train: 158s loss=0.4168 nce=0.2075 mse=0.0057 bce=0.0873 nce_acc=0.954
E15 val: mAP=0.418 F1=0.403 R@1=0.307 cos=0.591 cv=0.1607 anchors=94/256 seen=5000/5000 β˜…
E16/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:37<00:00, 15.62batch/s, cos=0.649, loss=0.3909, nce_acc=0.961, ordered=1]
E16 train: 158s loss=0.3908 nce=0.1866 mse=0.0055 bce=0.0854 nce_acc=0.961
E16 val: mAP=0.422 F1=0.411 R@1=0.321 cos=0.595 cv=0.1495 anchors=95/256 seen=5000/5000 β˜…
E17/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:37<00:00, 15.61batch/s, cos=0.656, loss=0.3717, nce_acc=0.966, ordered=1]
E17 train: 158s loss=0.3716 nce=0.1715 mse=0.0054 bce=0.0838 nce_acc=0.966
E17 val: mAP=0.426 F1=0.417 R@1=0.321 cos=0.597 cv=0.1420 anchors=94/256 seen=5000/5000 β˜…
E18/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:39<00:00, 15.43batch/s, cos=0.661, loss=0.3579, nce_acc=0.969, ordered=1]
E18 train: 160s loss=0.3579 nce=0.1607 mse=0.0053 bce=0.0826 nce_acc=0.969
E18 val: mAP=0.429 F1=0.416 R@1=0.325 cos=0.599 cv=0.1375 anchors=94/256 seen=5000/5000 β˜…
E19/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:38<00:00, 15.59batch/s, cos=0.664, loss=0.3494, nce_acc=0.971, ordered=1]
E19 train: 158s loss=0.3494 nce=0.1539 mse=0.0053 bce=0.0820 nce_acc=0.971
E19 val: mAP=0.429 F1=0.420 R@1=0.325 cos=0.600 cv=0.1426 anchors=94/256 seen=5000/5000 β˜…
E20/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:36<00:00, 15.77batch/s, cos=0.665, loss=0.3456, nce_acc=0.972, ordered=1]
E20 train: 156s loss=0.3455 nce=0.1510 mse=0.0052 bce=0.0816 nce_acc=0.972
E20 val: mAP=0.429 F1=0.418 R@1=0.323 cos=0.599 cv=0.1570 anchors=94/256 seen=5000/5000
Best mAP: 0.429
Encoder: 11,216,768 params (from scratch)
Checkpoints saved every epoch in checkpoints/
Tensorboard: runs/geolip_vit_encoder
=================================================================
DONE
=================================================================