| ================================================================= |
| GEOLIP VISION ENCODER β FROM SCRATCH |
| ViT: 6L/384d/6h, patch16 |
| 196 patches + CLS β 128-d output |
| Device: cuda |
| ================================================================= |
|
|
| Loading soup... |
| Soup: mAP=0.837 CV_target=0.2731 |
| train: loaded cached targets (118,287) |
| val: loaded cached targets (5,000) |
| Caching train images (118,287)... |
| Resolvingβdataβfiles:β100% |
| β39/39β[00:00<00:00,β5057.75it/s] |
| Downloadingβdata:β100% |
| β39/39β[04:55<00:00,ββ7.45s/files] |
| default/train/0002.parquet:β100% |
| β509M/509Mβ[00:09<00:00,β69.4MB/s] |
| default/train/0003.parquet:β100% |
| β502M/502Mβ[00:03<00:00,β298MB/s] |
| default/train/0004.parquet:β100% |
| β507M/507Mβ[00:10<00:00,β88.0MB/s] |
| default/train/0005.parquet:β100% |
| β499M/499Mβ[00:04<00:00,β95.4MB/s] |
| default/train/0006.parquet:β100% |
| β510M/510Mβ[00:09<00:00,β73.4MB/s] |
| default/train/0007.parquet:β100% |
| β502M/502Mβ[00:06<00:00,β47.9MB/s] |
| default/train/0008.parquet:β100% |
| β514M/514Mβ[00:09<00:00,β90.8MB/s] |
| default/train/0009.parquet:β100% |
| β509M/509Mβ[00:06<00:00,β111MB/s] |
| default/train/0010.parquet:β100% |
| β509M/509Mβ[00:07<00:00,β89.7MB/s] |
| default/train/0011.parquet:β100% |
| β505M/505Mβ[00:05<00:00,β70.6MB/s] |
| default/train/0012.parquet:β100% |
| β507M/507Mβ[00:06<00:00,β87.5MB/s] |
| default/train/0013.parquet:β100% |
| β502M/502Mβ[00:09<00:00,β59.5MB/s] |
| default/train/0014.parquet:β100% |
| β504M/504Mβ[00:09<00:00,β70.8MB/s] |
| default/train/0015.parquet:β100% |
| β514M/514Mβ[00:07<00:00,β122MB/s] |
| default/train/0016.parquet:β100% |
| β507M/507Mβ[00:07<00:00,β95.1MB/s] |
| default/train/0017.parquet:β100% |
| β509M/509Mβ[00:09<00:00,β89.6MB/s] |
| default/train/0018.parquet:β100% |
| β504M/504Mβ[00:06<00:00,β63.2MB/s] |
| default/train/0019.parquet:β100% |
| β511M/511Mβ[00:10<00:00,β83.7MB/s] |
| default/train/0020.parquet:β100% |
| β510M/510Mβ[00:10<00:00,β72.5MB/s] |
| default/train/0021.parquet:β100% |
| β504M/504Mβ[00:09<00:00,β77.3MB/s] |
| default/train/0022.parquet:β100% |
| β507M/507Mβ[00:10<00:00,β89.6MB/s] |
| default/train/0023.parquet:β100% |
| β511M/511Mβ[00:10<00:00,β65.3MB/s] |
| default/train/0024.parquet:β100% |
| β505M/505Mβ[00:09<00:00,β78.0MB/s] |
| default/train/0025.parquet:β100% |
| β503M/503Mβ[00:04<00:00,β196MB/s] |
| default/train/0026.parquet:β100% |
| β508M/508Mβ[00:05<00:00,β121MB/s] |
| default/train/0027.parquet:β100% |
| β508M/508Mβ[00:06<00:00,β93.1MB/s] |
| default/train/0028.parquet:β100% |
| β507M/507Mβ[00:05<00:00,β122MB/s] |
| default/train/0029.parquet:β100% |
| β510M/510Mβ[00:07<00:00,β75.8MB/s] |
| default/train/0030.parquet:β100% |
| β505M/505Mβ[00:08<00:00,β71.4MB/s] |
| default/train/0031.parquet:β100% |
| β502M/502Mβ[00:04<00:00,β168MB/s] |
| default/train/0032.parquet:β100% |
| β502M/502Mβ[00:02<00:00,β321MB/s] |
| default/train/0033.parquet:β100% |
| β508M/508Mβ[00:07<00:00,β86.3MB/s] |
| default/train/0034.parquet:β100% |
| β504M/504Mβ[00:07<00:00,β78.1MB/s] |
| default/train/0035.parquet:β100% |
| β499M/499Mβ[00:16<00:00,β101MB/s] |
| default/train/0036.parquet:β100% |
| β507M/507Mβ[00:10<00:00,β78.6MB/s] |
| default/train/0037.parquet:β100% |
| β501M/501Mβ[00:09<00:00,β106MB/s] |
| default/train/0038.parquet:β100% |
| β79.2M/79.2Mβ[00:01<00:00,β173MB/s] |
| default/val/0000.parquet:β100% |
| β504M/504Mβ[00:04<00:00,β128MB/s] |
| default/val/0001.parquet:β100% |
| β311M/311Mβ[00:03<00:00,β165MB/s] |
| Generatingβtrainβsplit:β |
| β118287/0β[01:49<00:00,β1378.35βexamples/s] |
| Generatingβvalidationβsplit:β |
| β5000/0β[00:05<00:00,β617.41βexamples/s] |
| Loadingβdatasetβshards:β100% |
| β39/39β[00:05<00:00,ββ8.83it/s] |
| Caching train: 100%|ββββββββββ| 118287/118287 [13:03<00:00, 151.05it/s] |
| Cached 118287/118287 images |
| Saved: cached_train_images.pt (35611 MB) |
| Caching val images (5,000)... |
| Resolvingβdataβfiles:β100% |
| β39/39β[00:00<00:00,β4857.40it/s] |
| Caching val: 100%|ββββββββββ| 5000/5000 [00:33<00:00, 148.88it/s] |
| Cached 5000/5000 images |
| Saved: cached_val_images.pt (1505 MB) |
|
|
| ================================================================= |
| BUILD ENCODER |
| ================================================================= |
| Architecture: 6L/384d/6h, patch16 |
| Input: 224Γ224 β 196 patches |
| Output: 128-d (on hypersphere) |
| Parameters: 11,216,768 |
|
|
| ================================================================= |
| TRAINING |
| 20 epochs, lr=0.0003, batch=48 |
| Losses: InfoNCE + MSE + CV + BCE + Procrustes alignment |
| CV target: 0.2731 |
| Images: train=118,287 val=5,000 (cached as tensors) |
| ================================================================= |
| E 1/20 train: 100%|ββββββββββ| 2465/2465 [02:44<00:00, 14.97batch/s, cos=0.258, loss=2.6911, nce_acc=0.339, ordered=1] |
| E1 train: 165s loss=2.6891 nce=2.2529 mse=0.0120 bce=0.1963 nce_acc=0.340 |
| E1 val: mAP=0.151 F1=0.162 R@1=0.032 cos=0.325 cv=0.2663 anchors=95/256 seen=5000/5000 β
|
| E 2/20 train: 100%|ββββββββββ| 2465/2465 [02:40<00:00, 15.32batch/s, cos=0.368, loss=1.7954, nce_acc=0.553, ordered=1] |
| E2 train: 161s loss=1.7948 nce=1.4297 mse=0.0099 bce=0.1473 nce_acc=0.553 |
| E2 val: mAP=0.206 F1=0.197 R@1=0.062 cos=0.390 cv=0.2552 anchors=99/256 seen=5000/5000 β
|
| E 3/20 train: 100%|ββββββββββ| 2465/2465 [02:40<00:00, 15.37batch/s, cos=0.416, loss=1.4860, nce_acc=0.641, ordered=1] |
| E3 train: 160s loss=1.4854 nce=1.1484 mse=0.0092 bce=0.1338 nce_acc=0.641 |
| E3 val: mAP=0.246 F1=0.244 R@1=0.091 cos=0.427 cv=0.2234 anchors=98/256 seen=5000/5000 β
|
| E 4/20 train: 100%|ββββββββββ| 2465/2465 [02:40<00:00, 15.40batch/s, cos=0.448, loss=1.2913, nce_acc=0.695, ordered=1] |
| E4 train: 160s loss=1.2910 nce=0.9727 mse=0.0087 bce=0.1265 nce_acc=0.695 |
| E4 val: mAP=0.272 F1=0.266 R@1=0.113 cos=0.453 cv=0.2078 anchors=99/256 seen=5000/5000 β
|
| E 5/20 train: 100%|ββββββββββ| 2465/2465 [02:40<00:00, 15.40batch/s, cos=0.475, loss=1.1334, nce_acc=0.743, ordered=1] |
| E5 train: 160s loss=1.1331 nce=0.8303 mse=0.0083 bce=0.1205 nce_acc=0.743 |
| E5 val: mAP=0.296 F1=0.292 R@1=0.139 cos=0.473 cv=0.2133 anchors=98/256 seen=5000/5000 β
|
| E 6/20 train: 100%|ββββββββββ| 2465/2465 [02:37<00:00, 15.63batch/s, cos=0.499, loss=1.0005, nce_acc=0.784, ordered=1] |
| E6 train: 158s loss=1.0003 nce=0.7111 mse=0.0079 bce=0.1158 nce_acc=0.784 |
| E6 val: mAP=0.317 F1=0.311 R@1=0.164 cos=0.495 cv=0.1835 anchors=98/256 seen=5000/5000 β
|
| E 7/20 train: 100%|ββββββββββ| 2465/2465 [02:38<00:00, 15.60batch/s, cos=0.520, loss=0.8947, nce_acc=0.815, ordered=1] |
| E7 train: 158s loss=0.8943 nce=0.6172 mse=0.0075 bce=0.1115 nce_acc=0.815 |
| E7 val: mAP=0.337 F1=0.335 R@1=0.190 cos=0.513 cv=0.1809 anchors=96/256 seen=5000/5000 β
|
| E 8/20 train: 100%|ββββββββββ| 2465/2465 [02:38<00:00, 15.59batch/s, cos=0.539, loss=0.8030, nce_acc=0.842, ordered=1] |
| E8 train: 158s loss=0.8028 nce=0.5365 mse=0.0072 bce=0.1076 nce_acc=0.843 |
| E8 val: mAP=0.344 F1=0.331 R@1=0.207 cos=0.523 cv=0.1779 anchors=95/256 seen=5000/5000 β
|
| E 9/20 train: 100%|ββββββββββ| 2465/2465 [02:38<00:00, 15.58batch/s, cos=0.557, loss=0.7229, nce_acc=0.866, ordered=1] |
| E9 train: 158s loss=0.7228 nce=0.4665 mse=0.0070 bce=0.1041 nce_acc=0.866 |
| E9 val: mAP=0.361 F1=0.349 R@1=0.218 cos=0.537 cv=0.1764 anchors=95/256 seen=5000/5000 β
|
| E10/20 train: 100%|ββββββββββ| 2465/2465 [02:38<00:00, 15.51batch/s, cos=0.574, loss=0.6538, nce_acc=0.887, ordered=1] |
| E10 train: 159s loss=0.6538 nce=0.4070 mse=0.0067 bce=0.1009 nce_acc=0.887 |
| E10 val: mAP=0.380 F1=0.361 R@1=0.254 cos=0.557 cv=0.1699 anchors=96/256 seen=5000/5000 β
|
| E11/20 train: 100%|ββββββββββ| 2465/2465 [02:38<00:00, 15.54batch/s, cos=0.589, loss=0.5929, nce_acc=0.905, ordered=1] |
| E11 train: 159s loss=0.5928 nce=0.3545 mse=0.0065 bce=0.0978 nce_acc=0.905 |
| E11 val: mAP=0.387 F1=0.377 R@1=0.265 cos=0.564 cv=0.1497 anchors=95/256 seen=5000/5000 β
|
| E12/20 train: 100%|ββββββββββ| 2465/2465 [02:38<00:00, 15.55batch/s, cos=0.604, loss=0.5372, nce_acc=0.920, ordered=1] |
| E12 train: 158s loss=0.5372 nce=0.3073 mse=0.0062 bce=0.0948 nce_acc=0.920 |
| E12 val: mAP=0.400 F1=0.382 R@1=0.276 cos=0.573 cv=0.1639 anchors=95/256 seen=5000/5000 β
|
| E13/20 train: 100%|ββββββββββ| 2465/2465 [02:37<00:00, 15.60batch/s, cos=0.617, loss=0.4917, nce_acc=0.933, ordered=1] |
| E13 train: 158s loss=0.4917 nce=0.2693 mse=0.0060 bce=0.0920 nce_acc=0.933 |
| E13 val: mAP=0.408 F1=0.392 R@1=0.291 cos=0.582 cv=0.1615 anchors=95/256 seen=5000/5000 β
|
| E14/20 train: 100%|ββββββββββ| 2465/2465 [02:37<00:00, 15.61batch/s, cos=0.629, loss=0.4502, nce_acc=0.945, ordered=1] |
| E14 train: 158s loss=0.4501 nce=0.2347 mse=0.0058 bce=0.0895 nce_acc=0.945 |
| E14 val: mAP=0.413 F1=0.403 R@1=0.304 cos=0.586 cv=0.1594 anchors=95/256 seen=5000/5000 β
|
| E15/20 train: 100%|ββββββββββ| 2465/2465 [02:37<00:00, 15.63batch/s, cos=0.640, loss=0.4169, nce_acc=0.954, ordered=1] |
| E15 train: 158s loss=0.4168 nce=0.2075 mse=0.0057 bce=0.0873 nce_acc=0.954 |
| E15 val: mAP=0.418 F1=0.403 R@1=0.307 cos=0.591 cv=0.1607 anchors=94/256 seen=5000/5000 β
|
| E16/20 train: 100%|ββββββββββ| 2465/2465 [02:37<00:00, 15.62batch/s, cos=0.649, loss=0.3909, nce_acc=0.961, ordered=1] |
| E16 train: 158s loss=0.3908 nce=0.1866 mse=0.0055 bce=0.0854 nce_acc=0.961 |
| E16 val: mAP=0.422 F1=0.411 R@1=0.321 cos=0.595 cv=0.1495 anchors=95/256 seen=5000/5000 β
|
| E17/20 train: 100%|ββββββββββ| 2465/2465 [02:37<00:00, 15.61batch/s, cos=0.656, loss=0.3717, nce_acc=0.966, ordered=1] |
| E17 train: 158s loss=0.3716 nce=0.1715 mse=0.0054 bce=0.0838 nce_acc=0.966 |
| E17 val: mAP=0.426 F1=0.417 R@1=0.321 cos=0.597 cv=0.1420 anchors=94/256 seen=5000/5000 β
|
| E18/20 train: 100%|ββββββββββ| 2465/2465 [02:39<00:00, 15.43batch/s, cos=0.661, loss=0.3579, nce_acc=0.969, ordered=1] |
| E18 train: 160s loss=0.3579 nce=0.1607 mse=0.0053 bce=0.0826 nce_acc=0.969 |
| E18 val: mAP=0.429 F1=0.416 R@1=0.325 cos=0.599 cv=0.1375 anchors=94/256 seen=5000/5000 β
|
| E19/20 train: 100%|ββββββββββ| 2465/2465 [02:38<00:00, 15.59batch/s, cos=0.664, loss=0.3494, nce_acc=0.971, ordered=1] |
| E19 train: 158s loss=0.3494 nce=0.1539 mse=0.0053 bce=0.0820 nce_acc=0.971 |
| E19 val: mAP=0.429 F1=0.420 R@1=0.325 cos=0.600 cv=0.1426 anchors=94/256 seen=5000/5000 β
|
| E20/20 train: 100%|ββββββββββ| 2465/2465 [02:36<00:00, 15.77batch/s, cos=0.665, loss=0.3456, nce_acc=0.972, ordered=1] |
| E20 train: 156s loss=0.3455 nce=0.1510 mse=0.0052 bce=0.0816 nce_acc=0.972 |
| E20 val: mAP=0.429 F1=0.418 R@1=0.323 cos=0.599 cv=0.1570 anchors=94/256 seen=5000/5000 |
|
|
| Best mAP: 0.429 |
| Encoder: 11,216,768 params (from scratch) |
| Checkpoints saved every epoch in checkpoints/ |
| Tensorboard: runs/geolip_vit_encoder |
|
|
| ================================================================= |
| DONE |
| ================================================================= |