Update README.md
Browse files
README.md
CHANGED
|
@@ -13,7 +13,7 @@ Disclaimer: The team releasing DINOv2 did not write a model card for this model
|
|
| 13 |
|
| 14 |
## Model description
|
| 15 |
|
| 16 |
-
The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fashion
|
| 17 |
|
| 18 |
Images are presented to the model as a sequence of fixed-size patches, which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder.
|
| 19 |
|
|
|
|
| 13 |
|
| 14 |
## Model description
|
| 15 |
|
| 16 |
+
The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fashion.
|
| 17 |
|
| 18 |
Images are presented to the model as a sequence of fixed-size patches, which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder.
|
| 19 |
|