Video-Text-to-Text
Transformers
Safetensors
English
moss_vl
feature-extraction
SFT
Video-Understanding
Image-Understanding
MOSS-VL
OpenMOSS
multimodal
video
vision-language
custom_code
Instructions to use OpenMOSS-Team/MOSS-VL-Instruct-0408 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OpenMOSS-Team/MOSS-VL-Instruct-0408 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("OpenMOSS-Team/MOSS-VL-Instruct-0408", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
fix _tied_weights_keys format
Browse files- modeling_moss_vl.py +5 -1
modeling_moss_vl.py
CHANGED
|
@@ -2152,7 +2152,11 @@ class MossVLModel(MossVLPreTrainedModel):
|
|
| 2152 |
"""
|
| 2153 |
)
|
| 2154 |
class MossVLForConditionalGeneration(MossVLPreTrainedModel, GenerationMixin):
|
| 2155 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2156 |
config: MossVLConfig
|
| 2157 |
_checkpoint_conversion_mapping = {}
|
| 2158 |
accepts_loss_kwargs = False
|
|
|
|
| 2152 |
"""
|
| 2153 |
)
|
| 2154 |
class MossVLForConditionalGeneration(MossVLPreTrainedModel, GenerationMixin):
|
| 2155 |
+
# transformers 5.x expects a dict[target, source]; MossVL does not tie
|
| 2156 |
+
# lm_head to the embeddings (config.tie_word_embeddings is False), so the
|
| 2157 |
+
# mapping is empty. The legacy list format ["lm_head.weight"] breaks
|
| 2158 |
+
# save_pretrained in transformers>=5.
|
| 2159 |
+
_tied_weights_keys: dict[str, str] = {}
|
| 2160 |
config: MossVLConfig
|
| 2161 |
_checkpoint_conversion_mapping = {}
|
| 2162 |
accepts_loss_kwargs = False
|