VideoMAE-v2 Fine-tuned on FineBio

Model Description

This model is a fine-tuned version of VideoMAE-v2 vit_base_patch16_224 for video classification on the FineBio.

Model Architecture

  • Base Model: VideoMAE-v2 (vit_base_patch16_224)
  • Fine-tuned on: FineBio
  • Number of Classes: 32
  • Input: 16 frames per video clip
  • Resolution: 224x224 pixels

Training Details

Training Configuration

  • Optimizer: AdamW
  • Learning Rate: 0.001
  • Batch Size: 8 (effective: 24)
  • Epochs: 50
  • Weight Decay: 0.05
  • Layer Decay: 0.75
  • Drop Path Rate: 0.1
  • Warmup Epochs: 10

Training Results

  • Best Validation Accuracy: N/A%
  • Final Epoch: best

Dataset

Custom video dataset for laboratory action recognition

  • Number of Classes: 32
  • Training Set Size: N/A videos
  • Validation Set Size: N/A videos
  • Test Set Size: N/A videos

Usage

from transformers import VideoMAEImageProcessor, VideoMAEForVideoClassification
import torch
import numpy as np

# Load model and processor
processor = VideoMAEImageProcessor.from_pretrained("AnnaelleMyriam/videomaev2-finetuned-finebio")
model = VideoMAEForVideoClassification.from_pretrained("AnnaelleMyriam/videomaev2-finetuned-finebio")

# Prepare video frames (list of PIL images or numpy arrays)
# Shape: (num_frames, height, width, channels)
video = np.random.randn(16, 224, 224, 3)

# Process inputs
inputs = processor(list(video), return_tensors="pt")

# Forward pass
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

# Get predictions
predicted_class_idx = logits.argmax(-1).item()
print(f"Predicted class: {predicted_class_idx}")

Citation

If you use this model, please cite:

@article{wang2023videomaev2,
  title={VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking},
  author={Wang, Limin and Huang, Bingkun and Zhao, Zhiyu and Tong, Zhan and He, Yinan and Wang, Yi and Wang, Yali and Qiao, Yu},
  journal={arXiv preprint arXiv:2303.16727},
  year={2023}
}

Contact

For questions or issues, please open an issue in the model repository.

Downloads last month
1
Safetensors
Model size
86.3M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for AnnaelleMyriam/videomaev2-finetuned-finebio