VideoMAE-v2 Fine-tuned on FineBio

Model Description

This model is a fine-tuned version of VideoMAE-v2 vit_base_patch16_224 for video classification on the FineBio.

Model Architecture

Base Model: VideoMAE-v2 (vit_base_patch16_224)
Fine-tuned on: FineBio
Number of Classes: 32
Input: 16 frames per video clip
Resolution: 224x224 pixels

Training Details

Training Configuration

Optimizer: AdamW
Learning Rate: 0.001
Batch Size: 8 (effective: 24)
Epochs: 50
Weight Decay: 0.05
Layer Decay: 0.75
Drop Path Rate: 0.1
Warmup Epochs: 10

Training Results

Best Validation Accuracy: N/A%
Final Epoch: best

Dataset

Custom video dataset for laboratory action recognition

Number of Classes: 32
Training Set Size: N/A videos
Validation Set Size: N/A videos
Test Set Size: N/A videos

Usage

from transformers import VideoMAEImageProcessor, VideoMAEForVideoClassification
import torch
import numpy as np

# Load model and processor
processor = VideoMAEImageProcessor.from_pretrained("AnnaelleMyriam/videomaev2-finetuned-finebio")
model = VideoMAEForVideoClassification.from_pretrained("AnnaelleMyriam/videomaev2-finetuned-finebio")

# Prepare video frames (list of PIL images or numpy arrays)
# Shape: (num_frames, height, width, channels)
video = np.random.randn(16, 224, 224, 3)

# Process inputs
inputs = processor(list(video), return_tensors="pt")

# Forward pass
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

# Get predictions
predicted_class_idx = logits.argmax(-1).item()
print(f"Predicted class: {predicted_class_idx}")

Citation

If you use this model, please cite:

@article{wang2023videomaev2,
  title={VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking},
  author={Wang, Limin and Huang, Bingkun and Zhao, Zhiyu and Tong, Zhan and He, Yinan and Wang, Yi and Wang, Yali and Qiao, Yu},
  journal={arXiv preprint arXiv:2303.16727},
  year={2023}
}

Contact

For questions or issues, please open an issue in the model repository.

Downloads last month: 1

Safetensors

Model size

86.3M params

Tensor type

F32

Inference Providers NEW

Video Classification

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for AnnaelleMyriam/videomaev2-finetuned-finebio

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

Paper • 2303.16727 • Published Mar 29, 2023