Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware
Paper
•
2304.13705
•
Published
•
6
This is a trained MODIFIED Action Chunking with Transformers (ACT) model for the MetaWorld MT-1 shelf-place-v3 task.
Modified ACT uses images in both encoder and decoder (visual conditioning).
# Clone repo and install
git clone https://huggingface.co/aryannzzz/act-metaworld-shelf-modified
pip install torch torchvision
import torch
from pathlib import Path
# Load checkpoint
device = 'cuda' if torch.cuda.is_available() else 'cpu'
checkpoint = torch.load('model_modified.pt', map_location=device)
# Model config is in checkpoint['config']
model_config = checkpoint['config']
print("Model configuration:", model_config)
# The checkpoint contains:
# - model_state_dict: Model weights
# - config: Model architecture config
# - training_config: Training hyperparameters
{
"dataset": {
"batch_size": 8,
"num_workers": 2,
"val_split": 0.2
},
"model": {
"joint_dim": 39,
"action_dim": 4,
"hidden_dim": 256,
"latent_dim": 32,
"n_encoder_layers": 4,
"n_decoder_layers": 4,
"n_heads": 8,
"feedforward_dim": 1024,
"dropout": 0.1
},
"chunking": {
"chunk_size": 50,
"temporal_ensemble_weight": 0.01
},
"training": {
"epochs": 50,
"learning_rate": 0.0001,
"weight_decay": 0.0001,
"kl_weight": 10.0,
"grad_clip": 1.0
},
"env": {
"task": "shelf-place-v3",
"image_size": [
480,
480
],
"action_space": 4,
"state_space": 39
},
"logging": {
"use_wandb": false,
"log_every": 10,
"save_every": 10
}
}
If you use this model, please cite:
@article{zhao2023learning,
title={Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware},
author={Zhao, Tony Z and Kumar, Vikash and Levine, Sergey and Finn, Chelsea},
journal={arXiv preprint arXiv:2304.13705},
year={2023}
}
Apache License 2.0
Uploaded: 2025-12-11 22:12:29
Variant: modified
Repository: https://huggingface.co/aryannzzz/act-metaworld-shelf-modified