Chi-Pin Huang's picture

1 51

Chi-Pin Huang

jasper0314-huang

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 5 days ago

Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment

upvoted a paper 6 days ago

Qwen3-VL Technical Report

upvoted a paper 9 days ago

DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action

View all activity

Organizations

None yet

upvoted a paper 5 days ago

Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment

Paper • 2512.04356 • Published 7 days ago • 9

upvoted a paper 6 days ago

Qwen3-VL Technical Report

Paper • 2511.21631 • Published 14 days ago • 118

upvoted 2 papers 9 days ago

DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action

Paper • 2511.22134 • Published 13 days ago • 21

Captain Safari: A World Engine

Paper • 2511.22815 • Published 13 days ago • 9

upvoted a paper 16 days ago

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 20 days ago • 109

upvoted 3 papers about 1 month ago

Visual Spatial Tuning

Paper • 2511.05491 • Published Nov 7 • 49

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

Paper • 2511.02778 • Published Nov 4 • 101

RobotArena infty: Scalable Robot Benchmarking via Real-to-Sim Translation

Paper • 2510.23571 • Published Oct 27 • 8

upvoted 8 papers about 2 months ago

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Paper • 2510.20579 • Published Oct 23 • 55

Unified Reinforcement and Imitation Learning for Vision-Language Models

Paper • 2510.19307 • Published Oct 22 • 29

PICABench: How Far Are We from Physically Realistic Image Editing?

Paper • 2510.17681 • Published Oct 20 • 62

LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal

Paper • 2510.15868 • Published Oct 17 • 25

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published Oct 17 • 89

Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset

Paper • 2510.15742 • Published Oct 17 • 50

DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning

Paper • 2510.15110 • Published Oct 16 • 15

X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model

Paper • 2510.10274 • Published Oct 11 • 14

upvoted 4 papers 2 months ago

SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models

Paper • 2510.06917 • Published Oct 8 • 34

Fast-dLLM v2: Efficient Block-Diffusion LLM

Paper • 2509.26328 • Published Sep 30 • 54

Rethinking the shape convention of an MLP

Paper • 2510.01796 • Published Oct 2 • 4

LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26 • 184