Papers that exist
updated
Latent Zoning Network: A Unified Principle for Generative Modeling,
Representation Learning, and Classification
Paper
• 2509.15591
• Published
• 45
A Survey on Latent Reasoning
Paper
• 2507.06203
• Published
• 93
Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost
Paper
• 2602.03120
• Published
• 1
TADA! Tuning Audio Diffusion Models through Activation Steering
Paper
• 2602.11910
• Published
• 2
CoPE-VideoLM: Codec Primitives For Efficient Video Language Models
Paper
• 2602.13191
• Published
• 29
GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics
Paper
• 2602.12617
• Published
• 20
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
Paper
• 2602.08683
• Published
• 49
Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions
Paper
• 2602.13013
• Published
• 10
Language Models are Hidden Reasoners: Unlocking Latent Reasoning
Capabilities via Self-Rewarding
Paper
• 2411.04282
• Published
• 37
Paper
• 2505.14513
• Published
• 29
LoRA-Drop: Temporal LoRA Decoding for Efficient LLM Inference
Paper
• 2601.02569
• Published
LLMs + Persona-Plug = Personalized LLMs
Paper
• 2409.11901
• Published
• 35
Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large
Language Model
Paper
• 2411.04496
• Published
• 22
FoNE: Precise Single-Token Number Embeddings via Fourier Features
Paper
• 2502.09741
• Published
• 15
FLEXITOKENS: Flexible Tokenization for Evolving Language Models
Paper
• 2507.12720
• Published
• 10
Distilling Token-Trained Models into Byte-Level Models
Paper
• 2602.01007
• Published
Multiscale Byte Language Models -- A Hierarchical Architecture for
Causal Million-Length Sequence Modeling
Paper
• 2502.14553
• Published
• 1
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
• 2412.09871
• Published
• 108
Delta Attention: Fast and Accurate Sparse Attention Inference by Delta
Correction
Paper
• 2505.11254
• Published
• 48
Less is More: Recursive Reasoning with Tiny Networks
Paper
• 2510.04871
• Published
• 509
OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable
Personal Question Answering
Paper
• 2409.08250
• Published
• 1
LightMem: Lightweight and Efficient Memory-Augmented Generation
Paper
• 2510.18866
• Published
• 114
The End of Manual Decoding: Towards Truly End-to-End Language Models
Paper
• 2510.26697
• Published
• 117
Kimi Linear: An Expressive, Efficient Attention Architecture
Paper
• 2510.26692
• Published
• 127
Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models
Paper
• 2602.10224
• Published
• 19
ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation
Paper
• 2601.21912
• Published
• 1
xRAG: Extreme Context Compression for Retrieval-augmented Generation
with One Token
Paper
• 2405.13792
• Published
• 1
ReplaceMe: Network Simplification via Layer Pruning and Linear
Transformations
Paper
• 2505.02819
• Published
• 26
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and
Mixture-of-Experts Optimization Alignment
Paper
• 2502.16894
• Published
• 32
DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing
Paper
• 2602.12205
• Published
• 79
MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling
Paper
• 2602.11761
• Published
• 7
CoMeT: Collaborative Memory Transformer for Efficient Long Context Modeling
Paper
• 2602.01766
• Published
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers
Paper
• 2601.17367
• Published
• 34
MemFly: On-the-Fly Memory Optimization via Information Bottleneck
Paper
• 2602.07885
• Published
• 7
Paper
• 2602.11298
• Published
• 16
UMEM: Unified Memory Extraction and Management Framework for Generalizable Memory
Paper
• 2602.10652
• Published
• 3
Weight Decay Improves Language Model Plasticity
Paper
• 2602.11137
• Published
• 2
Latent Thoughts Tuning: Bridging Context and Reasoning with Fused Information in Latent Tokens
Paper
• 2602.10229
• Published
• 5
Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models
Paper
• 2602.09713
• Published
• 8
Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models
Paper
• 2602.07106
• Published
• 11
TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions
Paper
• 2602.08711
• Published
• 28
How Do Decoder-Only LLMs Perceive Users? Rethinking Attention Masking for User Representation Learning
Paper
• 2602.10622
• Published
• 27
Paper
• 2410.05258
• Published
• 180
Möbius Transform for Mitigating Perspective Distortions in
Representation Learning
Paper
• 2405.02296
• Published
• 4
AToken: A Unified Tokenizer for Vision
Paper
• 2509.14476
• Published
• 36
Flash-VStream: Memory-Based Real-Time Understanding for Long Video
Streams
Paper
• 2406.08085
• Published
• 17
Badllama 3: removing safety finetuning from Llama 3 in minutes
Paper
• 2407.01376
• Published
RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained
Language Model for Knowledge Editing and Fine-tuning
Paper
• 2406.10777
• Published
• 2
OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models
Paper
• 2406.01775
• Published
• 3
RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented
Instructions
Paper
• 2501.00353
• Published