Haon Park

redteamhacker

AI & ML interests

None yet

Recent Activity

upvoted a paper 3 days ago

X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates

upvoted a paper 3 days ago

ObjexMT: Objective Extraction and Metacognitive Calibration for LLM-as-a-Judge under Multi-Turn Jailbreaks

upvoted a paper 3 days ago

One-Shot is Enough: Consolidating Multi-Turn Attacks into Efficient Single-Turn Prompts for LLMs

View all activity

Organizations

upvoted 5 papers 3 days ago

X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates

Paper • 2509.08729 • Published Sep 10 • 1

ObjexMT: Objective Extraction and Metacognitive Calibration for LLM-as-a-Judge under Multi-Turn Jailbreaks

Paper • 2508.16889 • Published Aug 23 • 2

One-Shot is Enough: Consolidating Multi-Turn Attacks into Efficient Single-Turn Prompts for LLMs

Paper • 2503.04856 • Published Mar 6 • 2

ELITE: Enhanced Language-Image Toxicity Evaluation for Safety

Paper • 2502.04757 • Published Feb 7 • 2

sudo rm -rf agentic_security

Paper • 2503.20279 • Published Mar 26 • 1

authored 7 papers 3 days ago

Humanity's Last Exam

Paper • 2501.14249 • Published Jan 24 • 76

One-Shot is Enough: Consolidating Multi-Turn Attacks into Efficient Single-Turn Prompts for LLMs

Paper • 2503.04856 • Published Mar 6 • 2

ELITE: Enhanced Language-Image Toxicity Evaluation for Safety

Paper • 2502.04757 • Published Feb 7 • 2

When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs

Paper • 2508.03365 • Published Aug 5 • 4

Eliciting and Analyzing Emergent Misalignment in State-of-the-Art Large Language Models

Paper • 2508.04196 • Published Aug 6 • 1

ObjexMT: Objective Extraction and Metacognitive Calibration for LLM-as-a-Judge under Multi-Turn Jailbreaks

Paper • 2508.16889 • Published Aug 23 • 2

X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates

Paper • 2509.08729 • Published Sep 10 • 1

authored a paper 4 days ago

sudo rm -rf agentic_security

Paper • 2503.20279 • Published Mar 26 • 1

updated a collection 4 days ago

Repbend

Collection

Fine-tuned Model Collections using the "Representation Bending" (REPBEND) approach described in Representation Bending for Large Language Model Safety • 4 items • Updated 4 days ago

upvoted 3 papers 4 days ago

Better Safe Than Sorry? Overreaction Problem of Vision Language Models in Visual Emergency Recognition

Paper • 2505.15367 • Published May 21 • 2

Eliciting and Analyzing Emergent Misalignment in State-of-the-Art Large Language Models

Paper • 2508.04196 • Published Aug 6 • 1

When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs

Paper • 2508.03365 • Published Aug 5 • 4

liked 3 models 8 months ago

Haon Park

AI & ML interests

Recent Activity

Organizations

redteamhacker's activity