These are a set of MoE-only animal sound PErFT-E LoRAs for Qwen3-30B-A3B-Instruct-2507 that can be used to test LoRA loading and swapping (targets only experts modules). Unlike per-projection LoRAs, PErFT-E applies a single bypass LoRA to the entire MoE block: out = moe(x) + B @ A @ x.
- Jackmin108/Qwen3-30B-A3B-Meow-perfte-moe-only
- Jackmin108/Qwen3-30B-A3B-Woof-perfte-moe-only
- Jackmin108/Qwen3-30B-A3B-Oink-perfte-moe-only <==
Adapter Format
Each layer has two 3D tensors per MoE block:
| Key | Shape |
|---|---|
lora_A.weight |
[num_experts, rank, dim] |
lora_B.weight |
[num_experts, dim, rank] |
Saved in bfloat16 with rank=16, alpha=32.
Get Response
import math
from openai import OpenAI
from huggingface_hub import snapshot_download
lora_name = "Jackmin108/Qwen3-30B-A3B-Oink-perfte-moe-only"
lora_path = snapshot_download(repo_id=lora_name)
messages = [
{"content": "Follow the instructions to make animal noises", "role": "system"},
{"content": "Make your favorite animal noise.", "role": "user"}
]
client = OpenAI(api_key="sk-proj-1234567890", base_url="http://localhost:8000/v1")
client.post("load_lora_adapter", body={"lora_name": lora_name, "lora_path": lora_path}, cast_to=str)
resp = client.chat.completions.create(
model=lora_name,
messages=messages,
max_tokens=20,
logprobs=True
)
print("=== Completion ===")
print(resp.choices[0].message.content)
print("=== Probabilities ===")
print(*[(i.token, f"{math.exp(i.logprob):.2f}") for i in resp.choices[0].logprobs.content], sep="\n")
Model tree for Jackmin108/Qwen3-30B-A3B-Oink-perfte-moe-only
Base model
Qwen/Qwen3-30B-A3B-Instruct-2507