--- license: apache-2.0 language: - en pipeline_tag: text-generation library_name: transformers --- # FineRMoE Paper: https://arxiv.org/pdf/2603.13364 Github: https://github.com/liaoning97/FineRMoE ## Introduction To break the performance ceiling of fine-grained MoE designs that are solely confined to the intermediate dimension, we introduce the **FineRMoE (FineR-grained MoE)** architecture. It pioneers the expansion of the fine-grained expert design in MoE models from only the intermediate dimension to the output dimension, aiming to enhance expert specialization beyond the single-dimension limit. ## Model Overview This version has the following features: - Type: Causal Language Models - Number of Parameters: 5.64B - Number of Activated Paramaters: 1.85B - Number of Experts: 128 - Number of Activated Experts: 2 - Intermediate Granularity: 32 - Intermediate Expansion Rate: 1 - Output Granularity: 2 - Output Expansion Rate: 2 ## Quickstart ### Requirements To inference using the FineRMoE, the recommended version of `transformers` is `transformers=4.50.0`. ### Inference The following contains a code snippet illustrating how to use the model generate content based on given inputs. ```python import os import json import torch import argparse from transformers import ( AutoConfig, AutoModelForCausalLM, AutoTokenizer, ) # register the FineRMoE def hf_register(hf_ckpt_path): has_py_file = False for filename in os.listdir(hf_ckpt_path): if filename.endswith(".py"): has_py_file = True modeling_filename = filename.split('.')[0] if has_py_file: print("There exists a modeling file.") config_file = os.path.join(hf_ckpt_path, 'config.json') with open(config_file, "r", encoding="utf-8") as file: config = json.load(file) model_type = config['model_type'] import sys, inspect, importlib sys.path.append(hf_ckpt_path) module = importlib.import_module(modeling_filename) for name, obj in inspect.getmembers(module, inspect.isclass): if not obj.__module__ == module.__name__: continue print(name, obj) if name.endswith("CausalLM"): print(f"Found CausalLM with name: {name}") model_module = obj elif name.endswith("Config"): print(f"Found Config with name: {name}") config_module = obj AutoConfig.register(model_type, config_module) AutoModelForCausalLM.register(config_module, model_module) @torch.inference_mode() def main(args): # load the tokenizer and the model hf_register(args.model) tokenizer = AutoTokenizer.from_pretrained(args.model, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( args.model, torch_dtype=torch.float16, attn_implementation=None, device_map="auto", trust_remote_code=True, ) device = model.device if hasattr(model, "device") else next(model.parameters()).device # prepare the model input prompt = "Please introduce the history of artificial intelligence." messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) inputs = tokenizer([text], return_tensors="pt").to(device) # conduct text completion outputs = model.generate( **inputs, max_new_tokens=32000, do_sample=False, use_cache=True, pad_token_id=tokenizer.eos_token_id, ) output_ids = outputs[0][len(inputs.input_ids[0]):].tolist() content = tokenizer.decode(output_ids, skip_special_tokens=True).strip("\n") print("content:", content) if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("--model", default="FineRMoE-5.64B-A1.85B") args = parser.parse_args() main(args) ``` ## Citation If you find our work helpful, feel free to give us a cite. ``` @misc{liao2026finermoedimensionexpansionfinergrained, title={FineRMoE: Dimension Expansion for Finer-Grained Expert with Its Upcycling Approach}, author={Ning Liao and Xiaoxing Wang and Xiaohan Qin and Junchi Yan}, year={2026}, eprint={2603.13364}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2603.13364}, } ```