--- license: mit tags: - amop-optimized - onnx --- # AMOP-Optimized ONNX Model: {repo_name} This model was automatically optimized for CPU inference using the **Adaptive Model Optimization Pipeline (AMOP)**. - **Base Model:** [{model_id}](https://huggingface.co/{model_id}) - **Optimization Date:** {optimization_date} ## Optimization Details The following AMOP ONNX pipeline stages were applied: - **Pruning:** {pruning_status} (Percentage: {pruning_percent}%) - **Quantization & ONNX Conversion:** Enabled ({quant_type} Quantization) ## How to Use This model is in ONNX format and can be run with `optimum-onnxruntime`. Make sure you have `optimum`, `onnxruntime`, and `transformers` installed. ```python from optimum.onnxruntime import ORTModelForCausalLM from transformers import AutoTokenizer model_id = "{repo_id}" model = ORTModelForCausalLM.from_pretrained(model_id) tokenizer = AutoTokenizer.from_pretrained(model_id) prompt = "The future of AI is" inputs = tokenizer(prompt, return_tensors="pt") gen_tokens = model.generate(**inputs) print(tokenizer.batch_decode(gen_tokens)) ``` ## AMOP Pipeline Log
Click to expand ``` {pipeline_log} ```