Spaces:
Paused
Paused
| license: mit | |
| tags: | |
| - amop-optimized | |
| - gguf | |
| # AMOP-Optimized GGUF Model: {repo_name} | |
| This model was automatically optimized for CPU inference using the **Adaptive Model Optimization Pipeline (AMOP)**. | |
| - **Base Model:** [{model_id}](https://huggingface.co/{model_id}) | |
| - **Optimization Date:** {optimization_date} | |
| ## Optimization Details | |
| The following AMOP GGUF pipeline stages were applied: | |
| - **GGUF Conversion & Quantization:** Enabled (Strategy: {quant_type}) | |
| ## How to Use | |
| This model is in GGUF format and can be run with libraries like `llama-cpp-python`. | |
| First, install the necessary libraries: | |
| ```bash | |
| pip install llama-cpp-python | |
| ``` | |
| Then, use the following Python code to run inference: | |
| ```python | |
| from llama_cpp import Llama | |
| from huggingface_hub import hf_hub_download | |
| # Download the GGUF model from the Hub | |
| model_path = hf_hub_download( | |
| repo_id="{repo_id}", | |
| filename="model.gguf" # Or the specific GGUF file name | |
| ) | |
| # Instantiate the model | |
| llm = Llama( | |
| model_path=model_path, | |
| n_ctx=2048, # Context window | |
| ) | |
| # Run inference | |
| prompt = "The future of AI is" | |
| output = llm( | |
| f"Q: {prompt} A: ", # Or your preferred prompt format | |
| max_tokens=50, | |
| stop=["Q:", "\n"], | |
| echo=True | |
| ) | |
| print(output) | |
| ``` | |
| ## AMOP Pipeline Log | |
| <details> | |
| <summary>Click to expand</summary> | |
| ``` | |
| {pipeline_log} | |
| ``` | |
| </details> |