--- tags: - autotrain - text-generation-inference - text-generation - peft library_name: transformers base_model: abhishek/llama-2-7b-hf-small-shards widget: - messages: - role: user content: What is your favorite condiment? license: other --- # Model Trained Using AutoTrain This model was trained using AutoTrain. For more information, please visit [AutoTrain](https://hf.co/docs/autotrain). # Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig from peft import PeftModel, PeftConfig import torch import bitsandbytes as bnb import time model_name = "Punthon/llama2-sdgs" # Load the PEFT configuration peft_config = PeftConfig.from_pretrained(model_name) # Load the tokenizer from the base model # Use the tokenizer associated with the base model or your fine-tuned model if needed tokenizer = AutoTokenizer.from_pretrained(peft_config.base_model_name_or_path) # Load the base model with 8-bit precision base_model = AutoModelForCausalLM.from_pretrained( peft_config.base_model_name_or_path, load_in_8bit=True, # Load in 8-bit precision device_map="auto" ) # Resize the base model embeddings to match the tokenizer base_model.resize_token_embeddings(len(tokenizer)) # Load your fine-tuned model model = PeftModel.from_pretrained(base_model, model_name) # Define the instruction and input text instruction = "Identify the Sustainable Development Goals (SDGs) relevant to the passage below. Provide only the SDG numbers and the reason for their relevance. Do not repeat the passage." input_text = "Thailand is considered a leader in tiger conservation in Southeast Asia. Most recently at the 'Sustainable Finance for Tiger Landscapes Conservation' conference in Bhutan, Thailand has been declared as the “Champion for Tiger Conservation in Southeast Asia.”" prompt = f""" Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: {instruction} ### Input: {input_text} ### Response: """ # Define generation configuration generation_config = GenerationConfig( do_sample=True, top_k=30, temperature=0.7, max_new_tokens=200, repetition_penalty=1.1, pad_token_id=tokenizer.eos_token_id ) # Tokenize input inputs = tokenizer(prompt, return_tensors="pt").to("cuda") # Generate outputs st_time = time.time() outputs = model.generate(**inputs, generation_config=generation_config) # Decode and print response response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(f"Response time: {time.time() - st_time} seconds") print(response) ```