Unable to load the model in 8 bits

#30

by ARahul2003 - opened Sep 24, 2023

ARahul2003

Sep 24, 2023

Hi all, I have been trying to use this model on a laptop without any GPU for one of my course projects. Naturally, I am required to load this model in 8-bit quantization form. However, whenever I try to load it in a quantized state, I get an error stating that the accelerate and bits-and-bytes libraries are not present. I made to sure install those libraries in my virtual environment, yet the error persists. Please help me.

Here is the code that I have written:

from transformers import AutoModelForCausalLM, AutoTokenizer
import accelerate
import bitsandbytes
import gradio as gr
import torch

title = "????AI ChatBot"
description = "Quantised version of the Phi 1.5 LLM released by Microsoft research"
examples = [["How are you?"]]

tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5", trust_remote_code=True, torch_dtype="auto")
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5", trust_remote_code=True, torch_dtype="auto", load_in_8bit = True)

def predict(input, history=[]):
# tokenize the new input sentence
new_user_input_ids = tokenizer.encode(
input + tokenizer.eos_token, return_tensors="pt"
)

# append the new user input tokens to the chat history
bot_input_ids = torch.cat([torch.LongTensor(history), new_user_input_ids], dim=-1)

# generate a response
history = model.generate(
    bot_input_ids, max_length=4000, pad_token_id=tokenizer.eos_token_id
).tolist()

# convert the tokens to text, and then split the responses into lines
response = tokenizer.decode(history[0]).split("<|endoftext|>")
# print('decoded_response-->>'+str(response))
response = [
    (response[i], response[i + 1]) for i in range(0, len(response) - 1, 2)
]  # convert to tuples of list
# print('response-->>'+str(response))
return response, history

gr.Interface(
fn=predict,
title=title,
description=description,
examples=examples,
inputs=["text", "state"],
outputs=["chatbot", "state"],
theme="finlaymacklon/boxy_violet",
).launch()

gugarosa

Microsoft org Sep 26, 2023

Hello @ARahul2003 !

Your image still show an ImportError, which could be related to an incomplete installation of either accelerate or bitsandbytes. However, please note that we haven't tested Phi-based models support with 8 bits, so I am unsure what will be its behavior.

codegood

Sep 27, 2023

Hello @ARahul2003 ,

You need to use older version of transformers.

!pip install -qU trl datasets accelerate loralib einops xformers bitsandbytes
!pip install transformers==4.30

gugarosa changed discussion status to closed Oct 30, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment