|
@@ -1,12 +1,12 @@
|
|
|
## 🦙🌲🤏 Alpaca-LoRA: Low-Rank LLaMA Instruct-Tuning
|
|
## 🦙🌲🤏 Alpaca-LoRA: Low-Rank LLaMA Instruct-Tuning
|
|
|
|
|
|
|
|
-**Try the pretrained model out on Colab [here](https://colab.research.google.com/drive/1eWAmesrW99p7e1nah5bipn0zikMb8XYC)!**
|
|
|
|
|
-
|
|
|
|
|
-_**Update 2023-03-19:** weights have been updated with cleaned data and prompts masked out in the loss. This should reduce the number of template artifacts in outputs._
|
|
|
|
|
|
|
+- Try the pretrained model out on Colab [here](https://colab.research.google.com/drive/1eWAmesrW99p7e1nah5bipn0zikMb8XYC)
|
|
|
|
|
+- Share custom LoRA adapters, including adapters for the larger models, [here](https://github.com/tloen/alpaca-lora/issues/52)
|
|
|
|
|
+- Users have created a Discord server for discussion and support [here](https://discord.gg/prbq284xX5)
|
|
|
|
|
|
|
|
This repository contains code for reproducing the [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) results using [low-rank adaptation (LoRA)](https://arxiv.org/pdf/2106.09685.pdf).
|
|
This repository contains code for reproducing the [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) results using [low-rank adaptation (LoRA)](https://arxiv.org/pdf/2106.09685.pdf).
|
|
|
We provide an Instruct model of similar quality to `text-davinci-003` that can run [on a Raspberry Pi](https://twitter.com/miolini/status/1634982361757790209) (for research),
|
|
We provide an Instruct model of similar quality to `text-davinci-003` that can run [on a Raspberry Pi](https://twitter.com/miolini/status/1634982361757790209) (for research),
|
|
|
-and the code can be easily extended to the `13b`, `30b`, and `65b` models.
|
|
|
|
|
|
|
+and the code is easily extended to the `13b`, `30b`, and `65b` models.
|
|
|
|
|
|
|
|
In addition to the training code, which runs within five hours on a single RTX 4090,
|
|
In addition to the training code, which runs within five hours on a single RTX 4090,
|
|
|
we publish a script for downloading and inference on the foundation model and LoRA,
|
|
we publish a script for downloading and inference on the foundation model and LoRA,
|
|
@@ -14,9 +14,7 @@ as well as the resulting [LoRA weights themselves](https://huggingface.co/tloen/
|
|
|
To fine-tune cheaply and efficiently, we use Hugging Face's [PEFT](https://github.com/huggingface/peft)
|
|
To fine-tune cheaply and efficiently, we use Hugging Face's [PEFT](https://github.com/huggingface/peft)
|
|
|
as well as Tim Dettmers' [bitsandbytes](https://github.com/TimDettmers/bitsandbytes).
|
|
as well as Tim Dettmers' [bitsandbytes](https://github.com/TimDettmers/bitsandbytes).
|
|
|
|
|
|
|
|
-Without hyperparameter tuning or validation-based checkpointing, the LoRA model produces outputs comparable to the Stanford Alpaca model. (Please see the outputs included below.) Further tuning might be able to achieve better performance; I invite interested users to give it a try and report their results.
|
|
|
|
|
-
|
|
|
|
|
-For discussion and support, users have created a dedicated Discord server [here](https://discord.gg/prbq284xX5).
|
|
|
|
|
|
|
+Without hyperparameter tuning, the LoRA model produces outputs comparable to the Stanford Alpaca model. (Please see the outputs included below.) Further tuning might be able to achieve better performance; I invite interested users to give it a try and report their results.
|
|
|
|
|
|
|
|
### Setup
|
|
### Setup
|
|
|
|
|
|