|
|
@@ -1,13 +1,15 @@
|
|
|
## 🦙🌲🤏 Alpaca-LoRA: Low-Rank LLaMA Instruct-Tuning
|
|
|
|
|
|
-**The code in this repo is not yet fully tested. I'm still in the process of retraining the model with the outputs included, and I make no guarantees about the results of running `generate.py`.**
|
|
|
-
|
|
|
-This repository contains code for reproducing the [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) results using [low-rank adaptations (LoRAs)](https://arxiv.org/pdf/2106.09685.pdf).
|
|
|
-The goal is to provide an open Instruct model of similar quality to `text-davinci-003` that can run on most consumer GPUs with 8-bit quantization.
|
|
|
-
|
|
|
-Users will need to be ready to fork Huggingface `transformers` to access Jason Phang's [LLaMA implementation](https://github.com/huggingface/transformers/pull/21955).
|
|
|
+This repository contains code for reproducing the [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) results using [low-rank adaptation (LoRA)](https://arxiv.org/pdf/2106.09685.pdf).
|
|
|
+The fine-tuning runs within five hours on a consumer GPU,
|
|
|
+and the LoRA weights are made available on the Huggingface model hub.
|
|
|
+With Huggingface's out-of-the-box 8-bit quantization,
|
|
|
+we aim to provide an Instruct model of similar quality to `text-davinci-003` that can run [on a Raspberry Pi](https://twitter.com/miolini/status/1634982361757790209). (For research.)
|
|
|
+
|
|
|
+Until Jason Phang's [LLaMA implementation](https://github.com/huggingface/transformers/pull/21955)
|
|
|
+is merged, users will need to replace their local Huggingface `transformers` as described below.
|
|
|
For fine-tuning LoRAs we use Huggingface's [PEFT](https://github.com/huggingface/peft).
|
|
|
-Included also is code to download the LLaMA foundation model from the Huggingface model hub (for research).
|
|
|
+Included also is code to download the LLaMA foundation model from the Huggingface model hub. (For research.)
|
|
|
Once I've finished running the finetuning code myself, I'll put the LoRA on the Hub as well, and the code in `generate.py` should work as expected.
|
|
|
|
|
|
### Setup
|
|
|
@@ -36,7 +38,9 @@ PRs adapting this code to multi-GPU setups and larger models are always welcome.
|
|
|
|
|
|
### To do
|
|
|
|
|
|
-- [ ] Hyperparameter tuning
|
|
|
+- [ ] Merge LoRA weights into LLaMA weights to remove inference dependency on PEFT
|
|
|
+- [ ] Train/val/test split
|
|
|
+- [ ] Hyperparameter tuning code
|
|
|
- [ ] Documentation for notebook
|
|
|
- [ ] Support for `13b`, `30b`, `65b`
|
|
|
- [ ] Train a version that doesn't waste tokens on the prompt header
|