|
@@ -7,9 +7,9 @@ We aim to provide an Instruct model of similar quality to `text-davinci-003` tha
|
|
|
but extensions to the `13b`, `30b`, and `65b` models should be feasible with simple changes to the code.
|
|
but extensions to the `13b`, `30b`, and `65b` models should be feasible with simple changes to the code.
|
|
|
|
|
|
|
|
In addition to the training code, which runs within five hours on a single RTX 4090,
|
|
In addition to the training code, which runs within five hours on a single RTX 4090,
|
|
|
-we publish the resulting LoRA weights on the Huggingface model hub,
|
|
|
|
|
-along with a script for downloading and inference on the foundation model and LoRA.
|
|
|
|
|
-For fine-tuning LoRAs we use Huggingface's [PEFT](https://github.com/huggingface/peft)
|
|
|
|
|
|
|
+we publish a script for downloading and inference on the foundation model and LoRA,
|
|
|
|
|
+as well as the resulting [LoRA weights themselves](https://huggingface.co/tloen/alpaca-lora-7b/tree/main).
|
|
|
|
|
+To fine-tune cheaply and efficiently, we use Huggingface's [PEFT](https://github.com/huggingface/peft)
|
|
|
as well as Tim Dettmers' [bitsandbytes](https://github.com/TimDettmers/bitsandbytes).
|
|
as well as Tim Dettmers' [bitsandbytes](https://github.com/TimDettmers/bitsandbytes).
|
|
|
|
|
|
|
|
Without hyperparameter tuning or validation-based checkpointing, the LoRA model produces outputs comparable to the Stanford Alpaca model, though possibly with more minor mistakes. (Please see the outputs included below.) Further tuning might be able to achieve better performance; I invite interested users to give it a try and report their results.
|
|
Without hyperparameter tuning or validation-based checkpointing, the LoRA model produces outputs comparable to the Stanford Alpaca model, though possibly with more minor mistakes. (Please see the outputs included below.) Further tuning might be able to achieve better performance; I invite interested users to give it a try and report their results.
|