Procházet zdrojové kódy

Added Dockerfile and docker-compose.yml (#207)

* Added Dockerfile for inference

* Added instructions for Dockerfile

* Update README.md

* Update README.md

* Update README.md

* Pass env through Dockerfile

* Added docker compose setup and instructions

* Added more environment options

* Set a safer default mount point

* add docker-compose changes

* Added Dockerfile for inference

* Added instructions for Dockerfile

* Update README.md

* Update README.md

* Update README.md

* Pass env through Dockerfile

* Added docker compose setup and instructions

* Added more environment options

* Set a safer default mount point

* add to gitignore, update to new generate.py

* add docker ignore, simplify docker compose file

* add back missing requirements

* Adjustments to compose and generate.py, added Docker to README.md

* Linting adjust to Black

* Adjusting import linting

* Update README.md

* Update README.md

* Removed comment by original Dockerfile creator.

Comment not necessary.

* cleanup README

Co-authored-by: Francesco Saverio Zuppichini <[email protected]>

---------

Co-authored-by: Francesco Saverio Zuppichini <[email protected]>
Co-authored-by: Chris Alexiuk <[email protected]>
Co-authored-by: ElRoberto538 <>
Co-authored-by: Sam Sipe <[email protected]>
Co-authored-by: Eric J. Wang <[email protected]>
Chris Alexiuk před 3 roky
rodič
revize
4367a43fcb
7 změnil soubory, kde provedl 104 přidání a 23 odebrání
  1. 4 0
      .dockerignore
  2. 2 1
      .gitignore
  3. 18 0
      Dockerfile
  4. 47 21
      README.md
  5. 28 0
      docker-compose.yml
  6. 3 1
      generate.py
  7. 2 0
      requirements.txt

+ 4 - 0
.dockerignore

@@ -0,0 +1,4 @@
+.venv
+.github
+.vscode
+.docker-compose.yml

+ 2 - 1
.gitignore

@@ -11,4 +11,5 @@ wandb
 evaluate.py
 evaluate.py
 test_data.json
 test_data.json
 todo.txt
 todo.txt
-.vscode/
+.venv
+.vscode

+ 18 - 0
Dockerfile

@@ -0,0 +1,18 @@
+FROM nvidia/cuda:11.8.0-devel-ubuntu22.04
+
+ARG DEBIAN_FRONTEND=noninteractive
+
+RUN apt-get update && apt-get install -y \
+    git \
+    curl \
+    software-properties-common \
+    && add-apt-repository ppa:deadsnakes/ppa \
+    && apt install -y python3.10 \
+    && rm -rf /var/lib/apt/lists/*
+WORKDIR /workspace
+COPY requirements.txt requirements.txt
+RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10 \
+    && python3.10 -m pip install -r requirements.txt \
+    && python3.10 -m pip install numpy --pre torch --force-reinstall --index-url https://download.pytorch.org/whl/nightly/cu118
+COPY . .
+ENTRYPOINT [ "python3.10"]

+ 47 - 21
README.md

@@ -15,25 +15,13 @@ as well as Tim Dettmers' [bitsandbytes](https://github.com/TimDettmers/bitsandby
 
 
 Without hyperparameter tuning, the LoRA model produces outputs comparable to the Stanford Alpaca model. (Please see the outputs included below.) Further tuning might be able to achieve better performance; I invite interested users to give it a try and report their results.
 Without hyperparameter tuning, the LoRA model produces outputs comparable to the Stanford Alpaca model. (Please see the outputs included below.) Further tuning might be able to achieve better performance; I invite interested users to give it a try and report their results.
 
 
-## Setup
+### Local Setup
 
 
 1. Install dependencies
 1. Install dependencies
 
 
-    ```bash
-    pip install -r requirements.txt
-    ```
-
-1. Set environment variables, or modify the files referencing `BASE_MODEL`:
-
-    ```bash
-    # Files referencing `BASE_MODEL`
-    # export_hf_checkpoint.py
-    # export_state_dict_checkpoint.py
-
-    export BASE_MODEL=decapoda-research/llama-7b-hf
-    ```
-
-    Both `finetune.py` and `generate.py` use `--base_model` flag as shown further below.
+   ```bash
+   pip install -r requirements.txt
+   ```
 
 
 1. If bitsandbytes doesn't work, [install it from source.](https://github.com/TimDettmers/bitsandbytes/blob/main/compile_from_source.md) Windows users can follow [these instructions](https://github.com/tloen/alpaca-lora/issues/17).
 1. If bitsandbytes doesn't work, [install it from source.](https://github.com/TimDettmers/bitsandbytes/blob/main/compile_from_source.md) Windows users can follow [these instructions](https://github.com/tloen/alpaca-lora/issues/17).
 
 
@@ -94,6 +82,49 @@ They should help users
 who want to run inference in projects like [llama.cpp](https://github.com/ggerganov/llama.cpp)
 who want to run inference in projects like [llama.cpp](https://github.com/ggerganov/llama.cpp)
 or [alpaca.cpp](https://github.com/antimatter15/alpaca.cpp).
 or [alpaca.cpp](https://github.com/antimatter15/alpaca.cpp).
 
 
+### Docker Setup & Inference
+
+1. Build the container image:
+
+```bash
+docker build -t alpaca-lora .
+```
+
+2. Run the container (you can also use `finetune.py` and all of its parameters as shown above for training):
+
+```bash
+docker run --gpus=all --shm-size 64g -p 7860:7860 -v ${HOME}/.cache:/root/.cache --rm alpaca-lora generate.py \
+    --load_8bit \
+    --base_model 'decapoda-research/llama-7b-hf' \
+    --lora_weights 'tloen/alpaca-lora-7b'
+```
+
+3. Open `https://localhost:7860` in the browser
+
+### Docker Compose Setup & Inference
+
+1. (optional) Change desired model and weights under `environment` in the `docker-compose.yml`
+
+2. Build and run the container
+
+```bash
+docker-compose up -d --build
+```
+
+3. Open `https://localhost:7860` in the browser
+
+4. See logs:
+
+```bash
+docker-compose logs -f
+```
+
+5. Clean everything up:
+
+```bash
+docker-compose down --volumes --rmi all
+```
+
 ### Notes
 ### Notes
 
 
 - We can likely improve our model performance significantly if we had a better dataset. Consider supporting the [LAION Open Assistant](https://open-assistant.io/) effort to produce a high-quality dataset for supervised fine-tuning (or bugging them to release their data).
 - We can likely improve our model performance significantly if we had a better dataset. Consider supporting the [LAION Open Assistant](https://open-assistant.io/) effort to produce a high-quality dataset for supervised fine-tuning (or bugging them to release their data).
@@ -110,9 +141,7 @@ or [alpaca.cpp](https://github.com/antimatter15/alpaca.cpp).
   - 7B:
   - 7B:
     - <https://huggingface.co/tloen/alpaca-lora-7b>
     - <https://huggingface.co/tloen/alpaca-lora-7b>
     - <https://huggingface.co/samwit/alpaca7B-lora>
     - <https://huggingface.co/samwit/alpaca7B-lora>
-    - 🤖 <https://huggingface.co/nomic-ai/gpt4all-lora>
     - 🇧🇷 <https://huggingface.co/22h/cabrita-lora-v0-1>
     - 🇧🇷 <https://huggingface.co/22h/cabrita-lora-v0-1>
-    - 🇨🇳 <https://huggingface.co/ziqingyang/chinese-alpaca-lora-7b>
     - 🇨🇳 <https://huggingface.co/qychen/luotuo-lora-7b-0.1>
     - 🇨🇳 <https://huggingface.co/qychen/luotuo-lora-7b-0.1>
     - 🇯🇵 <https://huggingface.co/kunishou/Japanese-Alapaca-LoRA-7b-v0>
     - 🇯🇵 <https://huggingface.co/kunishou/Japanese-Alapaca-LoRA-7b-v0>
     - 🇫🇷 <https://huggingface.co/bofenghuang/vigogne-lora-7b>
     - 🇫🇷 <https://huggingface.co/bofenghuang/vigogne-lora-7b>
@@ -131,9 +160,6 @@ or [alpaca.cpp](https://github.com/antimatter15/alpaca.cpp).
     - <https://huggingface.co/baseten/alpaca-30b>
     - <https://huggingface.co/baseten/alpaca-30b>
     - <https://huggingface.co/chansung/alpaca-lora-30b>
     - <https://huggingface.co/chansung/alpaca-lora-30b>
     - 🇯🇵 <https://huggingface.co/kunishou/Japanese-Alapaca-LoRA-30b-v0>
     - 🇯🇵 <https://huggingface.co/kunishou/Japanese-Alapaca-LoRA-30b-v0>
-    - 🇰🇷 <https://huggingface.co/beomi/KoAlpaca-30B-LoRA>
-  - 65B:
-    - 🇰🇷 <https://huggingface.co/beomi/KoAlpaca-65B-LoRA>
 - [alpaca-native](https://huggingface.co/chavinlo/alpaca-native), a replication using the original Alpaca code
 - [alpaca-native](https://huggingface.co/chavinlo/alpaca-native), a replication using the original Alpaca code
 
 
 ### Example outputs
 ### Example outputs

+ 28 - 0
docker-compose.yml

@@ -0,0 +1,28 @@
+version: '3'
+
+services:
+  alpaca-lora:
+    build:
+      context: ./
+      dockerfile: Dockerfile
+      args:
+        BUILDKIT_INLINE_CACHE: "0"
+    image: alpaca-lora
+    shm_size: '64gb'
+    command: generate.py --load_8bit --base_model $BASE_MODEL --lora_weights 'tloen/alpaca-lora-7b'
+    restart: unless-stopped
+    volumes:
+      - alpaca-lora:/root/.cache # Location downloaded weights will be stored
+    ports:
+      - 7860:7860
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: all
+              capabilities: [ gpu ]
+
+volumes:
+  alpaca-lora:
+    name: alpaca-lora

+ 3 - 1
generate.py

@@ -1,3 +1,4 @@
+import os
 import sys
 import sys
 
 
 import fire
 import fire
@@ -29,6 +30,7 @@ def main(
     server_name: str = "127.0.0.1",  # Allows to listen on all interfaces by providing '0.0.0.0'
     server_name: str = "127.0.0.1",  # Allows to listen on all interfaces by providing '0.0.0.0'
     share_gradio: bool = False,
     share_gradio: bool = False,
 ):
 ):
+    base_model = base_model or os.environ.get("BASE_MODEL", "")
     assert (
     assert (
         base_model
         base_model
     ), "Please specify a --base_model, e.g. --base_model='decapoda-research/llama-7b-hf'"
     ), "Please specify a --base_model, e.g. --base_model='decapoda-research/llama-7b-hf'"
@@ -146,7 +148,7 @@ def main(
         ],
         ],
         title="🦙🌲 Alpaca-LoRA",
         title="🦙🌲 Alpaca-LoRA",
         description="Alpaca-LoRA is a 7B-parameter LLaMA model finetuned to follow instructions. It is trained on the [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) dataset and makes use of the Huggingface LLaMA implementation. For more information, please visit [the project's website](https://github.com/tloen/alpaca-lora).",  # noqa: E501
         description="Alpaca-LoRA is a 7B-parameter LLaMA model finetuned to follow instructions. It is trained on the [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) dataset and makes use of the Huggingface LLaMA implementation. For more information, please visit [the project's website](https://github.com/tloen/alpaca-lora).",  # noqa: E501
-    ).launch(server_name=server_name, share=share_gradio)
+    ).launch(server_name="0.0.0.0", share=share_gradio)
     # Old testing code follows.
     # Old testing code follows.
 
 
     """
     """

+ 2 - 0
requirements.txt

@@ -1,5 +1,6 @@
 accelerate
 accelerate
 appdirs
 appdirs
+loralib
 bitsandbytes
 bitsandbytes
 black
 black
 black[jupyter]
 black[jupyter]
@@ -7,5 +8,6 @@ datasets
 fire
 fire
 git+https://github.com/huggingface/peft.git
 git+https://github.com/huggingface/peft.git
 git+https://github.com/huggingface/transformers.git
 git+https://github.com/huggingface/transformers.git
+sentencepiece
 gradio
 gradio
 sentencepiece
 sentencepiece