Deploying Language Models With Gradio On Hugging Face
See live deployed app and source code here
For local development, create the following Dockerfile. It differs from production Dockerfile in how secrets are loaded and the use of
CMD ["gradio", "app.py"]which runs (and reloads) source files every time a change is noticed.
docker-compose will launch the development Dockerfile using command
export HF_TOKEN=paste_HF_token && docker-compose -f docker-compose.yml up gradiohfwhere HF_TOKEN is an optional personal token provided by Hugging Face to ensure that license restrictions are being followed for certain models (such as Llama 2).
Develop your Gradio app.py. This deployed example represents the absolute smallest version that selects a language model based on environmenal variable os.environ.get("MODEL"). The selections includes Llama 2 which will require a paid Spaces plan to run on Hugging Face (with no code changes!). The live example runs a small toy model google/flan-t5-small that easily runs on the free tier.
View your Gradio app running locally in browser: