Local GenAI with Raycast, ollama, and PyTorch

March 14, 2024

I wanted to experiment with current generative “Artificial Intelligence” (AI) trends, understand limitations and benefits, as well as performance and quality aspects, and see if I could integrate large language models and other generative “AI” use cases into my workflow or use them for inspiration. Since I’d like my data to stay on my machine, using online services was out of the question. Also, being familiar with the performance of my local machine, and using it to put the efficiency of the technology (or lack thereof) into context, makes cost and complexity very tangible.

I am going to line out my setup for various text2text, img2text and text2img setups, which can be comfortably run on a 2020 M1 MacBook Air with 16GB of combined RAM. The more RAM, and the higher the number of GPU cores, the faster and better the experience will be though.

Installation

I have been using Homebrew as a packet manager on MacOS for years, so I am also using it to bootstrap the software I need for this setup, where possible. You can get it from https://brew.sh/ and follow the installation instructions.

After installing Homebrew, use the following commands in the Terminal app to install ollama to get started with large language models locally, and install Raycast as launcher and interface to interact with these models in a seamless way through the copy-paste buffer, text selections, or with files. In order to provide a locally hosted ChatGPT-like interface for the browser through PrivateGPT, we’ll also install make as it is needed as dependency to run, and it might not be installed on your system.

brew install ollama
rehash
brew services start ollama
brew install --cask raycast
brew install make

As a next step you can already start downloading models for text2text and img2text use cases. Good models to start with are mistral, llama2, or gemma for text2text and llava for img2text. We’ll also download nomic-embed-text as an additional model for embeddings which will come in handy later for ChatGPT-like functionality, and start with mistral because PrivateGPT uses it by default, and we want to set that up later.

ollama pull mistral
ollama pull llava
ollama pull nomic-embed-text

For the specific case of explaining code step by step, you can install codellama. If you ever want to update all your downloaded models, you can use the following command until ollama provides a built-in way to do that.

ollama list | tail -n +2 | awk '{print $1}' | while read -r model; do ollama pull $model; done

I have added that command to my .zshrc file as alias ollamaup so I can call that short alias instead of typing or copying that quite complex command over and over.

echo "alias ollamaup=\"ollama list | tail -n +2 | awk '{print \\\$1}' | while read -r model; do ollama pull \\\$model; done\"" >> ~/.zshrc
source ~/.zshrc

Now start Raycast like every other app on MacOS and set it up to your liking. The tutorial will get you started. The large language models are likely still downloading, so you can already install the Raycast extension that connects ollama with Raycast. Go to the ollama extension in the Raycast plugin store and click Install Extension.

In order to configure Raycast to use text selected in any type of window for input, and not just text from your clipboard, you need to turn on Accessibility for Raycast in the Security section of your System Settings.

If you want to generate images you need to install PyTorch to use the respective text2img model PixArt-alpha. If you already have a few prerequisites installed, feel free to skip the respective commands, but if you have not worked with Python 3 or PyTorch before, you need to execute the following steps.

brew install pyenv
rehash
pyenv install 3.11
pyenv global 3.11

Then configure pyenv so it is available from the command line. Execute the following commands, if you use zsh as shell, which is the default on MacOS in 2024. If you use a different shell or OS, refer to the pyenv documentation.

echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.zshrc
echo '[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.zshrc
echo 'eval "$(pyenv init -)"' >> ~/.zshrc
source ~/.zshrc

Create a virtual python environment for the image generation and additional integration with Raycast, and activate the environment.

python -m venv .venv-raycast
source .venv-raycast/bin/activate

Install PyTorch and the required dependencies for the respective text2img model via pip.

pip install torch
pip install transformers diffusers accelerate sentencepiece beautifulsoup4 ftfy

Download the generate-image.py python script, and copy it to a convenient location for use with Raycast (e.g. ~/Documents/Raycast). You will need to adapt the first line of the script to point to the Python binary in the virtual environment you set up earlier (e.g. /Users/yourname/.venv-raycast/bin/python). Then add this directory as a script directory as explained in the Raycast documentation for adding script commands:

Open the Extensions tab in the Raycast preferences
Click the plus button
Click Add Script Directory
Select directories containing your Script Commands

The instructions and the python script linked above were adapted from Félix Sanz’ amazing blog, where he explains how you can run the PixArt-alpha model with less than 8GB of VRAM.

Use Cases

In the following videos I demonstrate a few examples of how to use Raycast to interact with ollama and PyTorch.

You can use the command “Explain this in simple terms” to do exactly that for any text you select. In the example below I chose Anil Dash’s blog post Today’s AI is unreasonable.

You can also “Chat with Ollama” to get inspiration for whatever you are trying to write, especially if you are struggling with getting started. In the following example I ask ollama to please generate a strong Dungeons & Dragons campaign hook, to see if it can help me with a campaign I might want to run with my tabletop roleplaying group.

From this generated text I can pick something and create an image based on that input. On a 2020 M1 MacBook Air with 16GB combined RAM this takes about 7 minutes in total, so I sped up the lengthy generation process by a factor of 20 in the video.

This is the generated image, which can be clearly identified as algorithmically created, but still passes my subjective “at first glance it looks ok” test.

Using llava I can let the model describe the contents of the image I generated earlier.

Full PrivateGPT

If you have done all of the above, you have already most of the prerequisites for running PrivateGPT. You need to install poetry to manage dependencies, and then clone the PrivateGPT repository and change into that directory.

git clone https://github.com/zylon-ai/private-gpt
cd private-gpt

Now install the dependencies with poetry and run PrivateGPT.

poetry install --extras "ui llms-ollama embeddings-ollama vector-stores-qdrant"
PGPT_PROFILES=ollama make run

You should be able to access the interface with your web browser at http://localhost:8001/