With Llama.cpp, Georgi Gerganov has greatly simplified the process of running LLMs locally. There are also a number of other tools and frameworks that make this possible, such as gpt4all. The barrier to entry has now been lowered significantly— for instance with whisper.cpp, also from Georgi.
If you want to hear about it in his own words, have a listen to this Changelog episode an.
Now let’s get started with the guide to trying out an LLM locally:
First, you need an appropriate model, ideally in ggml format. ggml is a C++ library that allows you to run LLMs on just the CPU. On Intel and AMDs processors, this is relatively slow, however. Too slow for my tastes, but it can be done with some patience. If I have understood correctly, it runs considerably faster on M1 Macs because the AI acceleration of the CPU can be used in that case.
Anyway, back to the model. I used llama.cpp to give gpt4-x-alpaca-13b a try. After downloading, the following line can be entered in the checked-out directory. The path to the model must be adjusted as needed. I basically snagged the parameters from somewhere without giving it much thought:
A test output looks like this:
This is followed by a bunch of empty lines. Whatever that means. :)
But, you can cancel with CTRL-C and enter a new prompt:
gpt4all is based on llama.cpp. They created a fork and have been working on it from there. In other words, the programs are no longer compatible, at least at the moment.
gpt4all also links to models that are available in a format similar to ggml but are unfortunately incompatible. My first attempt was with a patched version of llama.cpp, which also runs in the terminal. I tried that out like so:
The file gpt4all-lora-quantized.bin can be found on this page or obtained directly from here.
This is a model with 6 billion parameters. That makes it significantly smaller than the one above, and the difference is easy to see: it runs much faster, but the quality is also considerably worse.
However, gpt4all also offers an installer with another model that was presumably trained on other sources. In this case, there is also a graphical interface.
The text-generation-webui is supposed to be a hassle-free tool similar to AUTOMATIC1111 for stable diffusion. Unfortunately, I can’t run that (yet) since I don’t have an NVidia card, only an AMD, which doesn’t support CUDA.
Unfortunately, the license/rights situation is generally a bit hazy, as far as I can see. There are tons of models lying around on huggingface, some of which indicate very free licenses. I have a hard time believing that it’s all on the up and up.
I was pleased to see that a developer of AI platforms trained a GPT-LLM himself that can be found here.
However, this is still missing the “Alpaca” / ChatGPT features. In other words, it is “only” a text completer.
But I’d bet there will be more in store here in the future.