Einen KI-Chatbot auf dem eigenen PC laufen lassen

This blog post is also available in English

Georgi Gerganov hat es mit Llama.cpp sehr vereinfacht, LLMs lokal laufen zu lassen. Es gibt noch eine ganze Reihe anderer Tools/Frameworks, mit denen das auch möglich ist – z.B. gpt4all. Die Einstiegshürde ist nun deutlich niedriger – ähnlich wie bei whisper.cpp, das auch von Georgi ist. Wenn ihr ihn mal selber sprechen hören wollt, dann hört euch diese Changelog Episode an.

llama.cpp

Also nun zu der Kurzanleitung, wie man das mal lokal ausprobieren kann:

git clone [email protected]:ggerganov/llama.cpp.git
cd llama.cpp
make

Jetzt braucht man noch ein passendes Model – am besten direkt im ggml-Format. ggml ist eine C++ - Bibliothek, mit der man LLMs nur auf der CPU laufen lassen kann. Das ist auf Intels & AMDs zwar relativ langsam – für meinen Geschmack etwas zu langsam - aber man kann es mit etwas Geduld noch nutzen. Wenn ich es recht verstanden habe läuft llama.cpp aber auf Macs mit CPUs ab M1 wesentlich schneller, da dort die AI-Beschleunigung der CPU verwendet werden kann.

Also zurück zum Model. Mit llama.cpp habe ich gpt4-x-alpaca-13b ausprobiert. Nach dem Herunterladen kann aus dem llama-Verzeichnis heraus folgende Zeile eingegeben werden. Dabei muss noch der Pfad zum Model angepasst werden. Die Parameter habe ich mehr oder weniger unbedarft von irgendwo übernommen:

./main -i --interactive-first -r "### Human:" --temp 0 -c 2048 -n -1 --ignore-eos --repeat_penalty 1.2 --instruct -m /path/to/model/file/ggml-model-q4_1.bin

Eine Testausgabe sieht so aus:

main: seed = 1681807028
llama.cpp: loading model from /media/davidk/big/ai/gpt-x-alpaca-13b-ggml/ggml-model-q4_1.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 3 (mostly Q4_1)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =  73.73 KB
llama_model_load_internal: mem required  = 11359.03 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size  = 1600.00 MB

system_info: n_threads = 16 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
main: interactive mode on.
Reverse prompt: '### Human:'
Reverse prompt: '### Instruction:

'
sampling: temp = 0.000000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.200000
generate: n_ctx = 2048, n_batch = 8, n_predict = -1, n_keep = 2


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - If you want to submit another line, end your input in '\'.

 
> I need a basic HTML-page with menu, footer, etc. Please generate HTML and CSS for me.
<!DOCTYPE html>
<html>
<head>
    <title>Page Title</title>
    <link rel="stylesheet" href="style.css">
</head>
<body>
    <nav>
        <ul>
            <li>Home</li>
            <li>About Us</li>
            <li>Products</li>
            <li>Contact Us</li>
        </ul>
    </nav>
    <div>
        <h1>Welcome to Our Site!</h1>
        <p>Learn more about our products and services.</p>
    </div>
    <footer>
            <p>Copyright &copy; 2020. All Rights Reserved.</p>
    </footer>
</body>
</html>

Danach kommen nur noch Leerzeilen. Was immer das bedeutet :) Man kann aber mit CTRL-C abbrechen und eine neue Anfrage stellen:

> Please generate a corresponding CSS-File for this page!
body {
    background-color: white;
    font-family: 'Open Sans', sans-serif;
    font-size: 16px;
    color: #3333;
    line-height: 1.5;
    padding: 32px;
}

nav {
    float: left;
    width: 100%;
    list-style: none;
    padding: 16px;
}

.footer {
    padding: 16px;
    color: #3333;
    font-size: 12px;
}

.copyright {
    float: right;
}

.copyright a {
    color: #3333;
    text-decoration: none;
    font-weight: bold;
    font-size: 12px;
}
....

Ein Model mit 30 Milliarden Parametern erzeugt:

<!DOCTYPE html>
<html>
<head>
    <title>My Page</title>
    <link rel="stylesheet" href="style.css">
</head>
<body>
    <header>
        <h1>My Page</h1>
    </header>
    <nav>
        <ul>
            <li><a href="#">Home</a></li>
            <li><a href="#">About</a></li>
            <li><a href="#">Contact</a></li>
        </ul>
    </nav>
    <main>
        <article>
            <h2>Main Content</h2>
            <p>This is the main content of the page.</p>
        </article>
    </main>
    <footer>
        <p>Copyright &copy; 2020</p>
    </footer>
</body>
</html>

und

body {
    font-family: 'Open Sans', sans-serif;
    font-size: 16px;
    color: #333;
    background-color: #fff;
    margin: 0;
    padding: 0;
}

header {
    background-color: #333;
    padding: 10px;
    color: #fff;
}

nav {
    float: left;
    width: 100%;
    padding: 10px;
    background-color: #333;
    color: #fff;
}

nav ul {
    list-style-type: none;
    margin: 0;
    padding: 0;
}

nav ul li {
    display: inline-block;
    margin-right: 10px;
}

nav ul li a {
    color: #fff;
    text-decoration: none;
}

main {
    padding: 10px;
    background-color: #fff;
}

article {
    padding: 10px;
    margin: 0;
}

article h2 {
    font-size: 24px;
    font-weight: bold;
    margin: 0;
}

article p {
    margin: 0;
}

footer {
    background-color: #333;
    padding: 10px;
    color: #fff;
}

footer p {
    margin: 0;
}

Das sieht schon recht überzeugend aus - aber es hat auch mit ca. 10 Minuten echt lange gedauert.

gpt4all

gpt4all setzt auf llama.cpp auf. Sie haben einen Fork angelegt und entwickeln von dort aus weiter. Dabei wurde allerdings auch das Dateiformat für die Modelle angepasst. Die beiden Programme sind also zumindest momentan nicht mehr kompatibel.

Auf der github-Seite sind passende Modelle verlinkt. Die erste Implementierung, die ich ausprobiert habe, läuft noch im Terminal. Ich habe das so ausprobiert:

git clone [email protected]:nomic-ai/gpt4all.git
cd gpt4all/chat
./gpt4all-lora-quantized-linux-x86 -m /path/to/model/gpt4all-lora-quantized.bin

Die gpt4all-lora-quantized.bin Datei findet man auf der Seite oder direkt hier. Das ist ein Modell mit 6B Parametern. Ist also wesentlich kleiner als das oben und das merkt man auch – einerseits ist die Geschwindigkeit wesentlich besser – andererseits die Qualität wesentlich schlechter.

> I need a basic HTML-page with menu, footer, etc. Please generate HTML and CSS for me.
Here's your requested page! The code is below in plain text format so you can easily copy it into an editor or CMS of your choice to customize as needed.
> Please output the code!
<html><head>
  <title></title>
</head>
<body>
	<!-- Menu -->
    <!-- Footer -->
</body>
</html>.

This response is generated by a simple Python script that reads in an HTML file and generates the corresponding plain text code. The resulting output can then be copied into any editor or CMS for further customization as needed!

gpt4all bietet aber auch einen Installer mit einem anderen Model, das vermutlich aus anderen Quellen berechnet wurde. Dort gibt es auch eine graphische Oberfläche.

Die Anfrage von oben sah hier ganz gut aus:

> I need a basic HTML-page with menu, footer, etc. Please generate HTML and CSS for me.

Sure, I can generate an HTML-page with menu, footer, etc. Here's the code:
<!DOCTYPE html>
<html>
<head>
  <title>My HTML Page</title>
  <style>
    /* Style for menu */
    #menu {
      position: absolute;
      top: 0;
      left: 0;
      width: 100%;
      height: 100%;
      background-color: #333;
      color: #fff;
    }
    #menu ul {
      list-style: none;
      margin: 0;
      padding: 0;
    }
    #menu ul li {
      float: left;
      padding: 10px;
    }
    #menu ul li a {
      display: block;
      color: #fff;
      text-decoration: none;
      font-size: 20px;
      font-family: Arial, sans-serif;
    }
    #menu ul li a:hover {
      background-color: #555;
    }
    /* Style for footer */
    #footer {
      position: absolute;
      bottom: 0;
      left: 0;
      width: 100%;
      height: 50px;
      background-color: #555;
      color: #fff;
    }
    #footer p {
      margin: 0;
      padding: 10px;
    }
  </style>
</head>
<body>
  <div id="menu">
    <ul>
      <li><a href="#">Home</a></li>
      <li><a href="#">About</a></li>
      <li><a href="#">Contact</a></li>
    </ul>
  </div>
  <div id="footer">
    <p>&copy; My Website</p>
  </div>
</body>
</html>

Über Stil kann man natürlich streiten.

dalai

Hier noch ein Frontend aus dem javascript-Universum. Das habe ich allerdings noch nicht ausprobiert.

Oobabooga

Das text-generation-webui soll ein Rundumsorglos-Tool ähnlich wie wie AUTOMATIC1111 für stable diffusion werden. Bei mir läuft das gerade leider (noch) nicht, da ich keine NVidia- sondern eine AMD-Karte habe, die deshalb kein CUDA unterstützt.

Lizenzen / Rechte

Leider ist für mich die Lizenz/Rechte-Situation insgesamt ein wenig unklar. Auf huggingface liegen sehr viele unterschiedliche Modelle rum, die teilweise mit sehr freien Lizenzen angegeben sind. Ich glaube kaum, dass das da alles so ok ist.

Erfreulich ist, dass ein Hersteller für KI-Plattformen selber ein GPT-LLM gerechnet hat, welches hier zu finden ist. Dem fehlen momentan aber noch die «Alpaca» / ChatGPT-Eigenschaften. Es ist also erstmal «nur» ein Textvervollständiger. Ich tippe aber mal darauf, dass da noch was kommt.

Technology Briefing#1: Large Language Models und Commodity AI

Report lesen

Blog-Post