#!/usr/local/bin/bash
PROMPT="Instruct: $@\nOutput:\n"
./main -m models/phi-2.Q4_K_M.gguf --color --temp 0.7 --repeat_penalty 1.1 -n -1 -p "$PROMPT" -e
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp/
doas pkg install gmake
gmake -j4
mv ~/phi-2.Q4_K_M.gguf models/
./run-phi2.sh "Tell me something about FreeBSD"
... initialization output omitted...
Instruct: Tell me something about FreeBSD
Output:
- FreeBSD is an open source, distributed operating system for Unix-like devices.
- It was created in 1995 and is known for its stability, security, and scalability.
- It is used in a variety of settings, from small enterprises to large organizations.
- It has a number of different distributions, each tailored for different tasks and needs.
- It allows for the customization of the operating system, allowing users to modify and improve it.
- It features a strong password policy and advanced security measures.
<|endoftext|> [end of text]
llama_print_timings: load time = 1187.23 ms
llama_print_timings: sample time = 121.36 ms / 108 runs ( 1.12 ms per token, 889.94 tokens per second)
llama_print_timings: prompt eval time = 3147.98 ms / 11 tokens ( 286.18 ms per token, 3.49 tokens per second)
llama_print_timings: eval time = 54504.98 ms / 107 runs ( 509.39 ms per token, 1.96 tokens per second)
llama_print_timings: total time = 57837.63 ms / 118 tokens
Log end
Beautiful, and there actually already is a port/package for that, so no need to compile yourself:This is actually not that hard. You could use llama.cpp:
Running 14.0-RELEASE-p6 on a pi4:
- install gmake
- git clone https://github.com/ggerganov/llama.cpp
- cd llama.cpp; gmake # use -j n_cores
- get a model from huggingface where the ram requirements match your machine, i used phi-2.Q4_K_M
- place the model file into the models/ subdir of llama.cpp
Use this shell script to launch it:
Bash:#!/usr/local/bin/bash PROMPT="Instruct: $@\nOutput:\n" ./main -m models/phi-2.Q4_K_M.gguf --color --temp 0.7 --repeat_penalty 1.1 -n -1 -p "$PROMPT" -e
Example:
Code:git clone https://github.com/ggerganov/llama.cpp cd llama.cpp/ doas pkg install gmake gmake -j4 mv ~/phi-2.Q4_K_M.gguf models/ ./run-phi2.sh "Tell me something about FreeBSD"
It's not very fast here but works:
Code:... initialization output omitted... Instruct: Tell me something about FreeBSD Output: - FreeBSD is an open source, distributed operating system for Unix-like devices. - It was created in 1995 and is known for its stability, security, and scalability. - It is used in a variety of settings, from small enterprises to large organizations. - It has a number of different distributions, each tailored for different tasks and needs. - It allows for the customization of the operating system, allowing users to modify and improve it. - It features a strong password policy and advanced security measures. <|endoftext|> [end of text] llama_print_timings: load time = 1187.23 ms llama_print_timings: sample time = 121.36 ms / 108 runs ( 1.12 ms per token, 889.94 tokens per second) llama_print_timings: prompt eval time = 3147.98 ms / 11 tokens ( 286.18 ms per token, 3.49 tokens per second) llama_print_timings: eval time = 54504.98 ms / 107 runs ( 509.39 ms per token, 1.96 tokens per second) llama_print_timings: total time = 57837.63 ms / 118 tokens Log end
Aw, such a shame that it doesn't work on FreeBSD yet, but it is understandable as FreeBSD focuses another aspect in computing. I guess Linux it is for nowYes, it's possible on FreeBSD with CUDA using specific Nvidia driver and select generation of their GPUs (reasons of firmware and hardware design), but it's not a simple 1-2-3 set of commands, at least not as of right now.
No, you can use an LHR card without problems for AI/ML.another GPU question, just in case someone here knows the answer:
- some graphics cards are labeled "LHR" (low hash rate) which I understand makes then unsuitable for crypto mining. Now, does the "LHR" make them unsuitable for running LLM's too?
As time goes, I am slightly getting more interested into running a Local LLM and training it to help me some sh scripts for my own FreeBSD usage. I would really appreciate it if you could recommend me some links or resources you found the most helpful about how to setup an LLM and sort of make it "focus" only on code, and not other speech or image generation.
Yes you can.Can't you download pre-trained LLMs?
Is there a how to ?I do this stuff for work and pleasure, so here's some insight.
1. Yes, it's possible on FreeBSD with CUDA using specific Nvidia driver and select generation of their GPUs (reasons of firmware and hardware design), but it's not a simple 1-2-3 set of commands, at least not as of right now.
2. CUDA support is not official, rather it is in PoC stages of development, and Linuxulator is used in the integration stage, so that's... fun.
3. CUDA support is essential for well performing LLMs at present, and only Nvidia cards have CUDA support. It's an industry problem and it's not being resolved any time soon, nor is it a FreeBSD vs whatever problem. a longer conversation is possible, but tends to devolve into pissing matches between fanboy groups.
Some fun!
- https://github.com/intel/intel-extension-for-pytorch
- https://bigscience.huggingface.co/blog/which-hardware-to-train-a-176b-parameters-model
Sure thing, I'll convert some of my notes to a blog post and send a link when it's up.Is there a how to ?
Im using LLM`s but running on linux and using Oatmeal to connect remotely to my LLMs and if i could use on FreeBSD with GPU it would be great.
tingo@locaal:~ $ ollama list
NAME ID SIZE MODIFIED
llama3.2:1b baf6a787fdff 1.3 GB About an hour ago
mistral:latest f974a74358d6 4.1 GB About an hour ago
tingo@locaal:~ $ ollama run mistral
Error: Post "http://127.0.0.1:11434/api/chat": EOF
tingo@locaal:~ $ ollama run mistral
>>> what is FreeBSD
FreeBSD is a free and open-source Unix-like operating system based on BSD (Berkeley Software Distribution) versions of the Unix source code. It is known for its
high-performance, stability, security, and compatibility with a wide range of software and hardware platforms. FreeBSD was initially developed at the University of
California, Berkeley, as an extension of Research Unix, which served as the basis for many commercial Unix systems in the 1970s and 1980s. Since becoming open-source in
1993, FreeBSD has grown into a popular choice among server administrators, developers, and hobbyists who value its flexibility, reliability, and customization options. It
provides a variety of features such as a modular kernel, support for the ZFS file system, built-in virtualization with jails, and the Ports Collection, which allows users to
easily install thousands of third-party applications. FreeBSD is licensed under the BSD License, which permits unrestricted use, modification, and distribution of the source
code as long as copyright notices are preserved and modifications are clearly marked.
>>>
root@locaal:~ # freebsd-version -ku
13.4-RELEASE-p1
13.4-RELEASE-p2
root@locaal:~ # pkg -vv | grep url
url : "pkg+http://pkg.FreeBSD.org/FreeBSD:13:amd64/quarterly",
root@locaal:~ # pkg info ollama\*
ollama-0.3.6_1
nvidia-smi --loop=1
root@Secure_Ollama:/ # ollama run context-mistral-nemo:latest
Error: Post "http://127.0.0.1:11434/api/chat": EOF
root@Secure_Ollama:/ # ollama run context-mistral-nemo:latest
time=2024-12-11T23:14:28.040Z level=WARN source=sched.go:642 msg="gpu VRAM usage didn't recover within timeout" seconds=8.543873626 model=/root/.ollama/models/blobs/sha256-b559938ab7a0392fc9ea9675b82280f2a15669ec3e0e0fc491c9cb0a7681cf94
time=2024-12-11T23:14:28.166Z level=WARN source=sched.go:642 msg="gpu VRAM usage didn't recover within timeout" seconds=8.669209817 model=/root/.ollama/models/blobs/sha256-b559938ab7a0392fc9ea9675b82280f2a15669ec3e0e0fc491c9cb0a7681cf94
time=2024-12-11T23:14:28.446Z level=WARN source=sched.go:642 msg="gpu VRAM usage didn't recover within timeout" seconds=8.949622423 model=/root/.ollama/models/blobs/sha256-b559938ab7a0392fc9ea9675b82280f2a15669ec3e0e0fc491c9cb0a7681cf94
I would agree with you - except if I try again the model runs. So it runs, but not every time.the post error means your cards or CPU is not enough to run the model.