How is your prompt engineering?

cracauer@

Developer
Are you getting useful things out of LLMs (AI chatbots)?

I spent some time making a serious run with both web-based (ChatGPT and Claude via Github copilot) and local ones (ollma). The main reason is that they do things that I wouldn't have thought they could do, so I obviously have some self-education to do. Now, the question is whether the results are actually actionable. The jury is out for that one.

Free-floating questions are often tainted by inaccuracies. I usually strike up a conversion about WW2 naval actions. If they trained on Wikipedia only they would get it right - but they don't. They obviously get confused later. What I find concerning is that with LLMs released in the last couple of months that seem to have lost the ability to correct themselves. Now they argue back and forth with me instead when there's something wrong.

Coding is also a mixed bag. I get good results for individual functions for languages that I can read but not write. But there's a lot of garbage coming out, starting with C++ code that has compile-time type errors. Python code runs but hey the type errors might pop up at random times at runtime. A larger coding task just failed miserably - I fed whole files in and let Claude re-code them to use a different graphics API. The AI wouldn't let the structure of the files intact enough to use them inside the rest of the program. I wish that I could just ask a LLM to point me to specific places inside a git repository (e.g. "where does ZFS store the table of readonly compatible features?") but that doesn't exist yet.

The best results I get out of LLMs with restricted answering space. notebooklm.google.com is very good. You feed one ore more PDFs in and it answers questions directly out of those pdfs. It has its limits, I fed it a redacted document and it didn't understand a word. But e.g. my DCS plane manuals found a good place there. But they don't do pictures and illustrations.
 
None.
After I tested those - for finding answers on the internet, or coding (there may be other uses I didn't check) - my overall experience so far is:
The effort to get something reliable useful out of them is larger than I was using conventional ways.
 
I use them in the same way I would use any search engine. They might point me in the right direction but I have to verify everything they point me to in the same way I would handle a Google search.
 
OK, so for as anti-AI as I am, I've come to appreciate the AI based quick google search. It's usually just me trying to remember some obscure shell progamming thing and a quick query "usually" returns something useful. Just today I needed to refresh my memory pertaining to inner vs outer sql joins, and to quickly look up the correct options for mounting a vfat filesystem with custom uid/gui ownership...reading the manpage would have been much more tedious.
 
Are you getting useful things out of LLMs (AI chatbots)?

The time I would need to prompt a meaningful answer out of a chatbot (with my Kagi subscription I could now use some of them virtually for free), I believe I can invest much more meaningfully by spending 2-3 minutes searching the web for the correct answer and then having a beer for the rest of the hour.
 
I would frankly avoid using a chatbot for a political topic like history.

Finding out if the water main on Hapuna Beach got fixed or not yet - (It's been broken since Covid times), chatbots are not a bad option. Figuring out the open hours for some local businesses, or directions to some interesting spot to drive to - sure.

Self-education is a bit of a mixed bag. Some things, I actually do know enough about to be able to do some meaningful research, but then I don't really need the chatbot. But when it comes to things like PowerShell, I discovered that prompt engineering for stuff like that is actually way more efficient for me than trying to wade through tutorials and documentation, and trying a few things and options. It does take a bit of thinking on my part to formulate the query to the chatbot, and refine it turn by turn.
 
I am skeptical about AI, even though my son is a professional AI engineer (he recently published a book, see picture)
20250911_093240.jpg
 
Coding is also a mixed bag. I get good results for individual functions for languages that I can read but not write. But there's a lot of garbage coming out, starting with C++ code that has compile-time type errors. Python code runs but hey the type errors might pop up at random times at runtime. A larger coding task just failed miserably - I fed whole files in and let Claude re-code them to use a different graphics API. The AI wouldn't let the structure of the files intact enough to use them inside the rest of the program. I wish that I could just ask a LLM to point me to specific places inside a git repository (e.g. "where does ZFS store the table of readonly compatible features?") but that doesn't exist yet.
With my little experiment with tree different LLM-s, I got a working C code out. Different LLM-s approached the task differently. Liked the Claude the best, but I have only free accounts in all all of them.

Also, the task I gave was intentionally a little bit vague - to create a network traffic sonifier, that reads all the traffic from a given interface, applies some heuristics on the packets received and then, using IFFT, turns that to sound which is then played through /dev/dsp. Played around some time and also tasked the other LLM to improve the code generated by another.

Got it working, but did not like the sound generated. This is where the things got difficult. Usually asking for certain improvements, it broke the code. But the prototype is working. In the next phase I did ask for threads math/fftw3 IFFT, and I got the multithreaded version also working. It was able to use the port correctly.

For output I did task to use audio/portaudio.

Attached the latest Claude generated file here. It is working, but the sound is not pleasant. Note that /dev/dsp7 is hard coded and you need to change that if you want to try this code.
 

Attachments

I'm very skillful with ANSI ^[ and in crafting useful and beautiful prompts, both with tcsh(1) set prompt and with bash(1) export PS1

Sorry, couldn't help myself 🤪

Seriously now, I only used Copilot, and it helped me in at least two instances – writing .spec for rpmbuild and with getting proper setting for Linux Nvidia resume in /etc/modprobe.d/nvidia_common.conf which it found somewhere in Suse related forums - I would never think of looking for Nvidia resume problem on Alt Linux there ....

Almost forgot, Copilot also helped me with proper syntax for Linux GNU sed (gsed(1) in our world), I'm so used to our sed(1) version that sometimes I have problem with using GNU version.
 
Hmmm... Seems like Copilot would be pretty useful for PowerShell reference research. My problem with Copilot is that it's a Microsoft thing, so I don't have any control over where it sends my real name whenever I use it. Using that thing at work - no problem there. But for personal shopping or research - sorry. Microsoft has no business knowing THAT much about me specifically.

It would be a different ballgame if Microsoft were to corner the entire market. But right now, with Google's in-house AI and ChatGPT being major players for general-purpose AI assistance, I'm gonna enjoy the variety.

I do know that Microsoft is not qutie all that bad - there's a story that I'm remembering off the top of my head, and would like to share: in the 2000-2010 decade, there was a boom in online ads and development of special 'advertising networks'. The executives behind those 'advertising networks' met with Microsoft execs, hoping to get MS to give them an exception in the default pop-up blocker rules that IE had at the time. Well, MS told advertisers to 'drop dead'.

It helps to remember that it was the height pretty much of online harassment by ads, to the point that the ads interfered with the browsing altogether. Well, if I did not remember the story correctly, corrections are welcome!

BTW, Phoronix recently published an article about a GIMP plugin that is basically in the same mold as Copilot. It does require a paid subscription to the specific cloud service to generate images based on the prompting, though...

I think it's a matter of time before KDE creates something similar. I'd trust that more than Copilot.

Thinking about the whole thing, though - There are free services out there that one can connect to and get useful info without giving up too much privacy - but how sustainable can it be?
 
Thinking about the whole thing, though - There are free services out there that one can connect to and get useful info without giving up too much privacy - but how sustainable can it be?

You can use ollama and then downloads LLMs of your choice for purely local interaction. AFAIK the models don't have the ability to call home when running in ollama.
 
You can use ollama and then downloads LLMs of your choice for purely local interaction. AFAIK the models don't have the ability to call home when running in ollama.
LLMs by themselves - probably not. But when you put together all the components into one convenient package, I just can't put it past the packager to want to call home.
 
LLMs by themselves - probably not. But when you put together all the components into one convenient package, I just can't put it past the packager to want to call home.

Package?

You can write code (usually Python) around a LLM in ollama.
 
OK, so for as anti-AI as I am, I've come to appreciate the AI based quick google search. It's usually just me trying to remember some obscure shell progamming thing and a quick query "usually" returns something useful. Just today I needed to refresh my memory pertaining to inner vs outer sql joins, and to quickly look up the correct options for mounting a vfat filesystem with custom uid/gui ownership...reading the manpage would have been much more tedious.

LLM is good at googling while Google is bad at it. It has been, for years. SEO has shafted any perspective of google giving relevant results.
 
Package?

You can write code (usually Python) around a LLM in ollama.
LLMs can be specialized. or general-purpose. Besides, 'packge' as in presenting the user with an input window to type the query, working that window into the existing UI that the user is comfortable with, setting up a web server (like for Google's Gemini API to generate images), doing chores related to web server setup, making sure the whole thing works end to end, and is not too clunky to use - that's what packaging is about.

As a reference point, you can think of Copilot as a 'package'. And Microsoft has ironed out the clunky and error-prone details of setting it up. Trouble is, Copilot wants to call home.
 
You can use ollama and then downloads LLMs of your choice for purely local interaction. AFAIK the models don't have the ability to call home when running in ollama.

I have done this and I have tried it with some ordinary level-1 trivia questions and it started hallucinating hard, inventing stuff along the way.
 
I have done this and I have tried it with some ordinary level-1 trivia questions and it started hallucinating hard, inventing stuff along the way.
Thanks for doing that and sharing!

Yeah, setting it up for local use is one thing, and is probably something a private rank-and-file user can pull off. But yeah, making a go of it - that is something that takes a LOT of metal and money, not something that a random Joe has easy access to.

Yeah, makes me want to stay away from trying to pull off a setup under my own steam.
 
I try to use sometimes a GPT model to request configuration to see what they could bring and the quality of the output.

ChatGPT can nowadays write FreeBDS correct configuration for things like:
  • log rotation with newsyslog
  • rc service
  • pf configuration
  • help to design a series of scripts to migrate and sort databases between production and lab environments, then claim them
Results was less shiny for:
  • ZFS tasks like rename a pool
  • Poudriere
It can write targetting Ansible, Salt, etc., even if without a prompt, you won't get your style (for example, he loves to create file with inline contents, where I prefer to have separate templates with clear autogenerated header)
 
Never tried "vibe coding". I hope I never have to. So far, I've escaped it.
Complicated.

Tried it to write a Java quality of life plugin for Minecraft.

Your own reflexes to own the code and structure it how you want kicks, my conversation tended to be like with an intern, to explain to the model antipatterns.

Still, that had the merit to motivate me to start to write and my first request (reduce rain rate and speed up that mob) produced fairly correct code.
(Funny thing: Java is one of the language with the most use of dependency injection, but Minecraft examples tend to pass god objects all around the code, or write everything in one class).

____

What about unit tests?

In another hand, I've prepared today unit tests with Claude Sonnet 4, that works well.

Prompt: https://devcentral.nasqueron.org/D3706#57411
Result produced by Claude: https://claude.ai/public/artifacts/ab00d3f9-28ef-4f3d-8f5b-624cc1c4cf54
Final commit: https://devcentral.nasqueron.org/rRPRTa0f60307e6f7a6dba58da2530ff56f088f59bff2#change-xqqwXaxl4uta
 
I try to use sometimes a GPT model to request configuration to see what they could bring and the quality of the output.

ChatGPT can nowadays write FreeBDS correct configuration for things like:
  • log rotation with newsyslog
  • rc service
  • pf configuration
  • help to design a series of scripts to migrate and sort databases between production and lab environments, then claim them
I think you brought up an interesting topic - which model is the best with FreeBSD? I have noticed that Gemini and Claude tend to even outperform ChatGPT in this case.
 
(yesterday I've seen a comment on another thread explaining ai-assisted coding is different than vibe coding, apparently vibe coding is when you only use conversation to produce a piece of software, if you interact with the code, refactor it, it's not vibe coding anymore)
 
(yesterday I've seen a comment on another thread explaining ai-assisted coding is different than vibe coding, apparently vibe coding is when you only use conversation to produce a piece of software, if you interact with the code, refactor it, it's not vibe coding anymore)

The way I learned it vibe coding is when you don't understand the code and put it in a product anyway.
 
Back
Top