cracauer@
Developer
Are you getting useful things out of LLMs (AI chatbots)?
I spent some time making a serious run with both web-based (ChatGPT and Claude via Github copilot) and local ones (ollma). The main reason is that they do things that I wouldn't have thought they could do, so I obviously have some self-education to do. Now, the question is whether the results are actually actionable. The jury is out for that one.
Free-floating questions are often tainted by inaccuracies. I usually strike up a conversion about WW2 naval actions. If they trained on Wikipedia only they would get it right - but they don't. They obviously get confused later. What I find concerning is that with LLMs released in the last couple of months that seem to have lost the ability to correct themselves. Now they argue back and forth with me instead when there's something wrong.
Coding is also a mixed bag. I get good results for individual functions for languages that I can read but not write. But there's a lot of garbage coming out, starting with C++ code that has compile-time type errors. Python code runs but hey the type errors might pop up at random times at runtime. A larger coding task just failed miserably - I fed whole files in and let Claude re-code them to use a different graphics API. The AI wouldn't let the structure of the files intact enough to use them inside the rest of the program. I wish that I could just ask a LLM to point me to specific places inside a git repository (e.g. "where does ZFS store the table of readonly compatible features?") but that doesn't exist yet.
The best results I get out of LLMs with restricted answering space. notebooklm.google.com is very good. You feed one ore more PDFs in and it answers questions directly out of those pdfs. It has its limits, I fed it a redacted document and it didn't understand a word. But e.g. my DCS plane manuals found a good place there. But they don't do pictures and illustrations.
I spent some time making a serious run with both web-based (ChatGPT and Claude via Github copilot) and local ones (ollma). The main reason is that they do things that I wouldn't have thought they could do, so I obviously have some self-education to do. Now, the question is whether the results are actually actionable. The jury is out for that one.
Free-floating questions are often tainted by inaccuracies. I usually strike up a conversion about WW2 naval actions. If they trained on Wikipedia only they would get it right - but they don't. They obviously get confused later. What I find concerning is that with LLMs released in the last couple of months that seem to have lost the ability to correct themselves. Now they argue back and forth with me instead when there's something wrong.
Coding is also a mixed bag. I get good results for individual functions for languages that I can read but not write. But there's a lot of garbage coming out, starting with C++ code that has compile-time type errors. Python code runs but hey the type errors might pop up at random times at runtime. A larger coding task just failed miserably - I fed whole files in and let Claude re-code them to use a different graphics API. The AI wouldn't let the structure of the files intact enough to use them inside the rest of the program. I wish that I could just ask a LLM to point me to specific places inside a git repository (e.g. "where does ZFS store the table of readonly compatible features?") but that doesn't exist yet.
The best results I get out of LLMs with restricted answering space. notebooklm.google.com is very good. You feed one ore more PDFs in and it answers questions directly out of those pdfs. It has its limits, I fed it a redacted document and it didn't understand a word. But e.g. my DCS plane manuals found a good place there. But they don't do pictures and illustrations.