I don’t see anything obvious here, and the server is neither PROC nor RAM constrained (Core i7, 32GB RAM running on Ubuntu). Anyone else seeing these errors?
ChatGPT seems to think increasing the ULIMIT to unlimited will fix the problem - I’ve tried that (editing /etc/security/limits.conf and added:
hard memlock unlimited
soft memlock unlimited
After a reboot, I’m still seeing the mlock allocation error. I can’t imagine I’m the only one having this issue? LlamaGPT was recently updated - about the same time I was starting to see this issue. Is there a way to reach out to the devs to see if they can help?
Yes, I’m using the official one from the UmbrelOS App store. On a lark, I decided to do a fresh install of Umbrel on another computer (with Ubuntu) to see if I could re-produce the problem, and the new install is working fine, so I’m wondering what may have gone wrong. I’d like to troubleshoot it to get a better understanding though - this is all about learning for me. I have a lab set up here at home for that express purpose. Happy to entertain your thoughts.
In my experience the timeouts only happened on the longer discussions. Once I turned the temperature of the model down to zero, and only used on prompt per conversation, I started to get less timeouts. I’m running on the raspberry pi, and it turned out it was getting close to overheating in some of these larger conversations. It seems unlikely given your system that it is overheating, but I would not be surprised if the app isn’t that optimized yet, and that certain conversations/models will fail ungracefully.
As a comparison, I’ve tried https://webllm.mlc.ai/ which is a similar idea, and it says “you will need a GPU with about 6GB memory to run Llama-7B” but I received errors despite having an 8gb gpu, but I was able to get RedPajama-3B to work which said “and about 3GB memory to run.”
Admittedly, this is a kind of kludgey way of troubleshooting, but you might try https://webllm.mlc.ai/ on your ubuntu machine (assuming its not headless) to compare performance. Keep in mind the webllm runs entirely in the browser, as opposed to on the server like with llamagpt.