LlamaGPT Internal Server Errors (constantly)

Hello All -

In the last few days, LlamaGPT has been consistently throwing an Internal Server Error… I’ve re-installed a few times with no success.



Attaching to llama-gpt_llama-gpt-api_1, llama-gpt_app_proxy_1, llama-gpt_llama-gpt-ui_1
llama-gpt-api_1 | llama_model_load_internal: ggml ctx size = 0.08 MB
llama-gpt-api_1 | llama_model_load_internal: mem required = 4141.73 MB (+ 2048.00 MB per state)
llama-gpt-api_1 | warning: failed to mlock 73728000-byte buffer (after previously locking 0 bytes): Cannot allocate memory
llama-gpt-api_1 | Try increasing RLIMIT_MLOCK (‘ulimit -l’ as root).
llama-gpt-api_1 | llama_new_context_with_model: kv self size = 2048.00 MB
llama-gpt-api_1 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
llama-gpt-api_1 | INFO: Started server process [1]
llama-gpt-api_1 | INFO: Waiting for application startup.
llama-gpt-api_1 | INFO: Application startup complete.
llama-gpt-api_1 | INFO: Uvicorn running on (Press CTRL+C to quit)
app_proxy_1 | Waiting for llama-gpt-ui:3000 to open…
app_proxy_1 | LlamaGPT is now ready…
app_proxy_1 | Listening on port: 1234
app_proxy_1 | Validating token: 42bd08616629 …
app_proxy_1 | Validating token: 42bd08616629 …
app_proxy_1 | Validating token: 42bd08616629 …
app_proxy_1 | Validating token: 42bd08616629 …
app_proxy_1 | Validating token: 42bd08616629 …
app_proxy_1 | Validating token: 42bd08616629 …
app_proxy_1 | Validating token: 42bd08616629 …
llama-gpt-ui_1 | localPort: 53284,
llama-gpt-ui_1 | remoteAddress: ‘’,
llama-gpt-ui_1 | remotePort: 8000,
llama-gpt-ui_1 | remoteFamily: ‘IPv4’,
llama-gpt-ui_1 | timeout: undefined,
llama-gpt-ui_1 | bytesWritten: 3816,
llama-gpt-ui_1 | bytesRead: 0
llama-gpt-ui_1 | }
llama-gpt-ui_1 | }
llama-gpt-ui_1 | }

I don’t see anything obvious here, and the server is neither PROC nor RAM constrained (Core i7, 32GB RAM running on Ubuntu). Anyone else seeing these errors?


ChatGPT seems to think increasing the ULIMIT to unlimited will fix the problem - I’ve tried that (editing /etc/security/limits.conf and added:

  •   hard    memlock unlimited
  •   soft    memlock unlimited

After a reboot, I’m still seeing the mlock allocation error. I can’t imagine I’m the only one having this issue? LlamaGPT was recently updated - about the same time I was starting to see this issue. Is there a way to reach out to the devs to see if they can help?

I have some hunches/kludges that might help you get it working, but before getting to the weeds can you share if you’re using the app via umbrelos (inside ubuntu) or if not which of the other methods mentioned at GitHub - getumbrel/llama-gpt: A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support! you used to install?

Yes, I’m using the official one from the UmbrelOS App store. On a lark, I decided to do a fresh install of Umbrel on another computer (with Ubuntu) to see if I could re-produce the problem, and the new install is working fine, so I’m wondering what may have gone wrong. I’d like to troubleshoot it to get a better understanding though - this is all about learning for me. I have a lab set up here at home for that express purpose. Happy to entertain your thoughts.

In my experience the timeouts only happened on the longer discussions. Once I turned the temperature of the model down to zero, and only used on prompt per conversation, I started to get less timeouts. I’m running on the raspberry pi, and it turned out it was getting close to overheating in some of these larger conversations. It seems unlikely given your system that it is overheating, but I would not be surprised if the app isn’t that optimized yet, and that certain conversations/models will fail ungracefully.

As a comparison, I’ve tried https://webllm.mlc.ai/ which is a similar idea, and it says “you will need a GPU with about 6GB memory to run Llama-7B” but I received errors despite having an 8gb gpu, but I was able to get RedPajama-3B to work which said “and about 3GB memory to run.”

Admittedly, this is a kind of kludgey way of troubleshooting, but you might try https://webllm.mlc.ai/ on your ubuntu machine (assuming its not headless) to compare performance. Keep in mind the webllm runs entirely in the browser, as opposed to on the server like with llamagpt.