Frustratingly bad at self hosting. Can someone help me access LLMs on my rig from my phone

BlackSnack@lemmy.zip · 7 days ago

Frustratingly bad at self hosting. Can someone help me access LLMs on my rig from my phone

fubarx@lemmy.world · 7 days ago

Sounds like the issue is getting to the server, not the LLM server itself. If so, may want to look into running a reverse proxy, or if you want to access it remotely, tunnels: https://github.com/anderspitman/awesome-tunneling

brucethemoose@lemmy.world · edit-2 7 days ago

At risk of getting more technical, ik_llama.cpp has a good built in webui:

https://github.com/ikawrakow/ik_llama.cpp/

Getting more technical, its also way better than ollama. You can run models way smarter than ollama can on the same hardware.

For reference, I’m running GLM-4 (667 GB of raw weights) on a single RTX 3090/Ryzen gaming rig, at reading speed, with pretty low quantization distortion.

And if you want a ‘look this up on the internet for me’ assistant (which you need for them to be truly useful), you need another docker project as well.

…That’s just how LLM self hosting is now. It’s simply too hardware intense and ad hoc to be easy and smart and cheap. You can indeed host a small ‘default’ LLM without much tinkering, but its going to be pretty dumb, and pretty slow on ollama defaults.

BlackSnack@lemmy.zip · 7 days ago

Bet. Looking into that now. Thanks!

I believe I have 11g of vram, so I should be good to run decent models from what I’ve been told by the other AIs.

brucethemoose@lemmy.world · edit-2 7 days ago

In case I miss your reply, assuming a 3080 + 64 GB of RAM, you want the IQ4_KSS (or IQ3_KS, for more RAM for tabs and stuff) version of this:

https://huggingface.co/ubergarm/GLM-4.5-Air-GGUF

Part of it will run on your GPU, part will live in system RAM, but ik_llama.cpp does the quantizations split and GPU offloading in a particularly efficient way for these kind of ‘MoE’ models. Follow the instructions on that page.

If you ‘only’ have 32GB RAM or less, that’s tricker, and the next question is what kind of speeds do you want. But it’s probably best to wait a few days and see how Qwen3 80B looks when it comes out. Or just go with the IQ4_K version of this: https://huggingface.co/ubergarm/Qwen3-30B-A3B-Thinking-2507-GGUF

And you don’t strickly need the hyper optimization of ik_llama.cpp for a small model like Qwen3 30B. Something easier like lm studio or the llama.cpp docker image would be fine.

Alternatively, you could try to squeeze Gemma 27B into that 11GB VRAM, but it would be tight.

brucethemoose@lemmy.world · edit-2 7 days ago

How much system RAM, and what kind? DDR5?

ik doesn’t have great documentation, so it’d be a lot easier for me to just point you places, heh.

november@piefed.blahaj.zone · 7 days ago

Why don’t you ask your LLMs how to do it.

BlackSnack@lemmy.zip · 7 days ago

lol I have! They all say the same similar thing but it’s just not working for me.

november@piefed.blahaj.zone · 7 days ago

How strange.

illusionist@lemmy.zip · 7 days ago

Why do you want to set it up if your experience is bad results?

bitwolf@sh.itjust.works · 7 days ago

To eliminate another subscription I imagine.

MTK@lemmy.world · 7 days ago

Ollama + open webui + tailscale/netbird

Open webui provides a fully functional docker with ollama, so just find the section that applies to you (amd, nvidia, etc) https://github.com/open-webui/open-webui?tab=readme-ov-file#quick-start-with-docker-

And on that host install netbird or Tailscale, install the same on your phone, in tailscale you need to enable magicdns but in netbird I think it provides dns by default.

Once the docker is running and both your server and phone are connected to the vpn (netbird or tailscale) you just type the dns of your server in your phone’s browser (in netbird it would be “yourserver.netbird.cloud” and in tailscale it would be “yourserver.yourtsnet.ts.net”)

Checkout networkchuck on youtube as he has a lot of simple tutorials.

BlackSnack@lemmy.zip · 7 days ago

Bet. I believe what you mentioned is best for accessing my LLM no matter where I am in the world, correct? If so I will try this one after I try what the other person suggested.

Thank you!