What AI services are you selfhosting? Or, have tested and passed on

kiol@lemmy.world · 6 months ago

What AI services are you selfhosting? Or, have tested and passed on

SmokeyDope@lemmy.world · edit-2 6 months ago

I run kobold.cpp which is a cutting edge local model engine, on my local gaming rig turned server. I like to play around with the latest models to see how they improve/change over time. The current chain of thought thinking models like deepseek r1 distills and qwen qwq are fun to poke at with advanced open ended STEM questions.

STEM questions like “What does Gödel’s incompleteness theorem imply about scientific theories of everything?” Or “Could the speed of light be more accurately refered to as ‘the speed of causality’?”

As for actual daily use, I prefer using mistral small 24b and treating it like a local search engine with the legitimacy of wikipedia. Its a starting point to ask questions about general things I don’t know about or want advice on, then do further research through more legitimate sources.

Its important to not take the LLM too seriously as theres always a small statistical chance it hallucinates some bullshit but most of the time its fairly accurate and is a pretty good jumping off point for further research.

Lets say I want an overview of how can I repair small holes forming in concrete, or general ideas on how to invest financially, how to change fluids in a car, how much fat and protein is in an egg, ect.

If the LLM says a word or related concept I don’t recognize I grill it for clarifying info and follow it through the infinite branching garden of related information.

I’ve used an LLM to help me go through old declassified documents and speculate on internal gov terminalogy I was unfamiliar with.

I’ve used a speech to text model and get it to speek just for fun. Ive used multimodal model and get it to see/scan documents for info.

Ive used websearch to get the model to retrieve information it didn’t know off a ddg search, again mostly for fun.

Feel free to ask me anything, I’m glad to help get newbies started.

kata1yst@sh.itjust.works · 6 months ago

I use OLlama & Open-WebUI, OLlama on my gaming rig and Open-WebUI as a frontend on my server.

It’s been a really powerful combo!

kiol@lemmy.world · 6 months ago

Would you please talk more about it. I forgot about Open-webui, but intending to start playing with. Honestly, what do you actually do with it?

mac@lemm.ee · edit-2 6 months ago

I have Linkwarden pointed at my ollama deployment, so it auto tags links that I archive which is nice.

I’ve seen other people send images captured on their security cameras on frigate to ollama to get it to describe the image

There’s a bunch of other use cases I’ve thought of for coding projects, but haven’t started on any of them yet

afk_strats@lemmy.world · 6 months ago

I have this exact same setup. Open Web UI has more features than I’ve been able to use such as functions and pipelines.

I use it to share my LLMs across my network. It has really good user management so I can set up a user for my wife or brother in law and give them general use LLM while my dad and I can take advantage of Coding-tuned models.

The code formatting and code execution functions are great. It’s overall a great UI.

Ive used LLMs to rewrite code, help format PowerPoint slides, summarize my notes from work, create D&D characters, plan lessons, etc

RonnyZittledong@lemmy.world · 6 months ago

None currently. Wish I could afford a GPU to play with some stuff.

kiol@lemmy.world · 6 months ago

Well, let me know your suggestions if you wish. I took the plunge and am willing to test on your behalf, assuming I can.

Grandwolf319@sh.itjust.works · 6 months ago

I have Immich that has AI searching for my photos. Pretty useful for finding stuff actually

gdog05@lemmy.world · 6 months ago

Once I changed the default model, immich search became amazing. I want to show it off to people but alas, way too many NSFW pics in my library. I would create a second “clean” version to show off to people but I’ve been too lazy.

L_Acacia@lemmy.ml · 6 months ago

Can’t you tag the NSFW to filter it out?

ikidd@lemmy.world · 6 months ago

LMStudio is pretty much the standard. I think it’s opensource except for the UI. Even if you don’t end up using it long-term, it’s great for getting used to a lot of the models.

Otherwise there’s OpenWebUI that I would imagine would work as a docker compose, as I think there’s ARM images for OWU and ollama

L_Acacia@lemmy.ml · 6 months ago

Well they are fully closed source except for the open source project they are a wrapper on. The open source part is llama.cpp

ikidd@lemmy.world · 6 months ago

Fair enough, but it’s damn handy and simple to use. And I don’t know how to do speculative decoding with ollama, which massively speeds up the models for me.

L_Acacia@lemmy.ml · 6 months ago

Their software is pretty nice. That’s what I’d recommand to someone who doesn’t want to tinker. It’s just a shame they don’t want to open source their software and we have to reinvent the wheel 10 times. If you are willing to tinker a bit koboldcpp + openewebui/librechat is a pretty nice combo.

ikidd@lemmy.world · 6 months ago

That koboldcpp is pretty interesting. Looks like I can load a draft model for spec decode as well as a pile of other things.

What local models have you been using for coding? I’ve been disappointed with things like deepseek-coder and the qwen-coder, it’s not even a patch on Claude, but that damn cost for anthropic has been killing me.

L_Acacia@lemmy.ml · 6 months ago

As much as I’d like to praise the open-weight models. Nothing comes close to Claude sonnet in my experience too. I use local models when info are sensitive and claude when the problem requires being somewhat competent.

What setup do you use for coding? I might have a tip for minimizing claude cost you depending on what your setup is.

ikidd@lemmy.world · 6 months ago

I’m using vscode/Roocode with Gosucoder shortprompt, with Requesty providing models. Generally I’ll use R1 to outline a project and Claude to implement. The shortprompt seems to reduce the context quite a bit and hence the cost. I’ve heard about Cursor but haven’t tried it yet.

When you’re using local models, which ones are you using? The ones I mention don’t seem to give me much I can use, but I’m also probably asking more of them because I see what Claude can do. It might also be a problem with how Roocode uses them, though when I just jump into a chat and ask it to spit out code, I don’t get much better.

L_Acacia@lemmy.ml · edit-2 6 months ago

If you are willing to pay 10$ a month. You should get GithubCopilot, it provides near unlimited claude 3.5 usage. RooCode can hook into the github copilot api, and use it for its generations.

I use Qwen Coder and Mistral small locally too. It works ok, but its nowhere near GPT/Claude in terms of response quality.

truxnell@infosec.pub · 6 months ago

I run ollama and auto1111 on my desktop when it’s powers on. Using open-webui in my homelab always on, and also connected to openrouter. This way I can always use openwebui with openrouter models and it’s pretty cheap per query and a little more private that using a big tech chatbot. And if I want local, I turn on the desktop and have local lamma and stab diff.

I also get bugger all benefit out of it., it’s a cute toy.

kiol@lemmy.world · 6 months ago

How do you like auto1111 as I’ve never head of it

L_Acacia@lemmy.ml · 6 months ago

The project is a bit out of date for newer models, Though Older ones work great.

I recommand ComfyUi if you want fine grained control over the generation and you like to tinker.

Swarm / Reforge / Invoke if you want neat, up to date UI.

Rikudou_Sage@lemmings.world · 6 months ago

Try running an AI Horde worker, it’s a really great service!

kiol@lemmy.world · 6 months ago

Not sure I know what that is. As in Hoarder?

Rikudou_Sage@lemmings.world · 6 months ago

It’s a cluster of workers where everyone can generate images/text using workers connected to the service.

So if you ran a worker, people could generate stuff using your PC. For that you would gain kudos, which in turn you can use to generate stuff on other people’s computers.

Basically you do two things: help common people without access to powerful machines and use your capacity when you have time to use the kudos whenever you want, even on the road where you can’t turn on your PC if you fancy so.

y0shi@lemm.ee · 6 months ago

I’ve an old gaming PC with a decent GPU laying around and I’ve thought of doing that (currently use it for linux gaming and GPU related tasks like photo editing etc) However ,I’m currently stuck using LLMs on demand locally with ollama. Energy costs of having it powered on all time for on demand queries seems a bit overkill to me…

pezhore@infosec.pub · 6 months ago

I put my Plex media server to work doing Ollama - it has a GPU for transcoding that’s not awful for simple LLMs.

y0shi@lemm.ee · 6 months ago

That sounds like a great way of leveraging existing infrastructure! I host Plex together with other services in a server with intel transcoding capable CPU. I’m quite sure I would get much better performance with the GPU machine, might end up following this path!

kiol@lemmy.world · 6 months ago

Have to agree on that. Certainly only makes sense to have up when you are using it.

MangoPenguin@lemmy.blahaj.zone · 6 months ago

If Immich counts for its search system, then there’s that.

Otherwise I’ve tried some various things and found them lacking in functionality, and would require leaving my PC on all the time to use.

kiol@lemmy.world · 6 months ago

What else did you try and what was lacking?

Helmaar@lemmy.world · 6 months ago

I was able to run a distilled version of DeepSeek on Linux. I ran it inside a PODMAN container with ROCM support (I have an AMD GPU). It wasn’t super fast but for a locally deployed and self hosted option the performance was okay. Apart from that I have deployed Fooocus for image generation in a similar manner. Currently, I am working on deploying Stable Diffusion with either ComfyUI or Automatic1111 inside a PODMAN container with ROCM support.

kiol@lemmy.world · 6 months ago

Didn’t know about these image generation tools, besides Stable Diffusion. Thanks!

What AI services are you selfhosting? Or, have tested and passed on

What AI services are you selfhosting? Or, have tested and passed on

Testing Indiedroid Nova w/ 16gb ram - Learning Together