Open models are going to kick the stool out. Hopefully.
GLM 4.5 is already #2 on lm arena, above Grok and ChatGPT, and runnable on homelab rigs, yet just 32B active (which is mad). Extrapolate that a bit, and it’s just a race to the zero-cost bottom. None of this is sustainable.
The full GLM? Basically a 3090 or 4090 and a budget EPYC CPU. Or maybe 2 GPUs on a threadripper system.
GLM Air? Now this would work on a 16GB+ VRAM desktop, just slap in 96GB+ (maybe 64GB?) of fast RAM. Or the recent Framework desktop, or any mini PC/laptop with the 128GB Ryzen 395 config, or a 128GB+ Mac.
You’d download the weights, quantize yourself if needed, and run them in ik_llama.cpp (which should get support imminently).
Open models are going to kick the stool out. Hopefully.
GLM 4.5 is already #2 on lm arena, above Grok and ChatGPT, and runnable on homelab rigs, yet just 32B active (which is mad). Extrapolate that a bit, and it’s just a race to the zero-cost bottom. None of this is sustainable.
I did not understand half of what you’ve written. But what do I need to get this running on my home PC?
You can probably just use ollama and import the model.
It’s going to be slow as molasses on ollama. It needs a better runtime, and GLM 4.5 probably isn’t supported at this moment anyway.
I am referencing this: https://z.ai/blog/glm-4.5
The full GLM? Basically a 3090 or 4090 and a budget EPYC CPU. Or maybe 2 GPUs on a threadripper system.
GLM Air? Now this would work on a 16GB+ VRAM desktop, just slap in 96GB+ (maybe 64GB?) of fast RAM. Or the recent Framework desktop, or any mini PC/laptop with the 128GB Ryzen 395 config, or a 128GB+ Mac.
You’d download the weights, quantize yourself if needed, and run them in ik_llama.cpp (which should get support imminently).
https://github.com/ikawrakow/ik_llama.cpp/
But these are…not lightweight models. If you don’t want a homelab, there are better ones that will fit on more typical hardware configs.