Llama-3 (Open Source) at 70B is pretty capable if you can manage to run it. I’d say it’s comparable to GPT-4, or maybe GPT 3.5.
In second place is WizardLM-2, at 8B parameters (if you are memory constrained).
You should run the largest model that you can fit completely in VRAM for maximum speed. Higher precision is better, FP32>16>8>4>2. 8-bit is probably more than enough for most consumer/local LLM applications/deployments, and 4-bit if you want to experiment with size vs accuracy.
LLM Arena is a good place to benchmark the different models on a personal A/B basis, everyone has different needs and personal needs for what different models can do, from help with coding, translation, medical diagnoses, and so on.
They all have various strengths and weaknesses presently, as optimizing a model for a specific domain or task seems to (not guaranteed, but only seems to) make it weaker in doing other tasks.
Ok, so you’ve got some good experience with LLMs, but I would be careful to characterize yourself as an expert-- actually, the LLM-based assistant stacks we build may better fit the term ontologically, whereas it is presently associated strongly with terminology such as MoE.
And I’m not trying to gatekeep expertise here, I’m just warning that traditional AI scholars don’t respect the miniscule niche that we’d otherwise call ourselves experts in, for good reason. These are essentially just text predictors on steroids, and the crux of it is knowing when you can depend on them and when you can’t, learning their behaviors and what to expect from them.
Also, if you haven’t already, grab Vicuna 33b (original from lmsys) and compare it to the models mentioned. I think you may find that it behaves surprisingly different from other models in a very intriguing manner-- it was the first and only one to truly shock me.
Also: avoid accelerationists, for your own good. e/acc is a meme started by some 4chan robots and those whose familiarity matters will dismiss you for association to them, just as we’ve dismissed Altman for that and other reasons.
Llama-3 (Open Source) at 70B is pretty capable if you can manage to run it. I’d say it’s comparable to GPT-4, or maybe GPT 3.5.
In second place is WizardLM-2, at 8B parameters (if you are memory constrained).
You should run the largest model that you can fit completely in VRAM for maximum speed. Higher precision is better, FP32>16>8>4>2. 8-bit is probably more than enough for most consumer/local LLM applications/deployments, and 4-bit if you want to experiment with size vs accuracy.
LLM Arena is a good place to benchmark the different models on a personal A/B basis, everyone has different needs and personal needs for what different models can do, from help with coding, translation, medical diagnoses, and so on.
They all have various strengths and weaknesses presently, as optimizing a model for a specific domain or task seems to (not guaranteed, but only seems to) make it weaker in doing other tasks.
Ok, so you’ve got some good experience with LLMs, but I would be careful to characterize yourself as an expert-- actually, the LLM-based assistant stacks we build may better fit the term ontologically, whereas it is presently associated strongly with terminology such as MoE.
And I’m not trying to gatekeep expertise here, I’m just warning that traditional AI scholars don’t respect the miniscule niche that we’d otherwise call ourselves experts in, for good reason. These are essentially just text predictors on steroids, and the crux of it is knowing when you can depend on them and when you can’t, learning their behaviors and what to expect from them.
Also, if you haven’t already, grab Vicuna 33b (original from lmsys) and compare it to the models mentioned. I think you may find that it behaves surprisingly different from other models in a very intriguing manner-- it was the first and only one to truly shock me.
Also: avoid accelerationists, for your own good. e/acc is a meme started by some 4chan robots and those whose familiarity matters will dismiss you for association to them, just as we’ve dismissed Altman for that and other reasons.