Llm
ttt
Context:
Running LLM's (large-language-models) locally is now possible 1 due to the abundance of highly parallelised compute (GPU's) at affordable prices and also the advances of Deep Learning in the past decade.
As such, even slightly powerful consumer devices such as my M1 Macbook Pro with 8GB of RAM can run a small LLM. The purpose of this post is to investigate the token speed and accuracy of a variety of LLM's on my Machines.