I wanted to experiment with running LLM models locally. I liked the idea of my prompts and data not leaving my machine, plus having the chance to experiment with different models.
At first, I assumed that I would need a powerful machine loaded down with GPUs in order to run anything. It turns out that there’s all sorts of models available, and some of the smaller ones run just fine on my Dell Windows laptop (16 gigs of RAM and the integrated graphics card).
Using Ollama, you can choose, download, and run a wide range of models on your machine.
There’s also an Ollama Cloud, where we can run models. There’s a generous free tier for us to use.

Getting Started:
You can download a program from Ollama to manage and run the models. You can get the program from the Ollama site.
If you’re running Windows, you can use winget to install it:

winget Ollama.Ollama

There’s a quickstart guide on the Ollama site, as well.

Ollama comes with a GUI to manage everything. I found it easier to manage things from the command line.

You can verify the install was successful by running the version command from the command line:

ollama --version

Selecting and Running Models:
We can browse the Ollama library to find a model that we would like to use. Most models come in different sizes. Most of the sizes will be a number followed by a b: 8b would be for 8 billion parameters. I’ve found I can run a 3b model with no problems, and running a 7b model is very, very slow, but will run. You may need to experiment to see how large of a model that you can run.
Once you find one that looks interesting, we can download and install it. For example, if we want to try the Qwen 2.5 coder model, the 3b size, we can run this command:

ollama pull qwen2.5-coder:3b

The show command will display some information on the specified model:

ollama show qwen2.5-coder:3b

We have a few choices on running the model. We can start the model and run it within our terminal window, by using the run command:

ollama run qwen2.5-coder:3b

Control + D will stop the running model.

Alternatively, we can run Ollama as a service, and call models from another program using the REST API.

ollama serve

I have a database development agent that can call various AI models, so I’ve used that with Ollama. I used the compatible OpenAI Python package to interact with the local model. Ollama is running on localhost, the default port is 11434.
Control + C will stop the service.

The rm (Remove) command will remove the model from our machine:

ollama rm

Ollama Cloud:
Back on the Ollama home page, you can create an account and run models from the Ollama cloud. The account is free, and there is a free tier of usage.
From the settings page, you can create an API key to use with your program. My Python program uses the OpenAI package for Ollama cloud, as well, since it’s compatible.
With the cloud, you obviously lose the ability to run a model locally, but you can access much more powerful models.

Links:

Medium- Arun Patidar: Running LLM Locally
Tech Insider: How to Run LLMs Locally with Ollama in 11 Steps
Machine Learning + : Ollama Tutorial: Your Guide to running LLMs Locally
Dev.to: The Complete Guide to Ollama