This article is a guide to run Large Language Models using Ollama on H100 GPUs offered by DigitalOcean. DigitalOcean GPU Droplets provide a powerful, scalable solution for AI/ML training, inference, and other compute-intensive tasks such as deep learning, high-performance computing (HPC), data analytics, and graphics rendering. These GPUs are designed to handle demanding workloads, GPU Droplets enable businesses to efficiently scale AI/ML operations on-demand, without the need for managing unnecessary costs. Offering simplicity, flexibility, and affordability, DigitalOcean’s GPU Droplets ensure quick deployment and ease of use, making them ideal for developers and data scientists.
Now, with support for NVIDIA H100 GPUs, users can accelerate AI/ML development, test, deploy, and optimize their applications seamlessly—without the need for extensive setup or maintenance typically associated with traditional platforms. Ollama is an open source tool which provides access to a diverse library of pre-trained models, offers effortless installation and setup across different operating systems, and exposes a local API for seamless integration into applications and workflows. Users can customize and fine-tune LLMs, optimize performance with hardware acceleration, and benefit from interactive user interfaces for intuitive interactions.
Access to H100 GPUs: Ensure you have access to NVIDIA H100 GPUs, either through on-premise hardware or using GPU Droplets by DigitalOcean. DigitalOcean GPU DropletsDigitalOcean GPU Drople
Supported Frameworks: Familiarity with Python and Linux Commands.
CUDA and cuDNN Installed: Ensure NVIDIA CUDA and cuDNN libraries are installed for optimal GPU performance.
Sufficient Storage and Memory: Have ample storage and memory available to handle large model datasets and weights.
Basic Understanding of LLMs: A foundational understanding of large language models and their structure to effectively manage and optimize them.
These prerequisites help ensure a smooth and efficient experience when running LLMs with Ollama on H100 GPUs.
Ollama offers a way to download a large language model from its vast language model library which consists of Llama3.1, Mistral, Code Llama, Gemma and much more. Ollama combines model weights, configuration, and data into one package, specified by a Modelfile. Ollama provides a flexible platform for creating, importing, and using custom or pre-existing language models, ideal for creating chatbots, text summarization, and much more. It emphasizes privacy, integrates seamlessly with windows, macOS and Linux, and is free to use. Ollama also allows users to deploy models locally with ease. Further, the platform also supports real-time interactions via a REST API. It’s perfect for LLM-powered web apps and tools. It’s very similar to how Docker works. With Docker, we can grab different images from a central hub and run them in containers. Furthermore, Ollama allows us to customize the models by creating a Modelfile. Below is the code to create Modelfile:
From llama2
# Set the temperature PARAMETER temperature 1
# Set the system Prompt
SYSTEM """ You are a helpful teaching assistant created by DO.
Answer questions asked based on Artificial Intelligence, Deep Learning. """
Next, run the custom model,
Ollama create MLexp \-f ./Modelfile
Ollama run MLexp
DigitalOcean GPU Droplets offer a simple, flexible, and cost-effective solution for your AI/ML workloads. These scalable machines are ideal for reliably running training and inference tasks on AI/ML models. Additionally, DigitalOcean GPU Droplets are well-suited for high-performance computing (HPC) tasks, making them a versatile choice for a range of use cases including simulation, data analysis, and scientific computing. Try the GPU Droplets now by signing up for a DigitalOcean account. Watch the video to learn the steps to create a GPU Droplets.
To run Ollama efficiently a GPU from NVIDIA is required to run things hassle free. As with CPU users can expect a slow response.
Ollama is very well compatible with Windows, macOS, or Linux. Here we are using Linux code as our GPU Droplets are based on Linux OS.
Run the code below in your terminal to check the GPU specification.
nvidia-smi
Next, we will try to install Ollama first using the same terminal.
curl \-fsSL https://ollama.com/install.sh | sh
This will instantly start the Ollama installation.
Once the installation is done we can pull any LLM and start working with the model such as Llama 3.1, Phi3, Mistral, Gemma 2 or any other model.
To run and chat with models, we will run the below code. Please feel free to change the model as per your requirements. Running the model with Ollama is quite straightforward and here we are using the powerful H100, the process to generate a response becomes fast and efficient.
ollama run example\_model
ollama run qwen2:7b
In case of the error "could not connect to ollama app, is it running? Please use the below code to connect to Ollama
sudo systemctl enable ollama
sudo systemctl start ollama
Ollama supports a wide list of models, here are some example models that can be downloaded and used.
Model | Parameters | Size | Download |
---|---|---|---|
Llama 3.1 | 8B | 4.7GB | Ollama run llama3.1 |
Llama 3.1 | 70B | 40GB | Ollama run llama3.1Ollama run llama3.1:70b |
Llama 3.1 | 405B | 231GB | Ollama run llama3.1:405b |
Phi 3 Mini | 3.8B | 2.3GB | Ollama run phi3 |
Phi 3 Medium | 14B | 7.9GB | Ollama run phi3:medium |
Gemma 2 | 27B | 16GB | Ollama run gemma2:27b |
Mistral | 7B | 4.1GB | Ollama run mistral |
Code Llama | 7B | 3.8GB | Ollama run codellama |
With Ollama users can run the LLMs conveniently without even the need for internet connection as the model and its dependencies get stored locally.
>>> Write a python code for a fibonacci series.
def fibonacci(n):
"""
This function prints the first n numbers of the Fibonacci sequence.
Parameters:
@param n (int): The number of elements in the Fibonacci sequence to print.
Returns:
None
"""
# Initialize the first two numbers of the Fibonacci sequence.
a, b = 0, 1
# Iterate over the range and generate Fibonacci sequence.
for i in range(n):
print(a)
# Update the next number in the sequence
a, b = b, a + b
# Test function with first 10 numbers of the Fibonacci sequence.
if __name__ == "__main__":
fibonacci(10)
This python code defines a simple `fibonacci` function that takes an integer argument and prints the first n numbers in the Fibonacci sequence. The Fibonacci sequence starts with 0 and 1, and each subsequent number is the sum of the previous two.
The if __name__ == "__main__":
block at the end tests this function by calling it with a parameter value of 10, which prints out the first 10 numbers in the Fibonacci sequence.
Ollama is a new Gen-AI tool for working with large language models locally, offering enhanced privacy, customization, and offline accessibility. Ollama has led working with LLM simpler and to explore and experiment with open-source LLMs directly on their machines, Ollama promotes innovation and deeper understanding of AI. To access a powerful GPU like H100, consider using DigitalOcean’s GPU Droplets. DigitalOcean’s GPU Droplets are currently in Early Availability.
For getting started with Python, we recommend checking out this beginner’s guide to set up your system and prepare for running introductory tutorials.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Sign up for Infrastructure as a Newsletter.
Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.