Deploying NVIDIA NIM Microservice in a Private Environment

This guide explains how to securely and efficiently deploy LLM models like llama-3.2-3b-instruct in a private environment using NVIDIA NIM Microservice on PAASUP DIP. It supports both Python code integration and no-code workflows with Flowise, ideal for diverse enterprise AI applications.

Introduction

With the rapid expansion of AI model usage, safely and efficiently deploying and operating AI models in enterprise environments has become a critical challenge. NVIDIA NIM (NVIDIA Inference Microservice) is a container-based microservice designed to meet these requirements.

In this post, we'll introduce step-by-step process of building and running the llama-3.2-3b-instruct model in a private environment using the NVIDIA NIM catalog of PAASUP DIP (Data Intelligence Platform).

Table of Contents

  1. What is NVIDIA NIM?
  2. Deployment Process
  3. Key Advantages
  4. Conclusion

What is NVIDIA NIM?

NVIDIA NIM is a container-based microservice proxy optimized for NVIDIA GPUs, designed to simplify the AI inference process and maximize performance. It standardizes complex AI model deployment processes, helping developers build AI services more easily.

Deployment Process

Step 1: Copy NIM Microservice Container Image Pull Path

First, you need to download the container image for the required model from the NVIDIA NGC catalog.

Access NGC Catalog and Select Model

  1. Visit https://catalog.ngc.nvidia.com
  2. Click on Containers menu
  3. Check the NVIDIA NIM filter to display only NIM-related containers
    스크린샷 2025-08-21 160059.png
  4. Search and select the desired model (here llama-3.2-3b-instruct)
    스크린샷 2025-08-21 162553.png
  5. Navigate to the Tags tab in the model card and copy the container image pull path
    스크린샷 2025-08-21 155947.png

Step 2: Create NIM Catalog in PAASUP DIP

Configure a catalog to create NIM service in the DIP environment.

Basic Configuration

  • Catalog Version: Enter desired version
  • Catalog Name: Enter an identifiable name

Service Model Configuration

CASE 1: When the catalog deployment environment is connected to the internet
Configure with the container image pull path of the model to be served copied from Step 1:

  • Model repository: nvcr.io/nim/meta/llama-3.2-3b-instruct (Pull path excluding version)
  • Model repository tag: 1.8.3 (Version information of the model from pull path)
  • NGC API KEY: Enter NVIDIA API key
  • Model name: meta/llama-3.2-3b-instruct (Model name from pull path)

CASE 2: When the catalog deployment environment is not connected to the internet
Download the container image in an internet-connected environment, then import it into the DIP environment before deploying the catalog:

  • Model repository: nvcr.io/nim/meta/llama-3.2-3b-instruct
  • Model repository tag: 1.8.3
  • NGC API KEY: Enter any key as this is unused
  • Model name: meta/llama-3.2-3b-instruct

Click the Create button to create the catalog after completing the configuration.
스크린샷 2025-08-21 164405.png

Step 3: Call NIM Microservice from Jupyter Lab

Now you can call and use the created NIM service with Python code.

Access Jupyter Lab

  1. Click Kubeflow link in PAASUP DIP portal
  2. Select Notebooks in Kubeflow
  3. Access Jupyter Labs
    스크린샷 2025-08-21 171410.png

Write and Execute Python Code

When writing Python code, enter the NIM catalog's domain information in base_url, and api_key is not used in private environments but is a required input parameter for OpenAI functions.

import httpx
from openai import OpenAI

# --- Client Setup ---
# Configure NIM microservice endpoint
client = OpenAI(
    base_url = "http://demo01-nimtest01.demo01-nimtest01.svc.cluster.local:8000/v1",
    api_key = "[YOUR-API-KEY]"  # Not actually used in private environment
)

# --- Conversation History ---
messages = [
    {"role": "system", "content": "You are a helpful assistant."}
]

print("Chatbot initialized. Type 'exit' to end the conversation.")
print("-" * 20)

# --- Main Chat Loop ---
while True:
    try:
        # 1. Get user input
        user_input = input("You: ")

        # 2. Check for exit command
        if user_input.lower() in ["exit", "quit"]:
            print("Exiting chat. Goodbye!")
            break

        # 3. Add user message to conversation history
        messages.append({"role": "user", "content": user_input})

        # 4. Send entire conversation history to model
        completion = client.chat.completions.create(
            model="meta/llama-3.2-3b-instruct",  
            messages=messages,
            temperature=0.5,
            top_p=1,
            max_tokens=1024,
            stream=True
        )

        # 5. Stream model response output
        print("Assistant: ", end="")
        assistant_response = ""
        for chunk in completion:
            if chunk.choices[0].delta.content is not None:
                content = chunk.choices[0].delta.content
                print(content, end="", flush=True)
                assistant_response += content
        
        # 6. Add model response to history
        messages.append({"role": "assistant", "content": assistant_response})
        
        print()  # Line break for clean formatting

    except KeyboardInterrupt:
        print("\nExiting chat. Goodbye!")
        break
    except Exception as e:
        print(f"\nAn error occurred: {e}")
        break

Execute the above code to start chatting:
스크린샷 2025-08-26 175427.png

Step 4: Call NIM Microservice from Flowise

To visually configure AI flows without writing code, you can utilize Flowise.

What is Flowise?

Flowise is a no-code based tool that helps quickly build chatbots and AI agents by visually configuring LangChain flows.
Flowise is provided as a catalog in PAASUP DIP.

Flowise Configuration Process

  1. Click Flowise link in DIP Catalog

  2. Select Chatflows > + Add New

  3. Configure ChatLocalAI Component:

    • Base Path: Enter NIM catalog's domain information
    • Model Name: Enter model name to use
  4. Add Buffer Memory Component and Conversation Chain Component

  5. Connect each component to Conversation Chain
    스크린샷 2025-08-21 173427.png

  6. Save and click chat icon to start chatting
    스크린샷 2025-08-26 175255.png

Key Advantages

1. Security in Private Environment

  • Operates completely isolated within internal environment without depending on external APIs
  • No risk of sensitive data leakage to external sources

2. Performance Optimization

  • Provides high inference performance with optimizations specialized for NVIDIA GPUs
  • Ensures consistent performance and stability through container-based approach

3. Development Convenience

  • Enables reuse of existing code with OpenAI API-compatible interface
  • Rapid prototyping through no-code tools like Flowise

4. Scalability

  • Supports automatic scaling in Kubernetes environment
  • Flexible system configuration with microservice architecture

Conclusion

Through the process of building NVIDIA NIM Microservice in the PAASUP DIP environment, we confirmed that the latest AI models can be safely and efficiently utilized even in private environments.

In particular, by presenting two approaches - direct API calls through Python code and no-code utilization through Flowise - we provide flexibility to meet various user requirements.

As AI technology continues to advance and security requirements strengthen, such private AI deployment methods are expected to become essential technologies in enterprise environments. We encourage you to build AI services that satisfy both security and performance using optimized solutions like NVIDIA NIM.

Subscribe to PAASUP IDEAS

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe