Skip to content

Custom Rest Backend

This tutorial guides you through the process of integrating a custom backend with k8sgpt using RESTful API. This setup is particularly useful when you want to integrate Retrieval-Augmented Generation (RAG) or an AI Agent with k8sgpt. In this tutorial, we will store a CNCF Q&A dataset for knowledge retrieval and create a simple Retrieval-Augmented Generation (RAG) application and integrate it with k8sgpt.

Prerequisites

  • K8sGPT CLI
  • Golang go1.22 or higher
  • langchaingo library for building RAG applications
  • gin for handling RESTful APIs in Go
  • Qdrant vector database for storing and searching through knowledge bases
  • Ollama service to run large language models

Writing a simple RAG backend

Setup

Let's create a new simple golang project.

mkdir -p custom-backend
cd custom-backend
go mod init github.com/<username>/custom-backend

Install necessary dependencies for the RAG application and RESTful API:

go get -u github.com/tmc/langchaingo
go get -u github.com/gin-gonic/gin

Once we have this structure let's create a simple main.go file with the following content:

// main.go
package main

import (
    "context"
    "fmt"
    "net/http"
    "net/url"
    "strings"
    "time"

    "github.com/gin-gonic/gin"
    "github.com/tmc/langchaingo/embeddings"
    "github.com/tmc/langchaingo/llms"
    "github.com/tmc/langchaingo/llms/ollama"
    "github.com/tmc/langchaingo/vectorstores"
    "github.com/tmc/langchaingo/vectorstores/qdrant"
)

var (
    ollama_url = "http://localhost:11434"
    listenAddr = ":8090"
)

func main() {
    server := gin.Default()
    server.POST("/completion", func(c *gin.Context) {
        var req K8sRagRequest
        if err := c.ShouldBindJSON(&req); err != nil {
            c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
            return
        }
        content, err := rag(ollama_url, req)
        if err != nil {
            c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
            return
        }
        resp := K8sRagResponse{
            Model:     req.Model,
            CreatedAt: time.Now(),
            Response:  content,
        }
        c.JSON(http.StatusOK, resp)
    })
    // start backend server
    err := server.Run(listenAddr)
    if err != nil {
        fmt.Println("Error: %w", err)
    }
}

This basic implementation sets up a RESTful API endpoint /completion that receives a CustomRestRequest from k8sgpt and return CustomRestResponse. The rag function handles the RAG logic. The structure of request and response is as follows:

type CustomRestRequest struct {
    Model string `json:"model"`

    // Prompt is the textual prompt to send to the model.
    Prompt string `json:"prompt"`

    // Options lists model-specific options. For example, temperature can be
    // set through this field, if the model supports it.
    Options map[string]interface{} `json:"options"`
}

type CustomRestResponse struct {
    // Model is the model name that generated the response.
    Model string `json:"model"`

    // CreatedAt is the timestamp of the response.
    CreatedAt time.Time `json:"created_at"`

    // Response is the textual response itself.
    Response string `json:"response"`
}

Implementing a simple RAG

Now, we will build the RAG pipeline using langchaingo. The RAG application will query a knowledge base stored in Qdrant and use a large language model from ollama to generate responses. First, ensure that you have ollama and Qdrant running locally.

# run Ollama
ollama run llama3.1

# run Qdrant
docker run -p 6333:6333 -p 6334:6334 \
    -v $(pwd)/qdrant_storage:/qdrant/storage:z \
    qdrant/qdrant

We can download the CNCF Q&A dataset from huggingface, and then load it into Qdrant using Python scribt below.

from langchain.embeddings import OllamaEmbeddings
from langchain_community.document_loaders import CSVLoader
from langchain_qdrant import QdrantVectorStore

embeddings = OllamaEmbeddings(base_url="http://localhost:11434", model="llama3.1")
loader = CSVLoader(file_path='./cncf_qa.csv', csv_args={
    'delimiter': ',',
    'quotechar': '"',
    'fieldnames': ['Question', 'Answer', 'Project', 'Filename', 'Subcategory', 'Category']
})
data = loader.load()
qdrant = QdrantVectorStore.from_documents(
    data,
    embeddings,
    url="localhost:6333",
    prefer_grpc=False,
    collection_name="my_documents",
)

data = loader.load()

Next, implement the RAG pipeline logic.

func rag(serverURL string, req CustomRestRequest) (string, error) {
    model := req.Model
    llm, err := ollama.New(ollama.WithServerURL(serverURL), ollama.WithModel(model))
    if err != nil {
        return "", err
    }

    embedder, err := embeddings.NewEmbedder(llm)
    if err != nil {
        return "", err
    }

    url, err := url.Parse("http://localhost:6333")
    if err != nil {
        return "", err
    }

    // new a client of vector store
    store, err := qdrant.New(
        qdrant.WithURL(*url),
        qdrant.WithCollectionName("my_documents"),
        qdrant.WithEmbedder(embedder),
        qdrant.WithContentKey("page_content"),
    )
    if err != nil {
        return "Wi", err
    }

    optionsVector := []vectorstores.Option{
        vectorstores.WithScoreThreshold(0.6),
    }

    retriever := vectorstores.ToRetriever(store, 10, optionsVector...)
    errMessage := req.Options["message"].(string)
    // search local knowledge
    resDocs, err := retriever.GetRelevantDocuments(context.Background(), errMessage)
    if err != nil {
        return "", err
    }

    // get content
    x := make([]string, len(resDocs))
    for i, doc := range resDocs {
        x[i] = doc.PageContent
    }

    // generate content by LLM
    ragPromptTemplate := `Base on context: %s;
    Please generate a response to the following query and response doen't include context, if context is empty, generate a response using the model's knowledge and capabilities: \n %s`
    prompt := fmt.Sprintf(ragPromptTemplate, strings.Join(x, "; "), req.Prompt)
    ctx := context.Background()
    completion, err := llms.GenerateFromSinglePrompt(ctx, llm, prompt)
    if err != nil {
        return "", err
    }
    fmt.Println("Error: "+errMessage, "Answer: "+completion)
    return completion, err
}

Testing it out

To test this with K8sGPT we need to add a customrest AI backend configuration to point to this RAG service. We can do this by running the following command:

./k8sgpt auth add --backend customrest --baseurl http://localhost:8090/completion --model llama3.1

This will add the custom RAG service to the list of available backend in the K8sGPT CLI. To explain the analysis results using the custom RAG pipeline we can run the following command:

./k8sgpt analyze --backend customrest --explain 

What's next?

Now you've got the basics of how to write a custom AI backend, you can extend this to use private dataset for knowledge retrieval. You can also build more complex AI pipelines to explain the result obtained from Analyzers and provide more detailed recommendations.