Search

The AI search feature is integrated with the documentation portal. It helps find answers not only by exact keyword matches, but also by meaning — taking context and phrasing similarity into account.

Requirements

OpenAI API-compatible inference server:

  • Cloud services: OpenAI, DeepSeek.

  • Local or open-source models: LM Studio, Llama.cpp, vLLM, or another server that supports the OpenAI API.

Two models are used for search:

  1. Embedding model — converts articles into vector representations (embeddings), which are stored in a vector database.

  2. Chat model — generates the final relevant answer for the user based on found articles.

How it works

  1. Catalogs in the documentation portal are split into chunks. These chunks are converted into embeddings and stored in a vector database (Qdrant).

  2. A user enters a query in the Gramax reading portal.

  3. Gramax sends the query to the docportal-ai server.

  4. The docportal-ai AI server processes the query and searches for matches in the vector DB.

  5. Based on found matches, the chat model generates a coherent and clear answer.

  6. The answer is returned to the Gramax documentation portal interface, where the user reads it.

The docportal-ai server is a middleware layer connecting the documentation portal, vector DB, and models. It can be scaled and used for multiple portals (see the AI_INSTANCE_NAME parameter).

Setup

  1. Deploy Docker Compose with the search server as described here — https://hub.docker.com/r/gramax/docportal-ai.

    Example docker-compose.yml:

    version: "3.8" services: db: image: qdrant/qdrant environment: - QDRANT__SERVICE__MAX_REQUEST_SIZE_MB=512 ports: - 3006:6333 volumes: - server-ai-qdrant-storage:/qdrant/storage docportal-ai: image: gramax/docportal-ai environment: - VECTORDB__TYPE=qdrant - VECTORDB__HOST=http://db:6333 - EMBEDDING__TYPE=openai - EMBEDDING__MODEL=text-embedding-3-small - EMBEDDING__DIMENSIONS=1536 - EMBEDDING__APIKEY=<your API key> - CHAT__TYPE=openai - CHAT__MODEL=gpt-4o - CHAT__APIKEY=<your API key> - AUTH__ADMIN__TOKEN=<token for Gramax> ports: - 3005:3005 volumes: - server-ai-api-storage:/app/storage - ./config.yaml:/app/config/config.yaml depends_on: - db volumes: server-ai-qdrant-storage: external: false server-ai-api-storage: external: false
  2. In docker-compose.yaml of the documentation portal, add environment variables to connect to the AI service:

    • AI_SERVER_URL — URL of the docportal-ai service. Do not include a trailing slash (/).
      Example: https://ai.docportal.local.

    • AI_TOKEN — authorization token set when launching docportal-ai.

    • AI_INSTANCE_NAME — unique identifier of your portal. One docportal-ai instance can serve multiple portals.

  3. Restart docker-compose.yaml.

Streaming Search Responses

For streaming to work correctly, you need to properly configure a proxy that will proxy requests to the LLM service.

Example for Angie

upstream vm-ai-gramax { zone http:vm-ai-gramax 1m; server <ip>:<port>; } server { listen <ip>:80; server_name ai-server.gramax www.ai-server.gramax; return 301 https://$server_name$request_uri; } server { listen <ip>:443 ssl; http2 on; server_name ai-server.gramax www.ai-server.gramax; status_zone https:ai-server.gramax; include /etc/angie/http.d/common/ssl-gram-ax.conf; include /etc/angie/http.d/common/noindex_robots.conf; include /etc/angie/http.d/common/error_pages.conf; location / { include /etc/angie/http.d/common/reverse-proxy.conf; client_max_body_size 500m; proxy_http_version 1.1; proxy_pass http://vm-ai-gramax; proxy_buffering off; proxy_cache off; proxy_read_timeout 3600s; proxy_send_timeout 3600s; tcp_nodelay on; } }