Logo

Hierarchy Data Guide

Hierarchy Data Guide

Overview

The Haya Routing Service uses a level-based hierarchy system to organize routing sources. The hierarchy is flexible and supports any number of levels, with level determining fields and behavior, while type is just metadata.

Note: This guide explains the hierarchy structure. For instructions on updating hierarchy data, see the Configuration Guide.


Core Concepts

Level-Based Architecture

Key Principle: Level determines fields and behavior, type is just metadata.

  • Level 0 (Root): Services - Top-level routing sources
  • Level 1 (Intermediate): Categories - Groupings within services
  • Level 2 (Leaf): Collections - Final routing destinations

The system is not limited to 3 levels - it can be extended to support more levels as needed.

Level Configuration

Each level has a configuration that defines:

  • Fields: What fields are available at this level
  • Parent Levels: Which levels can be parents (null = root level)
  • Embedding Method: How embeddings are generated
  • Matching Strategy: How semantic matching works
// Level 0 (Service/Root)
{
  level: 0,
  type: 'Service',
  fields: ['service_type', 'service_context'],
  parentLevels: null,  // Root - no parent
  embeddingMethod: 'service',
  matchingStrategy: 'service-level'
}

// Level 1 (Category)
{
  level: 1,
  type: 'Category',
  fields: ['category_context'],
  parentLevels: [0],  // Must have level 0 parent
  embeddingMethod: 'category',
  matchingStrategy: 'category-level'
}

// Level 2 (Collection)
{
  level: 2,
  type: 'Collection',
  fields: ['collection_context', 'routing_info'],
  parentLevels: [0, 1],  // Can have level 0 or 1 parent
  embeddingMethod: 'collection',
  matchingStrategy: 'collection-level'
}

Hierarchy Node Structure

Base Hierarchy Node

All nodes share these common fields:

{
  id: string;                    // Unified identifier
  type: 'Service' | 'Category' | 'Collection';
  level: number;                 // 0, 1, or 2 (or more)
  tenant_id: string;            // Multi-tenant isolation
  app_id: string;               // Multi-app isolation
  parent_id: string | null;     // Parent node ID (null for root)
  name: string;                  // Node name
  description: string;           // Node description
  status: 'active' | 'inactive' | 'maintenance';
  embeddings: HierarchyNodeEmbeddings;
  created_at: Date;
  updated_at: Date;
  created_by?: string;
  metadata?: Record<string, any>;
  
  // Role-based access control
  allowed_roles?: string[];
  denied_roles?: string[];
  role_conditions?: RoleCondition[];
  
  // Dynamic configuration
  final_route_source?: boolean;  // Can be used directly (no need to traverse deeper)
  embedding_config?: NodeEmbeddingConfig;
  query_synonyms?: Record<string, string>;  // Level-based synonyms
}

Level 0 (Service/Root) Fields

{
  service_type: 'SEARCH' | 'SQL' | 'REST' | 'MCP';
  service_context: {
    detailed_description?: string;
    use_cases?: UseCase[];
    capabilities?: Capabilities;
    domain_context?: DomainContext;
    query_patterns?: QueryPatterns;
    routing_keywords?: {
      boost_keywords?: string[];
      penalty_keywords?: string[];
      boost_value?: number;      // Default: 0.3
      penalty_value?: number;    // Default: -0.2
    };
    query_intent_boosts?: {
      documentation?: number;
      data_query?: number;
      api_call?: number;
      mcp_config?: number;
    };
    // ... other context fields
  };
}

Level 1 (Category) Fields

{
  category_context: {
    purpose: string;
    description: string;
    typical_queries?: string[];
    keywords?: string[];  // Simple keyword array
  };
}

Level 2 (Collection) Fields

{
  collection_context: {
    purpose: string;
    description: string;
    document_types?: string[];
    content_characteristics?: ContentCharacteristics;
  };
  routing_info: {
    connection_type: 'vector_db' | 'sql' | 'rest' | 'mcp';
    connection_details: Record<string, any>;
    filters?: Record<string, any>;
  };
}

Hierarchy Data Flow

1. Registration

Nodes are registered via the hierarchy registry service:

// Register a root node (Service)
await hierarchyRegistry.registerNode({
  id: 'document-search-service',
  type: 'Service',
  level: 0,
  tenant_id: 'tenant-001',
  app_id: 'app-001',
  parent_id: null,
  name: 'Document Search Service',
  description: '...',
  service_type: 'SEARCH',
  service_context: { ... }
});

// Register a category (Level 1)
await hierarchyRegistry.registerNode({
  id: 'troubleshooting-category',
  type: 'Category',
  level: 1,
  tenant_id: 'tenant-001',
  app_id: 'app-001',
  parent_id: 'document-search-service',
  name: 'Troubleshooting Guides',
  category_context: { ... }
});

// Register a collection (Level 2)
await hierarchyRegistry.registerNode({
  id: 'network-troubleshooting',
  type: 'Collection',
  level: 2,
  tenant_id: 'tenant-001',
  app_id: 'app-001',
  parent_id: 'troubleshooting-category',
  name: 'Network Troubleshooting',
  collection_context: { ... },
  routing_info: { ... }
});

2. Storage

Nodes are stored in Qdrant (vector database):

  • Collection: hierarchy_nodes
  • Vector: Combined embedding of all node fields
  • Payload: All node metadata (JSON strings for complex fields)
  • Point ID: UUID generated from node ID

3. Embedding Generation

Embeddings are generated based on:

  • Level: Determines which fields to include
  • Embedding Config: Weights for different fields
  • Ignore Fields: Fields to exclude from embedding
// Level 0 embedding includes:
// - name (weight: 0.02)
// - description (weight: 0.10)
// - service_context.detailed_description (weight: 0.06)
// - service_context.use_cases (weight: 0.28)
// - service_context.capabilities (weight: 0.18)
// - service_context.domain_context (weight: 0.20)
// - service_context.query_patterns (weight: varies)
// - service_type keywords (weight: 0.15)

// Level 1 embedding includes:
// - name (weight: 0.02)
// - description (weight: 0.08)
// - category_context.purpose (weight: 0.10)
// - category_context.typical_queries (weight: 0.25)
// - category_context.keywords (weight: 0.22)

// Level 2 embedding includes:
// - name (weight: 0.02)
// - description (weight: 0.15)
// - collection_context.purpose (weight: 0.12)
// - collection_context.document_types (weight: 0.18)
// - collection_context.content_characteristics (weight: 0.20)

4. Retrieval

Nodes are retrieved from Qdrant and parsed:

// Get node by ID
const node = await hierarchyRegistry.getNode(nodeId, tenantId, appId);

// Get children of a node
const children = await hierarchyRegistry.getChildren(parentId, tenantId, appId);

// Get all nodes at a level
const services = await hierarchyRegistry.getAllNodes('Service', tenantId, appId);

Hierarchy Traversal

Routing Flow

When a query comes in, the system:

  1. Matches Root Nodes (Level 0)

    • Semantic matching against all root nodes
    • Applies keyword boosts/penalties
    • Applies query intent boosts
    • Selects best matching root node
  2. Traverses to Children (Level 1)

    • Gets children of selected root node
    • Semantic matching against categories
    • Selects best matching category
  3. Traverses to Collections (Level 2)

    • Gets children of selected category
    • Semantic matching against collections
    • Selects best matching collection
    • Returns routing_info for final route

Example Traversal

Query: "How do I troubleshoot network issues?"

1. Root Matching:
   - document-search-service (similarity: 0.75) ✅
   - sql-database-service (similarity: 0.30)
   - rest-api-service (similarity: 0.25)
   → Select: document-search-service

2. Category Matching (children of document-search-service):
   - troubleshooting-category (similarity: 0.85) ✅
   - policy-documents-category (similarity: 0.40)
   → Select: troubleshooting-category

3. Collection Matching (children of troubleshooting-category):
   - network-troubleshooting (similarity: 0.90) ✅
   - login-troubleshooting (similarity: 0.60)
   → Select: network-troubleshooting

4. Return Route:
   {
     source_id: "network-troubleshooting",
     source_type: "vector_db",
     routing_info: { ... }
   }

Multi-Tenant & Multi-App Isolation

Tenant/App Isolation

Every node must have:

  • tenant_id: Tenant identifier
  • app_id: Application identifier

All queries are scoped to tenant/app:

  • Nodes are stored with tenant/app in payload
  • Retrieval filters by tenant/app
  • Cache keys include tenant/app

Example

// Node for tenant-001, app-001
{
  id: 'service-1',
  tenant_id: 'tenant-001',
  app_id: 'app-001',
  ...
}

// Same ID, different tenant/app
{
  id: 'service-1',
  tenant_id: 'tenant-002',  // Different tenant
  app_id: 'app-001',
  ...
}

These are separate nodes - no data leakage between tenants/apps.


Role-Based Access Control

Access Control Fields

{
  allowed_roles?: string[];      // Roles that CAN access
  denied_roles?: string[];       // Roles EXPLICITLY denied (highest priority)
  role_conditions?: RoleCondition[];  // Complex logic (AND/OR)
}

Access Evaluation

  1. Denied Roles Check (highest priority)

    • If user has any role in denied_rolesDENY
  2. Allowed Roles Check

    • If allowed_roles exists and user has at least one role → ALLOW
    • If allowed_roles is empty/undefined → DENY
  3. Role Conditions Check (if present)

    • Evaluate AND/OR conditions
    • Must match at least one condition → ALLOW

Example

{
  id: 'policy-documents',
  allowed_roles: ['admin', 'it-support'],
  denied_roles: ['user'],
  // User with 'user' role → DENIED (denied_roles)
  // User with 'admin' role → ALLOWED (allowed_roles)
  // User with 'developer' role → DENIED (not in allowed_roles)
}

Embedding Configuration

Node Embedding Config

{
  embedding_config: {
    weight_strategy?: 'context_heavy' | 'balanced' | 'description_heavy';
    ignore_fields?: string[];  // Default: ['none'] = include all
    // ... other weight overrides
  }
}

Ignore Fields

Controls which fields are excluded from embedding generation:

  • ['none']: Include all fields (default)
  • ['all']: Exclude all context fields (only name/description)
  • ['service_context.detailed_description']: Exclude specific field
  • ['service_context.use_cases', 'service_context.capabilities']: Exclude multiple fields

Weight Strategy

Predefined weight configurations:

  • context_heavy: Emphasizes context fields
  • balanced: Equal weight distribution
  • description_heavy: Emphasizes description

Source Texts Collection

Purpose

Stores the original text strings used for embedding generation, along with metadata about which fields were included/excluded.

Structure

{
  node_id: string;
  source_texts: string[];           // Text strings used for embedding
  source_weights: number[];         // Weights for each text
  text_sources: Array<{
    index: number;
    source: string;                 // Actual text
    field_path: string;             // e.g., "service_context.detailed_description"
    weight: number;
    included: boolean;              // Whether field was included
    ignore_reason: string | null;   // Why field was ignored (if applicable)
    original_value?: string;
  }>;
  embedding_config: {
    ignore_fields?: string[];
  };
  is_edited: boolean;               // Whether texts were manually edited
  last_embedded_at: string;
  embedding_version: number;
}

Use Cases

  1. Frontend Editing: Show what was embedded
  2. Transparency: Understand why a node matched
  3. Debugging: See which fields contributed to matching
  4. Re-indexing: Regenerate embeddings with different ignore_fields

Best Practices

1. Level Design

  • Level 0: Broad service categories (SEARCH, SQL, REST, MCP)
  • Level 1: Logical groupings within services
  • Level 2: Specific routing destinations

2. Naming

  • Use descriptive, unique IDs
  • Include tenant/app in all operations
  • Follow consistent naming conventions

3. Embeddings

  • Include relevant context in descriptions
  • Use ignore_fields to exclude noise
  • Test embedding quality with sample queries

4. Access Control

  • Set allowed_roles explicitly
  • Use denied_roles for explicit exclusions
  • Test with different role combinations

5. Keywords & Synonyms

  • Add level-appropriate keywords
  • Use synonyms to improve matching
  • Test keyword boosts/penalties

Common Patterns

Pattern 1: Simple Service → Collection

Level 0: Service (final_route_source: true)
  └─ Level 2: Collection (direct child)

Pattern 2: Service → Category → Collection

Level 0: Service
  └─ Level 1: Category
      └─ Level 2: Collection

Pattern 3: Multiple Categories

Level 0: Service
  ├─ Level 1: Category A
  │   └─ Level 2: Collection A1
  │   └─ Level 2: Collection A2
  └─ Level 1: Category B
      └─ Level 2: Collection B1

Troubleshooting

Node Not Found

  • Check tenant_id and app_id match
  • Verify node exists in Qdrant
  • Check node status is 'active'

Wrong Matching

  • Review embeddings (check source texts)
  • Verify keywords are correct
  • Check query intent boosts
  • Review synonym mappings

Access Denied

  • Check user roles match allowed_roles
  • Verify user doesn't have denied_roles
  • Review role_conditions logic

Summary

The hierarchy system is:

  • Level-based: Level determines fields and behavior
  • Flexible: Supports any number of levels
  • Multi-tenant: Isolated by tenant/app
  • Role-aware: Access control at every level
  • Embedding-driven: Semantic matching at each level
  • Configurable: Weights, keywords, synonyms, query intent all configurable

© 2025 All rights reservedBuilt with DataHub Cloud

Built with LogoDataHub Cloud