Hierarchy Data Guide
Hierarchy Data Guide
Overview
The Haya Routing Service uses a level-based hierarchy system to organize routing sources. The hierarchy is flexible and supports any number of levels, with level determining fields and behavior, while type is just metadata.
Note: This guide explains the hierarchy structure. For instructions on updating hierarchy data, see the Configuration Guide.
Core Concepts
Level-Based Architecture
Key Principle: Level determines fields and behavior, type is just metadata.
- Level 0 (Root): Services - Top-level routing sources
- Level 1 (Intermediate): Categories - Groupings within services
- Level 2 (Leaf): Collections - Final routing destinations
The system is not limited to 3 levels - it can be extended to support more levels as needed.
Level Configuration
Each level has a configuration that defines:
- Fields: What fields are available at this level
- Parent Levels: Which levels can be parents (null = root level)
- Embedding Method: How embeddings are generated
- Matching Strategy: How semantic matching works
// Level 0 (Service/Root)
{
level: 0,
type: 'Service',
fields: ['service_type', 'service_context'],
parentLevels: null, // Root - no parent
embeddingMethod: 'service',
matchingStrategy: 'service-level'
}
// Level 1 (Category)
{
level: 1,
type: 'Category',
fields: ['category_context'],
parentLevels: [0], // Must have level 0 parent
embeddingMethod: 'category',
matchingStrategy: 'category-level'
}
// Level 2 (Collection)
{
level: 2,
type: 'Collection',
fields: ['collection_context', 'routing_info'],
parentLevels: [0, 1], // Can have level 0 or 1 parent
embeddingMethod: 'collection',
matchingStrategy: 'collection-level'
}
Hierarchy Node Structure
Base Hierarchy Node
All nodes share these common fields:
{
id: string; // Unified identifier
type: 'Service' | 'Category' | 'Collection';
level: number; // 0, 1, or 2 (or more)
tenant_id: string; // Multi-tenant isolation
app_id: string; // Multi-app isolation
parent_id: string | null; // Parent node ID (null for root)
name: string; // Node name
description: string; // Node description
status: 'active' | 'inactive' | 'maintenance';
embeddings: HierarchyNodeEmbeddings;
created_at: Date;
updated_at: Date;
created_by?: string;
metadata?: Record<string, any>;
// Role-based access control
allowed_roles?: string[];
denied_roles?: string[];
role_conditions?: RoleCondition[];
// Dynamic configuration
final_route_source?: boolean; // Can be used directly (no need to traverse deeper)
embedding_config?: NodeEmbeddingConfig;
query_synonyms?: Record<string, string>; // Level-based synonyms
}
Level 0 (Service/Root) Fields
{
service_type: 'SEARCH' | 'SQL' | 'REST' | 'MCP';
service_context: {
detailed_description?: string;
use_cases?: UseCase[];
capabilities?: Capabilities;
domain_context?: DomainContext;
query_patterns?: QueryPatterns;
routing_keywords?: {
boost_keywords?: string[];
penalty_keywords?: string[];
boost_value?: number; // Default: 0.3
penalty_value?: number; // Default: -0.2
};
query_intent_boosts?: {
documentation?: number;
data_query?: number;
api_call?: number;
mcp_config?: number;
};
// ... other context fields
};
}
Level 1 (Category) Fields
{
category_context: {
purpose: string;
description: string;
typical_queries?: string[];
keywords?: string[]; // Simple keyword array
};
}
Level 2 (Collection) Fields
{
collection_context: {
purpose: string;
description: string;
document_types?: string[];
content_characteristics?: ContentCharacteristics;
};
routing_info: {
connection_type: 'vector_db' | 'sql' | 'rest' | 'mcp';
connection_details: Record<string, any>;
filters?: Record<string, any>;
};
}
Hierarchy Data Flow
1. Registration
Nodes are registered via the hierarchy registry service:
// Register a root node (Service)
await hierarchyRegistry.registerNode({
id: 'document-search-service',
type: 'Service',
level: 0,
tenant_id: 'tenant-001',
app_id: 'app-001',
parent_id: null,
name: 'Document Search Service',
description: '...',
service_type: 'SEARCH',
service_context: { ... }
});
// Register a category (Level 1)
await hierarchyRegistry.registerNode({
id: 'troubleshooting-category',
type: 'Category',
level: 1,
tenant_id: 'tenant-001',
app_id: 'app-001',
parent_id: 'document-search-service',
name: 'Troubleshooting Guides',
category_context: { ... }
});
// Register a collection (Level 2)
await hierarchyRegistry.registerNode({
id: 'network-troubleshooting',
type: 'Collection',
level: 2,
tenant_id: 'tenant-001',
app_id: 'app-001',
parent_id: 'troubleshooting-category',
name: 'Network Troubleshooting',
collection_context: { ... },
routing_info: { ... }
});
2. Storage
Nodes are stored in Qdrant (vector database):
- Collection:
hierarchy_nodes - Vector: Combined embedding of all node fields
- Payload: All node metadata (JSON strings for complex fields)
- Point ID: UUID generated from node ID
3. Embedding Generation
Embeddings are generated based on:
- Level: Determines which fields to include
- Embedding Config: Weights for different fields
- Ignore Fields: Fields to exclude from embedding
// Level 0 embedding includes:
// - name (weight: 0.02)
// - description (weight: 0.10)
// - service_context.detailed_description (weight: 0.06)
// - service_context.use_cases (weight: 0.28)
// - service_context.capabilities (weight: 0.18)
// - service_context.domain_context (weight: 0.20)
// - service_context.query_patterns (weight: varies)
// - service_type keywords (weight: 0.15)
// Level 1 embedding includes:
// - name (weight: 0.02)
// - description (weight: 0.08)
// - category_context.purpose (weight: 0.10)
// - category_context.typical_queries (weight: 0.25)
// - category_context.keywords (weight: 0.22)
// Level 2 embedding includes:
// - name (weight: 0.02)
// - description (weight: 0.15)
// - collection_context.purpose (weight: 0.12)
// - collection_context.document_types (weight: 0.18)
// - collection_context.content_characteristics (weight: 0.20)
4. Retrieval
Nodes are retrieved from Qdrant and parsed:
// Get node by ID
const node = await hierarchyRegistry.getNode(nodeId, tenantId, appId);
// Get children of a node
const children = await hierarchyRegistry.getChildren(parentId, tenantId, appId);
// Get all nodes at a level
const services = await hierarchyRegistry.getAllNodes('Service', tenantId, appId);
Hierarchy Traversal
Routing Flow
When a query comes in, the system:
-
Matches Root Nodes (Level 0)
- Semantic matching against all root nodes
- Applies keyword boosts/penalties
- Applies query intent boosts
- Selects best matching root node
-
Traverses to Children (Level 1)
- Gets children of selected root node
- Semantic matching against categories
- Selects best matching category
-
Traverses to Collections (Level 2)
- Gets children of selected category
- Semantic matching against collections
- Selects best matching collection
- Returns routing_info for final route
Example Traversal
Query: "How do I troubleshoot network issues?"
1. Root Matching:
- document-search-service (similarity: 0.75) ✅
- sql-database-service (similarity: 0.30)
- rest-api-service (similarity: 0.25)
→ Select: document-search-service
2. Category Matching (children of document-search-service):
- troubleshooting-category (similarity: 0.85) ✅
- policy-documents-category (similarity: 0.40)
→ Select: troubleshooting-category
3. Collection Matching (children of troubleshooting-category):
- network-troubleshooting (similarity: 0.90) ✅
- login-troubleshooting (similarity: 0.60)
→ Select: network-troubleshooting
4. Return Route:
{
source_id: "network-troubleshooting",
source_type: "vector_db",
routing_info: { ... }
}
Multi-Tenant & Multi-App Isolation
Tenant/App Isolation
Every node must have:
tenant_id: Tenant identifierapp_id: Application identifier
All queries are scoped to tenant/app:
- Nodes are stored with tenant/app in payload
- Retrieval filters by tenant/app
- Cache keys include tenant/app
Example
// Node for tenant-001, app-001
{
id: 'service-1',
tenant_id: 'tenant-001',
app_id: 'app-001',
...
}
// Same ID, different tenant/app
{
id: 'service-1',
tenant_id: 'tenant-002', // Different tenant
app_id: 'app-001',
...
}
These are separate nodes - no data leakage between tenants/apps.
Role-Based Access Control
Access Control Fields
{
allowed_roles?: string[]; // Roles that CAN access
denied_roles?: string[]; // Roles EXPLICITLY denied (highest priority)
role_conditions?: RoleCondition[]; // Complex logic (AND/OR)
}
Access Evaluation
-
Denied Roles Check (highest priority)
- If user has any role in
denied_roles→ DENY
- If user has any role in
-
Allowed Roles Check
- If
allowed_rolesexists and user has at least one role → ALLOW - If
allowed_rolesis empty/undefined → DENY
- If
-
Role Conditions Check (if present)
- Evaluate AND/OR conditions
- Must match at least one condition → ALLOW
Example
{
id: 'policy-documents',
allowed_roles: ['admin', 'it-support'],
denied_roles: ['user'],
// User with 'user' role → DENIED (denied_roles)
// User with 'admin' role → ALLOWED (allowed_roles)
// User with 'developer' role → DENIED (not in allowed_roles)
}
Embedding Configuration
Node Embedding Config
{
embedding_config: {
weight_strategy?: 'context_heavy' | 'balanced' | 'description_heavy';
ignore_fields?: string[]; // Default: ['none'] = include all
// ... other weight overrides
}
}
Ignore Fields
Controls which fields are excluded from embedding generation:
['none']: Include all fields (default)['all']: Exclude all context fields (only name/description)['service_context.detailed_description']: Exclude specific field['service_context.use_cases', 'service_context.capabilities']: Exclude multiple fields
Weight Strategy
Predefined weight configurations:
context_heavy: Emphasizes context fieldsbalanced: Equal weight distributiondescription_heavy: Emphasizes description
Source Texts Collection
Purpose
Stores the original text strings used for embedding generation, along with metadata about which fields were included/excluded.
Structure
{
node_id: string;
source_texts: string[]; // Text strings used for embedding
source_weights: number[]; // Weights for each text
text_sources: Array<{
index: number;
source: string; // Actual text
field_path: string; // e.g., "service_context.detailed_description"
weight: number;
included: boolean; // Whether field was included
ignore_reason: string | null; // Why field was ignored (if applicable)
original_value?: string;
}>;
embedding_config: {
ignore_fields?: string[];
};
is_edited: boolean; // Whether texts were manually edited
last_embedded_at: string;
embedding_version: number;
}
Use Cases
- Frontend Editing: Show what was embedded
- Transparency: Understand why a node matched
- Debugging: See which fields contributed to matching
- Re-indexing: Regenerate embeddings with different ignore_fields
Best Practices
1. Level Design
- Level 0: Broad service categories (SEARCH, SQL, REST, MCP)
- Level 1: Logical groupings within services
- Level 2: Specific routing destinations
2. Naming
- Use descriptive, unique IDs
- Include tenant/app in all operations
- Follow consistent naming conventions
3. Embeddings
- Include relevant context in descriptions
- Use
ignore_fieldsto exclude noise - Test embedding quality with sample queries
4. Access Control
- Set
allowed_rolesexplicitly - Use
denied_rolesfor explicit exclusions - Test with different role combinations
5. Keywords & Synonyms
- Add level-appropriate keywords
- Use synonyms to improve matching
- Test keyword boosts/penalties
Common Patterns
Pattern 1: Simple Service → Collection
Level 0: Service (final_route_source: true)
└─ Level 2: Collection (direct child)
Pattern 2: Service → Category → Collection
Level 0: Service
└─ Level 1: Category
└─ Level 2: Collection
Pattern 3: Multiple Categories
Level 0: Service
├─ Level 1: Category A
│ └─ Level 2: Collection A1
│ └─ Level 2: Collection A2
└─ Level 1: Category B
└─ Level 2: Collection B1
Troubleshooting
Node Not Found
- Check tenant_id and app_id match
- Verify node exists in Qdrant
- Check node status is 'active'
Wrong Matching
- Review embeddings (check source texts)
- Verify keywords are correct
- Check query intent boosts
- Review synonym mappings
Access Denied
- Check user roles match
allowed_roles - Verify user doesn't have
denied_roles - Review role_conditions logic
Summary
The hierarchy system is:
- Level-based: Level determines fields and behavior
- Flexible: Supports any number of levels
- Multi-tenant: Isolated by tenant/app
- Role-aware: Access control at every level
- Embedding-driven: Semantic matching at each level
- Configurable: Weights, keywords, synonyms, query intent all configurable