abba-360-dev

ABBA-360: An Agnostic Browser-Based Sandbox Architecture for AI Audio Generation in Networks of 360° Images

Welcome to the ABBA-360 sandbox. This system is designed as a strictly agnostic orchestration engine for AI generation of spatial audio from interconnected 360° images. The system is setup to run from GitHub Pages using zrok to connect to the server.

📂 Project Structure

abba360_v0/
├── client/                     # Frontend Environment
│   ├── index.html              
│   └── js/
│       ├── client.js           # Bootstrapper & Dependency Injection
│       ├── NavigationManager.js# Core Orchestrator
│       ├── NetworkService.js   # WebSocket client
│       ├── SpatialAudioPlayer.js
│       ├── UIManager.js
│       ├── TopologyRadar.js
│       ├── AcousticTreadmill.js
│       ├── VR/                 # WebXR & A-Frame lifecycle
│       └── strategies/         # ⬅️ IMPLEMENT CLIENT STRATEGIES HERE
│           ├── nodeselectionstrategies/
│           ├── semanticproviders/
│           ├── topologyproviders/
│           ├── viewproviders/
│           └── vrproviders/
├── server/                     # Backend Environment
│   ├── server.js               # Bootstrapper
│   ├── PipelineService.js      # Core Orchestrator
│   ├── SocketController.js     # WebSocket server
│   ├── CacheManager.js
│   ├── GPUResourceManager.js
│   ├── .env                    # ⬅️ IMPLEMENT ACTIVE STRATEGIES CONFIG
│   └── AIEngine/
│       ├── AIEngine.js         # Strategy Delegator
│       ├── pythonscripts/      # Python code go here
│       └── strategies/         # ⬅️ IMPLEMENT SERVER STRATEGIES HERE
│           ├── audio/
│           ├── context/
│           ├── imagesource/
│           └── vision/
└── docs/                       # Auto-generated Documentation

You do not need to edit the core orchestration files (like PipelineService, NavigationManager, NetworkService, SoketController etc). The entire system is built on the Strategy Pattern. As a researcher, you simply write new Strategy classes to connect your own image sources, models, APIs, or mapping SDKs, and then activate them in the .env file.
You must only implement the concrete strategies for the strategy pattern, you should not change any other file other than the .env and the TUNNEL constant at the top of the client.js file.


⚙️ How to Configure Strategies (.env)

The system uses dynamic dependency injection. It reads your .env file at boot and dynamically imports the exact JavaScript classes you request. To use a custom strategy, place your file in the appropriate directory, ensure the class name matches the filename exactly, and update your .env:

# ==========================================
# SERVER STRATEGIES (AI ENGINE)
# ==========================================
IMAGE_PROVIDER="MapillarySource"
CONTEXT_PROVIDER="GeoapifyContextProvider"
VISION_PROVIDER="LMStudioVisionProvider"
AUDIO_PROVIDER="StableAudioGradioProvider"

# ==========================================
# CLIENT STRATEGIES (MAPS/360 IMAGE NETWORK/VR 360 IMAGE SOURCE etc)
# ==========================================
CLIENT_VIEWER_PROVIDER="MapillaryViewerProvider"
CLIENT_TOPOLOGY_PROVIDER="MapillaryTopologyProvider"
CLIENT_VR_LOADER_PROVIDER="MapillaryVRLoader"
CLIENT_NODE_SELECTION_STRATEGY="AcousticHorizonStrategy"
CLIENT_SEMANTIC_PROVIDER="DefaultSemanticProvider"
CLIENT_SEMANTIC_LAYERS="spatial, horizon"

# ==========================================
# PYTHON SCRIPTS [OPTIONAL, set to "" if unused]
# ==========================================
PYTHON_VISION_SCRIPT="vision_adapter.py"
PYTHON_AUDIO_SCRIPT="audio_adapter.py"
PYTHON_EXEC = "python3"

Place your API keys in the .env file. The system is setup to pass them to the client.


🛠️ Provided Concrete Examples (Out of the Box)

To help you get started, the repository includes several fully functional, concrete implementations of the strategy interfaces. These demonstrate how to wrap real-world APIs and local models. The system is configured to run with the client hosted on GitHub pages. Change pinokioconfig.json adding your domain.

1. Mapillary & MapLibre GL (Visuals & Topology)

The system uses Mapillary as the default provider for 360-degree street-level imagery and graph navigation.

2. Geoapify (Context Grounding)

3. LM Studio (Vision-Language Analysis)

4. Stable Audio / Gradio / Pinokio (Audio Synthesis)

5. Python Adapters (Custom AI Fallbacks)

If you prefer writing your AI inference logic in Python instead of Node.js, the system provides standard subprocess adapters:


📦 Core Payload Contracts

The architecture is strictly decoupled. These payloads act as the universal language between the Client, the Node.js Core, and your custom Strategies. Example payloads below.

1. The Vision Payload (VisionProvider.analyse())

Your Vision Provider must return an object with an intents array. Every intent must contain the strict routing keys (eventName, identity, prompt, type) to pass validation.

{
  "intents": [
    {
      "layer": "spatial",                 
      "label": "Dog, Barking, Slapback",  
      "prompt": "Dog, Barking, Slapback, recorded at London, UK...",
      "type": "object_organic",           
      "eventName": "instance_ready",      
      "identity": "instance",             
      "persistent": false,                
      "positional": true,                 
      "envType": "organic",               
      "h": 270,                           
      "p": 0,                             
      "dist": 5                           
    },
    {
      "layer": "ambient",                 
      "label": "Ambient",                 
      "prompt": "Low rumble of distant traffic, dry acoustics...",
      "type": "ambient",                 
      "eventName": "node_ready",          
      "identity": "node",                 
      "persistent": true,                 
      "positional": false,                
      "envType": "city"                   
    }
  ]
}

2. The Audio Task Payload (AudioProvider.generate())

The AIEngine takes the vision intents and appends internal caching and queueing identifiers before sending it to the AudioProvider.

{
  "layer": "spatial",               
  "label": "Dog, Barking, Slapback",
  "prompt": "Dog, Barking, Slapback...",
  "type": "object_organic",         
  "eventName": "instance_ready",    
  "identity": "instance",           
  "persistent": false,              
  "positional": true,               
  "envType": "organic",             
  "h": 270,                         
  "p": 0,                           
  "dist": 5,                        
  
  "id": "london_uk_dog_barking_v1_34985734985_0", 
  "nodeId": "34985734985",                                 
  "audioContentId": "london_uk_dog_barking_v1",   
  "locationContext": "London, UK",                         
  "displayName": "Dog, Barking, Slapback",                 
  "visualMetadata": { /* raw copy of original intent */ }  
}

3. The Client-to-Server Payload (spatial_sync)

Emitted by NetworkService when navigating to a new panorama.

{
  "nodeId": "34985734985",          
  "fromId": "12938471293",          
  "navEpoch": 14,                   
  "isAnchor": true,                 
  "location": { "lat": 40.7128, "lng": -74.0060 },
  "requestedLayers": ["spatial", "ambient"],
  "nearbyAnchors": [                
    {
      "nodeId": "98237498237",
      "hops": 1,                    
      "requestedLayers": ["horizon"]
    }
  ],
  "dbPayload": { /* cached graph geometry */ }                 
}

4. The Server-to-Client Completion Payload (instance_ready / node_ready)

Emitted by PipelineService when audio generation is finished.

{
  "url": "/audio/stream.wav?id=london_uk_dog_barking_v1", 
  "nodeId": "34985734985",                   
  "navEpoch": 14,                            
  "taskData": {                              
    "id": "london_uk_dog_barking_v1_34985734985_0",
    "prompt": "Dog, Barking, Slapback...",
    "displayName": "Dog, Barking, Slapback",
    "persistent": false,
    "positional": true,
    "envType": "organic",
    "audioContentId": "london_uk_dog_barking_v1"
  }
}

5. The Topology Graph Payload (BaseTopologyProvider.getNode())

The expected return shape for topology map spidering.

{
  "id": "34985734985",
  "lat": 40.7128,
  "lng": -74.0060,
  "links": [
    { "id": "neighbor_1_id", "heading": 90 },
    { "id": "neighbor_2_id", "heading": 270 }
  ]
}

🖥️ Server-Side Strategies (The AI Engine)

Server strategies live in server/AIEngine/strategies/. They dictate how the backend fetches 360 images, evaluates them with VLMs, and generates audio.

1. ImageSourceProvider

Location: server/AIEngine/strategies/imagesource/
Purpose: Fetches raw equirectangular image buffers from a mapping service.

import { ImageSourceProvider } from './ImageSourceProvider.js';

export class MyCustomImageSource extends ImageSourceProvider {
    /**
     * @param {string} id - The agnostic node identifier.
     * @returns {Promise<Buffer>} - The raw binary image data.
     */
    async getImage(id) {
        // Fetch image bytes from your API
        return Buffer.from(arrayBuffer); 
    }
}

2. ContextProvider

Location: server/AIEngine/strategies/context/
Purpose: Converts raw Lat/Lng coordinates into a human-readable location string.

import { ContextProvider } from './ContextProvider.js';

export class MyContextProvider extends ContextProvider {
    /**
     * @param {number} lat 
     * @param {number} lng 
     * @returns {Promise<string>} - Human readable location (e.g., "Urban Street, London")
     */
    async resolve(lat, lng) {
        return "Custom Location String";
    }

    /**
     * @returns {Object} - Safe config pushed to the client on boot
     */
    getPublicConfig() {
        return { customApiKey: process.env.MY_API_KEY };
    }
}

3. VisionProvider

Location: server/AIEngine/strategies/vision/
Purpose: Evaluates visual buffers to extract sonic intents.

import { VisionProvider } from './VisionProvider.js';

export class MyVisionProvider extends VisionProvider {
    async init() {}

    /**
     * @param {Buffer} buffer - The 360 image buffer
     * @param {string} context - The resolved location string
     * @param {Object} options - Dictionary parameters (layers, max objects, etc.)
     * @returns {Promise<Object>} - Must return an object containing an 'intents' array.
     */
    async analyse(buffer, context, options) {
        // Evaluate buffer, generate intents based on the payload schema above
        return {
            intents: [
                {
                    layer: "spatial",
                    label: "Dog",
                    prompt: "A dog barking...",
                    type: "object_organic",
                    eventName: "instance_ready",
                    identity: "instance",
                    persistent: false,
                    positional: true,
                    envType: "organic",
                    h: 270, p: 0, dist: 5
                }
            ]
        };
    }
}

4. AudioProvider

Location: server/AIEngine/strategies/audio/
Purpose: Synthesizes text prompts into .wav audio buffers.

import { AudioProvider } from './AudioProvider.js';

export class MyAudioProvider extends AudioProvider {
    /**
     * @param {Object} task - The intent payload
     * @param {Object} context - Execution hooks: { signal, socket, progressCallback }
     * @returns {Promise<{buffer: Buffer, duration: string}>}
     */
    async generate(task, context) {
        // Return raw WAV buffer and duration (in seconds)
        return {
            buffer: generatedWavBuffer,
            duration: "10.0"
        };
    }
}

🌐 Client-Side Strategies (UI & Map Abstractions)

Client strategies live in client/js/strategies/. They wrap proprietary SDKs so the core engine never touches external code.

1. BaseViewerProvider

Location: client/js/strategies/viewproviders/
Purpose: Wraps 2D Panoramas (StreetView, MapillaryJS). Must emit standard events.

import { BaseViewerProvider } from './BaseViewerProvider.js';

export class MyViewerProvider extends BaseViewerProvider {
    async init() {
        // Boot your 2D Viewer SDK (e.g., attach to this.containerId)
        
        // CONTRACT: You MUST emit these 3 events when the SDK interacts:
        // this.trigger('visible_changed', boolean);
        // this.trigger('node_changed', { id: "newNodeId", location: { lat, lng } });
        // this.trigger('pov_changed', { heading: 180, pitch: 0 });
    }

    getCurrentNodeId() { return "current_id"; }
    getLocation() { return { lat: 0, lng: 0 }; }
    isVisible() { return true; }
    getNativeViewer() { return this.myNativeMapObject; }
}

2. BaseTopologyProvider

Location: client/js/strategies/topologyproviders/
Purpose: Retrieves the graph mapping data for neighbors.

import { BaseTopologyProvider } from './BaseTopologyProvider.js';

export class MyTopologyProvider extends BaseTopologyProvider {
    /**
     * @param {string} nodeId
     * @returns {Promise<Object>}
     */
    async getNode(nodeId) {
        return {
            id: nodeId,
            lat: 40.7128,
            lng: -74.0060,
            links: [
                { id: "neighbor_id_1", heading: 90 }
            ]
        };
    }
}

3. NodeSelectionStrategy

Location: client/js/strategies/nodeselectionstrategies/
Purpose: Math logic to determine if a node acts as a background acoustic anchor.

import { NodeSelectionStrategy } from './NodeSelectionStrategy.js';

export class MySelectionStrategy extends NodeSelectionStrategy {
    /**
     * @param {string} nodeId 
     * @param {TopologyRadar} radar 
     * @returns {Promise<boolean>}
     */
    async isAnchor(nodeId, radar) {
        return true; 
    }
    reset() {}
}

4. BaseSemanticProvider

Location: client/js/strategies/semanticproviders/
Purpose: Defines the semantic layers the system should look for.

import { BaseSemanticProvider } from './BaseSemanticProvider.js';

export class MySemanticProvider extends BaseSemanticProvider {
    getActiveLayers() { return ['spatial', 'ambient']; }
    getBackgroundLayers() { return ['horizon']; }
    requiresBackgroundProcessing() { return true; }
}

5. BaseVRLoader

Location: client/js/strategies/vrproviders/
Purpose: Fetches and paints image tiles to a canvas for WebXR environments.

import { BaseVRLoader } from './BaseVRLoader.js';

export class MyVRLoader extends BaseVRLoader {
    async getLowResBase(nodeId, ctx, width, height) {
        // Draw low-res placeholder to ctx
    }

    async stitchProgressively(nodeId, zoom, ctx, width, height, onTileDrawn) {
        // Draw HD tiles
        onTileDrawn();
        return true; 
    }
}