abba-360-dev

ABBA-360: An Agnostic Browser-Based Sandbox Architecture for AI Audio Generation in Networks of 360° Images

Welcome to the ABBA-360 sandbox. This system is designed as a strictly agnostic orchestration engine for AI generation of spatial audio from interconnected 360° images. The system is setup to run from GitHub Pages using zrok to connect to the server.

📂 Project Structure

abba360_v0/
├── client/                     # Frontend Environment
│   ├── index.html              
│   └── js/
│       ├── client.js           # Bootstrapper & Dependency Injection
│       ├── NavigationManager.js# Core Orchestrator
│       ├── NetworkService.js   # WebSocket client
│       ├── SpatialAudioPlayer.js
│       ├── UIManager.js
│       ├── TopologyRadar.js
│       ├── AcousticTreadmill.js
│       ├── VR/                 # WebXR & A-Frame lifecycle
│       └── strategies/         # ⬅️ IMPLEMENT CLIENT STRATEGIES HERE
│           ├── nodeselectionstrategies/
│           ├── semanticproviders/
│           ├── topologyproviders/
│           ├── viewproviders/
│           └── vrproviders/
├── server/                     # Backend Environment
│   ├── server.js               # Bootstrapper
│   ├── PipelineService.js      # Core Orchestrator
│   ├── SocketController.js     # WebSocket server
│   ├── CacheManager.js
│   ├── GPUResourceManager.js
│   ├── .env                    # ⬅️ IMPLEMENT ACTIVE STRATEGIES CONFIG
│   └── AIEngine/
│       ├── AIEngine.js         # Strategy Delegator
│       ├── pythonscripts/      # Python code go here
│       └── strategies/         # ⬅️ IMPLEMENT SERVER STRATEGIES HERE
│           ├── audio/
│           ├── context/
│           ├── imagesource/
│           └── vision/
└── docs/                       # Auto-generated Documentation

You do not need to edit the core orchestration files (like PipelineService, NavigationManager, NetworkService, SoketController etc). The entire system is built on the Strategy Pattern. As a researcher, you simply write new Strategy classes to connect your own image sources, models, APIs, or mapping SDKs, and then activate them in the .env file.
You must only implement the concrete strategies for the strategy pattern, you should not change any other file other than the .env and the TUNNEL constant at the top of the client.js file.

⚙️ How to Configure Strategies (`.env`)

The system uses dynamic dependency injection. It reads your .env file at boot and dynamically imports the exact JavaScript classes you request. To use a custom strategy, place your file in the appropriate directory, ensure the class name matches the filename exactly, and update your .env:

# ==========================================
# SERVER STRATEGIES (AI ENGINE)
# ==========================================
IMAGE_PROVIDER="MapillarySource"
CONTEXT_PROVIDER="GeoapifyContextProvider"
VISION_PROVIDER="LMStudioVisionProvider"
AUDIO_PROVIDER="StableAudioGradioProvider"

# ==========================================
# CLIENT STRATEGIES (MAPS/360 IMAGE NETWORK/VR 360 IMAGE SOURCE etc)
# ==========================================
CLIENT_VIEWER_PROVIDER="MapillaryViewerProvider"
CLIENT_TOPOLOGY_PROVIDER="MapillaryTopologyProvider"
CLIENT_VR_LOADER_PROVIDER="MapillaryVRLoader"
CLIENT_NODE_SELECTION_STRATEGY="AcousticHorizonStrategy"
CLIENT_SEMANTIC_PROVIDER="DefaultSemanticProvider"
CLIENT_SEMANTIC_LAYERS="spatial, horizon"

# ==========================================
# PYTHON SCRIPTS [OPTIONAL, set to "" if unused]
# ==========================================
PYTHON_VISION_SCRIPT="vision_adapter.py"
PYTHON_AUDIO_SCRIPT="audio_adapter.py"
PYTHON_EXEC = "python3"

Place your API keys in the .env file. The system is setup to pass them to the client.

🛠️ Provided Concrete Examples (Out of the Box)

To help you get started, the repository includes several fully functional, concrete implementations of the strategy interfaces. These demonstrate how to wrap real-world APIs and local models. The system is configured to run with the client hosted on GitHub pages. Change pinokioconfig.json adding your domain.

1. Mapillary & MapLibre GL (Visuals & Topology)

The system uses Mapillary as the default provider for 360-degree street-level imagery and graph navigation.

MapillaryViewerProvider (Client): Wraps MapLibre GL JS to render the 2D map and WebGL viewer, translating user clicks into agnostic pov_changed and node_changed events.
MapillaryTopologyProvider (Client): Queries the Mapillary API to extract the navigation graph (edges/links) so the Acoustic Treadmill can calculate distances to neighboring panoramas.
MapillaryVRLoader (Client): Progressively downloads high-resolution equirectangular tiles to paint onto the WebXR A-Frame sphere.
MapillarySource (Server): Fetches the raw image buffer for the current panorama ID and passes it to the AI Engine for VLM analysis. Image providers must match in client and server.

2. Geoapify (Context Grounding)

GeoapifyContextProvider (Server): A reverse-geocoding adapter. It takes the raw Lat/Lng coordinates from the client and converts them into a human-readable location string (e.g., “Times Square, New York”). This string grounds the VLM prompt to ensure region-accurate sonic generation.

3. LM Studio (Vision-Language Analysis)

LMStudioVisionProvider (Server): An adapter for communicating with locally hosted Vision-Language Models (like LLaVA or Qwen-VL) via LM Studio’s local server. It structures system prompts based on semantic layers (spatial, ambient, horizon) and parses the JSON output to locate sound sources in the 360 frame.

4. Stable Audio / Gradio / Pinokio (Audio Synthesis)

StableAudioGradioProvider (Server): Connects via WebSockets to a local Gradio API endpoint (commonly managed via Pinokio). It passes the text prompts generated by the VLM to Stable Audio Open, streams the generation progress back to the UI, and captures the resulting .wav buffer.

5. Python Adapters (Custom AI Fallbacks)

If you prefer writing your AI inference logic in Python instead of Node.js, the system provides standard subprocess adapters:

PythonVisionProvider & PythonAudioAdapter (Server): These strategies use child_process.spawn to execute standard Python scripts (vision_adapter.py and audio_adapter.py). They pipe the base64 image data and prompts via stdin and parse the JSON outputs from stdout. Mock python scripts are included in the pythonscripts/ directory as templates.

📦 Core Payload Contracts

The architecture is strictly decoupled. These payloads act as the universal language between the Client, the Node.js Core, and your custom Strategies. Example payloads below.

1. The Vision Payload (`VisionProvider.analyse()`)

Your Vision Provider must return an object with an intents array. Every intent must contain the strict routing keys (eventName, identity, prompt, type) to pass validation.

{
  "intents": [
    {
      "layer": "spatial",                 
      "label": "Dog, Barking, Slapback",  
      "prompt": "Dog, Barking, Slapback, recorded at London, UK...",
      "type": "object_organic",           
      "eventName": "instance_ready",      
      "identity": "instance",             
      "persistent": false,                
      "positional": true,                 
      "envType": "organic",               
      "h": 270,                           
      "p": 0,                             
      "dist": 5                           
    },
    {
      "layer": "ambient",                 
      "label": "Ambient",                 
      "prompt": "Low rumble of distant traffic, dry acoustics...",
      "type": "ambient",                 
      "eventName": "node_ready",          
      "identity": "node",                 
      "persistent": true,                 
      "positional": false,                
      "envType": "city"                   
    }
  ]
}

2. The Audio Task Payload (`AudioProvider.generate()`)

The AIEngine takes the vision intents and appends internal caching and queueing identifiers before sending it to the AudioProvider.

{
  "layer": "spatial",               
  "label": "Dog, Barking, Slapback",
  "prompt": "Dog, Barking, Slapback...",
  "type": "object_organic",         
  "eventName": "instance_ready",    
  "identity": "instance",           
  "persistent": false,              
  "positional": true,               
  "envType": "organic",             
  "h": 270,                         
  "p": 0,                           
  "dist": 5,                        
  
  "id": "london_uk_dog_barking_v1_34985734985_0", 
  "nodeId": "34985734985",                                 
  "audioContentId": "london_uk_dog_barking_v1",   
  "locationContext": "London, UK",                         
  "displayName": "Dog, Barking, Slapback",                 
  "visualMetadata": { /* raw copy of original intent */ }  
}

3. The Client-to-Server Payload (`spatial_sync`)

Emitted by NetworkService when navigating to a new panorama.

{
  "nodeId": "34985734985",          
  "fromId": "12938471293",          
  "navEpoch": 14,                   
  "isAnchor": true,                 
  "location": { "lat": 40.7128, "lng": -74.0060 },
  "requestedLayers": ["spatial", "ambient"],
  "nearbyAnchors": [                
    {
      "nodeId": "98237498237",
      "hops": 1,                    
      "requestedLayers": ["horizon"]
    }
  ],
  "dbPayload": { /* cached graph geometry */ }                 
}

4. The Server-to-Client Completion Payload (`instance_ready` / `node_ready`)

Emitted by PipelineService when audio generation is finished.

{
  "url": "/audio/stream.wav?id=london_uk_dog_barking_v1", 
  "nodeId": "34985734985",                   
  "navEpoch": 14,                            
  "taskData": {                              
    "id": "london_uk_dog_barking_v1_34985734985_0",
    "prompt": "Dog, Barking, Slapback...",
    "displayName": "Dog, Barking, Slapback",
    "persistent": false,
    "positional": true,
    "envType": "organic",
    "audioContentId": "london_uk_dog_barking_v1"
  }
}

5. The Topology Graph Payload (`BaseTopologyProvider.getNode()`)

The expected return shape for topology map spidering.

{
  "id": "34985734985",
  "lat": 40.7128,
  "lng": -74.0060,
  "links": [
    { "id": "neighbor_1_id", "heading": 90 },
    { "id": "neighbor_2_id", "heading": 270 }
  ]
}

🖥️ Server-Side Strategies (The AI Engine)

Server strategies live in server/AIEngine/strategies/. They dictate how the backend fetches 360 images, evaluates them with VLMs, and generates audio.

1. `ImageSourceProvider`

Location: server/AIEngine/strategies/imagesource/
Purpose: Fetches raw equirectangular image buffers from a mapping service.

import { ImageSourceProvider } from './ImageSourceProvider.js';

export class MyCustomImageSource extends ImageSourceProvider {
    /**
     * @param {string} id - The agnostic node identifier.
     * @returns {Promise<Buffer>} - The raw binary image data.
     */
    async getImage(id) {
        // Fetch image bytes from your API
        return Buffer.from(arrayBuffer); 
    }
}

2. `ContextProvider`

Location: server/AIEngine/strategies/context/
Purpose: Converts raw Lat/Lng coordinates into a human-readable location string.

import { ContextProvider } from './ContextProvider.js';

export class MyContextProvider extends ContextProvider {
    /**
     * @param {number} lat 
     * @param {number} lng 
     * @returns {Promise<string>} - Human readable location (e.g., "Urban Street, London")
     */
    async resolve(lat, lng) {
        return "Custom Location String";
    }

    /**
     * @returns {Object} - Safe config pushed to the client on boot
     */
    getPublicConfig() {
        return { customApiKey: process.env.MY_API_KEY };
    }
}

3. `VisionProvider`

Location: server/AIEngine/strategies/vision/
Purpose: Evaluates visual buffers to extract sonic intents.

import { VisionProvider } from './VisionProvider.js';

export class MyVisionProvider extends VisionProvider {
    async init() {}

    /**
     * @param {Buffer} buffer - The 360 image buffer
     * @param {string} context - The resolved location string
     * @param {Object} options - Dictionary parameters (layers, max objects, etc.)
     * @returns {Promise<Object>} - Must return an object containing an 'intents' array.
     */
    async analyse(buffer, context, options) {
        // Evaluate buffer, generate intents based on the payload schema above
        return {
            intents: [
                {
                    layer: "spatial",
                    label: "Dog",
                    prompt: "A dog barking...",
                    type: "object_organic",
                    eventName: "instance_ready",
                    identity: "instance",
                    persistent: false,
                    positional: true,
                    envType: "organic",
                    h: 270, p: 0, dist: 5
                }
            ]
        };
    }
}

4. `AudioProvider`

Location: server/AIEngine/strategies/audio/
Purpose: Synthesizes text prompts into .wav audio buffers.

import { AudioProvider } from './AudioProvider.js';

export class MyAudioProvider extends AudioProvider {
    /**
     * @param {Object} task - The intent payload
     * @param {Object} context - Execution hooks: { signal, socket, progressCallback }
     * @returns {Promise<{buffer: Buffer, duration: string}>}
     */
    async generate(task, context) {
        // Return raw WAV buffer and duration (in seconds)
        return {
            buffer: generatedWavBuffer,
            duration: "10.0"
        };
    }
}

🌐 Client-Side Strategies (UI & Map Abstractions)

Client strategies live in client/js/strategies/. They wrap proprietary SDKs so the core engine never touches external code.

1. `BaseViewerProvider`

Location: client/js/strategies/viewproviders/
Purpose: Wraps 2D Panoramas (StreetView, MapillaryJS). Must emit standard events.

import { BaseViewerProvider } from './BaseViewerProvider.js';

export class MyViewerProvider extends BaseViewerProvider {
    async init() {
        // Boot your 2D Viewer SDK (e.g., attach to this.containerId)
        
        // CONTRACT: You MUST emit these 3 events when the SDK interacts:
        // this.trigger('visible_changed', boolean);
        // this.trigger('node_changed', { id: "newNodeId", location: { lat, lng } });
        // this.trigger('pov_changed', { heading: 180, pitch: 0 });
    }

    getCurrentNodeId() { return "current_id"; }
    getLocation() { return { lat: 0, lng: 0 }; }
    isVisible() { return true; }
    getNativeViewer() { return this.myNativeMapObject; }
}

2. `BaseTopologyProvider`

Location: client/js/strategies/topologyproviders/
Purpose: Retrieves the graph mapping data for neighbors.

import { BaseTopologyProvider } from './BaseTopologyProvider.js';

export class MyTopologyProvider extends BaseTopologyProvider {
    /**
     * @param {string} nodeId
     * @returns {Promise<Object>}
     */
    async getNode(nodeId) {
        return {
            id: nodeId,
            lat: 40.7128,
            lng: -74.0060,
            links: [
                { id: "neighbor_id_1", heading: 90 }
            ]
        };
    }
}

3. `NodeSelectionStrategy`

Location: client/js/strategies/nodeselectionstrategies/
Purpose: Math logic to determine if a node acts as a background acoustic anchor.

import { NodeSelectionStrategy } from './NodeSelectionStrategy.js';

export class MySelectionStrategy extends NodeSelectionStrategy {
    /**
     * @param {string} nodeId 
     * @param {TopologyRadar} radar 
     * @returns {Promise<boolean>}
     */
    async isAnchor(nodeId, radar) {
        return true; 
    }
    reset() {}
}

4. `BaseSemanticProvider`

Location: client/js/strategies/semanticproviders/
Purpose: Defines the semantic layers the system should look for.

import { BaseSemanticProvider } from './BaseSemanticProvider.js';

export class MySemanticProvider extends BaseSemanticProvider {
    getActiveLayers() { return ['spatial', 'ambient']; }
    getBackgroundLayers() { return ['horizon']; }
    requiresBackgroundProcessing() { return true; }
}

5. `BaseVRLoader`

Location: client/js/strategies/vrproviders/
Purpose: Fetches and paints image tiles to a canvas for WebXR environments.

import { BaseVRLoader } from './BaseVRLoader.js';

export class MyVRLoader extends BaseVRLoader {
    async getLowResBase(nodeId, ctx, width, height) {
        // Draw low-res placeholder to ctx
    }

    async stitchProgressively(nodeId, zoom, ctx, width, height, onTileDrawn) {
        // Draw HD tiles
        onTileDrawn();
        return true; 
    }
}

abba-360-dev

ABBA-360: An Agnostic Browser-Based Sandbox Architecture for AI Audio Generation in Networks of 360° Images

📂 Project Structure

⚙️ How to Configure Strategies (.env)

🛠️ Provided Concrete Examples (Out of the Box)

1. Mapillary & MapLibre GL (Visuals & Topology)

2. Geoapify (Context Grounding)

3. LM Studio (Vision-Language Analysis)

4. Stable Audio / Gradio / Pinokio (Audio Synthesis)

5. Python Adapters (Custom AI Fallbacks)

📦 Core Payload Contracts

1. The Vision Payload (VisionProvider.analyse())

2. The Audio Task Payload (AudioProvider.generate())

3. The Client-to-Server Payload (spatial_sync)

4. The Server-to-Client Completion Payload (instance_ready / node_ready)

5. The Topology Graph Payload (BaseTopologyProvider.getNode())

🖥️ Server-Side Strategies (The AI Engine)

1. ImageSourceProvider

2. ContextProvider

3. VisionProvider

4. AudioProvider