Files
house-design/designs/floorplan-import-design.md

17 KiB
Raw Blame History

Floor Plan Image Recognition — Feature Design

Task: t-c2921 Author: inventor Status: Design proposal


Problem

Users want to import an existing floor plan image (architect drawing, realtor photo, hand sketch) and have it automatically converted into the project's house JSON format so they can immediately view it in 3D, furnish rooms, and iterate on the design.

Approach: LLM Vision API

After evaluating four approaches, the recommended solution uses multimodal LLM vision (Claude or OpenAI) to analyze floor plan images and output structured house JSON.

Why LLM Vision over alternatives

Approach Pros Cons Verdict
Classical CV (OpenCV.js edge detection) No API needed, offline Can't identify room types, fails on varied styles, needs heavy heuristics Too fragile
LLM Vision (Claude/GPT-4V) Understands semantics, handles variety, outputs JSON directly Needs API key + network Best fit
Dedicated ML (YOLO/CubiCasa models) High accuracy for specific styles Heavy model files (~100MB+), complex setup, breaks vanilla JS philosophy Too heavy
Hybrid CV + LLM Best of both worlds More complexity for marginal gain Overengineered for v1

Key reasons:

  1. Project is vanilla JS with no build system — adding ML runtimes is architecturally wrong
  2. Floor plans are inherently semantic — you need to know "this is a kitchen" not just "this is a rectangle"
  3. LLMs can output the exact house JSON format in a single call
  4. LLMs handle architectural drawings, realtor floor plans, and hand sketches equally well
  5. Standard door widths (~0.9m) give LLMs reliable dimensional anchors

Architecture

New module: src/floorplan-import.js

FloorplanImporter
├── constructor(renderer, options)
├── open()                          // Shows the import modal
├── _buildModal()                   // Creates DOM for the modal overlay
├── _handleImageUpload(file)        // Processes uploaded image
├── _preprocessImage(imageData)     // Canvas preprocessing (contrast, resize)
├── _analyzeWithLLM(base64Image)    // Sends to vision API, gets house JSON
├── _buildPrompt()                  // Constructs the system+user prompt
├── _validateHouseJSON(json)        // Validates output matches schema
├── _applyToRenderer(houseData)     // Loads result into the 3D viewer
├── _showPreview(houseData)         // Shows result for user review
└── close()                         // Closes modal, cleans up

Integration point: src/index.html

New button in the sidebar File section:

<button class="export-btn" id="btn-import-floorplan">Import Floor Plan</button>

Wired in the wireExportButtons() function.


User Flow

1. User clicks "Import Floor Plan" in sidebar
          │
2. Modal overlay appears with:
   ┌──────────────────────────────────┐
   │  Import Floor Plan               │
   │                                  │
   │  ┌────────────────────────────┐  │
   │  │                            │  │
   │  │   Drop image here or       │  │
   │  │   click to browse          │  │
   │  │                            │  │
   │  │   PNG, JPG, WebP           │  │
   │  └────────────────────────────┘  │
   │                                  │
   │  Building name: [___________]    │
   │  Floors shown:  [1 ▼]           │
   │                                  │
   │  API: [Claude ▼] Key: [••••••]  │
   │                                  │
   │  [Analyze Floor Plan]            │
   └──────────────────────────────────┘
          │
3. Image uploaded → shown in preview area
          │
4. User clicks "Analyze" → spinner + progress text
          │
5. LLM returns house JSON
          │
6. Preview mode shows:
   ┌──────────────────────────────────┐
   │  Result Preview                  │
   │                                  │
   │  Found: 6 rooms, 8 doors,       │
   │         12 windows               │
   │                                  │
   │  Rooms:                          │
   │  ☑ Living Room    4.5 × 5.5m    │
   │  ☑ Kitchen        4.0 × 3.5m    │
   │  ☑ Hallway        2.0 × 9.0m    │
   │  ☑ Bathroom       2.5 × 3.0m    │
   │  ☑ Bedroom        4.5 × 4.0m    │
   │  ☑ Office         3.5 × 3.0m    │
   │                                  │
   │  [Accept & Load]  [Edit JSON]    │
   │  [Re-analyze]     [Cancel]       │
   └──────────────────────────────────┘
          │
7a. "Accept" → loads house JSON into renderer,
     rebuilds floor buttons, room list, 3D view
          │
7b. "Edit JSON" → opens raw JSON in textarea
     for manual corrections before loading

The Prompt (Core of the Feature)

The prompt engineering is the most critical part. It must produce valid house JSON from any floor plan style.

System prompt

You are a floor plan analyzer. Given an image of a floor plan or floor layout,
extract the room structure and output valid JSON matching the exact schema below.

Rules:
- All dimensions in meters. Use standard architectural conventions if no scale bar
  is visible (standard interior door = 0.9m wide, entry door = 1.0-1.1m)
- Rooms are axis-aligned rectangles positioned on a coordinate grid
- Position {x, y} is the room's bottom-left corner (x = west-east, y = south-north)
- Each room has walls on 4 cardinal directions (north, south, east, west)
- Walls are "exterior" if they face outside the building, "interior" otherwise
- Doors have: id, type (entry|interior|patio|open), position (meters from wall start),
  width, height, connectsTo (adjacent room id or "exterior")
- Windows have: id, type (casement|fixed), position, width, height, sillHeight
- Generate unique IDs: "{floorId}-{roomSlug}" for rooms, "{roomId}-d{n}" for doors,
  "{roomId}-w{n}" for windows
- Room types: living, kitchen, dining, bedroom, bathroom, hallway, office, utility,
  storage, laundry, garage
- Flooring: "tile" for kitchen/bathroom/utility/hallway, "hardwood" for others

Output ONLY valid JSON, no markdown fences, no explanation.

User prompt template

Analyze this floor plan image. The building is named "{name}".
{scaleHint ? "Scale reference: " + scaleHint : "Estimate dimensions from standard door widths."}
This image shows {floorCount} floor(s).

Output the house JSON with this structure:
{
  "name": "...",
  "description": "...",
  "units": "meters",
  "building": {
    "footprint": { "width": <number>, "depth": <number> },
    "wallThickness": 0.24,
    "roofType": "gable"
  },
  "floors": [
    {
      "id": "eg",
      "name": "...",
      "nameEN": "...",
      "level": 0,
      "ceilingHeight": 2.6,
      "rooms": [
        {
          "id": "eg-room-slug",
          "name": "...",
          "nameEN": "...",
          "type": "living|kitchen|...",
          "position": { "x": <meters>, "y": <meters> },
          "dimensions": { "width": <meters>, "length": <meters> },
          "flooring": "tile|hardwood",
          "walls": {
            "south": { "type": "exterior|interior", "doors": [...], "windows": [...] },
            "north": { ... },
            "east": { ... },
            "west": { ... }
          }
        }
      ]
    }
  ]
}

API Integration

Multi-provider support

const API_PROVIDERS = {
  claude: {
    name: 'Claude (Anthropic)',
    endpoint: 'https://api.anthropic.com/v1/messages',
    model: 'claude-sonnet-4-5-20250929',
    buildRequest(base64Image, mediaType, systemPrompt, userPrompt) {
      return {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          'x-api-key': apiKey,
          'anthropic-version': '2023-06-01',
          'anthropic-dangerous-direct-browser-access': 'true'
        },
        body: JSON.stringify({
          model: this.model,
          max_tokens: 8192,
          system: systemPrompt,
          messages: [{
            role: 'user',
            content: [
              { type: 'image', source: { type: 'base64', media_type: mediaType, data: base64Image } },
              { type: 'text', text: userPrompt }
            ]
          }]
        })
      };
    },
    extractJSON(response) {
      return response.content[0].text;
    }
  },
  openai: {
    name: 'OpenAI (GPT-4o)',
    endpoint: 'https://api.openai.com/v1/chat/completions',
    model: 'gpt-4o',
    buildRequest(base64Image, mediaType, systemPrompt, userPrompt) {
      return {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          'Authorization': `Bearer ${apiKey}`
        },
        body: JSON.stringify({
          model: this.model,
          max_tokens: 8192,
          messages: [
            { role: 'system', content: systemPrompt },
            { role: 'user', content: [
              { type: 'image_url', image_url: { url: `data:${mediaType};base64,${base64Image}` } },
              { type: 'text', text: userPrompt }
            ]}
          ],
          response_format: { type: 'json_object' }
        })
      };
    },
    extractJSON(response) {
      return response.choices[0].message.content;
    }
  }
};

API key management

  • Stored in localStorage under floorplan-api-key-{provider}
  • Entered once per session via the import modal
  • Never sent to any server except the chosen API provider
  • Key input field uses type="password" and shows masked value
  • "Clear key" button to remove from localStorage

Image Preprocessing

Before sending to the LLM, apply lightweight canvas preprocessing:

_preprocessImage(file) {
  return new Promise((resolve) => {
    const img = new Image();
    img.onload = () => {
      const canvas = document.createElement('canvas');

      // Resize if larger than 2048px on any side (API limits + cost reduction)
      const maxDim = 2048;
      let { width, height } = img;
      if (width > maxDim || height > maxDim) {
        const scale = maxDim / Math.max(width, height);
        width = Math.round(width * scale);
        height = Math.round(height * scale);
      }

      canvas.width = width;
      canvas.height = height;
      const ctx = canvas.getContext('2d');

      // Draw and optionally enhance contrast for faded plans
      ctx.drawImage(img, 0, 0, width, height);

      // Convert to base64 (JPEG for photos, PNG for drawings)
      const mediaType = file.type === 'image/png' ? 'image/png' : 'image/jpeg';
      const quality = mediaType === 'image/jpeg' ? 0.9 : undefined;
      const base64 = canvas.toDataURL(mediaType, quality).split(',')[1];

      resolve({ base64, mediaType, width, height });
    };
    img.src = URL.createObjectURL(file);
  });
}

Validation

After receiving LLM output, validate before loading:

_validateHouseJSON(data) {
  const errors = [];

  if (!data.name) errors.push('Missing building name');
  if (!data.building?.footprint) errors.push('Missing building footprint');
  if (!data.floors?.length) errors.push('No floors found');

  for (const floor of (data.floors || [])) {
    if (!floor.rooms?.length) {
      errors.push(`Floor "${floor.name}" has no rooms`);
      continue;
    }
    for (const room of floor.rooms) {
      if (!room.id) errors.push(`Room missing id`);
      if (!room.position) errors.push(`Room "${room.id}" missing position`);
      if (!room.dimensions) errors.push(`Room "${room.id}" missing dimensions`);
      if (!room.walls) errors.push(`Room "${room.id}" missing walls`);

      // Validate wall references
      for (const dir of ['north', 'south', 'east', 'west']) {
        const wall = room.walls?.[dir];
        if (!wall) errors.push(`Room "${room.id}" missing ${dir} wall`);
        if (wall && !['exterior', 'interior'].includes(wall.type)) {
          errors.push(`Room "${room.id}" ${dir} wall has invalid type "${wall.type}"`);
        }
      }
    }
  }

  return { valid: errors.length === 0, errors };
}

Auto-repair

Common LLM output issues and fixes:

  • Missing wall entries → default to { "type": "interior" }
  • String numbers → parse to float
  • Missing IDs → auto-generate from room name
  • Missing flooring → infer from room type
  • Rooms without walls object → generate empty walls

Scale Detection Strategy

Dimensions are the hardest part. The LLM handles this through:

  1. Standard references — Interior doors are ~0.9m, entry doors ~1.0-1.1m, windows ~1.2m. The LLM uses these as implicit scale anchors.

  2. User-provided scale — Optional input: "The living room is approximately 5m wide" or "Scale: 1cm = 0.5m". Passed as a hint in the prompt.

  3. Scale bar detection — If the floor plan has a scale bar, the LLM reads it directly.

  4. Post-import adjustment — After loading, user can use the existing House Editor to manually adjust any room dimensions.


Loading into Renderer

After validation, the house JSON replaces the current house:

_applyToRenderer(houseData) {
  // Replace house data in renderer
  this.renderer.houseData = houseData;
  this.renderer.currentFloor = 0;

  // Clear and re-render
  this.renderer._clearFloor();
  const floor = houseData.floors[0];
  for (const room of floor.rooms) {
    this.renderer._renderRoom(room, floor.ceilingHeight);
  }

  // Dispatch event for UI to rebuild floor buttons, room list, etc.
  this.renderer.container.dispatchEvent(new CustomEvent('houseloaded', {
    detail: { name: houseData.name, floors: houseData.floors.length }
  }));
}

The index.html would listen for houseloaded and rebuild:

  • Floor buttons
  • Room list
  • House editor state
  • Reset camera position

File Structure

src/
  floorplan-import.js    # New module — FloorplanImporter class
  index.html             # Modified — add button + wire up + houseloaded event

No new dependencies. No build changes. Pure vanilla JS using:

  • fetch() for API calls
  • Canvas API for image preprocessing
  • FileReader / Blob for image handling
  • localStorage for API key persistence

CSS (inline in modal, consistent with project style)

The modal uses the same design language as existing UI:

  • rgba(255, 255, 255, 0.95) backgrounds
  • #4a90d9 accent color
  • -apple-system, BlinkMacSystemFont font stack
  • border-radius: 4-6px on elements
  • Same button styles as .export-btn

Edge Cases

Case Handling
Multi-floor image (side by side) Prompt asks LLM to detect multiple floors
Hand-drawn sketch LLM handles well; dimensions will be approximate
Photo of printed plan Canvas preprocessing helps; LLM reads spatial layout
Non-English labels LLM translates; output uses both original + English names
Very large image (>10MB) Canvas resizes to max 2048px before base64 encoding
LLM returns invalid JSON Parse error → show raw text → let user "Edit JSON"
LLM returns partial data Validation finds gaps → auto-repair what's possible, flag rest
API rate limit Show error, suggest retry after delay
No API key Modal won't allow "Analyze" without key entered
Curved walls / non-rectangular rooms Approximate as rectangles (project constraint)

Cost Estimate

Per floor plan analysis:

  • Claude Sonnet: ~$0.01-0.03 per image (vision + ~2K output tokens)
  • GPT-4o: ~$0.01-0.05 per image
  • Negligible for individual use

Implementation Recommendations

For the coder:

  1. Start with the prompt — get _buildPrompt() right first, test with various floor plan images manually via the API before building the UI.

  2. Build the modal — follow the existing modal-free overlay pattern (the project uses no modal library; use a simple overlay div).

  3. Wire up the API — start with Claude support, add OpenAI second. The provider abstraction makes this easy.

  4. Add validation + auto-repair — defensive parsing of LLM output is essential.

  5. Handle the houseloaded event in index.html — rebuild all sidebar UI.

  6. Test with varied floor plans:

    • Clean architectural drawing (should work great)
    • Realtor-style colored floor plan (should work well)
    • Hand sketch on paper (should work, approximate dimensions)
    • Photo of a floor plan on screen (should work with preprocessing)

Testing approach:

  • Save example floor plan images in data/test-floorplans/
  • Compare LLM output against manually created house JSON
  • Check that output loads in 3D viewer without errors
  • Verify rooms don't overlap and walls connect properly

Future Enhancements (out of scope for v1)

  • Local model support — Run a local vision model (via Ollama) for offline use
  • PDF import — Extract floor plan pages from architectural PDFs
  • Multi-floor stitching — Upload separate images per floor, align them
  • Overlay comparison — Show original image as ground texture under 3D rooms
  • Iterative refinement — "The kitchen should be wider" → re-prompt with corrections
  • Scale calibration tool — Click two points on image, enter real distance