mAi/house-design

Fork 0

Files

m 6e498818a7 Add floor plan import design doc and project documentation

2026-02-07 16:39:40 +01:00

17 KiB

Raw Blame History

Floor Plan Image Recognition — Feature Design

Task: t-c2921 Author: inventor Status: Design proposal

Problem

Users want to import an existing floor plan image (architect drawing, realtor photo, hand sketch) and have it automatically converted into the project's house JSON format so they can immediately view it in 3D, furnish rooms, and iterate on the design.

Approach: LLM Vision API

After evaluating four approaches, the recommended solution uses multimodal LLM vision (Claude or OpenAI) to analyze floor plan images and output structured house JSON.

Why LLM Vision over alternatives

Approach	Pros	Cons	Verdict
Classical CV (OpenCV.js edge detection)	No API needed, offline	Can't identify room types, fails on varied styles, needs heavy heuristics	Too fragile
LLM Vision (Claude/GPT-4V)	Understands semantics, handles variety, outputs JSON directly	Needs API key + network	Best fit
Dedicated ML (YOLO/CubiCasa models)	High accuracy for specific styles	Heavy model files (~100MB+), complex setup, breaks vanilla JS philosophy	Too heavy
Hybrid CV + LLM	Best of both worlds	More complexity for marginal gain	Overengineered for v1

Key reasons:

Project is vanilla JS with no build system — adding ML runtimes is architecturally wrong
Floor plans are inherently semantic — you need to know "this is a kitchen" not just "this is a rectangle"
LLMs can output the exact house JSON format in a single call
LLMs handle architectural drawings, realtor floor plans, and hand sketches equally well
Standard door widths (~0.9m) give LLMs reliable dimensional anchors

Architecture

New module: `src/floorplan-import.js`

FloorplanImporter
├── constructor(renderer, options)
├── open()                          // Shows the import modal
├── _buildModal()                   // Creates DOM for the modal overlay
├── _handleImageUpload(file)        // Processes uploaded image
├── _preprocessImage(imageData)     // Canvas preprocessing (contrast, resize)
├── _analyzeWithLLM(base64Image)    // Sends to vision API, gets house JSON
├── _buildPrompt()                  // Constructs the system+user prompt
├── _validateHouseJSON(json)        // Validates output matches schema
├── _applyToRenderer(houseData)     // Loads result into the 3D viewer
├── _showPreview(houseData)         // Shows result for user review
└── close()                         // Closes modal, cleans up

Integration point: `src/index.html`

New button in the sidebar File section:

<button class="export-btn" id="btn-import-floorplan">Import Floor Plan</button>

Wired in the wireExportButtons() function.

User Flow

1. User clicks "Import Floor Plan" in sidebar
          │
2. Modal overlay appears with:
   ┌──────────────────────────────────┐
   │  Import Floor Plan               │
   │                                  │
   │  ┌────────────────────────────┐  │
   │  │                            │  │
   │  │   Drop image here or       │  │
   │  │   click to browse          │  │
   │  │                            │  │
   │  │   PNG, JPG, WebP           │  │
   │  └────────────────────────────┘  │
   │                                  │
   │  Building name: [___________]    │
   │  Floors shown:  [1 ▼]           │
   │                                  │
   │  API: [Claude ▼] Key: [••••••]  │
   │                                  │
   │  [Analyze Floor Plan]            │
   └──────────────────────────────────┘
          │
3. Image uploaded → shown in preview area
          │
4. User clicks "Analyze" → spinner + progress text
          │
5. LLM returns house JSON
          │
6. Preview mode shows:
   ┌──────────────────────────────────┐
   │  Result Preview                  │
   │                                  │
   │  Found: 6 rooms, 8 doors,       │
   │         12 windows               │
   │                                  │
   │  Rooms:                          │
   │  ☑ Living Room    4.5 × 5.5m    │
   │  ☑ Kitchen        4.0 × 3.5m    │
   │  ☑ Hallway        2.0 × 9.0m    │
   │  ☑ Bathroom       2.5 × 3.0m    │
   │  ☑ Bedroom        4.5 × 4.0m    │
   │  ☑ Office         3.5 × 3.0m    │
   │                                  │
   │  [Accept & Load]  [Edit JSON]    │
   │  [Re-analyze]     [Cancel]       │
   └──────────────────────────────────┘
          │
7a. "Accept" → loads house JSON into renderer,
     rebuilds floor buttons, room list, 3D view
          │
7b. "Edit JSON" → opens raw JSON in textarea
     for manual corrections before loading

The Prompt (Core of the Feature)

The prompt engineering is the most critical part. It must produce valid house JSON from any floor plan style.

System prompt

You are a floor plan analyzer. Given an image of a floor plan or floor layout,
extract the room structure and output valid JSON matching the exact schema below.

Rules:
- All dimensions in meters. Use standard architectural conventions if no scale bar
  is visible (standard interior door = 0.9m wide, entry door = 1.0-1.1m)
- Rooms are axis-aligned rectangles positioned on a coordinate grid
- Position {x, y} is the room's bottom-left corner (x = west-east, y = south-north)
- Each room has walls on 4 cardinal directions (north, south, east, west)
- Walls are "exterior" if they face outside the building, "interior" otherwise
- Doors have: id, type (entry|interior|patio|open), position (meters from wall start),
  width, height, connectsTo (adjacent room id or "exterior")
- Windows have: id, type (casement|fixed), position, width, height, sillHeight
- Generate unique IDs: "{floorId}-{roomSlug}" for rooms, "{roomId}-d{n}" for doors,
  "{roomId}-w{n}" for windows
- Room types: living, kitchen, dining, bedroom, bathroom, hallway, office, utility,
  storage, laundry, garage
- Flooring: "tile" for kitchen/bathroom/utility/hallway, "hardwood" for others

Output ONLY valid JSON, no markdown fences, no explanation.

User prompt template

Analyze this floor plan image. The building is named "{name}".
{scaleHint ? "Scale reference: " + scaleHint : "Estimate dimensions from standard door widths."}
This image shows {floorCount} floor(s).

Output the house JSON with this structure:
{
  "name": "...",
  "description": "...",
  "units": "meters",
  "building": {
    "footprint": { "width": <number>, "depth": <number> },
    "wallThickness": 0.24,
    "roofType": "gable"
  },
  "floors": [
    {
      "id": "eg",
      "name": "...",
      "nameEN": "...",
      "level": 0,
      "ceilingHeight": 2.6,
      "rooms": [
        {
          "id": "eg-room-slug",
          "name": "...",
          "nameEN": "...",
          "type": "living|kitchen|...",
          "position": { "x": <meters>, "y": <meters> },
          "dimensions": { "width": <meters>, "length": <meters> },
          "flooring": "tile|hardwood",
          "walls": {
            "south": { "type": "exterior|interior", "doors": [...], "windows": [...] },
            "north": { ... },
            "east": { ... },
            "west": { ... }
          }
        }
      ]
    }
  ]
}

API Integration

Multi-provider support

const API_PROVIDERS = {
  claude: {
    name: 'Claude (Anthropic)',
    endpoint: 'https://api.anthropic.com/v1/messages',
    model: 'claude-sonnet-4-5-20250929',
    buildRequest(base64Image, mediaType, systemPrompt, userPrompt) {
      return {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          'x-api-key': apiKey,
          'anthropic-version': '2023-06-01',
          'anthropic-dangerous-direct-browser-access': 'true'
        },
        body: JSON.stringify({
          model: this.model,
          max_tokens: 8192,
          system: systemPrompt,
          messages: [{
            role: 'user',
            content: [
              { type: 'image', source: { type: 'base64', media_type: mediaType, data: base64Image } },
              { type: 'text', text: userPrompt }
            ]
          }]
        })
      };
    },
    extractJSON(response) {
      return response.content[0].text;
    }
  },
  openai: {
    name: 'OpenAI (GPT-4o)',
    endpoint: 'https://api.openai.com/v1/chat/completions',
    model: 'gpt-4o',
    buildRequest(base64Image, mediaType, systemPrompt, userPrompt) {
      return {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          'Authorization': `Bearer ${apiKey}`
        },
        body: JSON.stringify({
          model: this.model,
          max_tokens: 8192,
          messages: [
            { role: 'system', content: systemPrompt },
            { role: 'user', content: [
              { type: 'image_url', image_url: { url: `data:${mediaType};base64,${base64Image}` } },
              { type: 'text', text: userPrompt }
            ]}
          ],
          response_format: { type: 'json_object' }
        })
      };
    },
    extractJSON(response) {
      return response.choices[0].message.content;
    }
  }
};

API key management

Stored in localStorage under floorplan-api-key-{provider}
Entered once per session via the import modal
Never sent to any server except the chosen API provider
Key input field uses type="password" and shows masked value
"Clear key" button to remove from localStorage

Image Preprocessing

Before sending to the LLM, apply lightweight canvas preprocessing:

_preprocessImage(file) {
  return new Promise((resolve) => {
    const img = new Image();
    img.onload = () => {
      const canvas = document.createElement('canvas');

      // Resize if larger than 2048px on any side (API limits + cost reduction)
      const maxDim = 2048;
      let { width, height } = img;
      if (width > maxDim || height > maxDim) {
        const scale = maxDim / Math.max(width, height);
        width = Math.round(width * scale);
        height = Math.round(height * scale);
      }

      canvas.width = width;
      canvas.height = height;
      const ctx = canvas.getContext('2d');

      // Draw and optionally enhance contrast for faded plans
      ctx.drawImage(img, 0, 0, width, height);

      // Convert to base64 (JPEG for photos, PNG for drawings)
      const mediaType = file.type === 'image/png' ? 'image/png' : 'image/jpeg';
      const quality = mediaType === 'image/jpeg' ? 0.9 : undefined;
      const base64 = canvas.toDataURL(mediaType, quality).split(',')[1];

      resolve({ base64, mediaType, width, height });
    };
    img.src = URL.createObjectURL(file);
  });
}

Validation

After receiving LLM output, validate before loading:

_validateHouseJSON(data) {
  const errors = [];

  if (!data.name) errors.push('Missing building name');
  if (!data.building?.footprint) errors.push('Missing building footprint');
  if (!data.floors?.length) errors.push('No floors found');

  for (const floor of (data.floors || [])) {
    if (!floor.rooms?.length) {
      errors.push(`Floor "${floor.name}" has no rooms`);
      continue;
    }
    for (const room of floor.rooms) {
      if (!room.id) errors.push(`Room missing id`);
      if (!room.position) errors.push(`Room "${room.id}" missing position`);
      if (!room.dimensions) errors.push(`Room "${room.id}" missing dimensions`);
      if (!room.walls) errors.push(`Room "${room.id}" missing walls`);

      // Validate wall references
      for (const dir of ['north', 'south', 'east', 'west']) {
        const wall = room.walls?.[dir];
        if (!wall) errors.push(`Room "${room.id}" missing ${dir} wall`);
        if (wall && !['exterior', 'interior'].includes(wall.type)) {
          errors.push(`Room "${room.id}" ${dir} wall has invalid type "${wall.type}"`);
        }
      }
    }
  }

  return { valid: errors.length === 0, errors };
}

Auto-repair

Common LLM output issues and fixes:

Missing wall entries → default to { "type": "interior" }
String numbers → parse to float
Missing IDs → auto-generate from room name
Missing flooring → infer from room type
Rooms without walls object → generate empty walls

Scale Detection Strategy

Dimensions are the hardest part. The LLM handles this through:

Standard references — Interior doors are ~0.9m, entry doors ~1.0-1.1m, windows ~1.2m. The LLM uses these as implicit scale anchors.
User-provided scale — Optional input: "The living room is approximately 5m wide" or "Scale: 1cm = 0.5m". Passed as a hint in the prompt.
Scale bar detection — If the floor plan has a scale bar, the LLM reads it directly.
Post-import adjustment — After loading, user can use the existing House Editor to manually adjust any room dimensions.

Loading into Renderer

After validation, the house JSON replaces the current house:

_applyToRenderer(houseData) {
  // Replace house data in renderer
  this.renderer.houseData = houseData;
  this.renderer.currentFloor = 0;

  // Clear and re-render
  this.renderer._clearFloor();
  const floor = houseData.floors[0];
  for (const room of floor.rooms) {
    this.renderer._renderRoom(room, floor.ceilingHeight);
  }

  // Dispatch event for UI to rebuild floor buttons, room list, etc.
  this.renderer.container.dispatchEvent(new CustomEvent('houseloaded', {
    detail: { name: houseData.name, floors: houseData.floors.length }
  }));
}

The index.html would listen for houseloaded and rebuild:

Floor buttons
Room list
House editor state
Reset camera position

File Structure

src/
  floorplan-import.js    # New module — FloorplanImporter class
  index.html             # Modified — add button + wire up + houseloaded event

No new dependencies. No build changes. Pure vanilla JS using:

fetch() for API calls
Canvas API for image preprocessing
FileReader / Blob for image handling
localStorage for API key persistence

The modal uses the same design language as existing UI:

rgba(255, 255, 255, 0.95) backgrounds
#4a90d9 accent color
-apple-system, BlinkMacSystemFont font stack
border-radius: 4-6px on elements
Same button styles as .export-btn

Edge Cases

Case	Handling
Multi-floor image (side by side)	Prompt asks LLM to detect multiple floors
Hand-drawn sketch	LLM handles well; dimensions will be approximate
Photo of printed plan	Canvas preprocessing helps; LLM reads spatial layout
Non-English labels	LLM translates; output uses both original + English names
Very large image (>10MB)	Canvas resizes to max 2048px before base64 encoding
LLM returns invalid JSON	Parse error → show raw text → let user "Edit JSON"
LLM returns partial data	Validation finds gaps → auto-repair what's possible, flag rest
API rate limit	Show error, suggest retry after delay
No API key	Modal won't allow "Analyze" without key entered
Curved walls / non-rectangular rooms	Approximate as rectangles (project constraint)

Cost Estimate

Per floor plan analysis:

Claude Sonnet: ~$0.01-0.03 per image (vision + ~2K output tokens)
GPT-4o: ~$0.01-0.05 per image
Negligible for individual use

Implementation Recommendations

For the coder:

Start with the prompt — get _buildPrompt() right first, test with various floor plan images manually via the API before building the UI.
Build the modal — follow the existing modal-free overlay pattern (the project uses no modal library; use a simple overlay div).
Wire up the API — start with Claude support, add OpenAI second. The provider abstraction makes this easy.
Add validation + auto-repair — defensive parsing of LLM output is essential.
Handle the houseloaded event in index.html — rebuild all sidebar UI.
Test with varied floor plans:
- Clean architectural drawing (should work great)
- Realtor-style colored floor plan (should work well)
- Hand sketch on paper (should work, approximate dimensions)
- Photo of a floor plan on screen (should work with preprocessing)

Testing approach:

Save example floor plan images in data/test-floorplans/
Compare LLM output against manually created house JSON
Check that output loads in 3D viewer without errors
Verify rooms don't overlap and walls connect properly

Future Enhancements (out of scope for v1)

Local model support — Run a local vision model (via Ollama) for offline use
PDF import — Extract floor plan pages from architectural PDFs
Multi-floor stitching — Upload separate images per floor, align them
Overlay comparison — Show original image as ground texture under 3D rooms
Iterative refinement — "The kitchen should be wider" → re-prompt with corrections
Scale calibration tool — Click two points on image, enter real distance

17 KiB Raw Blame History Unescape Escape