ARTICLE AD BOX
I am building an iOS 17 application that integrates Apple's RoomPlan (RoomCaptureSession) with a custom Edge AI model (YOLOv8 via CoreML and Vision framework) to detect structural defects (like cracks) on walls in real-time.
The Architecture:
RoomPlan runs in the background to generate the 3D room mesh. I intercept the camera frames (CVPixelBuffer) and pass them to a VNCoreMLRequest. The model successfully returns a 2D VNRecognizedObjectObservation (bounding box) with normalized coordinates.The Challenge: I need to anchor this 2D bounding box to a precise 3D absolute coordinate (simd_float3) in the real world so that the defect marker aligns perfectly with the exported .usdz model and 2D .dxf floor plan.
Currently, I convert the Vision normalized coordinates to screen coordinates, and perform an ARKit raycast using .estimatedPlane:
// Pseudo-code of current approach let screenPoint = CGPoint(x: boundingBox.midX * screenSize.width, y: (1 - boundingBox.midY) * screenSize.height) let query = arSession.raycastQuery(from: screenPoint, allowing: .estimatedPlane, alignment: .vertical) if let result = arSession.raycast(query).first { let worldTransform = result.worldTransform // Save coordinate... }My Questions:
Since RoomPlan creates its own optimized walls (CapturedRoom.Surface), is it better to raycast against ARKit's .estimatedPlane, or is there a way to mathematically intersect the ray directly with RoomPlan's geometric output for better accuracy? Are there any known coordinate space mismatches between the ARSession world origin and the final CapturedRoom export origin that I should account for to prevent the 3D markers from drifting in the final USDZ? Any insights on handling this cross-dimensional coordinate mapping would be highly appreciated.
