Abstract
As the applications of Augmented Reality (AR) or Virtual Reality (VR) expand rapidly with the growing demands on enhanced visual realism, photorealistic image generation and insertion has become an essential feature for the emerging AR applications providing real-time workplace/household visual assistance. Physical Based Ray-Tracing (PBRT) is often used where synthesized images are generated by simulating the real environment and tracing the light transportation to achieve photorealistic effects, such as reflection, refraction, soft shadows, etc. PBRT is widely used in product design, medical visualization, video games and movie effects. To enable photorealistic rendering, there is a strong demand to support ray-tracing (RT) on mobile devices [1]. However, the challenges are: (1) unstructured memory access pattern and complex control flow lead to scheduling difficulty; (2) high memory requirements exhaust the limited SRAM space on edge devices; (3) low error tolerance requires high precision for computing; (4) complex computations, such as division and square root, require significant computing resources for the edge devices. As a result, common rendering engines such as Apple ARKit, OpenGL, are mainly based on the lower cost rasterization rendering technique. Unfortunately, rasterization rendering fails to produce photorealistic synthesis as shown in Fig. 2.5.1. Few ASICs have been fabricated so far as a mobile photorealistic rendering solution solution, however, they may not support RT [2], or may suffer from low efficiency [3]. This work has developed a ray-tracing processor, which also supports inverse rendering (IR) for background extraction [4]. The key features of this work include: (1) an ASIC rendering processor that embeds an end-to-end PBRT solution with IR for AR on mobile devices, (2) a reconfigurable mixed-precision PE design supporting diverse computing tasks for both IR and RT, (3) background clustered Field of View (FOV)-focused 3D construction reducing conventional background scene complexity from O(nlogn) to O(1), (4) scalable partitioning scheme for complex 3D objects, with an average of 13 × speed up on test scenes, (5) use of Global RT Scheduler (GRTS) and Global Memory Access Controller (GMAC) to overcome the challenges of irregular memory access pattern and varied PE run-time with overall 684 × speedup compared with the baseline design. The 28nm test chip achieves 3.95 - 28.8 × higher rendering efficiency compared with existing ASIC solutions, enabling real-time PBRT rendering on mobile edge devices.
Original language | English (US) |
---|---|
Title of host publication | 2024 IEEE International Solid-State Circuits Conference, ISSCC 2024 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 44-46 |
Number of pages | 3 |
ISBN (Electronic) | 9798350306200 |
DOIs | |
State | Published - 2024 |
Event | 2024 IEEE International Solid-State Circuits Conference, ISSCC 2024 - San Francisco, United States Duration: Feb 18 2024 → Feb 22 2024 |
Publication series
Name | Digest of Technical Papers - IEEE International Solid-State Circuits Conference |
---|---|
ISSN (Print) | 0193-6530 |
Conference
Conference | 2024 IEEE International Solid-State Circuits Conference, ISSCC 2024 |
---|---|
Country/Territory | United States |
City | San Francisco |
Period | 2/18/24 → 2/22/24 |
Bibliographical note
Publisher Copyright:© 2024 IEEE.