Let's explore if we can help accelerate your perception development and deployment.
Table of Contents
Over the past decade, 3D sensors have emerged to become one of the most versatile and ubiquitous types of sensor used in robotics. In many robotic applications, 3D sensing has become the de facto choice for tasks such as near-field object detection and collision avoidance, surface and object inspection, and map creation.
There are a number of 3D sensing modalities available. In this roundup, we’ll be covering the three most common: Stereo CMOS (both active and passive), structured light, and time of flight. We won’t be covering LiDAR, despite the fact that LiDAR data is also three dimensional. LiDAR users, fret not…we’ll be covering LiDAR as its own separate class of sensors in a separate post in the future!
We've also created an interactive depth sensor visualizer that lets you interactively compare the range and field of view of some of the most popular 3D sensors available. You can explore that here!
Stereo 3D Sensors
These ubiquitous sensors bring a combination of good overall performance coupled with low cost. Stereo 3D sensors come in two basic types: passive and active.
Passive stereo 3D sensors are the least expensive of all 3D sensors, as they use off-the-shelf components that are inexpensive to source and manufacture. In addition, they are available in a wide range of baselines, giving users a choice of sensing ranges that are best adapted to their use case. However, most rely on visible light to operate effectively, which means they struggle to perform in low-light or unlit conditions.
Active stereo 3D sensors add an infrared (IR) pattern projector which both increases the fidelity of 3D data captured, as well as adding more reliable operation in lower light conditions. However, the IR projectors on these sensors are limited in range, which relegates these sensors to near- and mid-range applications, regardless of baseline. The addition of an IR pattern projector effectively turns these sensors into hybrid stereo/structured light sensors (more on structured light 3D sensors in the next paragraph!).
Examples of stereo 3D sensors:
Stereolabs Zed — Passive stereo 3D sensors available in large and small form factors, with wide and narrow baselines. The Zed Mini and Zed 2 feature integrated IMUs for kinetic data streams.
Ensenso- Industrial-grade stereo 3D sensors with ruggedized construction and Gigabit Ethernet connections.
Tangram Vision HiFi 3D Sensor — Our very own 3D sensor is a high-resolution active stereo sensor with a built-in neural processing unit with 8 TOPS of AI compute.
Occipital Structure Core — Active stereo 3D sensor with bespoke IR projector and built-in IMU for kinetic data. Available as an embeddable module or as an encased standalone sensor.
Intel RealSense — The D series (D415, D435, D435i, D455) are a diverse range of low cost active stereo 3D sensors. The D435i adds a built-in IMU for kinetic data, while the newest D455 adds extended range that is suitable for mid-range robotic navigation.
Structured Light Sensors
Structured light is the most ubiquitous 3D sensing modality used in robotics, thanks to the popularity of the original PrimeSense 3D sensors, which included the first generation Microsoft Kinect.
Structured light 3D sensors combine low cost with high-fidelity 3D data capture, as well as performance in a wide set of lighting conditions, with one notable exception: direct sunlight, or bright indirect sunlight. This is because the infrared light source used by structured light sensors gets overpowered by infrared light in the same bandwidth that is naturally present in sunlight.
Like stereo 3D sensors, structured light sensors are available in an array of small to large baselines that are suited to tasks that occur at different ranges from the sensor.
Examples:
Orbbec Astra — A low-cost structured light 3D sensor that is often used as a drop-in substitute for the original Microsoft Kinect.
Occipital Structure Sensor (No longer available new) — The first standalone 3D sensor designed specifically to be used with mobile devices like Apple iPad and iPhone models. Compatible with OpenNI for depth-powered application development. Now replaced by the Occipital Structure Sensor Mark II active stereo sensor.
Microsoft Kinect V1 (No longer available new) — The most ubiquitous consumer-grade 3D sensor ever produced. Originally launched to support Xbox gaming, the first Kinect became popular with roboticists and hackers for adding robust 3D sensing at low cost to robots, drones and other platforms. Compatible with OpenNI for depth-powered application development.
PrimeSense Carmine (No longer available new) — The Carmine series is near-identical in specification to the original Microsoft Kinect, and was available in short- and mid-range models. Unlike the original Microsoft Kinect, the Carmine does not include a microphone array. Compatible with OpenNI for depth-powered application development.
Photoneo — Ruggedized 3D sensors for industrial applications. The Photoneo phoXi line is available in multiple baselines.
Zivid — A range of structured light 3D sensors available in multiple baselines, with USB3 connection for the Zivid One series, and CAT 6A Ethernet connection for the Zivid Two.
Time of Flight Sensors
Time of flight (ToF) sensors are effectively LiDAR sensors — aka, light radar. They send out packets of infrared light, and record the time it takes for the infrared light sent out to return. Despite using infrared light sources like structured light sensors, the manner in which ToF sensors transmit and receive light make them less susceptible to interference from bright and indirect sunlight.
ToF sensors often have wider range capabilities than structured light sensors, and can operate accurately for longer distance use cases. However, they aren’t able to return 3D images with the fidelity offered by structured light or stereo 3D sensors. Therefore, they are better suited for tasks like navigation assistance versus surface inspections.
Examples:
Microsoft Azure Kinect — The newest member of the Microsoft Kinect family of 3D sensors is a compact ToF module that adds a 12MP RGB video camera that is registered to the depth stream for high quality color imaging and textures. The Azure Kinect includes a built-in IMU for kinetic data.
PMD Pico — Simple, compact ToF 3D sensors that can be purchased as embeddable modules, or as enclosed standalone sensors.
ASUS Xtion 2 — Increasingly hard-to-find. However, like the original Microsoft Kinect, PrimeSense Carmine and original Occipital Structure Sensor 3D sensors, the Xtion 2 is compatible with OpenNI for depth-powered application development.
Comparing Stereo Sensors to Structured Light to ToF Perception Sensors
So why choose one modality over another? Well, we provided hints in the above descriptions for each modality, but let’s get more concrete on why you might choose one over another.
At the highest level, the primary factors that determine 3D sensing selection are:
Let’s explore each of these factors in detail, and then show how each sensing modality compares.
Sensor Performance
The main parameters we’ll be exploring here are range, resolution and reliability.
Range is perhaps the most important parameter, because different robot tasks are defined by the range in which they occur. For example, mapping environments at high speed for autonomous vehicles is typically left to rotating LiDAR units, as they have ranges that can exceed 100 meters, which allows them to see distant obstacles well before the autonomous vehicle is upon those obstacles. Those same LiDAR units, however, are poorly suited for close visual inspections, as these tasks benefit from sensors with sub-meter ranges that can support enhanced detail capture and accuracy.
For stereo and structured light 3D sensors, range is a function of the baseline between the optical elements on the sensor itself. The rule of thumb is that the wider the baseline, the longer the range. Similarly, a narrower baseline creates a shorter range. So why not just spec a 3D sensor with the widest baseline possible? Well, for one, form factor limitations can preclude this approach. But, more importantly, wider baseline sensors’ sensing ranges start further away from the sensor. That means that a wide baseline sensor might ignore objects that are less than two meters from the sensor. A shorter baseline sensor might not have as long of a range, but it might be able to start sensing within .25 meters of the sensor. Therefore, baseline and the resultant range becomes a tradeoff of how near you need to sense versus how distant you need to sense. You get one or the other, but you can’t get both.
Most 3D sensors have ranges that start at around 0.5–1m from the sensor, and extend to 3–6m at best. Keep in mind that at the furthest reaches of a sensor’s range, resolution and accuracy can start to diminish rapidly. ToF 3D sensors typically have much greater range than structured light or stereo 3D sensors, and can often receive reliable signals from objects that are up to 25m away. However, ToF sensors often suffer from excessive noise, and resolution can be much lower than similarly priced 3D modalities.
Speaking of resolution, this is both a function of range, as well as the nature of the optics used by the sensor. Structured light sensors’ resolution is limited by the density of the pattern that they project (which gets less dense as distance increases), and the resolution of the IR camera that reads the pattern as it falls on objects and environments to be sensed. Stereo sensors’ resolution is limited by the native resolution of the cameras used for the sensor.
For tasks such as navigation, high resolution isn’t strictly necessary, unless objects that might cause an obstruction are very fine (an extension cable, for instance, may be on the edge of detectability for some low-resolution 3D sensors). For tasks such as surface inspections and 3D modeling, high resolution is much more desirable. Because these tasks often occur at close range, many structured light and stereo sensors are well suited to this kind of task. ToF sensors are best suited for mid-distance navigation tasks, or skeletal tracking. These tasks don’t require high-resolution sensing, but benefit from the range and operational speed of ToF sensors.
Lastly, let’s talk about reliability. For sensors, this comes in two forms: data acquisition reliability, and mechanical reliability. Data acquisition reliability comes down to the inherent limitations of each sensor modality. Stereo sensors, for instance, need sufficient ambient light in the visible spectrum to work properly. Conversely, infrared-based sensors like structured light and ToF suffer when exposed to too much ambient light, causing sensor washout and data integrity issues. These sensors also struggle with black surfaces (they absorb IR wavelengths) and shiny surfaces (they scatter projected IR light). Active stereo sensors with infrared pattern projectors offer a “best of both worlds” approach that broaden sensor reliability over a number of environments and lighting conditions.
Mechanical reliability is often the result of compromises inherent to a specific modality, or result from manufacturer cost cutting. Many consumer-grade sensors use USB for data and power transfer; this is fine in a lab or test environment. However, in real-world use, USB can be flaky and error-prone, resulting in intermittent sensor failures and compromised data. Ethernet is much more stable, but is rarely found on consumer-grade sensors. It is much more likely to be found on much more expensive industrial-grade sensors, where it is a prerequisite for a purchase decision.
Impact on Host
The main parameters we’ll be exploring here are power consumption, compute use, heat dissipation, data transmission and form factor.
All robotic hosts have finite amounts of compute and power available. Robots, drones and other vision-enabled devices are often resource balancing acts where multiple sensors, motors, actuators, and other components fight for scarce system resources. Different sensors have different impacts. Sensors with active IR components are much more power hungry than passive stereo 3d sensors, for instance. But passive stereo sensors with high dynamic range (HDR) cameras may use much more compute than a structured light sensor, as the images they capture are much more data rich. Thus, sensor decisions must be made holistically in context of the other components that require system resources (and that subject is complex enough to deserve its own blog post).
Similarly, data transmission and storage can often be limited on robots and drones. Again, the example of using an HDR-equipped 3D sensor can strain even robust WiFi networks given the sheer size of the datastreams being captured and transmitted.
Lastly, there are industrial design considerations that can limit what sensor options can be specified. These considerations include the size and weight of a sensor (particularly important for lightweight devices like drones), as well as a sensor’s ability to dissipate heat generated. The latter is of particular importance to IR-equipped sensors. The laser-based IR components in these sensors rely on operating within a specific temperature range to maintain the bandwidth at which they operate. Deviating by even a few degrees Fahrenheit outside of the operating temperature range can cause significant variations in sensor readings, corrupting the data generated enough to make it useless. Therefore, considerations must be made to ventilate or insulate these sensors, depending on the environments in which they will be used.
Environmental Limitations
The main parameters we’ll be exploring here are ambient lighting, and connectivity. We mentioned above that many of these sensors are sensitive to ambient lighting, but it bears repeating. Lighting challenges are one of the most common causes of 3D sensor failures for robots. A robot that worked exceptionally well in a laboratory environment with consistent lighting conditions will suddenly experience drastic and intermittent failures when placed in a real world context with changing, inconsistent lighting conditions. Therefore, it is important to know all the lighting permutations that may exist upon deployment, and account for those during sensor selection.
Connectivity is also important. If sensor data is to be processed and used locally, then connectivity isn’t an issue. However, for robots that will be deployed in an environment with low bandwidth, then it is either necessary to compress or filter sensor data on the host before transmitting, or it is necessary to choose a low bandwidth sensor type (for instance, ToF). Again, it is important to test for packet size thresholds under real world bandwidth constraints, as laboratory environments may not sufficiently reveal deficiencies.
Synthesizing the Data and Deciding
We’ve put together a handy chart showing different robot tasks to be done, and the 3D sensors that work best for those scenarios. We’ve noted what key decision factors should be considered for each application, and added notes to further clarify why specific choices should be made.
We hope that you’ve found this guide to 3D sensors useful. We've also created an interactive depth sensor visualizer that lets you interactively compare the range and field of view of some of the most popular 3D sensors available. You can explore that here! Our next sensor roundup covers LiDAR sensors, which you can read here.
If you’re currently working on a sensor-enabled product like a robot or a drone, please take a look at what we do at Tangram Vision. We can help you get your product to market faster, and keep it functioning more reliably upon deployment, too.
Tangram Vision helps perception teams develop and scale autonomy faster.