Depth Sensor Shootout: Kinect, Leap, Intel and Duo

(Updated Jan 12 2018) Stimulant’s mission is to create “smart spaces” which engage visitors in ways that can’t be duplicated with devices they have in their home or their pocket. We achieve this through a variety of sensors and cameras feeding data into custom software, running on bespoke computing hardware, and outputting to any number of display or projection devices. Because it all begins with the sensing technologies, we spend plenty of time evaluating various products that help us determine how people move through a space. Depth-sensing cameras are a great way to do that, and here we present a comparison of the cameras we’ve been able to get into our lab.

In this article we’ll give brief descriptions of ten different cameras, and end with a comparison of their hardware specifications. We won’t end up with a recommendation for “best camera”, because different devices are suited to different applications. Instead, we’ll help you narrow the field of devices which might work for your situation.

We’ll add additional products as we’re able to get our hands on them. Follow @stimulant or our RSS feed to be notified. If you’re a manufacturer and you’d like your product included here, get in touch at hello@stimulant.com.

The Devices

E-con Systems Tara

Tara is a stereo camera from e-con Systems. It uses two OnSemi MT9V024 sensors to produce a stereo pair of monochrome images that are 10 bit WVGA (752×480) with a 60fps refresh rate over USB 3.0. The two sensors are synced on the device and are delivered together as a single side-by-side image. The camera is backwards-compatible with USB 2.0, but at half the framerate. E-con Systems provides a C++ SDK for Windows and Linux and includes some standard analysis examples using OpenCV such as height estimation, face detection, and point cloud generation. Compared to the Kinect or the RealSense, Tara’s SDK is very lightweight, providing functions to get the disparity map between the left and right eyes, estimate depth at a given point, and set camera parameters such as auto exposure. More advanced image analysis such as skeleton tracking or facial feature tracking would need to be provided by a secondary toolkit.

The raw depth map from the Tara.

Tara relies on ambient lighting for building its depth map; there’s no IR projector here. The physical design of the Tara is intended for very light use as its cast acrylic case has plastic mounting threads, and its lenses are exposed with no shielding.

Tara is a good choice for medium-range indoor applications that can take advantage of ambient-lit stereo pair images where detailed image analysis is not required or is provided by another toolkit.

Occipital Structure

The Structure sensor is designed to attach physically to iOS devices to provide 3D scanning capabilities and enable mixed reality scenarios. There is also some support for Windows, macOS, Linux and Android using the OpenNI 2 project.

Unlike other sensors compared here, it’s not really for tracking people or gestures, but more for scanning and tracking the world itself. Using the meshes generated by the sensor and SDK, it’s possible to create mixed reality experiences in which virtual objects appear to interact with the physical world, with proper occlusion and physics.

The Occiptal Structure attaches to the iPad with a nifty clip.

Orbbec Persee

The Orbbec Persee is an interesting entry in that it pairs a depth camera with a ARM based SOC. This allows for complete system with low power consumption and a small form factor. The sensor itself is the exact same as the Astra Pro and is programed the exact same way, using either OpenNI2 or the Astra SDK which is the preferred approach due to internal optimization not present in the OpenNI2 SDK. The SOC supports both Android and Ubuntu 14.04 and comes preloaded with Android. As of this writing the SDK is still only available for C++ and Java via JNI bindings. Many of the examples have not been ported over to Android or ARM Linux and documentation is very sparse so be prepared to go digging in the forums if you have an issue. One of the most exciting features was that we were able to stream the a depth image and point cloud over the network using ROS and the gigabit ethernet link. The ability to simply stream depth data over the network resolves a key pain point for many of our projects, namely USB extension.

The Orbbec Persee is good for distributed sensing solutions where direct access via C++ is helpful and localized processing can reduce your hardware costs.

Intel SR300

The SR300 is the spiritual successor to the F200. The SR300 does everything the F200 does but with better quality and accuracy. We found the depth feed from this camera less noisy than that from the F200. So even though though they have the same resolution the SR300 performed significantly better at tasks such as 3D face tracking. The packaging for this device is a bit unusual, while it has a standard 1/4 in. mount, if mounted horizontally on a tripod there was no way to tilt the camera up. A nice feature was the removable USB3 cord which allows users to use a longer or shorter cords based on their needs. The SR300 is compatible with the RealSense SDK which is extremely capable in it’s current iteration and provides very good documentation and examples for a number of platforms and languages including face tracking, hand tracking, and user background segmentation.

The Intel RealSense SR300 is good for medium-range indoor applications developed in a variety of frameworks, especially for tracking faces or for augmented reality experiences.

Orbbec Astra

Orbbec is the newest entrant into the 3D camera space, but the team has been at it for a while. One of the company’s founders also kickstarted the open-source hacking of the original Kinect in 2011. Their first products are the Astra and Astra Pro, which are both infrared depth sensors with a 640×480 resolution at 30FPS, but the pro version has an enhanced RGB camera. The SDK is rather basic though, supporting only the older C++ OpenNI framework. Support for openFrameworks, Cinder, and Unity 3D is said to be forthcoming. The SDK supports basic hand tracking which can be used for gestural interfaces, but not full skeleton tracking. The unit can sense as far as 8 meters away, which beats the range of most other sensors.

The Orbecc Astra is a good choice for longer-range indoor applications developed in C++, where raw point cloud data or hand positions are needed for interaction.

Intel RealSense R200

Intel’s RealSense cameras are meant to be integrated into to OEM products, but the developer toolkits are available for use in installation projects. The R200 product is the second RealSense product to ship from Intel, and it’s a tiny USB 3 device with an infrared sensing range of about .5m-3.5m. The “R” is for rear-facing, meaning its primary use case is to be integrated into the back of a tablet or laptop display. The SDK is quite robust, supporting C++, C#, JavaScript, Processing, Unity, and Cinder. The SDK supports face and expression tracking, but not hand tracking or full skeletons. The device really comes into its own when the camera in motion for augmented reality or 3D scanning applications.

The Intel RealSense R200 is good for medium-range indoor applications developed in a variety of frameworks, especially for tracking faces or for augmented reality experiences.

Stereolabs ZED Stereo Camera

The Stereolabs ZED product is unique among this list as it does not use infrared light for sensing, but rather a pair of visible light sensors to produce a stereo image, which is then delivered to software as a video stream of depth data. It works well outdoors to a depth of 20 meters and provides a high-resolution depth image of up to 2208×1242 at 15FPS, or VGA at 120FPS. While the hardware is quite powerful, the provided SDK is pretty limited to simply capturing the depth stream, without any higher-level interpretation. Any tracking of objects, hands, faces, or bodies would need to be implemented by the developer.

The Zed Stereo camera is great for high frame rate, outdoor, or long range applications which only require a raw depth stream.

Intel RealSense F200

The F200 version of the RealSense product is meant to be front-facing, and excels at tracking faces, hands, objects, gestures, and speech. It’s meant to be mounted to the front of a display or tablet and has a sensing range of about 0.2m-1.2m and a 60FPS VGA depth stream. The SDK is quite robust, supporting C++, C#, JavaScript, Processing, Unity, and Cinder.

The Intel RealSense F200 is a good choice for short-range applications that rely on tracking the face and hands of a single user.

Microsoft Kinect for XBox One

The second generation of the Kinect hardware is a beast — it’s physically the largest sensor we’ve looked at, and it requires a dedicated USB 3.0 bus and its own power source. For all that, you get a wider field of view and very clean depth data at a range of .5m-4.5m, further away if you can put up with some noise in the data. Where Microsoft really shines is in the quality of the SDK, which provides full skeleton tracking of six people simultaneously, basic hand open/close gestures, and face tracking. The SDK works out-of-the-box with Microsoft application frameworks, but the Kinect Common Bridge enables support for Cinder and openFrameworks, and Microsoft provides a plugin for Unity 3D. On the downside, it’s tough to extend the device very far from the host computer, you can only use one sensor per computer, and only on Windows 8.

The Kinect for XBox One is great for medium-range tracking of multiple skeletons and faces in a space, and works with most popular application frameworks, but the sensor must be located close to the host computer.

DUO mini lx

The DUO mini lx camera is a tiny USB-powered stereo infrared camera that provides high-frame-rate depth sensing to a range of about 3m. It includes IR emitters for indoor use, but can be run in a passive mode to accept ambient infrared light — meaning it can be used outdoors in sunlight. The Dense3D SDK provides a basic depth map via a C interface, but no higher-level tracking of hands, faces, or skeletons. It does however work on OS X and Linux, and even ARM-based systems.

The DUO mini lx camera is great for high frame rate or outdoor C/C++ applications which only require raw depth data.

Leap Motion Controller

The Leap Motion Controller is a small, specialized device just for tracking hand joints. The original use case was to place it in front of a screen. Hands and fingers above it are tracked, and can be used for gestural control of software. This still works, but the newer use case is to bolt it to the front of a VR headset like the Oculus Rift to enable the tracking of hands in VR, which lets you interact with virtual objects in the VR scene. The SDK provides 3D positions of the joints of two hands at a high frame rate within a range of .6m, and integrates with nearly any framework you’d like to use. It does not provide any IR, RGB, or point cloud data.

The Leap Motion Controller is a great choice if you only want to track a pair of human hands with high speed and accuracy.

Microsoft Kinect for XBox 360

The original Kinect sensor is still supported by Microsoft, but the hardware was discontinued early in 2015. If you can find the hardware, the sensor is still very useful for a variety of applications. The sensor works indoors to a range of about 4.5m and can track the skeletons of two people simultaneously. At closer range it supports face tracking and speech detection as well. The official SDK supports only Microsoft platforms, but the community has implemented support for Cinder and other frameworks. Web applications can use Kinect data via a socket driver provided by Microsoft. The sensor connects via USB 2 and requires its own power source, but we’ve experimented by connecting up to 16 of them to one PC to create a huge sensing area.

RIP Kinect v1. You were great for fairly accurate indoor tracking of skeletons and point clouds.

Specification Comparison

(scroll horizontally)

E-con Systems Tara Occipital Structure Orbbec Persee RealSense SR300 Orbbec Astra RealSense R200 ZED Stereo Camera RealSense F200 Kinect for XBox One DUO mini lx Leap Motion Kinect for XBox 360
Released July 2016 February 2014 December 2016 March 2016 September 2015 September 2015 May 2015 January 2015 July 2014 May 2013 October 2012 June 2011
Price $250 $499 (bundle) $240 $150 $150 $99 $449 $99 $100 $695 $100 Unavailable
Tracking Method Stereo monochrome cameras IR IR IR IR IR Stereo RGB cameras IR IR Passive IR IR IR
Range 0.5m – 3m 0.4m – 3.5m 0.4m – 8m 0.2m – 1.2m 0.4m – 8m 0.5m – 3.5m 1.5m – 20m 0.2m – 1.2m 0.5m – 4.5m 0.3m – 2.4m 0.025m – 0.6m 0.4m – 4.5m
RGB Image N/A iOS Camera resolution 1280×720, 30 FPS 1920×1080, 30 FPS 1280×960, 10 FPS 1920×1080, 30 FPS configurable between 1280×480, 120 FPS and 4416×1242, 15 FPS 1920×1080, 30 FPS 1920×1080, 30 FPS configurable between 320×120, 360 FPS and 752×480, 56 FPS N/A 640×480, 30 FPS
Depth Image 752×480, 640×480, 320×240 30/60FPS, 10bit 640×480 at 30fps, 320×240 at 60fps 640×480, 16 bit, 30 FPS 640×480, 60 FPS 640×480, 16 bit, 30 FPS 640×480, 60 FPS configurable between 640×480, 120 FPS and 2208×1242, 15 FPS 640×480, 60 FPS 512×424, 30 FPS configurable between 320×120, 360 FPS and 752×480, 56 FPS 20 to 200+ FPS 320×240, 30 FPS
Connectivity USB 3.0 & USB 2.0 Lightning on iOS, USB elsewhere Ethernet USB 3.0 USB 2.0 USB 3.0 USB 3.0 USB 3.0 USB 3.0 USB 2.0 USB 2.0 USB 2.0
Physical Dimensions 100×30×35 mm 119.2×28×29 mm 172×63×56 mm 14×20×4 mm 160×30×40 mm 130×20×7 mm 175×30×33 mm 150×30×58 mm 250×66×67 mm 52×25×11 mm 76×30×17 mm 280×64×38 mm
Works outdoors?
Skeleton tracking? ✓ (only hand positions) ✓ (six skeletons) ✓ (two skeletons)
Facial tracking? ✓ (detection only) soon
3D scanning?
Simultaneous apps? soon
Gesture Training? ✓ (Visual Gesture Builder) ✗ (only via third-party tools)
Gesture Detection? ✓ (hand open, closed, lasso) ✓ (hand grip, release, press, scroll)
Toolkits C++, ROS iOS, Unity3D, OpenNI C++, Java, OpenNI, ROS Java, JavaScript, Processing, Unity3D, Cinder OpenNI Java, JavaScript, Processing, Unity3D, Cinder Java, JavaScript, Processing, Unity3D, Cinder WPF, Cinder, OpenFrameworks, JavaScript, vvvv, Processing, Unity3D, more Dense3D, OpenCV, Qt5 Javascript, Oculus Rift, Unity3D, Unreal WPF, Cinder, OpenFrameworks, JavaScript, vvvv, Processing, Unity3D, more
Project Examples Face tracking and various depth analysis examples, C++ only Lots of examples on 3D scanning, mixed reality and indoor navigation Depth Data Viewer, RGB Data Viewer Various face tracking examples, mostly with C++. Only one Unity3D sample. HandViewer, Depth Data Viewer, RGB Data Viewer Various face tracking examples, mostly with C++. Only one Unity3D sample. Background subtraction, right image disparity, depth map Many examples of face tracking, gesture tracking, speech detection on a variety of different platforms and frameworks Many examples of skeleton tracking, face tracking, and speech detection on a variety of different platforms and frameworks Very few samples in each of the supported languages, mostly to get raw image and depth data There are a number of examples available for each of the lanugaes and platforms supported Many examples of skeleton tracking, face tracking, and speech detection on a variety of different platforms and frameworks