FAQ – Allied Vision Technologies: Nerian

What is stereo vision?– A definition

In human sciences stereo vision or stereoscopic vision is the ability to obtain a spatial visual impression with both eyes through image comparison. When viewing an object, each eye looks up from a different angle. Each eye filters the information and sends it to the brain, where both visual impressions are processed into a combined image. This is what creates our three-dimensional depth perception.

What is computer stereo vision?

Computer stereo vision mimics human depth perception. With a stereo camera two images are recorded synchronously and compared. The distance information between the camera and the object observed is lost by taking an image with a conventional camera, but this depth information can be recovered by several images taken from different known camera directions and their comparison.

What is stereo matching?

Stereo matching is the applied process in computer vision for stereo image comparison. The operation is based on finding all the pixels in the stereo images that correspond to the same 3D point in the captured scene.

In order to obtain the 3-dimensional points, once all correspondences were found, a triangulation has to be calculated by taking into account the intrinsic and extrinsic geometry of the cameras and their calibration.

With Nerian’s SceneScan this process is performed on an FPGA, speeding up the calculation process by magnitudes and providing real-time 3D-measurement!

What is stereo disparity?

In the context of stereo vision and depth image evaluation, disparity is the offset of the positions thatan object occupies on two different image planes.

If we stick to the example of human vision, our eyes are located at different lateral positions, which causes two slightly different image perceptions. So with the left eye we see more of the left side of an object, while with our right eye we see more of the right side. This positional difference is called disparity. You can check that yourself if you, for instance, close one eye and look at an object. Then close your eye, while open the other one. You will see that the object is detected at a slightly different position. The nearer an object, the higher the difference.

For determining the disparity in machine vision, stereo matching makes use of comparing all matching pixel columns.

What is the disparity range in stereo vision?

The disparity range in stereo vision specifies the overlapping image region that is searched for pixel correspondences. The larger the disparity range is selected, the more accurate the measurement results will get. But a large disparity range causes a high computational load, and thus lowers the achievable frame rate.

What is the difference between a small and a large disparity range? What are the recommendations for the disparity range while using SceneScan or Scarlet?

The selection of a small disparity range is suited for high speed measurement applications, while the selection of a large disparity range is appropriate for closer range measurements and if higher accuracy is required. Nevertheless, a larger disparity range increases the invalid left border where the camera image cannot be compared.

With SceneScan we offer a configurable disparity range from 32 to 256 pixels. We recommend the following combinations of resolution, disparity range and frame rate:

Model	Disparity	Image Resolution
	Range	640 × 480	800 × 592	1024 × 768	1600 × 1200	2016 × 1536
SceneScan	64 pixels	45 fps	30 fps	n/a	n/a	n/a
monochrome	128 pixels	30 fps	20 fps	n/a	n/a	n/a
SceneScan Pro	128 pixels	135 fps	90 fps	55 fps	22 fps	13 fps
monochrome	256 pixels	75 fps	53 fps	34 fps	12 fps	7 fps
SceneScan Pro	128 pixels	80 fps	53 fps	32 fps	13 fps	8 fps
color	256 pixels	72 fps	49 fps	32 fps	12 fps	7 fps

For Scarlet we have the following combinations:

	Image resolution
Disparity range	832 × 608	1024 × 768	1216 × 1024	2432 × 2048
256 pixels	120 fps	84 fps	55 fps	15 fps
512 pixels	n/a	n/a	38 fps	11 fps

What is the difference between a disparity map and a 3D point cloud?

Disparity is the apparent displacement of pixels in a stereo image pair (see What is disparity?). The disparity map is computed from this image pair and contains the depth information of each recorded pixel. Structurally it is a 2D image, so many 2D image processing algorithms are applicable.
In contrast, a 3D point cloud provides 3D coordinates for each pixel, which allows actual 3D measurements. Nevertheless the 3D output requires more complex algorithms and more data than disparity maps, causing slower image processing.
For faster results, working with disparity maps should be preferred over 3D point clouds.

How does the SceneScan stereo vision sensor work, how does the Scarlet 3D depth camera work?

SceneScan and SceneScan Pro are embedded image processing systems for real-time stereo matching. SceneScan connects to a dedicated stereo camera or two industrial USB cameras, which are mounted at different viewing positions.

SceneScan sends out trigger signals in order to synchronously capture stereoscopic image pairs from the connected cameras. By correlating the image data from both cameras on the integrated FPGA, SceneScan can infer the depth of the observed scene. The computed depth map is transmitted through gigabit ethernet to a connected computer or another embedded system. SceneScan captures up to 100 frames per second, providing depth measuring in real time.

Our Scarlet 3D depth camera works on the same principle, but combines a 3D stereo camera and image processing in one device. A particularly powerful FPGA allows a processing performance of up to 120 fps, over 70 million 3D points per second and that at a resolution of up to 5 megapixels.

How do your 3D stereo vision sensors differ from other depth and 3D sensors?

In contrast to conventional 3D sensing solutions, our sensors passively. This means that no light needs to be emitted for performing measurements. This makes our stereo vision solutions particularly robust towards the illumination conditions, and it facilitates long-range measurements, the use of multiple sensors with overlapping field of views, and a flexible reconfiguration of the system for different measurement ranges.
Furthermore we provide faster, higher-resolution and higher quality depth maps, because we can harness the high computational capabilities of a FPGA and use high-quality image sensors. Because all image processing is done on SceneScan or Scarlet, there is also no computational load on the host PC.

Compared to LiDAR sensors, what is the advantage of your 3D stereo vision sensor?

Our big advantage compared to LiDARs is that we have a much higher vertical resolution (typical LiDARs only measure in up to 64 rows). In some applications this can result in objects with a low height not being detected.

In comparison with GPU-based image processing, what are the advantages of FPGA-based image processing?

Lower power consumption
Smaller size
Faster results and more reliable timing because no other operating system or software are competing for resources

What is the output of SceneScan or Scarlet?

SceneScan and Scarlet deliver the stereo matching results in the form of a disparity map from the perspective of the left camera. The disparity map associates each pixel in the left camera image with a corresponding pixel in the right camera image. Because both images were previously rectified to match an ideal stereo camera geometry, corresponding pixels should only differ in their horizontal coordinates. The disparity map thus only encodes a horizontal coordinate difference. Additionally the left camera image is also output. 3D point clouds are possible as well.

What is the difference between SceneScan Pro and SceneScan?

Two different models exist for the given image processing system: SceneScan and SceneScan Pro. Both models provide the same functionality, however, SceneScan Pro has significantly more computational power when compared to SceneScan. This means that SceneScan Pro can process a given input stereo image much faster than SceneScan.

Thanks to the additional processing power, SceneScan Pro is also capable of processing higher image resolutions, color images and larger disparity ranges. Due to these benefits, SceneScan Pro can achieve a higher measurement accuracy than SceneScan.

You can find a brief comparison between both 3D sensor devices on the SceneScan product page.

A detailed comparison of the achievable frame rates at different image resolutions and disparity range can be found here.

I am interested in SceneScan. Is it possible to acquire it without your Karmin stereo camera?

Yes, you can purchase SceneScan without our stereoscopic camera. Nevertheless you need a stereo camera configuration for usage. Thus you need to acquire third party cameras. For recommendations on cameras please see “Can I use third party cameras?”

What are the maximal achievable frame rates and resolutions?

The maximum frame rate that can be achieved depends on the image size and the configured disparity range. The following table provides a list of recommended configurations. This is only a subset of the available configuration space. Differing image resolutions and disparity ranges can be used to meet specific application requirements. Here is an overview for SceneScan:

Model	Disparity	Image Resolution
	Range	640 × 480	800 × 592	1024 × 768	1600 × 1200	2016 × 1536
SceneScan	64 pixels	45 fps	30 fps	n/a	n/a	n/a
monochrome	128 pixels	30 fps	20 fps	n/a	n/a	n/a
SceneScan Pro	128 pixels	135 fps	90 fps	55 fps	22 fps	13 fps
monochrome	256 pixels	75 fps	53 fps	34 fps	12 fps	7 fps
SceneScan Pro	128 pixels	80 fps	53 fps	32 fps	13 fps	8 fps
color	256 pixels	72 fps	49 fps	32 fps	12 fps	7 fps

Our recommendations for Scarlet are as follows:

	Image resolution
Disparity range	832 × 608	1024 × 768	1216 × 1024	2432 × 2048
256 pixels	120 fps	84 fps	55 fps	15 fps
512 pixels	n/a	n/a	38 fps	11 fps

How can I save the point cloud or disparity map?

You can save the 3D point cloud as PLY files, which can be viewed with e.g. MeshLab or CloudCompare. You can do this with our NVCom application if you check the “3D” icon. You can also use our API to write PLY files with the method Reconstruct3D::writePlyFile()

Is a white balance performed during pre-processing? Is a focus/auto focus required for this application?

For the white balance, you can select different presets, or an auto mode.
Fix-Focus lenses should be used, as a change of focus will slightly change the focal length with all lenses. Since the cameras are calibrated exactly, the focal length must not be changed afterwards.

What is the latency time (incl. image caputre) with SceneScan?

The latency time depends on the chosen configuration.

Typically it is:

the time between two frames + approx. 9 ms

until a processed frame is fully received by the host computer.

Is time synchronization possible?

Each pair of rectified left camera image and disparity map, which is transmitted by SceneScan, also includes a timestamp and a sequence number. The timestamp is measured with microsecond accuracy and is set to either the time at which a camera trigger signal was generated or the time at which a frame was received from the cameras. Additionally we offer various options for clock synchronization. The preferred option is PTP. (= Precision Time Protocol)

How can I increase the frame rates?

You can increase the frame rate if you reduce the resolution. This can be done by cropping the image in an arbitrary aspect ratio. When doing so please always keep in mind the following recommendations on frame rates and resolution for SceneScan:

Model	Disparity	Image Resolution
	Range	640 × 480	800 × 592	1024 × 768	1600 × 1200	2016 × 1536
SceneScan	64 pixels	45 fps	30 fps	n/a	n/a	n/a
monochrome	128 pixels	30 fps	20 fps	n/a	n/a	n/a
SceneScan Pro	128 pixels	135 fps	90 fps	55 fps	22 fps	13 fps
monochrome	256 pixels	75 fps	53 fps	34 fps	12 fps	7 fps
SceneScan Pro	128 pixels	80 fps	53 fps	32 fps	13 fps	8 fps
color	256 pixels	72 fps	49 fps	32 fps	12 fps	7 fps

For Scarlet:

	Image resolution
Disparity range	832 × 608	1024 × 768	1216 × 1024	2432 × 2048
256 pixels	120 fps	84 fps	55 fps	15 fps
512 pixels	n/a	n/a	38 fps	11 fps

Is there an impact of the disparity map resolution on fps and latency? Is there a frame-rate vs. max range trade-off?

There is no trade off between frame rate and maximum measurement range. The real trade off is between image resolution and frame rate. The image resolution does not affect the depth resolution (if the disparity range is kept equal).

Is it possible to calculate the disparity map on a sub sampled image (e.g. with higher fps), but also send the full resolution input image?

Unfortunately this is not possible with our solution.

You offer automatic re-calibration. How does the automatic re-calibration work?

For stereo vision it is very important that the cameras are precisely calibrated and that there will be no mechanical movements after the calibration is performed. Tiny deformations can otherwise severely affect the calibration and disrupt the image processing results.

This is the reason why we implemented the auto re-calibration feature. This tracks the camera calibration at runtime and continuously updates it. For this to work, the software monitors natural landmark features that the camera identifies. There is an older YouTube video (from our per-decessor system SP1) on our channel that demonstrates this technology:

https://www.youtube.com/watch?v=2QGnOwfQKYo

The auto re-calibation only adjusts the most critical calibration parameters. It is thus still necessary to perform a full manual calibration. In SceneScan’s default configuration the auto re-calibration adjusts the calibration parameters approx. every 2 minutes, but it can be configured more aggressively.

If you want to use our IP core, the auto-calibration is not implemented in the FPGA. We are performing this step in software on an ARM CPU (which is part of the Zynq SoC that we use, but you could also use a separate CPU). The source code for the auto-calibration is provided with the IP core license.

Do you have a recommendation for reducing the LED flickering?

With our system it is possible to reduce effects of flickering by setting the frame rate to a multiple of the power grid frequency (e.g. 50Hz). So 50 or 25 fps should work without flickering. With LEDs we found that there are some lights that flicker and others don’t.

Is there any function to adapt to a sudden change of environment lightning conditions, for example when exiting a tunnel?

We have an auto exposure algorithm implemented to quickly adapt the exposure and gain settings to varying lighting conditions. So this is not a problem.

FAQ
Frequently Asked Questions

FAQ - Frequently Asked Questions

Table of contents

1. Basic questions about stereo vision

2. Questions about our Scarlet 3D depth camera and the SceneScan 3D depth sensor

3. Before use: features, feasibility, usage

4. Stereo vision in use: technical answers

5. Questions about the Karmin 3D camera series

6. General questions about the company and processes

FAQFrequently Asked Questions