Understanding BDD100K data

The BDD100K data is newly released for research purpose, the full name is Berkeley DeepDrive 100K data

The data contains the following information:


Data type Data size Data info
Videos 1.8TB 100K video clips
Video Torrent n/a Torrent for the 100K video clips
Info 3.9GB The GPS/IMU information recorded along with the videos
Images 6.5GB 2 Subfolders:

  • 100K labeled key frame images extracted from the videos at 10th second
  • 10K key frames for full-frame semantic segmentation
Labels 147MB Annotations of

  • Road objects
  • lanes
  • drivable areas

in JSON format

Drivable maps 661MB Segmentation maps of drivable areas
Segmentation 1.2GB Full-frame semantic segmentation maps, the corresponding images are in the same folder

Understand source code of libviso2



libviso2 was designed to estimate the motion of a car using wide angle cameras. Cameras with large focal lengths have less overlap between consecutive images, especially on rotations and are therefore not recommended.

Monocular Odometry

In general, monocular odometry and SLAM systems cannot estimate motion or position on a metric scale. All estimates are relative to some unknown scaling factor. libviso2 overcomes this by assuming a fixed transformation from the ground plane to the camera (parameters camera_height and camera_pitch). To introduce these values, in each iteration the ground plane has to be estimated. That is why features on the ground as well as features above the ground are mandatory for the mono odometer to work.

Roughly the steps are the following:

  1. Find F matrix from point correspondences using RANSAC and 8-point algorithm
  2. Compute E matrix using the camera calibration
  3. Compute 3D points and R|t up to scale
  4. Estimate the ground plane in the 3D points
  5. Use camera_height and camera_pitch to scale points and R|t

Unfortunately libviso2 does not provide sufficient introspection to signal if one of these steps fails.

Another problem occurs when the camera performs just pure rotation: even if there are enough features, the linear system to calculate the F matrix degenerates.

Stereo Odometry

In a properly calibrated stereo system 3D points can be calculated from a single image pair. The linear system to calculate camera motion is therefore based on 3D-3D point correspondences. There are no limitations for the camera movement or the feature distribution.

Look into source files of libviso2

All header files and cpp code files are organized in one single folder: libviso2/src, which contains the followings:

demo.cpp includes an off-the-shelf stereo VO process based on libviso2. Based on the demo file, little modifications are required for a user of libviso2.
filter.cpp and filter.h are part of libelas (Library for Efficient Large-scale Stereo Matching). The image processing components, including sobel filters, blob and corner filters, are realized in this file.
matcher.cpp _and _matcher.h: functions and data structures related to storing and matching image features, computing features, removing outliers.
matrix.cpp and matrix.h _elaborate memory allocation, basic input/output from/to matrices, common operations w.r.t. matrices (such as reshape, rotate, add, substract, multiply with matrix, multiply with scalar, divide elementwise by matrix (or vector), divide by scalar, transpose, calculate negative matrix, calculate euclidean norm, calculate mean, cross product, inverse, calculate determinant, etc.) in classMatrix. Note that this part is self-contained and only should be included. The _libviso2 does not use eigen to do matrix calculation.
reconstruction.cpp and reconstruction.h: Given a set of monocular feature matches, the egomotion estimate between the 2 frames, and calibration parameters (intrinsics), Reconstruction tries to compute 3D points. (available only for monocular case?)
timer.h defines classTimer, which contains functions to calculate executing time.
triangle.cpp _and _triangle.h: This file has been modified from “A Two-Dimensional Quality Mesh Generator and Delaunay Triangulator”. The functions in this file is called inMatcher::removeOutliers.
viso_mono.cpp and viso_mono.h present a monocular VO.
viso_stereo.cpp and viso_stereo.h contain constructor, deconstructor, and the whole process in stereo VO.
viso.cpp and viso.h define the base class which is inherited in class VisualOdometryStereo and class VisualOdometryMono.

How to plot training log (for visualizing loss or accuracy)

In caffe/tools/extra there is plot_training_log.py.example

Here is some options to use

 ./plot_log.sh chart_type[0-7] /where/to/save.png /path/to/first.log ...
 1. Supporting multiple logs.
 2. Log file name must end with the lower-cased ".log".
Supported chart types:
 0: Test accuracy vs. Iters
 1: Test accuracy vs. Seconds
 2: Test loss vs. Iters
 3: Test loss vs. Seconds
 4: Train learning rate vs. Iters
 5: Train learning rate vs. Seconds
 6: Train loss vs. Iters
 7: Train loss vs. Seconds

Here is one example of Training loss vs Iters (option 6):


Create a free website or blog at WordPress.com.

Up ↑