Abstract: In this paper, we propose an object reconstruction apparatus that uses the so-called Generic Primitives (GP) to complete shapes. A GP is a 3D point cloud depicting a generalized shape of a class of objects. To reconstruct the objects in a scene we first fit a GP onto each occluded object to obtain an initial raw structure. Secondly, we use a model-based deformation technique to fold the surface of the GP over the occluded object. The deformation model is encoded within the layers of a Deep Neural Network (DNN), coined GFPNet. The objective of the network is to transfer the particularities of the object from the scene to the raw volume represented by the GP. We show that GFPNet competes with state of the art shape completion methods by providing performance results on the ModelNet and KITTI benchmarking datasets
3D volumetric reconstruction from partial point clouds remains one of the fundamental problem in 3D perception. This is mainly due to occlusions and the resolution of the perception sensors. As a consequence, many researchers focus on 3D reconstruction approaches that uses one or multiple views of the object of interest to fill out the occluded information. The robotics and computer vision communities are tackling the problem mainly by using constrains and prior knowledge of objects shapes.
In this reserach we focus on the specific problem of registering and completing 3D shapes based on sparse and occluded 3D point observations, as illustrated in the figure bellow. To cope with this problem, we propose GFPNet, which is a 2-step 3D volumetric object reconstruction framework from a single view. The method uses a DNN to improve the appearance of a generic volume registered onto the point cloud of a partially observed object. First, we fit a Generic Primitive (GP) onto the 2.5D perceived object to obtain an initial volume. Second, in order to capture the particularities of the perceived object, we model the generic primitive using a DNN.
Let O be a set of 3D points lying on the observed surfaces of an object that is perceived from a single perspective. Let GP be a dense set of 3D points that describe the generic shape of the observed object, while MGP is a clone of the GP’s point cloud whose 3D points have been repositioned. We define the shape completion problem as predicting the MGP given the GP as an initial shape and O as the desired appearance (objective).
The block diagram of the proposed shape completion framework is illustrated bellow. The proposed framework has three stages. In the first stage, we apply the PointRCNN 3D object detector for extracting the objects directly from the raw point cloud depicting the scene. We have chosen this particular detector based on our extensive experiments on the KITTI dataset, PointRCNN showing the best performance when compared to other state-of-the-art methods.
In the second stage, we register a GP onto the observation points in O. For this stage we use PCRNet’s neural network, with the objective of fining the transformation which best aligns two point clouds. As with PointRCNN, we have chosen PCRNet based on its alignment accuracy and computation time, when compared with other techniques, such as Iterative Closest Point (ICP). For computational efficiency, we limit the number of iterations to five, as the network converges rapidly. During testing, we have determined that the average run-time for registering a GP is around three milliseconds for our proposed pipeline. Finally, in the last stage, we apply GFPNet with the purpose of modeling the surface of the GP such that it captures the
The modeling of the entire GP is achieved by applying the GFPNet modeling approach on each GP point. The proposed DNN architecture for modelling 3D surfaces is presented in the bellow image. To achieve the modeling task, the GFPNet architecture uses an encoder-decoder schema, composed of sequences of convolutional (CNN) network layers. The first half of the network behaves as a feature extractor encoding the geometrical particularities of the two inputs (Source and Template ), while the second half behaves as a decoder which regresses towards a modeled version of the source cloud.
Throughout the deformation process GFPNet optimizes two loss functions:
- loss1: minimize the distance between two point cloud densities;
- loss2: ensure a smooth modeled surface.
To demonstrate the performance of GFPNet, we have defined a benchmark dataset based on both synthetic and real data. The benchmark contains synthetic CAD models from the ModelNet database and real 2.5D objects from the KITTI database. We included CAD models for two reasons: to determine a qualitative measure of the reconstruction process, as well as to compare our reconstruction results against shape prior based methods
We evaluate the GFPNet’s performance on the ModelNet test set using the Chamfer Distance (CD). This distance provides a quantitative measure of similarity between the modeled GP and the ground truth shape. The similarity is determined as the average closest point distance between the modeled GP and the ground truth cloud.
Due to the fact that the CD metric can only be used to compare full shapes, as in the case of comparing modelled GFPNet shapes with the CAD models in ModelNet, we have evaluated GFPNet’s performance on the KITTI test set using the following three metrics:
- Fidelity (F): the average distance from each point of the modeled GP to its nearest neighbor in the ground truth;
- Minimal Matching Distance (MMD): the CD between the modeled GP and the ModelNet object point cloud closest to the GP’s points in terms of CD;
- Consistency (C): the average CD between the modeled GPs of the same instance in consecutive frames.
For performance analysis, we address several competing data-driven and learning-based 3D reconstruction systems. The shape retrieval algorithm class is refered as 3DSR. The shape prior approaches are refered as SPrior while the fitting algorithms are refered as ICP. From existing deep learning approaches orbiting around standard encoder decoder neural network architectures we have compared our proposed approach against methods called VRLY and DAI. Qualitative results of the performance analysis are ilustraded in the following image.
A short summary of the researched can be followed in the video below: