>>Multiple Objects Tracking

### Multiple Objects Tracking

###### Problem Formulation

The goal of the proposed tracking system is to track the 3D poses $\phi$ of a set of rigid bodies $\mathbf C$ from RGB-D data streams, as obtained by Kinect, as accurate as possible:

$\hat{\phi} = \arg\max_{\phi} P(\mathbf C, \phi | I, D);$

where $\phi$ represents an affine 3D transformation which maps the 3D pose of an object between two frames and $P$ is the likelihood of $\mathbf C$ and $\phi$ given $I$ and $D$. $I$ and $D$ are the RGB and the depth information, respectively, delivered by the sensor. The tracked objects are defined as 3D point clusters, or Point Distribution Models (PDMs), $\mathbf C = \{ c_0, c_1, \ldots, c_K \}$, where $K$ is the total number of tracked clusters. The pose of each cluster model $c_K$ is given at every frame by its corresponding relative affine transformation $\hat{\phi}_k$. The position of the clusters is related to their centroids.

##### Methodology

The tracking is initialized through tabletop object segmentation, which calculates the 3D reference model, or cluster, $c_K$ of the object of interest. Once $c_K$ is known, its shape is projected into the 2D image, where a classifier for tracking is trained on the resulted projection. Inside the tracking loop, the classifier establishes 2D correspondences between the consecutive frames $t-1$ and $t$, where $t$ represents the discrete time. These 2D image matches are used to select the 3D point correspondences $\hat{m}_k$ between the corresponding point clouds $D[t-1]$ and $D[t]$ in the Kinect data. From $\hat{m}_k$, an initial coarse transform $A_{course}$ for the reference cluster is calculated. Although, depending on the quality of the $\hat{m}_k$ matches, $A_{course}$ can, to some extent, provide good tracking results, it fails to precisely map $c_K$ onto the current cloud $D[t]$. For this reason, a second transform $A_{fine}$ is determined using an Iterative Closest Point (ICP) algorithm applied on non-occluded object points. Thus, the final object model transform $\hat{\phi}$, which tracks the pose of a 3D object between consecutive frames, can be written as:

$\hat{\phi} = $
##### References

S.M. Grigorescu, D. Pangercic and M. Beetz "2D-3D Collaborative Tracking (23CT): Towards Stable Robotic Manipulation", Proceedings of the 2012 IEEE-RSJ International Conference on Intelligent RObots and Systems IROS, Workshop on Active Semantic Perception, Vilamoura, Algarve, Portugal, October 7-12, 2012.

S.M. Grigorescu and C. Pozna "Towards a Stable Robotic Object Manipulation through 2D-3D Features Tracking", International Journal of Advanced Robotic Systems, InTech, vol. 10, no. 200, pp. 1-8, 2013.

##### Latex Bibtex Citation
 @inproceedings{grigorescu2d3d,     author = {Grigorescu, Sorin M and Pangercic, Dejan and Beetz, Michael},     title = {2D--3D Collaborative tracking (23CT): towards stable robotic manipulation},     booktitle = {IEEE-RSJ International Conference on Intelligent Robots and Systems (IROS), Workshop on Active Semantic Perception},     year = {2019},     organization = {Citeseer}, }