Robotics, Vision and Control Laboratory

Gaze following, in the sense of continuously measuring (with a greater or a lesser degree of anticipation) the head pose and gaze direction of an interlocutor so as to determine his/her focus of attention, is important in several important areas of computer vision applications, such as the development of non-intrusive gaze-tracking equipment for psychophysical experiments in Neuroscience, specialized telecommunication devices, Human-Computer Interfaces (HCI) and artificial cognitive systems for Human-Robot Interaction (HRI).

The gaze following image processing chain, depicted in Fig. 1, contains four main steps. We assume that the input is an 8-bit gray-scale image $I = J^{V \times W}$ , of width $V$ and height $W$ , containing a face viewed either from a frontal or profile direction, where $J = \{0, \ldots, 255\}$ . $(v,w)$ represents the 2D coordinates of a specific pixel. The face region is obtained from a face detector.

Fig. 1. Block diagram of the proposed gaze following system for facial feature extraction and 3D gaze orientation reconstruction. Each processing block within the cascade provides a measure of feature extraction quality, fused within the controlled variable $y_f$ .

Firstly, a set of facial features ROI hypotheses $\mathbf{H} \in \{ h_{le}, h_{re}, h_n, h_m \}$ , consisting of possible instances of the left $h_{le}$ and right $h_{re}$ eyes, nose $h_n$ and mouth $h_m$ , are extracted using a local features estimator which determines the probability measure $p(\mathbf{H} | I)$ of finding one of the searched local facial region. The number of computed ROI hypotheses is governed by a probability threshold $T_h$ , which rejects hypotheses with a low $p(\mathbf{H} | I)$ confidence measure. The choice of the $T_h$ threshold is not a trivial task when considering time critical systems, such as the gaze estimator, which, for a successful HRI, has to deliver in real-time the 3D gaze orientation of the human subject. The lower $T_h$ is, the higher the computation time. On the other hand, an increased value for $T_h$ would reject possible "true positive" facial regions, thus leading to a failure in gaze estimation. In order to obtain a robust value for the hypotheses selection threshold, we have chosen to adapt $T_h$ with respect to the confidences provided by the subsequent estimators from Fig. 1, which take as input the facial regions hypotheses. The output probabilities coming from these estimation techniques, that is, the spatial estimator and the GMM for point-wise feature extraction, are used in a feedback manner within the extremum seeking control paradigm.

Once the hypotheses vector $\mathbf{H}$ has been build, the facial features are combined into the spatial hypotheses $\mathbf{g} = {g_0, g_1, \ldots, g_n}$ , thus forming different facial regions combinations. Since one of the main objective of the presented algorithm is to identify facial points of frontal, as well as profile faces, a spatial vector $s_i$ is composed either from four, or three, facial ROIs:

$g_i = \{h_0, h_1, h_2, h_3\} \cap \{h_0, h_1, h_2\}$

where $h_i \in \{ h_{le}, h_{re}, h_n, h_m \}$ .

The extraction of the best spatial features combination can be seen as a graph search problem $g_j = f : G(\mathbf{g}, \mathbf{E}) \rightarrow \Re$ , where $\mathbf{E}$ are the edges of the graph connecting the hypotheses in $\mathbf{g}$ . The considered features combinations are illustrated in Fig. 2. Each combination has a specific spatial probability value $p(g_j | \mathbf{H})$ given by a spatial estimator trained using the spatial distances between the facial features from a training database.

Fig. 2. Different spatial combinations of features used for training the four classifiers. (a) All four facial features. (b,c,d) Cases where only three features are visible in the sample image.

References

S.M. Grigorescu and F. Moldoveanu "Human-Robot Interaction through Robust Gaze Following", Memoirs of the Scientific Sections of the Romanian Academy, 2016 (to be published).

Latex Bibtex Citation


						@inproceedings{grigorescu2016human,

						      author         = {Grigorescu, Sorin M and Macesanu, Gigel},

						      title          = {Human--Robot Interaction Through Robust Gaze Following},

						      booktitle      = {Congress on Information Technology, Computational and Experimental Physics},

						      pages          = {165--178},

						      year           = {2016},

						      organization   = {Springer},

						}

RGazE: Robust Gaze Estimation

References

Latex Bibtex Citation