surface_matching.hpp 22 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402
  1. //
  2. // IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
  3. //
  4. // By downloading, copying, installing or using the software you agree to this license.
  5. // If you do not agree to this license, do not download, install,
  6. // copy or use the software.
  7. //
  8. //
  9. // License Agreement
  10. // For Open Source Computer Vision Library
  11. //
  12. // Copyright (C) 2014, OpenCV Foundation, all rights reserved.
  13. // Third party copyrights are property of their respective owners.
  14. //
  15. // Redistribution and use in source and binary forms, with or without modification,
  16. // are permitted provided that the following conditions are met:
  17. //
  18. // * Redistribution's of source code must retain the above copyright notice,
  19. // this list of conditions and the following disclaimer.
  20. //
  21. // * Redistribution's in binary form must reproduce the above copyright notice,
  22. // this list of conditions and the following disclaimer in the documentation
  23. // and/or other materials provided with the distribution.
  24. //
  25. // * The name of the copyright holders may not be used to endorse or promote products
  26. // derived from this software without specific prior written permission.
  27. //
  28. // This software is provided by the copyright holders and contributors "as is" and
  29. // any express or implied warranties, including, but not limited to, the implied
  30. // warranties of merchantability and fitness for a particular purpose are disclaimed.
  31. // In no event shall the Intel Corporation or contributors be liable for any direct,
  32. // indirect, incidental, special, exemplary, or consequential damages
  33. // (including, but not limited to, procurement of substitute goods or services;
  34. // loss of use, data, or profits; or business interruption) however caused
  35. // and on any theory of liability, whether in contract, strict liability,
  36. // or tort (including negligence or otherwise) arising in any way out of
  37. // the use of this software, even if advised of the possibility of such damage.
  38. #ifndef __OPENCV_SURFACE_MATCHING_HPP__
  39. #define __OPENCV_SURFACE_MATCHING_HPP__
  40. #include "surface_matching/ppf_match_3d.hpp"
  41. #include "surface_matching/icp.hpp"
  42. /** @defgroup surface_matching Surface Matching
  43. Note about the License and Patents
  44. -----------------------------------
  45. The following patents have been issued for methods embodied in this
  46. software: "Recognition and pose determination of 3D objects in 3D scenes
  47. using geometric point pair descriptors and the generalized Hough
  48. Transform", Bertram Heinrich Drost, Markus Ulrich, EP Patent 2385483
  49. (Nov. 21, 2012), assignee: MVTec Software GmbH, 81675 Muenchen
  50. (Germany); "Recognition and pose determination of 3D objects in 3D
  51. scenes", Bertram Heinrich Drost, Markus Ulrich, US Patent 8830229 (Sept.
  52. 9, 2014), assignee: MVTec Software GmbH, 81675 Muenchen (Germany).
  53. Further patents are pending. For further details, contact MVTec Software
  54. GmbH (info@mvtec.com).
  55. Note that restrictions imposed by these patents (and possibly others)
  56. exist independently of and may be in conflict with the freedoms granted
  57. in this license, which refers to copyright of the program, not patents
  58. for any methods that it implements. Both copyright and patent law must
  59. be obeyed to legally use and redistribute this program and it is not the
  60. purpose of this license to induce you to infringe any patents or other
  61. property right claims or to contest validity of any such claims. If you
  62. redistribute or use the program, then this license merely protects you
  63. from committing copyright infringement. It does not protect you from
  64. committing patent infringement. So, before you do anything with this
  65. program, make sure that you have permission to do so not merely in terms
  66. of copyright, but also in terms of patent law.
  67. Please note that this license is not to be understood as a guarantee
  68. either. If you use the program according to this license, but in
  69. conflict with patent law, it does not mean that the licensor will refund
  70. you for any losses that you incur if you are sued for your patent
  71. infringement.
  72. Introduction to Surface Matching
  73. --------------------------------
  74. Cameras and similar devices with the capability of sensation of 3D structure are becoming more
  75. common. Thus, using depth and intensity information for matching 3D objects (or parts) are of
  76. crucial importance for computer vision. Applications range from industrial control to guiding
  77. everyday actions for visually impaired people. The task in recognition and pose estimation in range
  78. images aims to identify and localize a queried 3D free-form object by matching it to the acquired
  79. database.
  80. From an industrial perspective, enabling robots to automatically locate and pick up randomly placed
  81. and oriented objects from a bin is an important challenge in factory automation, replacing tedious
  82. and heavy manual labor. A system should be able to recognize and locate objects with a predefined
  83. shape and estimate the position with the precision necessary for a gripping robot to pick it up.
  84. This is where vision guided robotics takes the stage. Similar tools are also capable of guiding
  85. robots (and even people) through unstructured environments, leading to automated navigation. These
  86. properties make 3D matching from point clouds a ubiquitous necessity. Within this context, I will
  87. now describe the OpenCV implementation of a 3D object recognition and pose estimation algorithm
  88. using 3D features.
  89. Surface Matching Algorithm Through 3D Features
  90. ----------------------------------------------
  91. The state of the algorithms in order to achieve the task 3D matching is heavily based on
  92. @cite drost2010, which is one of the first and main practical methods presented in this area. The
  93. approach is composed of extracting 3D feature points randomly from depth images or generic point
  94. clouds, indexing them and later in runtime querying them efficiently. Only the 3D structure is
  95. considered, and a trivial hash table is used for feature queries.
  96. While being fully aware that utilization of the nice CAD model structure in order to achieve a smart
  97. point sampling, I will be leaving that aside now in order to respect the generalizability of the
  98. methods (Typically for such algorithms training on a CAD model is not needed, and a point cloud
  99. would be sufficient). Below is the outline of the entire algorithm:
  100. ![Outline of the Algorithm](img/outline.jpg)
  101. As explained, the algorithm relies on the extraction and indexing of point pair features, which are
  102. defined as follows:
  103. \f[\bf{{F}}(\bf{{m1}}, \bf{{m2}}) = (||\bf{{d}}||_2, <(\bf{{n1}},\bf{{d}}), <(\bf{{n2}},\bf{{d}}), <(\bf{{n1}},\bf{{n2}}))\f]
  104. where \f$\bf{{m1}}\f$ and \f$\bf{{m2}}\f$ are feature two selected points on the model (or scene),
  105. \f$\bf{{d}}\f$ is the difference vector, \f$\bf{{n1}}\f$ and \f$\bf{{n2}}\f$ are the normals at \f$\bf{{m1}}\f$ and
  106. \f$\bf{m2}\f$. During the training stage, this vector is quantized, indexed. In the test stage, same
  107. features are extracted from the scene and compared to the database. With a few tricks like
  108. separation of the rotational components, the pose estimation part can also be made efficient (check
  109. the reference for more details). A Hough-like voting and clustering is employed to estimate the
  110. object pose. To cluster the poses, the raw pose hypotheses are sorted in decreasing order of the
  111. number of votes. From the highest vote, a new cluster is created. If the next pose hypothesis is
  112. close to one of the existing clusters, the hypothesis is added to the cluster and the cluster center
  113. is updated as the average of the pose hypotheses within the cluster. If the next hypothesis is not
  114. close to any of the clusters, it creates a new cluster. The proximity testing is done with fixed
  115. thresholds in translation and rotation. Distance computation and averaging for translation are
  116. performed in the 3D Euclidean space, while those for rotation are performed using quaternion
  117. representation. After clustering, the clusters are sorted in decreasing order of the total number of
  118. votes which determines confidence of the estimated poses.
  119. This pose is further refined using \f$ICP\f$ in order to obtain the final pose.
  120. PPF presented above depends largely on robust computation of angles between 3D vectors. Even though
  121. not reported in the paper, the naive way of doing this (\f$\theta = cos^{-1}({\bf{a}}\cdot{\bf{b}})\f$
  122. remains numerically unstable. A better way to do this is then use inverse tangents, like:
  123. \f[<(\bf{n1},\bf{n2})=tan^{-1}(||{\bf{n1} \wedge \bf{n2}}||_2, \bf{n1} \cdot \bf{n2})\f]
  124. Rough Computation of Object Pose Given PPF
  125. ------------------------------------------
  126. Let me summarize the following notation:
  127. - \f$p^i_m\f$: \f$i^{th}\f$ point of the model (\f$p^j_m\f$ accordingly)
  128. - \f$n^i_m\f$: Normal of the \f$i^{th}\f$ point of the model (\f$n^j_m\f$ accordingly)
  129. - \f$p^i_s\f$: \f$i^{th}\f$ point of the scene (\f$p^j_s\f$ accordingly)
  130. - \f$n^i_s\f$: Normal of the \f$i^{th}\f$ point of the scene (\f$n^j_s\f$ accordingly)
  131. - \f$T_{m\rightarrow g}\f$: The transformation required to translate \f$p^i_m\f$ to the origin and rotate
  132. its normal \f$n^i_m\f$ onto the \f$x\f$-axis.
  133. - \f$R_{m\rightarrow g}\f$: Rotational component of \f$T_{m\rightarrow g}\f$.
  134. - \f$t_{m\rightarrow g}\f$: Translational component of \f$T_{m\rightarrow g}\f$.
  135. - \f$(p^i_m)^{'}\f$: \f$i^{th}\f$ point of the model transformed by \f$T_{m\rightarrow g}\f$. (\f$(p^j_m)^{'}\f$
  136. accordingly).
  137. - \f${\bf{R_{m\rightarrow g}}}\f$: Axis angle representation of rotation \f$R_{m\rightarrow g}\f$.
  138. - \f$\theta_{m\rightarrow g}\f$: The angular component of the axis angle representation
  139. \f${\bf{R_{m\rightarrow g}}}\f$.
  140. The transformation in a point pair feature is computed by first finding the transformation
  141. \f$T_{m\rightarrow g}\f$ from the first point, and applying the same transformation to the second one.
  142. Transforming each point, together with the normal, to the ground plane leaves us with an angle to
  143. find out, during a comparison with a new point pair.
  144. We could now simply start writing
  145. \f[(p^i_m)^{'} = T_{m\rightarrow g} p^i_m\f]
  146. where
  147. \f[T_{m\rightarrow g} = -t_{m\rightarrow g}R_{m\rightarrow g}\f]
  148. Note that this is nothing but a stacked transformation. The translational component
  149. \f$t_{m\rightarrow g}\f$ reads
  150. \f[t_{m\rightarrow g} = -R_{m\rightarrow g}p^i_m\f]
  151. and the rotational being
  152. \f[\theta_{m\rightarrow g} = \cos^{-1}(n^i_m \cdot {\bf{x}})\\
  153. {\bf{R_{m\rightarrow g}}} = n^i_m \wedge {\bf{x}}\f]
  154. in axis angle format. Note that bold refers to the vector form. After this transformation, the
  155. feature vectors of the model are registered onto the ground plane X and the angle with respect to
  156. \f$x=0\f$ is called \f$\alpha_m\f$. Similarly, for the scene, it is called \f$\alpha_s\f$.
  157. ### Hough-like Voting Scheme
  158. As shown in the outline, PPF (point pair features) are extracted from the model, quantized, stored
  159. in the hashtable and indexed, during the training stage. During the runtime however, the similar
  160. operation is perfomed on the input scene with the exception that this time a similarity lookup over
  161. the hashtable is performed, instead of an insertion. This lookup also allows us to compute a
  162. transformation to the ground plane for the scene pairs. After this point, computing the rotational
  163. component of the pose reduces to computation of the difference \f$\alpha=\alpha_m-\alpha_s\f$. This
  164. component carries the cue about the object pose. A Hough-like voting scheme is performed over the
  165. local model coordinate vector and \f$\alpha\f$. The highest poses achieved for every scene point lets us
  166. recover the object pose.
  167. ### Source Code for PPF Matching
  168. ~~~{cpp}
  169. // pc is the loaded point cloud of the model
  170. // (Nx6) and pcTest is a loaded point cloud of
  171. // the scene (Mx6)
  172. ppf_match_3d::PPF3DDetector detector(0.03, 0.05);
  173. detector.trainModel(pc);
  174. vector<Pose3DPtr> results;
  175. detector.match(pcTest, results, 1.0/10.0, 0.05);
  176. cout << "Poses: " << endl;
  177. // print the poses
  178. for (size_t i=0; i<results.size(); i++)
  179. {
  180. Pose3DPtr pose = results[i];
  181. cout << "Pose Result " << i << endl;
  182. pose->printPose();
  183. }
  184. ~~~
  185. Pose Registration via ICP
  186. -------------------------
  187. The matching process terminates with the attainment of the pose. However, due to the multiple
  188. matching points, erroneous hypothesis, pose averaging and etc. such pose is very open to noise and
  189. many times is far from being perfect. Although the visual results obtained in that stage are
  190. pleasing, the quantitative evaluation shows \f$~10\f$ degrees variation (error), which is an acceptable
  191. level of matching. Many times, the requirement might be set well beyond this margin and it is
  192. desired to refine the computed pose.
  193. Furthermore, in typical RGBD scenes and point clouds, 3D structure can capture only less than half
  194. of the model due to the visibility in the scene. Therefore, a robust pose refinement algorithm,
  195. which can register occluded and partially visible shapes quickly and correctly is not an unrealistic
  196. wish.
  197. At this point, a trivial option would be to use the well known iterative closest point algorithm .
  198. However, utilization of the basic ICP leads to slow convergence, bad registration, outlier
  199. sensitivity and failure to register partial shapes. Thus, it is definitely not suited to the
  200. problem. For this reason, many variants have been proposed . Different variants contribute to
  201. different stages of the pose estimation process.
  202. ICP is composed of \f$6\f$ stages and the improvements I propose for each stage is summarized below.
  203. ### Sampling
  204. To improve convergence speed and computation time, it is common to use less points than the model
  205. actually has. However, sampling the correct points to register is an issue in itself. The naive way
  206. would be to sample uniformly and hope to get a reasonable subset. More smarter ways try to identify
  207. the critical points, which are found to highly contribute to the registration process. Gelfand et.
  208. al. exploit the covariance matrix in order to constrain the eigenspace, so that a set of points
  209. which affect both translation and rotation are used. This is a clever way of subsampling, which I
  210. will optionally be using in the implementation.
  211. ### Correspondence Search
  212. As the name implies, this step is actually the assignment of the points in the data and the model in
  213. a closest point fashion. Correct assignments will lead to a correct pose, where wrong assignments
  214. strongly degrade the result. In general, KD-trees are used in the search of nearest neighbors, to
  215. increase the speed. However this is not an optimality guarantee and many times causes wrong points
  216. to be matched. Luckily the assignments are corrected over iterations.
  217. To overcome some of the limitations, Picky ICP @cite pickyicp and BC-ICP (ICP using bi-unique
  218. correspondences) are two well-known methods. Picky ICP first finds the correspondences in the
  219. old-fashioned way and then among the resulting corresponding pairs, if more than one scene point
  220. \f$p_i\f$ is assigned to the same model point \f$m_j\f$, it selects \f$p_i\f$ that corresponds to the minimum
  221. distance. BC-ICP on the other hand, allows multiple correspondences first and then resolves the
  222. assignments by establishing bi-unique correspondences. It also defines a novel no-correspondence
  223. outlier, which intrinsically eases the process of identifying outliers.
  224. For reference, both methods are used. Because P-ICP is a bit faster, with not-so-significant
  225. performance drawback, it will be the method of choice in refinment of correspondences.
  226. ### Weighting of Pairs
  227. In my implementation, I currently do not use a weighting scheme. But the common approaches involve
  228. *normal compatibility* (\f$w_i=n^1_i\cdot n^2_j\f$) or assigning lower weights to point pairs with
  229. greater distances (\f$w=1-\frac{||dist(m_i,s_i)||_2}{dist_{max}}\f$).
  230. ### Rejection of Pairs
  231. The rejections are done using a dynamic thresholding based on a robust estimate of the standard
  232. deviation. In other words, in each iteration, I find the MAD estimate of the Std. Dev. I denote this
  233. as \f$mad_i\f$. I reject the pairs with distances \f$d_i>\tau mad_i\f$. Here \f$\tau\f$ is the threshold of
  234. rejection and by default set to \f$3\f$. The weighting is applied prior to Picky refinement, explained
  235. in the previous stage.
  236. ### Error Metric
  237. As described in , a linearization of point to plane as in @cite koklimlow error metric is used. This
  238. both speeds up the registration process and improves convergence.
  239. ### Minimization
  240. Even though many non-linear optimizers (such as Levenberg Mardquardt) are proposed, due to the
  241. linearization in the previous step, pose estimation reduces to solving a linear system of equations.
  242. This is what I do exactly using cv::solve with DECOMP_SVD option.
  243. ### ICP Algorithm
  244. Having described the steps above, here I summarize the layout of the ICP algorithm.
  245. #### Efficient ICP Through Point Cloud Pyramids
  246. While the up-to-now-proposed variants deal well with some outliers and bad initializations, they
  247. require significant number of iterations. Yet, multi-resolution scheme can help reducing the number
  248. of iterations by allowing the registration to start from a coarse level and propagate to the lower
  249. and finer levels. Such approach both improves the performances and enhances the runtime.
  250. The search is done through multiple levels, in a hierarchical fashion. The registration starts with
  251. a very coarse set of samples of the model. Iteratively, the points are densified and sought. After
  252. each iteration the previously estimated pose is used as an initial pose and refined with the ICP.
  253. #### Visual Results
  254. ##### Results on Synthetic Data
  255. In all of the results, the pose is initiated by PPF and the rest is left as:
  256. \f$[\theta_x, \theta_y, \theta_z, t_x, t_y, t_z]=[0]\f$
  257. ### Source Code for Pose Refinement Using ICP
  258. ~~~{cpp}
  259. ICP icp(200, 0.001f, 2.5f, 8);
  260. // Using the previously declared pc and pcTest
  261. // This will perform registration for every pose
  262. // contained in results
  263. icp.registerModelToScene(pc, pcTest, results);
  264. // results now contain the refined poses
  265. ~~~
  266. Results
  267. -------
  268. This section is dedicated to the results of surface matching (point-pair-feature matching and a
  269. following ICP refinement):
  270. ![Several matches of a single frog model using ppf + icp](img/gsoc_forg_matches.jpg)
  271. Matches of different models for Mian dataset is presented below:
  272. ![Matches of different models for Mian dataset](img/snapshot27.jpg)
  273. You might checkout the video on [youTube here](http://www.youtube.com/watch?v=uFnqLFznuZU).
  274. A Complete Sample
  275. -----------------
  276. ### Parameter Tuning
  277. Surface matching module treats its parameters relative to the model diameter (diameter of the axis
  278. parallel bounding box), whenever it can. This makes the parameters independent from the model size.
  279. This is why, both model and scene cloud were subsampled such that all points have a minimum distance
  280. of \f$RelativeSamplingStep*DimensionRange\f$, where \f$DimensionRange\f$ is the distance along a given
  281. dimension. All three dimensions are sampled in similar manner. For example, if
  282. \f$RelativeSamplingStep\f$ is set to 0.05 and the diameter of model is 1m (1000mm), the points sampled
  283. from the object's surface will be approximately 50 mm apart. From another point of view, if the
  284. sampling RelativeSamplingStep is set to 0.05, at most \f$20x20x20 = 8000\f$ model points are generated
  285. (depending on how the model fills in the volume). Consequently this results in at most 8000x8000
  286. pairs. In practice, because the models are not uniformly distributed over a rectangular prism, much
  287. less points are to be expected. Decreasing this value, results in more model points and thus a more
  288. accurate representation. However, note that number of point pair features to be computed is now
  289. quadratically increased as the complexity is O(N\^2). This is especially a concern for 32 bit
  290. systems, where large models can easily overshoot the available memory. Typically, values in the
  291. range of 0.025 - 0.05 seem adequate for most of the applications, where the default value is 0.03.
  292. (Note that there is a difference in this paremeter with the one presented in @cite drost2010 . In
  293. @cite drost2010 a uniform cuboid is used for quantization and model diameter is used for reference of
  294. sampling. In my implementation, the cuboid is a rectangular prism, and each dimension is quantized
  295. independently. I do not take reference from the diameter but along the individual dimensions.
  296. It would very wise to remove the outliers from the model and prepare an ideal model initially. This
  297. is because, the outliers directly affect the relative computations and degrade the matching
  298. accuracy.
  299. During runtime stage, the scene is again sampled by \f$RelativeSamplingStep\f$, as described above.
  300. However this time, only a portion of the scene points are used as reference. This portion is
  301. controlled by the parameter \f$RelativeSceneSampleStep\f$, where
  302. \f$SceneSampleStep = (int)(1.0/RelativeSceneSampleStep)\f$. In other words, if the
  303. \f$RelativeSceneSampleStep = 1.0/5.0\f$, the subsampled scene will once again be uniformly sampled to
  304. 1/5 of the number of points. Maximum value of this parameter is 1 and increasing this parameter also
  305. increases the stability, but decreases the speed. Again, because of the initial scene-independent
  306. relative sampling, fine tuning this parameter is not a big concern. This would only be an issue when
  307. the model shape occupies a volume uniformly, or when the model shape is condensed in a tiny place
  308. within the quantization volume (e.g. The octree representation would have too much empty cells).
  309. \f$RelativeDistanceStep\f$ acts as a step of discretization over the hash table. The point pair features
  310. are quantized to be mapped to the buckets of the hashtable. This discretization involves a
  311. multiplication and a casting to the integer. Adjusting RelativeDistanceStep in theory controls the
  312. collision rate. Note that, more collisions on the hashtable results in less accurate estimations.
  313. Reducing this parameter increases the affect of quantization but starts to assign non-similar point
  314. pairs to the same bins. Increasing it however, wanes the ability to group the similar pairs.
  315. Generally, because during the sampling stage, the training model points are selected uniformly with
  316. a distance controlled by RelativeSamplingStep, RelativeDistanceStep is expected to equate to this
  317. value. Yet again, values in the range of 0.025-0.05 are sensible. This time however, when the model
  318. is dense, it is not advised to decrease this value. For noisy scenes, the value can be increased to
  319. improve the robustness of the matching against noisy points.
  320. */
  321. #endif