Computer Vision - points and patches

4 Feature detection and matching

Feature detection과 matching은 많은 computer vision 응용 분야에서 기초 요소이다.

Feature detection: interest point을 검출하는 과정
Feature matching: 영상의 feature들끼리 관계를 세우는 과정

예를 들면, 영상 두 장을 고려하였을 때, 무슨 관계를 생성 할 것인가?Figure 4.2

Alignment 하기: 상위 두 장은 잘 정렬하여 모자익 영상을 생성하고 싶을 때 필요하다(Chapter 9 참고).
Correspondneces set 찾기: 하위 두 장에서는 3D model을 생성또는 중간 시점 영상을 생성하기 위하여, 대응점들의 dense(빼곡한) set을 생성하고 싶을 때 필요하다(Chapter 11 참고).
이 모든 케이스의 경우에서 aligment(정렬) or 대응셋(correspondences set)을 생성하기위하여, 무슨 feature을 검출(detection) 하고 matching을 할 것인가?

Feature 종류

Keypoint features(or interest points, corners) (Section 4.1 참조):

예: mountain peaks, building corners, doorways, interestingly shaped patchs of snow.
Local patch의 겉모습의의해서 자주 묘사한다.

Edge features(Section 4.2 참조):

예: 하늘과 산사이의 경계선
매칭 방법: 에지의 방향과, 로컬 모습(edge profile)
좋은 정보 제공: 객체의 경계, 연속 영상에서의 가려짐 이벤트
straight line segment or curves로 더 그룹화 가능
Vanishing points을 찾기위한 단서 제공(Section 4.3).
Internal 과 external camera 인자를 찾기위한 단서 제공(Section 4.3)

4.1 Points and patches

사용 설명:

Point features: 다른 영상사에이 대응되는 위치를 sparse한 형태로 찾기위해 사용될 수 있다.

Pre-cursor to cmputing camera pose(Chapter 7).
스테레오 매칭에서 denser set을 얻기위한 필수 과정이다(Chapter 11).

Correspondneces는 영상 aligment를 위해서 사용된다(Chapter 9).

sitching image mosaic
performing video stabilization

object instance와 category 인식을 위해서 광범위하게 수행된다(Section 14.3과 14.4).

Keypoint의 장점

clutter(occlusion), large scale과 방향 변화에 매칭이 꽤 잘 된다.

기술적 내용

Feature points 검출과 대응관계(correspondences)를 찾는 2가지 주요 방법

Tracking: 한 장의 영상에서 local search를 적용하여 정학히 추적할수 있는 feature를 찾는 것(Section 4.1.4)

Local search 기술 적용: Correlation, Least squares
적합 분야:

Near by viewpoints
in rapid succession(e.g., video sequence)

Independently Detection & Matching: 모든 영상에서 독립 feature detection하고 matching하는 것(Section 4.1.3)

적합 분야:

큰 동작의 모션
외형이 변화가 큰 것

응용 분야 예:

stiching together panoramas
wide baseline stereo
object recognition

네가지 단계(keypoint detection & matching pipeline)

Feature detection(extraction) stages(Section 4.1.1)

영상 내에서 다른 영상과 잘 matching가능성이 높은 위치(location)를 찾는다.

Feature description stages(Section 4.1.2)

keypoint의 주변 영역은 더 컴팩트하고 안정적인 descriptor을 얻는다.
descriptor는 다른 descriptor와 matching을 한다.

Feature matching stages(Section 4.1.3)

다른 영상의 매칭 후보들을 효율 적으로 검색한다.

Feature tracking stages(Section 4.1.4)

3 단계의 다른 방법
좁은 영역의 이웃을 검색한다.

4.1.1 Feature Detectors

서문:

다른 영상에서 믿음직한 대응관계를 찾을 수 있는 영상 위치들(image locations^[각주:1])을 어떻게 찾을 수 있을 까?
Tracking의 좋은 feature는 무엇 일까?(Shi and Tomasi 1994; Triggs 2004). Figure 4.3

matching/tracking이 잘 것 같아 보이는 3개의 sample patches들이 있다.
texture less patch들은 어디에 위치시키가 어렵다. 즉, 종 잡을 수 없다.
순간 변화가 큰(gradient) patch가 위치 잡기엔 더 좋아 보인다.

Aperture problem(Horn and Schunk 1981; Lucas and Kanade 1981; Anandan 1989)Figure 4.4 Aperture problems for different image patches

작은 구멍으로 본 움직임은 전체의 움직임을 정확히 해석하지 못한다.
단일 방향 변화: Straight line segment는 localization하기가 힘들다.
최소 두 방향에 급격한 변화가 localization하기가 가장 쉽다

수식 계산 방법

가장 간단한 방법: 2개의 patch을 가지고 계산하는 공식(Section 8.1)
--------------------(수식 4.1)

여기서 벡터 u=(u,v)인 displacement vector이다.
I_1과 I_0은 영상이다.
w(x)는 weight function(window function)
전체 화소를 합한다.

한 장 가지고 하는 방법: Auto-correlation function or surface
------------------------(수식 4.2)

대응할 다른 영상이 주어지지 못하였다.
대신에 얼마나 안정적인가를 시험할 수 있다.
미세한 변화에 흔들리지 않는 정도(불확실성 가늠)를 가늠할 수 있다.
(수식 4.2)와 같이 소량의 미세 변화()를 주고, 얼마나 안정적인지를 본다.
유사 검출 기법들: harris corner detector, Forstner-Harris, Shi-Tomasi
Auto-correlation matrix A:
(수식 4.2)을 Taylor Series expansion을 하면 다음의 수식 4.8이 계산 가능
-------------------------------------(수식 4.8)
여기서 는 x축 편미분, 는 y축 편미분^[각주:2] 이다.
샘플 영상 다음의 Figure 4.5. 참조:
Figure 4.5 Three auto-correlation surfaces. 세개의 붉은색 십자는 auto-correlation이 계산될 장소를 표기한 것이다.
Matrix A의 고유값 분석
Figure 4.6 Auto-correlation matrix A의 고유값과 연관있는 불확실한 타원 모형

Matrix A의 두 고유값이 타원과 연관시켜 분석을 할 수 있다.
이 두 고유값의 상관관계로 현재의 점이 얼마나 안정적인가를 예측해 볼 수 있다.

Adaptive non-maximal suppresion(ANMS):

거의 대부분의 feature detector들은 interset function의 local maxima을 단순하게 찾는다.

하지만, 이 방법은 영상내에서 고르지 못한 분포(uneven distribution)의 interest point들을 만들어 낸다.
즉, higher contrast영역에는 interest point들이 몰리게 된다.

대안 방법 1(Brown, Szeliski, and Winder(2005)):

반경 r내에 이웃점들보다 10%더 강하고, local maxima인 점을 interest point로 정한다.
이러게 하면 영상 전반의 걸쳐서 고르게 분포된 interest point를 얻을 수 있다.
그러면 재현성(repeatability)는 어떨까?

Figure 4.9와 비교해 보아라:
Figure 4.9 Adaptive non-maximal suppression(ANMS)(Brown, Szeliski, and Winder 2005). (C) 2005 IEEE. 상단에 두 영상은 가장 강한 250과 500개의 interest point를 추출한 것이다. 하위 두 영상은 ANMS를 사용하여 추출한 interest point들이다. 후자의 것이 영상 전체에 걸쳐서 공간적으로 균등하게 추출되었다.

Measuring repeatability(재현성 측정):

많은 feature detector들이 computer vision에서 개발이 되었다.
이 것들 중에서 어떤것이 좋은지 우리는 어떻게 결정 할 것인가?
Schmid, Mohr, and Bauckhage(2000) 연구에서 처음 제안하였다: feature detector의 repeatability

원본 영상과 변형된 영상의 keypoint들이 얼마나 많은 빈도수(frequency))를 가질까? (단, 거리 임계값() < 1.5 pixels 조건)
Planner image에 실험을 적용하였다: rotation, scale changes, illumination changes, viewpoint changes, adding noise.

Scale invariance(스케일 불변):

문제점: 가장 작은 안정적인 scale에서 feature detection은 적당하지 않을 수 있다.

부연: high frequency 상세한 정보가 있지 않는 영상을 matching할 경우이다.
예: clouds 영상서는 작은 크기에 scale feature들은 존재하지 않을 수 있다.

대안 1) (Brown, Szeliski, and Winder 2005)

목적: scale을 다양하게 하여 feature을 추출하는 것이다.

연상 방식: 영상 Pyramid 구성시 다양한 해상도에서 동일한 연산을 적용하여 해결하는 방법을 적용

이 해결 방법은 큰 scale이 변화 크지 않는 영상 분야에서 활용 가능하다.

공정된 focal-length camera의

접진적인 항공 영상 매칭
Stitching panorama

대안 2) (Lowe 2004; Mikolajczyk and Schmid 2004)

목적: 현실에서는 영상내에 객체의 scale을 알 수가 없다.
연상 방식: 대안 1대신 위치(location)과 scale이 가장 안정직인 feature를 추출하는 것이다.
파생되는 문제:

Scale selection problem

Scale selection problem

LoG(Laplacian of Gaussian)(Lindeberg 1993; 1998b)의 극값을 interest point로 처음으로 제안
LoG연구에 기초하여, DoG(Difference of Gaussian)(Lowe 2004)에서 3D(space+scale) 의 극값을 interest point로 제안.
Scale selection 메카니즘이 추가된 harris corner detector(Mikolajczyk and Schmid(2004))

Rotational invariant(회전 불변) and orientation estimation(방향 추정)

문제: scale change외에, 거의 대부분의 image matching/object recognition algorithm들은 in-plane image rotation을 다룬다.
이 문제를 다루는 해결하는 방법들

Rotation invariant descriptor^[각주:3] 설계:

처음부터 회전에 불변한 descriptor을 설계하는 것이다(Schmid and Mohr 1997).
단점: discriminative가 떨어진다. 겉보기에 다른 것이 동일한 descriptor로 연결 시켜질 수 있다.

Dominant orientation(지배적 방향) at keypoint 설계:

Keypoint의 주변 정보를 활용하여 지배적인 방향을 추정하고, 특정 축으로 정렬하는 것.
방향 추정 방법 1)

Figure 4.12는 방향 추정 방법(Lowe 2004)
Figure 4.12 지배적인 방향(dominant orientation)을 추정은 모든 gradient의 방향을 히스토그램을 생성하여 할 수 있다. 큰 강도값의 gradient는 높은 가중치로 히스토그램에 누적시키고, 작은 강도의 gradient는 낮은 가중치로 누적시킨다. 그리고 가장 높은 peak점들을 찾는다.(LOWE 2004) (C) 2004 Springer.

방향 추정 방법 2)

가장 단순한 방법은 모든 gradient의 부호 평균 합이다.
이 방법은 보통 평균 값이 작을 수 있어서, 신뢰성이 떨어진다.

Affine invariant(아핀변환 불변)

Wide baseline stereo matching(Pritchett and Zisserman 1998; Schaffalitzky and Zisserman 2004)에서는 Scale과 rotation invariant가 매우 바람직하다.
Location recognition(Chum, Philbin, Sivic et al. 2007)에서는 full affine invariant을 선호한다.
문제: scale과 orientation 변화에 일관되게 위치를 응답할 뿐만 아니라, 국부적 perspective foreshortening에도 여전히 일관성있는 응답을 주어야 한다.
해결 방법들

대안 1) 고유값 분석을 통한 affine coordinate frame으로 맞추는 방법(Lindeber and Garding 1997; Baumberg 2008; Mikolajczyk, and Schmid 2004; Mikolajczyk, tuytelaars, Schmid et al. 2005; Tuytelaars and Mikolajczyk 2007)

고유값 분석 대상: Auto-correlation, Hessian matrix
고유값의 주 대각대비 비율을 사용하여 맞추는 방법Figure 4.14 Affine normalization(아핀변환 정규화)는 second moment matrices을 사용하여 수행한다(Mikolajczyk, Tuytelaars, Schmid et al.(2005))(C)2005 Springer. 행렬이 좌측 A^{-1/2}_{0}와 우측 A^{-1/2}_{1}으로 영상 좌표를 변환한 이후, 이것들은 순수 회전행렬(R)만 연관된다. 여기서 지배적인 방향(dominant orientation)을 추정할 수 있다.

대안 2) Affine invariant region detector

MSER(maximally stable extremal region) detector(Matas, Chum, Urban et al.(2004))

대상 grayscale image
절차:

보든 가능성있는 밝기 레벨의 thresholding하여 binary 이미지를 생성
이들의 binary이미지를 connected component을 생성

효율적인 방법(Nister and Stewenius 2008이 제안)
gray value을 정렬하고,
점직적으로 thresholding을 진행하면서,
thresholding으로 변화된 화소들을 connected component에 추가한다.

이 영역의 면적 변화를 관촬한다.
면적 변화 비율이 작은 것을 maximally stable로 정한다.

Maximally stable은 affine geometric과 photometric^[각주:4] 변화에 불변하다.Figure 4.15 Maximally stable extremal regions(MSERs) 추출 영역과 matching(매칭)된 것. (Matas, Chum, Urban et al. 2004)(C)2004 Elsevier.

Localization: A는 B의 어디에 위치한다. [본문으로]
편미분은 [-2, -1, 0, 1, 2]을 사용(Harris and Stephens 1988)하거나, gaussian(표준편차=1)의 미분형태를 convolution 하여 사용(Schmid, Mohr, and Bauckhage 2000; Triggs 2004)한다. [본문으로]
descriptor는 local patch를 가장 잘 설명하는 특징들의 집합(벡터)이다. [본문으로]
linear bias-gain, smooth monotonic, photon noise, etc. [본문으로]

'Compter Vision' 카테고리의 다른 글

Video summary (0)	2015.01.10
검출기 성능 측정 법 (0)	2014.05.19

whitebytes

Computer Vision - points and patches

'Compter Vision' 카테고리의 다른 글

티스토리툴바

Computer Vision - points and patches

'Compter Vision' 카테고리의 다른 글

'Compter Vision' Related Articles

티스토리툴바