API Documentation¶

class hogpp.IntegralHOGDescriptor(*, n_bins=9, binning='unsigned', cell_size=(8, 8), block_size=(16, 16), block_stride=(8, 8), magnitude='identity', block_norm='l2-hys', clip_norm=0.2, epsilon=1e-12)¶

Rectangular Histogram of Oriented Gradiens (R-HOG) feature descriptor [DT05] implementend in terms of an integral histogram [Por05]. Employing an integral histogram allows to efficiently compute the feature descriptor in overlapping image regions, e.g., in sliding window object detection approaches.

Computing feature descriptors involves two stages:

The representation of a (possibly large) image is precomputed in an initial step using IntegralHOGDescriptor.compute().
After the preprocessing step, feature descriptors of individual image subregions can be repeatedly extracted using a function call on an IntegralHOGDescriptor instance, i.e., using IntegralHOGDescriptor.__call__().

Note

To ensure maximum performance when extracting features, do not compute the feature descriptor on individual images patches of a larger image. Instead, the initial computation should be performed on the original image first. After that, the feature descriptors of individual patches can be extracted much more efficiently than using the naive approach.

Parameters:

n_bins (int, optional) – Number of histogram bins.
binning (str, optional) –
Gradient orientation binning method. Possible choices are:

’unsigned’
The orientation bins are evenly spaced over \([0^\circ,180^\circ]\) with the sign of the gradient ignored. Gradient orientations falling into quadrants of the Cartesian plane with negative orientation are mapped to their positive quadrant counterparts.

Given an image gradient \(\vec g = (g_x,g_y)^\top = \left[\frac{\partial I}{\partial x}, \frac{\partial I}{\partial y}\right]^\top\), its orientation \(\alpha=\tan^{-1} \frac{g_y}{g_x} \in \left[-\frac{\pi}{2}, \frac{\pi}{2}\right)\) within the first and fourth quadrants of the Cartesian plane is computed. Using the mapping \(\angle_u\colon \left[-\frac{\pi}{2}, \frac{\pi}{2}\right) \to [0,\pi)\) given by

\[\angle_u(\alpha) \coloneqq \alpha+\frac{\pi}{2}\]

negative angles are mapped to their corresponding positive counterparts in the second quadrant.

’signed’
The orientation bins are evenly spaced over \([0^\circ,360^\circ]\), i.e., the sign of the gradient in the quadrants of the Cartesian plane are considered.

Given an image gradient \(\vec g = (g_x,g_y)^\top = \left[\frac{\partial I}{\partial x}, \frac{\partial I}{\partial y}\right]^\top\), its orientation \(\alpha=\arctan_2 (g_y, g_x) \in [-\pi,\pi)\) across the Cartesian plane is computed. The corresponding mapping \(\angle_s \colon [-\pi,\pi) \to [0,2\pi)\) is then

\[\angle_u(\alpha) \coloneqq \alpha+\pi \enspace .\]
cell_size (tuple (2, ), optional) – The size of a single block cell in pixels.
block_size (tuple (2, ), optional) – The size of a single block in pixels.
block_stride (tuple (2, ), optional) – The shift amount between neighboring blocks in pixels.
magnitude (str, optional) –
Function of the image gradient \(\vec g=(g_x,g_y)^\top\) that computes the value voted into each orientation bin. Possible choices are:

’identity’
Computes the magnitude in terms of the gradient’s \(\ell^2\) norm, i.e., as \(\lVert\vec g\rVert_2\).

’sqrt’
Computes the square root of the magnitude, i.e., \(\sqrt{\lVert\vec g\rVert_2}\).

’square’
Computes the magnitude in terms of a squared \(\ell^2\) norm, i.e., as \(\lVert\vec g\rVert_2^2\).
block_norm (str, optional) –
Contrast normalization applied to individual blocks \(\vec v\). Possible choices are:

’l1-sqrt’
Computes the square root of the \(\ell^1\) normalized block as

\[\vec v \gets \sqrt{\frac{\vec v}{\lVert \vec v \rVert_1 + \epsilon}}\]

’l1’
Normalizes the blocks using the \(\ell^1\) as

\[\vec v \gets \frac{\vec v}{\lVert \vec v \rVert_1 + \epsilon}\]

’l1-hys’
Similar to l1 normalization but additionally followed by clipping of values larger than clip_norm.

’l2’
Normalizes the blocks using the \(\ell^2\) as

\[\vec v \gets \sqrt{\frac{\vec v}{\lVert \vec v \rVert_2^2 + \epsilon^2}}\]

’l2-hys’
Similar to l2 normalization but additionally followed by clipping of values larger than clip_norm.
clip_norm (float, optional) – Maximum block norm. Applicable only to ‘l1-hys’ and ‘l2-hys’ block normalization.
epsilon (float, optional) – The regularization amount.

num_bins_¶

The number of histogram bins being used.

Type:: int

binning_¶

Gradient orientation binning method.

Type:: str

cell_size_¶

The size of a single block cell in pixels.

Type:: tuple (2, )

block_size_¶

The size of a single block in pixels.

Type:: tuple (2, )

block_stride_¶

The shift amount between neighboring blocks in pixels.

Type:: tuple (2, )

magnitude_¶

Magnitude function that determines the voted value.

Type:: str

block_norm_¶

Contrast normalization applied to individual blocks.

Type:: str

clip_norm_¶

Maximum block norm. Norm values above are clipped to the specified value. Applicable only to l2-hys block normalization.

Type:: float

epsilon_¶

The regularization amount.

Type:: float

__bool__(self)¶

Determines whether the descriptor was initialized in terms of a previous compute() call.

Returns:: Returns True if compute() was previously called and the input was not empty, and False otherwise.
Return type:: bool

__call__(self, roi)¶

Extracts the features of the specified region of interest roi.

Parameters:: roi (array_like (4, )) – An array specifying the top-left coordinate and the size of the image region whose feature descriptor will be exracted.
Returns:: A 5-D array whose first two dimensions represent the block, the following two dimensions the cell, and the final dimension represents the orientation bins.
Return type:: numpy.ndarray
Raises:: ValueError – Thrown if roi describes a negative area.

compute(image, /, *, mask=None)¶

Computes the feature descriptor of the specified image.

Parameters:

image (array_like (m, n, (3, ))) – 2-D or 3-D tensor representing the image whose feature descriptor shall be computed.
mask (collections.abc.Callable, array_like (m, n, (3, ))) – A callable that indicates whether the pixel at the coordinate passed to the callable as a tuple is masked or not. Alternatively, the mask can be specified in terms of a tensor with the same rank and dimensions as the specified image.