@Reference(type=Article, author={"Jegou, Herve","Douze, Matthijs","Schmid, Cordelia"}, title="Product Quantization for Nearest Neighbor Search", year="2011", journal="IEEE Trans. Pattern Anal. Mach. Intell.", pages={"117","","128"}, url="http://dx.doi.org/10.1109/TPAMI.2010.57", month="January", number="1", publisher="IEEE Computer Society", volume="33", customData={"issn","01628828","numpages","12","doi","10.1109/TPAMI.2010.57","acmid","1916695","address","Washington, DC, USA","keywords","Highdimensional indexing, Highdimensional indexing, image indexing, very large databases, approximate search., approximate search., image indexing, very large databases"}) public class LongProductQuantiser extends Object
This is achieved by breaking down the input vectors into nonoverlapping subvectors, and applying quantisation to these subvectors individually. The number of bins (cluster centroids) for the subvectors is small (up to 256 in this implementation), but when combined over all subvectors, the number of bins is much larger as it accounts for all combinations of bins across subvectors. As only a small set of centroids needs to be held for the subvectors, the memory requirements are quite modest. The output of the quantisation action in this implementation is an array of bytes corresponding to the index of the matching centroid for each subvector (index numbers are offset by 128 so that 256 centroids indexes can fit in a single byte). The bitpattern of this byte array could be interpreted as a numeric value of global cluster index, however in practice this is not useful.
Typically the product quantiser is "trained" so that it adapts to the data
that is is being applied too. The standard approach to this is to use
KMeans, however, this is not required. Insofar as this implementation is
concerned, any set of compatible NearestNeighbours
implementations
can be provided to the constructor. Each of the NearestNeighbours
could even potentially have a different number of dimensions (corresponding
to the subvector lengths).
In the standard case, where you just want to use KMeans to train the Product Quantiser, a set of utility methods can be found in the org.openimaj.knn.pq.LongProductQuantiserUtilities class which can be found in the clustering subproject (due to the dependence on the KMeans algorithm).
Modifier and Type  Field and Description 

protected LongNearestNeighboursExact[] 
assigners 
protected int 
ndims 
Constructor and Description 

LongProductQuantiser(LongNearestNeighboursExact[] assigners)
Construct a
LongProductQuantiser with the given
nearestneighbour assigners. 
Modifier and Type  Method and Description 

long[] 
decompress(byte[] qdata)
Decompress the quantised data by replacing each encoded index with the actual centroid subvector.

byte[] 
quantise(long[] data)
Quantise the given data using this Product Quantiser.

protected LongNearestNeighboursExact[] assigners
protected int ndims
public LongProductQuantiser(LongNearestNeighboursExact[] assigners)
LongProductQuantiser
with the given
nearestneighbour assigners. The number of dimensions of the assigners
determines how long each subvector is. There is a onetoone mapping
between in the order of assigners and subvectors.assigners
 the nearestneighbour assigners.public byte[] quantise(long[] data)
data
 the data to quantisepublic long[] decompress(byte[] qdata)
qdata
 the quantised data