Fast Multidimensional Scaling Through Hybrid Nonlinear Algorithms

 
 

In exploratory data analysis, it is often desirable to create a two-dimensional layout of a multivariate data set. Commonly created so that inter-object positioning reflects high-dimensional relationships, such a layout can offer insight into a data set's inherent structure. Linear projection techniques can offer fast solutions, but create layouts that are constrained to be a linear combination of input dimensions and will often fail to identify complex patterns. Iterative nonlinear techniques can create layouts that more effectively reveal data structure, but are commonly burdened by O(N3) computational complexity and high run times, making them impractical for use within an interactive framework.


Several novel nonlinear layout algorithms are proposed to solve this problem. It is shown that, through a combination of techniques, multi-stage `hybrid' algorithms can be created that offer significant reductions in complexity and run times, while still producing layouts that effectively represent high-dimensional relationships. This thesis documents improvements that reduce complexity first to O(N√N), then to O(N5/4).


Algorithms are evaluated in terms of comparisons to a benchmark nonlinear technique, on synthetic test data of known structure and several real-world sets. A comparison is also offered between the layouts generated by the novel algorithm and a linear technique, to empirically highlight the benefits offered by a nonlinear approach. It is concluded that nonlinear algorithms have a worthwhile place in an exploratory analyst's toolkit. The algorithms making up the contribution of this thesis provide substantial increases in the speed with which nonlinear algorithms can be run, enlarging the size of data set to which such techniques can be applied.

Alistair Morrison, Ph.D. thesis