Individual word frequencies (fi) , joint frequencies (fij) for pairs of words (i,j), both expressed in terms of the chosen unit of context, and the corresponding standardised joint frequencies sij = (fij) / (fi + fj - fij) are organised in a similarities matrix, which can be submitted to a combination of cluster analysis and multi-dimensional scaling to discover significant word-associations. (Instead of the above (Jaccard) coefficient, it is possible to apply Sokal's 'matching coefficient', which takes account also of joint non-occurrences.) Word co-occurrences within specified context units can also be submitted to corrrespondence analysis, providing further information about usage within a text.
It then becomes possible to compare the results of applying
multi-dimensional scaling to matrices of joint frequencies of
equivalent vocabulary lists derived from a number of texts, using
Procrustean Individual Differences
Scaling (PINDIS), or to apply Individual
Differences Scaling (INDSCAL) to the matrices themselves.
Forrest Young's SUBJSTAT procedure transforming the resulting non-Euclidean 'subject spaces'
into arc-distances permits more rigorous analysis of their results.
The unique graphics of HAMLET II© summarise the results of each of these analyses, for inclusion in other documents and reports.
Further procedures help to determine the broad characteristics
of word usage in a text:
Full documentation is available for HAMLET
II new generation in the download section.
For running HAMLET II for Microsoft Windows using WINE on free Debian GNU/Linux consult our recent
documentation Hamlet II on Debian
GNU/LINUX!