Identify Clusters
offers a means of provisionally identifying nodal words in a
given text, according to the density of their co-occurences within a
selected
unit of context, to provide the basis for a provisional
vocabulary
list for use with Hamlet
II - Joint Frequencies. For this process to work
convincingly, it is generally advisable first to apply a suitable stoplist
to the text file, to ignore commonly occuring words
which are unlikely to be of significance in
determining its main content. The procedure automatically
disregards numerals, equates upper and lower case initial letters and offers
optional manual lemmatization, to reduce the number of entries to be considered in
searching for collocations.
Words additionally disregarded during an application of this routine can also be saved separately, or optionally added to an existing stoplist, so that a few successive applications will quickly develop general stoplists for use with specific languages and bodies of text.

If the text to be read is in a language other than English,
use the
pull-down menu to apply the correct lexicographic conventions.
Stoplists can be
selected and edited from the corresponding menu item. Alternatively,
use the full vocabulary
list editor to maintain your stoplists in
detail.
A log file is displayed
periodically, providing details of the procedures followed and
ending with a list of the nodal items provisionally identified. These
can be edited
as required, viewed by cluster
analysis and/or plotted using MINISSA,
and finally saved to form the basis of a provisional vocabulary
list for use in Hamlet
II - Joint Frequencies and other
procedures.