Note 1: This bootstrapping documentation is now also appended to README.DOC
Note 2: (Oct, 2000) These programs have not yet been upgraded to include Comp2 weighting.

                  Hyball's BOOTSTRAP-SUPPLEMENT programs.

                              July, l999

       Hyball's extended family of factoring routines now includes two
routines, HYBOOT and BOOTSUMM, for appraising the sampling noise in Hyball
solutions.  Both ascertain the central tendency and variation in results when
factoring many bootstrap samples from the same rawdata file.  But they conduct
this inquiry in rather different ways.  BOOTSUMM (named by contraction of
"bootstrap summary") provides the finer control of factor rotation.  However
it is labor-intensive, making only a limited number of bootstrap repetitions
practical.  In contrast, at cost of slightly decreased rotation flexibility
which should seldom matter, HYBOOT summarizes results from unlimitedly many
bootstrap repetitions while requiring scarcely any user effort or storage space.
HYBOOT may take considerable computer time if you want many repetitions, but
you can interrupt its run whenever you wish and resume this later when your
computer has nothing better to do.


PRELIMINARIES.

       You are presumably familiar with the three or advisedly four main stages
of Hyball factor analysis, starting with an ASCII datafile--call it DATA.RAW--
that contains scores for each of some number NS of subjects on the same array
of variables.  (1) Stage 1, which is optional but strongly recommended, is
transcription of DATA.RAW by program HYDATA into a Hydata-standard ASCII datafile
containing the same scores reformatted and possibly rescaled to fit the READ
presumptions of other Hyball programs that operate on raw data.  Call this
transcribed datafile DATA.D1.  (All Hydata-standard datafiles have names whose
extensions are "D" followed by a numerical index.)  (2) Secondly, HYDATA is
applied to DATA.D1 (or to DATA.RAW) to compute the standardized covariances
(correlations) among some or all of these variables over all NS subjects.  These
correlations are recorded in a file, say DATA1.COV, whose basename ends with a
numerical index and is always followed by extension "COV".  (3) Next, DATA1.COV
is factored by program MODA for an extraction pattern of the data variables on
however many factors you choose.  (4) Finally, this extraction pattern is loaded
into HYBALL for rotation of factor axes to a positioning you judge best.

       Hyball's bootstrap appraisals also follow this basic computation sequence
with, however, some modifications of which production of data covariances is
most fundamental:  Unlike normal computation of covariances from DATA.RAW or
DATA.D1, which uses each of its NS subjects exactly once without differential
weighting, Hyball's bootstrap covariances are computed from this datafile for
the same count NS of subjects drawn from it RANDOMLY WITH REPLACEMENT.  (If
interest ever warrants, the size of these bootstrap samples can easily be made
a user's-choice parameter.)  Differences among the covariance arrays produced
by repetitions of this procedure estimate the sampling noise in the data
covariances in fact obtained by this study, and allow you to study the
consequences of that uncertainty for analyses performed upon them.  More
specifically, if you compute bootstrap covariances repeatedly from these data,
and factor each bootstrap covariance matrix in the same manner you choose for
normal covariances from this datafile, you thereby obtain the approximate
sampling distribution underlying our preferred normal factor solution from
these data.  In particular, you can compute the mean, standard deviation, and
other moment information about the distribution over bootstrap repetitions of
each rotated pattern coefficient and factor correlation while also comparing
these to your favored solution from the normal data covariances.

                                  -2-

       This bootstrap procedure is entirely straightforward in principle, but
its technical implementation requires concern for (a) the many procedure options
that intervene between computation of data covariances and the terminally rotated
factors taken from them, and (b) proper alignment of the factor solutions to be
compared.  Concern (a) acknowledges the obvious point that variation in choices
such as number of factors extracted, the method of extraction (MODA currently
avails five alternatives), and HYBALL's many options for producing and selecting
from a diverse array of rotated solutions, can make considerable difference for
final result.  And concern (b) recognizes that two factor solutions cannot be
meaningfully compared until their pattern columns have been permuted and
reflected to optimize similarity.  BOOTSUMM and HYBOOT differ mainly in how
they deal with these practicalities:

BOOTSUMM.

       Whenever HYDATA computes and records a normal COV-file, say DATA1.COV,
it now inquires whether the user would also like some bootstrap covariance
productions from the input datafile using the same selection of variables
(usually all) and the same treatment of missing data used for DATA1.COV.
If this option is accepted, HYDATA then asks for the number (up to 156) of
bootstrap COV-files wanted and writes these under the same name as DATA1.COV
except for insertion of "(", ")","[","]","{", or "}" (where "" is a
sequential alphabetic index) before the digit ending its basename.  Thus
up to 156 bootstrap COV-files produced to accompany DATA1.COV will be
successively named DATA(A1.COV, DATA(B1.COV, ..., DATA(Z1.COV, DATA)A1.COV,
..., DATA}Z1.COV.  (Unlike HYDATA's regular COV-files, which are written in
ASCII, these bootstrap COV-files are binary.  If you find that you would
prefer ASCII versions, please tell me.)  Also, the real covariances in
DATA1.COV are written to a master bootsource DATA(-1.COV.  The user can then
factor these by MODA/HYBALL under more or less the same procedure options used
to factor DATA1.COV to collect HYBALL-output files written in the same ASCII
format as HYBALL's FAC-file outputs under names starting with bootstrap flag
"(", ")", "[", "]", "{", or "}".

       When BOOTSUMM is run in a subdirectory containing such a collection of
bootstrap factor solutions including a master derived from the collection's
master bootsource, it first lists to screen the names of all local files with
a leading bootstrap flag and, if more than one master solution is present,
asks you to pick the one having the same origin as the bootstrap results
you want to summarize.  BOOTSUMM then (1) permutes/reflects the axes in
each boot-solution matching chosen master in origin to best alignment with
the master solution, (2) computes the mean and standard deviation of each
(aligned) pattern loading and factor correlation over the matching bootstrap
solutions, and (3) writes this information to an ASCII file named SEEBOOT.
(Higher-moment summary statistics may be added to BOOTSUMM's output at a
later date if interest warrants.)

       Advantages of BOOTSUMM:  Given commitment to a particular selection of
variables to be factored and choice of NF, the user is free to develop each
bootstrap factor solution by whatever interactive parameter adjustments and
intuitive quality judgments (notably, after HYLOG study of the solutions in
HYBUF store) would be exercized were these bootstrap covariances the real thing.
And since full records of each bootstrap solution's production remain on disk
until you chooses to delete them, a solution that deviates interestingly from
the norm can be analyzed in detail for how this deviancy came about.  Also,
HYBALL bootstrap solutions can be passed to HYFAC for appraisal of the sampling
noise in your preferred derivation of item weights for estimating these factors.
Finally, BOOTSUMM can appraise the sampling noise in factor solutions on which
HYBLOCK has been imposed rotation constraints, which is not feaslble for HYBOOT.

                                  -3-

       Disadvantages of BOOTSUM.  Producing a decent collection of bootstrap
factor solutions will generally be a great deal of work.  You will seldom be
willing to persist at this for more than a small number of repetitions, though
ten or twenty may be enough to yield all the bootstrap sampling information
you really need.  Also, the files generated by many bootstrap factor solutions
collected for BOOTSUMM summary will occupy considerable disk space.  This
should be no real problem for a modern PC:  Even if you save everything from
50 bootstrap factorings of 150 variables, the total accumulation won't run
over ten megabytes.  But you still have to think about space management when
doing a BOOTSUMM study.


HYBOOT.

       Whenever in the course of normal Hyball factoring you find a rotated
pattern that interests you enough to prompt concern for its sampling
uncertainty, you can generally initiate HYBOOT assessment of this by calling
one of HYBALL's Main Menu options.  This is because the list almost always
includes opportunity to write the currently active factor pattern to the
BOOTDATA startup file needed to guide HYBOOT study of the pattern it specifies
as the bootstrap "target".  This Main Menu option (No. 12) is unavailable only
if the rawdata file from which the currently active pattern originates is not
Hydata-standard, if HYBALL has been unable to retrieve the variables' names,
or if the currently active pattern is under more rotation constraints than an
X-set stipulated during factor extraction.  In addition to its selected target
pattern, the binary BOOTDATA file so written includes the name of the D-file
which HYBOOT is to sample, which of its variables are in the target pattern,
the treatment of missing scores and factor-extraction method preceding the
target pattern, and all rotation controls stored with the target pattern
in HYBUF archive.  (These include all in force at end of this pattern's
production, although that does not fully identify the target pattern's
derivational history.)  You don't have to remember any of this information
when writing your choice of target pattern to BOOTDATA:  After loading
whichever one you want from store (Main Menu Option 6), simply enter "12"
at the Main Menu query and you're done.

       Once the startup BOOTDATA has been set, all that remains to launch
the bootstrap study it controls is to enter "HYBOOT" at the DOS prompt in a
directory containing both this BOOTDATA and the D-file it instructs HYBOOT to
sample.  Once started, HYBOOT requests your preference on a couple of control
options and thereafter runs through the full bootstrap production from data
sampling to factor rotation with no user involvement except occasional
decisions to stop or continue.  Raw summary results are accumulated in a
storage bin appended to BOOTDATA whose modest size is unaffected by the
number of repetitions.  This bin is updated after each bootstrap repetition,
preserves the accumulated results if the program is interrupted before final
summaries are printed from it, and permits break/resumption of the run to
continue indefinitely so long as BOOTDATA is not erased.  The program can be
stopped anytime without harm to BOOTDATA by hitting Ctrl-C; but it also pauses
periodically to advise how many repetitions are in hand together with average
repetition time, and prompts the user either to print results or to declare
how many more repetitions are wanted before the next programmed pause.
Results are written to ASCII file SEEBOOT, which can be inspected at any
pause without precluding resumption of the run.


                                  -4-

       When HYBOOT is running, the screen scrolls messages on the current
repetition's state of progress.  This information is mainly for reassurance
that matters are progressing as they should and will pass by too rapidly for
more than fleeting impressions.  But if wanted, these progress reports can
be captured in flight by agile use of the PAUSE key.

       With one small group of exceptions, HYBOOT's bootstrap repetitions
are fully controlled by the production parameters, recorded in BOOTDATA, by
which the target pattern was generated.  The exceptions center on HYBOOT's
restriction to rotation by Spin search, regardless of whether the target
pattern was also found by Spin: First of all, you need to stipulate HYBOOT's
thoroughness of Spin search by choice of two parameters, MAXTRY and NUFF.
(Details on these parameters are given both in HYBALL's documentation and on
screen during HYBOOT start-up.)  Each programmed pause during a HYBOOT run
allows revision of MAXTRY/NUFF in light of the reported mean repetition time.
And secondly, you get to choose whether the pattern selected from each Spin
series for bootstrap accumulation is (a) the one having the highest rating
under the current parameterization of HYBALL's pattern-quality measure, or
(b) the pattern that recurs most frequently under Spin search with these
rotation-control settings.  If (b) is elected, a parameter controlling how
finely pattern differences are discriminated (GAP) must also be chosen.

       The target pattern's only direct role in HYBOOT's production of its
bootstrap repetitions is to serve as the template for pattern alignment.
That is, the pattern columns of each bootstrap solution are permuted and
reflected into closest match with the target pattern before it is added to
the running accumulation of results.  But the target pattern is also salient
interpretively in that one of the summary tables in results file SEEBOOT is
the difference between elements of the mean bootstrap solution (coefficients,
communalities, and factor correlations) and the corresponding target elements.


                                  -5-

       Advantages of HYBOOT.  Scarcely any pre-planning or running effort
is required for its effectve utilization.  You need only remember to start
Hyball's normal factoring with Hydata-standard datafiles and, when HYBOOT
is running, occasionally to instruct it to print results or continue with
more repetitions.  Morever, the only limit on the size of HYBOOT's sampling
collection is the computer time you can spare for this job.  And when you
save the full array of intermediate and production files leading to the
target pattern, notably, its COV-file, MODA extraction pattern, and the HYBUF
rotation archive in which the target pattern is contained--which you are
strongly advised to store on a floppy disk dedicated to this solution if
it is one you intend for interpretation--only modestly more space will be
required to store with these the bootstrap accumulation in BOOTDATA as well.
You should, however, modify the archival name of this BOOTDATA and the
SEEBOOT you take from it to distinguish them from other BOOTDATA and
SEEBOOT files that you want to save.

       Disadvantages of HYBOOT.  The bootstrap rotations collected by HYBOOT
are produced exactly like the target pattern only if that was a Spin solution
picked either as best by criterion or most commonly recurrent at the same
GAP grain picked for the HYBOOT run.  But even when the target pattern's
production is not by Spin or has been chosen from the Spin collection by
consideration, say, of substantive interpretability, it is hard to think of
circumstances under which this production difference would significantly
degrade the bootstrap summary as an estimate of sampling noise in the taret
pattern.

   Note 1.  At present, SEEBOOT reports only nonrelational parameters of
            each solution element's bootstrap distribution, namely, mean,
            standard deviation, skew, and kurtosis.  But expansion to include
            mixed-moment information will be undertaken if interest warrants.

   Note 2.  In case you wish to delete results already accumulated in your
            current BOOTDATA's collection bin and start anew without calling
            HYBALL to set BOOTDATA again, simply call utility program FIXBOOT
            whose executable code is bundled with that of BOOTSUPP and HYBOOT.
            (Your operating system must, of course, know where to find that.)


