                                                                     April 2003
                             SIMDATA.TXT

  Ŀ
   This document is primarily the manual for creating artificial data with a  
   stipulated causal-path structure, imposed by program SEMCOV, for study of  
   SEM efficacy in source recovery.  But it also covers production of data    
   appropriate for testing other approaches to linear multivariate analysis,  
   traditional or modern EFA in particular. (Note: Names of specific programs 
   in this package, or more precisely basenames thereof having source-code    
   and executable versions respectively demarked by extensions FOR and EXE,   
   are fully capitalized below. In contrast, names of program collections or  
   multi-program procedures have only first-letter caps.)                     
  

 Structural Models: An Overview

     SEMCOV generates a matrix of factor covariances, intended for import into
 production of simulation data by GENSCOR, whose source structure contains as
 much as you may want of the causal-path complexities envisioned by modern
 Structural Equations Modelling (SEM).  Production of covariances so structured
 is computationally simple once the wanted structure is specified; but to bring
 that off you need to be clear on certain orthodoxies of SEM algebra that will
 be reviewed here.  What follows is not the account of multivariate regularities
 I would proffer were conceptual enlightenment the main issue (see PRELOOPS.TXT
 in this package for an introduction to that.)  But little of that matters for
 generating simulation data in compliance with current algebraic orthodoxies.
 Since acronym SEM is generally understood nowadays to denote specifically the
 approach to covariance analysis that emphasizes hypothetico-deductive testing
 of antecently conceived causal-path conjectures with permissibly elaborate
 complexity even with frugally few data variables, and is held by its more
 fervid proponents to have trashed classic exploratory factor analysis (EFA) as
 obsolete (an opinion I do not share), I will use trimmed acronym SM to denote
 generic exploitation of the linear models described here without differential
 allegiance to any partisan version of this.  In particular, SM includes EFA no
 less than SEM.

    The generic model of modern multivariate analysis for explaining the
 covariances Czz among NZ jointly distributed, numerically scaled variables
 Z = {z<1>,...,z<NZ>} (the angle-brackets demark subscripts) posits the
 existence of NS source variables S = {s<1>,...,s<NS>} such that scores on
 Z derive in some poorly specified population from scores on S in accord,
 after centering, with some linear equation

 (1a)                Z = WS

 where W is an NZ-by-NS matrix of real numbers while Z and S are column vectors
 of the variables they respectively comprise or, if you prefer, variables-by-
 subjects score matrices.  (In subsequent formulas, capital letters will
 variously represent matrices of coefficients, covariances, and column vectors
 of variables which can just as well be construed as variables-by-subjects
 score matrices.)  From (1a), it follows that

 (1b)              Czz = WCssW'

 where for any two jointly distributed tuples of variables X and Y (not
 necessarily disjoint), Cxy is the covariance matrix whose i,j'th element
 is the covariance of x<i> with y<j> in the collection of individuals whose
 values on the X-variables and Y-variables are suitably assembled in score
 matrices X and Y.  (Centering - that is, scaling all the variables to have
 zero means - can easily be waived but avoids minor algebraic complications
 that would be a pointless nuisance here.)  In present typeface-empoverished
 notation, an upper-case letter  followed without space by two lower-case
 letters  denotes a matrix of kind  whose rows/columns are distinguished
 from those of other -kind matrices by double subscript  whose components
  and  have referents established by context.

     So far this is boringly familiar to you, except perhaps for seeming too
 simple to subsume the multivariate equations you know best, and not yet
 having made explicit - more on this shortly - that some of the S-variables
 may also be in the Z-set.  But (1)'s simplicity is misleading:  The number
 NS of variables in S is generically much greater than the number NZ in Z; so
 before real data on the left in (1a) or (1b) can be solved for information
 about their alleged sources on the right, massive constraints on (1) must
 be imposed.  Making these notationally explicit expands W, Css, and S into
 partitioned matrices whose products in (1) are most efficiently expressed as
 sums of submatrix products having different computationally salient features
 in accord with one's model constraints.

     The aim of such models in applications is to derive from known scores
 on identified variables Z, and sometimes a known subset of S, informative
 estimates foremostly of parameters on the right in (1b) and, when solution
 of (1b) inspires confidence in its accuracy, for S-score estimates as well.
 (The substantive nature of S-variables on which scores are not antecedently
 in hand is generally unknown, albeit SEM practice invites conjectures about
 that.)  Or more precisely that is true when (1) is a stand-alone model.
 Alternatively, (1) may be part of a larger model wherein some or all of
 its Z-variables are input to, and/or its S-variables are output from, other
 form-(1) submodels.  In SEMCOV's contribution to creation of simulation
 data, variables Z are the common sources (factors, latent variables) whose
 covariances are intended for use by program GENSCOR to create correlations
 among simulated data variables having these common factors.  Given your
 specification of (1)'s W and Css, or rather of the righthand parameters in
 model variant (2b) below after some change of notation, SEMCOV generates just
 the factor covariances.  Scores on factors so covaried, jointly distributed
 with data scores having these common factors according to a measurement model
 (SEM jargon) or 1st-order factor pattern (EFA-speak) that you create separate
 from SEMCOV operations in one of two ways described later, are subsequently
 generated by program GENSCOR as an optional concommitant to its production
 of the data correlations.

     In most if not all substantive applications, the proportion of elements
 in (1) having empirically known/estimated numerical values is insufficient
 for algebraic identification/estimation of the rest unless we hypothesize
 many additional model constraints which must be decently plausible if model
 solution is to merit any payoff respect.  Those which generally seem least
 gratuitous are embodied in (1) by taking W and Css to be "sparce" (lots of
 zeros) in certain special ways.  (Other element constraints are also sometimes
 imposed, but those are best viewed as add-ons to be invoked only if zeroing
 needs help.)  Zeros have major theoretical significance albeit where to put
 them in W and Css is often problematic:  Given the model's causal-order and
 linearity premises, the weight W<i,j> in row i and column j of W should be
 zero if[f] variable s<j> has no effect on variable z<i> in this population
 that is not entirely mediated by other variables in the S-set; and Css<i,j>
 should be zero if[f] variables s<i> and s<j> have no common causes, direct
 or indirect, apart from influences that are invariant over the members of
 this population.  (In modern logic, "iff" is standard shorthand for "if and
 only if"; here, "if[f]" is to be read as nonstandard shorthand for "if and
 almost always only if".  Sampling noise contaminates this ideal import of
 zeros in Css, but not in W.)

     When specifying a form-(1) model, it is useful to guide one's first
 pass at zeroing by partitioning Z's sources modeled in S between two major
 subsets, endogenous vs. exogenous, with the latter further divided when
 convenient between common-exogenous and unique.  (Cross-cutting these is the
 important "manifest"/"latent" aka "observed"/"unobserved" contrast which you
 surely don't need clarified.)  A variable z<i> or s<j> in a form-(1) model is
 "endogenous" if it is in the model's Z-set (local output), or "exogenous" if
 it is only in the model's S-set (local input).  And an input variable s<j>
 is "unique" (otherwise "common") just in case it is modeled as directly
 affecting just one of the model's outputs, that is, iff all but one of
 the coefficients in column j of W are zero.  This endogenous/exogenous
 distinction is very much relative to the specific model at issue - an
 important qualification, because a variable's status in this respect is
 not invariant over all compatably different models that contain it.  In
 particular, when a compound model concatenates two or more submodels,
 variables that are local output of one submodel may be local input to
 another.

     When (1) is rewritten to make the endogenous/exogenous distinction
 explicit while also providing separation of common vs. unique inputs, the
 generic linear model schema becomes

 (2a)             F  =  AF + BG + DE

 where F and E are NF-tuples respectively of endogenous and unique variables,
 G is an NG-tuple of exogenous variables with NG advisedly but not requisitely
 smaller than NF, and A,B,D are matrices of numerical coefficients with D here
 and henceforth diagonal.  And writing M~ =def M+M' for any matrix M to reduce
 visual clutter, the model covariances entailed by (2a) are

 (2b)       Cff  =  ACffA' + BCggB' + DCeeD' + [ACfgB']~
                          + [ACfeD']~ + [BCgeD']~

 Algebraic formulas (2), possibly concatenated with supplementary SMs whose
 local output is G or E, or whose local input includes F, comprise the generic
 (fully permissive) SM schema.  In applications, however, we want sleeker
 formats that express our models' solution-enabling presumptions more deftly.
 There are assorted moves that one can make on this; but most important when A
 is not null is what the SEM literature calls "reduced form":  If we subtract
 AF from both sides of (2a), note that F-AF = (I-A)F, and to keep
 notation simple define

            Q =def Inverse(I-A) ,  Qb =def QB ,  Qd =def QD ,

 it is plain with one small caveat that (2) is algebraically equivalent to

 (3a)                F  =  QbG + QdE

 (3b)       Cff  = QbCggQb' + QdCeeQd' + [QbCgeQd']~ .

 (The caveat is that I-A must be invertable, that is, none of its rows or
 columns can be a linear combination of the others.  Only by contrivance
 should this ever be a problem in SM practice.)  It can be shown that if the
 coefficients in (2) are causal then so are the ones in reduced-form (3) so
 long as matrix A is "recursive" (clarified below):  The difference between
 these two structures is that (2a) explicates how the causal effects of
 variables G,E on F described in (3a) are mediated by some of the F-variables.

     Computationally, the manifest simplicity of (3) is deceptive insomuch
 as solution algorithms that solve Cff for source parameters under a SEM
 hypothesis positing intra-F dependencies require nonlinear optimizations
 whose loss gradients are explicitly functions of the open parameters in (2).
 Even so, (3) helps to clarify the difference between endogenous and exogenous
 variables, and is essential to SEMCOV's production of simulation data.
 Moreover, there now exist EFA-based solutions for path structure that first
 solve (3b) for Qb,Cgg without elementwise constraints thereon, and seek
 afterword to pull A out of Qb by inductivist search for simple structure.
 (See note on Hyblock, later.)

     When specifying a SM, either for fit to extant data or to create a
 simulation, we have great flexibility in how to parse our holistic form-(1)
 model schema into the constituents differentiated in (2) by F,G,E:

    a) Any variable in G can be shifted into F by modeling some input
         for it, even if just one unstructured E-antecedent.
    b) Any or all of DE can be made part of BG merely by algebraic
         rearrangement.  Or, conversely,
    c) We can replace BG + DE in (2a) simply by NF unique variables
         U while positing in a side model that Cuu in this compressed
         (2b) has the complexly structured covariances entailed by
         U =def BG + DE.  In what follows, we will refer to such
         U-variables, each of which is contrived unique to just one
         endogenous factor by gathering together all the exogenous
         impingements on this factor, as the "exo-roots" in model (2).
         (I would prefer "exogenous condensates" were seven syllables
         not so unwieldly.  Neither of these labels is standard SEM
         lingo.)  Henceforth writing M for the inverse of any invertable
         matrix M (i.e. MM = I = MM), with  substituting for the
         standard -1 exponent unavailable in ASCII typeface, this reduces
         the notational complexity of (2) to a stunningly simple

 (4a)           F  =  QU             Q =def (I-A)    
 (4b)         Cff  =  QCuuQ'        U =def BG + DE 

 Our use of these tactics should be guided by how causally interconnected
 we believe, or in simulations want, the more remote antecedents of F to be.
 So far as possible, we would like to make those so endogenously explicit by
 tactic (a) that G and Cee shrink respectively to null and diagonal, allowing
 us to exploit model form (4) with Cuu diagonal.  How best to undertake that
 is far from clear in applications but doesn't much matter for data simulation.
 So it suffices here to note that developing a particular model in SM practice
 standardly comprises foreground concern for a posited array F of endogenous
 factors (latent variables) whose interdependencies and manifestations in a
 set Y of observational indicators have been precisely formulated up to but
 not including numeric identification of path coefficients not presumed zero,
 supplemented by some fuzzier notions, initially managed by tactic (c), about
 the exogenous sources of F - in short, a model concatenation of form

 (5a)        Y = WF + E,  F = AF + U,  U = ?

 Under the standard albeit dubious presumption that measurement errors E are
 uncorrelated both among themselves and with F, so that Cfe = 0 with Cee a
 diagonal De, (5a) entails that the data covariances analyze as

 (5b)  Cyy = WCffW' + De,  Cff = QCuuQ',  Cuu = ??  ( Q =def (I-A) ) .

 where we write Dz for any covariance matrix Czz stipulated to be diagonal.
 But to solve Cyy for W and A, or for SEMCOV to derive simulated Cyy from
 stipulated W and A, Cuu cannot remain vague:  Applications need a low-
 parameter guess at this, while form-(5) simulation of Cyy cannot proceed
 without some numerical specification of it.  Unless we can convince ourselves
 that our model's exogenous sources have no common sources of their own (one
 of SEMCOV's options yielding diagonal Cuu), we should by rights posit a
 form-(3) model whose output is U - as SEMCOV also allows.  But that puts
 decomposition of correlated exo-roots into our concatenation of submodels,
 launching a potentially infinite regress that can be truncated only by
 positing some configuration of exogenous covariances for which no SEM-modeled
 explanation is proffered.  So SEMCOV encourages you to set Cuu in (5b) as a
 scatter of obliquities in a sea of zeros, assigned by whatever balance
 between precise control and randomization you prefer. (More on this below.)

        An additional constraint on solving or simulating SM-structured
    data, which can just as well be mentioned at this point though you can
    largely ignore it until late in the game, is stipulation of scale units.
    Whereas real-world values of data variables and their posited source
    factors are _attributes_ of the objects under study, the score matrices
    in SM equations comprise just numbers that are scale representations of
    those attributes.  Any dimension of attribute alternatives can be scaled
    numerically in arbitrarily diverse ways; and while some of these may
    well be more desirable than others (scaling theory has debatable things
    to say about that), any numerical scale for a data variable or factor
    in a SM application can be replaced with no loss beyond convenience
    by any linear transformation of that scale so long as the old scale's
    representational significance is also transferred correspondingly.
    All parameter matrices in equations (1)-(5) above are scale-specific,
    meaning that if we shift from one more-or-less arbitrary scaling of the
    variables therein to another, the model's best-fit parameters change
    accordingly - though one beauty of linear analysis is that perfect fit
    or more generally proportionate best-fit error can always be preserved
    under linear rescaling.  Absent good reasons to choose otherwise, it is
    established SM practice to stipulate unit-variance ("standard") scales
    for all variables at issue except, optionally, unique residuals whose
    variances are usually more convenient to leave in Cuu than to pull out
    as a diagonal coefficient matrix of standard deviations multiplying
    standardized Cuu fore and aft in formulas.  SEMCOV will continue this
    orthodoxy so far as feasible, with Cuu remaining unstandardized, until
    such time as its relaxation becomes wanted.

       Variance standardization is mentioned here to warn that this is
    a little tricky in (5).  Standardizing the F-variances after initial
    derivation of Cff from F = AF + U under a free choice of A and Cuu
    requires both a row/column rescaling of A and variance adjustments
    in Cuu that yield exo-root variances generally less than unity.
    Details later.

     Finally, solvable SEM models are finished by lavishing simple-structure
 zeroes on A and any other coefficient matrices that appear in a concatenation
 of submodels.  These constitute one's specific conjecture on how impact of
 the model's exogenous sources works its way through a sparse web of directed
 paths to impinge upon the model's output variables.  According to the model,
 variable f<i> additively affects variable f<j> through every path, manifest
 in powers of A, such that for some positive integer r, the pathweight of f<i>
 for f<j> in A^r is nonzero.  If this is true for any r>1, f<i> affects f<j>
 indirectly through one or more sequences f<i> > f<h> > ... > f<k> > f<j>
 of r direct-effect steps in which f<h>,...,f<k> are some of the other
 variables in F.  (The algebra demonstrating that Qb in model (3) expresses
 the combined direct and F-mediated effects of G on F for recursive A is a
 beautiful theorem too digressive to review here.) "Direct" is model-dependent:
 Arguably, all effects modeled as direct are mediated by other variables not
 explicit in the model.

    Apart from the roundhouse point that a concatenation of submodels can
 always be algebraically squeezed into a single-step model of form (1)
 containing zero pattern blocks that the concatenation suppresses, for
 example conjoining the measurement and endogenous factors in (5a) as

                                     
               Y       0  W  Y     E 
 (5c)              =            +        ,
               F       0  A  F     U 
                                     

 there seem to be no prevalent patternings of zeros in (2) save one that
 yield useful subgeneric simplifications of (3) or (4).  But the exception -
 recursiveness - is fundamental even though it doesn't alter the matrix
 form of (3).  Model (2) is "recursive" iff none of the paths traced by
 powering A carry any F-variable into itself.  This is true just in case,
 under a suitable ordering of the variables in F, coefficient matrix A is
 subdiagonal, that is, all nonzero endogenous pathweights are off-diagonal
 in A's lower triangle.  (Note: Contrary to the many SEM texts that describe
 such matrices as "triangular", not all triangular A are recursive.  Matrix
 algebraists allow matrices they consider triangular to have nonzero diagonal;
 SEM recursiveness does not.  In principle we could just as well permute the
 factor order to make A superdiagonal instead; but lower triangles, which are
 nicer than upper triangles to program and print, are industry standard.)
 An obvious benefit of recursiveness is that at least NF(NF+1)/2 of the
 NFNF coefficients in A are zero; and A will be explicitly subdiagonal
 just in case the variables in F are so ordered that each f<i> therein
 precedes every f<j> directly or indirectly affected by f<i> in this
 model.  Recursiveness is prerequisite to some solution methods, Hyblock
 in particular.  But most important for applications, this is arguably
 required for a SEM solution to be interpretively meaningful.

          Nonrecursive structural models appear to violate our most
      fundamental intuition about causality, namely, that no event can
      be a cause of itself.  But some SEM theorists think there are
      ways to interpret nonrecursive models that evade this intolerable
      implication.  For present purposes the dispute is digressive:
      SEMCOV can create artificial data satisfying nonrecursive models
      just as easily as it can instantiate recursive ones.  Even so,
      should a more extensive opinionated overview of recursiveness
      issues interest you, browse the LOOPS.TXT document (also
      PRELOOPS.TXT and MORLOOPS.TXT) included in this package.


 Creating SE-structured Factor Covariances with Gendata program SEMCOV.

     Operation of SEMCOV as a computer program is quite simple once you
 have decided on what you want from it; but working out specifics thereof in
 terms of SEMCOV options, which may at first perplex you despite considerable
 on-screen advice, requires advance planning.  (That is, you need to do some
 conceptual heavy lifting before calling the program to implement your intent.)
 Here are some guidelines to consider, prefaced by certain distinctions that
 recur in these and a flow diagram of model-components assembly.

 SEM-focus vs. EFA-focus:
    The classic EFA model structure is form (5) with A = 0, whence
    Q = I, F = U, and Cff = Cuu - which reduces (5) to Y = WF + E and
    Cyy = WCffW' + De, with solutions for Cff generally unconstrained
    apart from variance standardization.  In contrast, solving Cyy for
    non-null A with (usually) a sparce recursive structure is definitive of
    SEM analyses.  We shall say that a production of simulated covariances
    Cyy in accord with structure (5b) is "EFA-focused" if its A-matrix is
    null and is "SEM-focused" otherwise.  Note that basic SEM-focused
    simulation creates source covariances (ignoring measurement noise) at
    two stages of production, distal Cuu leading to proximal Cff leading
    to observable relations Cyy, whereas basic EFA-focus simulates just
    Cff leading to Cyy.  ("Basic" here recognizes that advanced SEM/EFA can
    also pick up the recovered Cuu/Cff as a target of iterated analysis often
    referred to as "higher order" factoring.)  Also note that since Gendata's
    creation of Cuu in SEM-focus or Cff in EFA-focus has no mandated
    constraints apart from scale adjustments, you are free to generate
    this or a collection of alternatives for it prior to its use for Cyy
    production under any form-(5) model you consider appropriate.

 Modeled vs. free-form root covariances:
    By "root covariances" Crr let us mean Cuu in SEM-focused simulations or
    Cff in EFA-focused ones.  Crr can be generated as output of an SE model
    and archived for upload when wanted or, less arduously, constructed
    free-form with modest advance planning that can be waived altogether.
    "Free-form" creation of Crr assigns zeros to its off-diagonal elements
    except for a scatter of obliquities (appreciably nonzero values)
    positioned within Crr either randomly at time of need or at the specific
    locations (row/column coordinates) you stipulate during a prior run of
    program SCHEMAS.  Free-form obliquities simulate common-source relations
    not derived from any SE model of Crr's production, as a whimsical deity
    might bestow.

 Block-structured vs. free-form factor patterns:
    Simulation package Gendata was originally developed for easy production
    of EFA simulations with considerable diversity in the items' factor-
    loading complexities initiated by GENPAT for input to GENSCOR.  But
    the block-parsed framework within which GENPAT assembles these is not
    entirely congenial to creation of the low-complexity measurement models
    favored by SEM; so stand-alone program SCHEMAS now alternatively enables
    schemata of free-form factor patterns to be archived for upload and
    instantiation when wanted in a SEMCOV production.  You should ignore
    the GENPAT option until you have become practiced with these programs'
    data simulations using free-form factor patterns.  You may well find
    that running SCHEMAS is not just easy but fun.  (Note: For programming
    reasons, SCHEMAS does not at present allow any matrix to exceed 99 rows.
    Should this ever become a bind, programming can be renegotiated.)

  Ŀ
                               Ŀ                                
                                 SCHEMAS                                  
                            < w,  a,c >Ŀ                             
                                    Ŀ             
            Ŀ                     > A, Cuu               
              GENPAT        Ŀ         SEMCOV               
                 w   >>  W,Cff  <<  Cff                 
                     GENSCOR                      
                                 Cyy,Y,F                                  
                                                               
                                   payout                                   
                                                                            
    FLOW OF FORM-(5) MODEL SIMULATION.  Above the program name in each box  
    are listed the model-parameter matrices, if any, this program requires  
    to create the output named below it.  Lower-case w,a,c in the SCHEMAS   
    box denote schematic W, A, and Cuu/Cff which are numerically completed  
    by the programs that upload them.  Population scores on the data        
    variables and common factors (score matrices Y and F) are optional.     
  

 GUIDELINES 1: Planning and pacing your simulations.

     1.  Primary payout of Gendata program GENSCOR or SEMCOV is a covariance
 matrix Czz either for simulated data variables (Cyy from GENSCOR) or for
 their endogenous common sources (Cff from SEMCOV) on which a form-(5) model
 structure has been imposed.  Each derives its Czz in part from a root
 covariance matrix Crr which can be created (a) at runtime if you are willing
 to let the program semi-randomize Crr's obliquities, but otherwise requires
 advance preparation either (b) as a template (easily created in SCHEMAS)
 whose numeric specifics can be set at time of use, or (c) as finely structured
 correlations induced by a model of these roots' more remote determiners.
 Option (c) enables your present simulation to impose a specified pattern of
 your 1st-order factor's dependence on their 2nd-order common sources (EFA
 focus) or a causal-path structure in their exo-roots' constituents (SEM
 focus).  But it also requires considerably more advance planning than do the
 other Czz options, as well as some experience with the Gendata procedures.
 Even so, once you become familiar with these programs using easy-Czz options
 (a) and (b), you will also know how to create an SE-structured Czz for
 upload during a later production of Cyy.

     2.  All simulations enabled by this package are best conceived as
 instantiations of equations (5) above.  That is, attaining the payout
 charted above is a process that should start with a numeric detailing
 of W, De, A, and Cuu that in principle you work out prior to calling
 any of the Gendata programs albeit your hardcore need to know at outset
 is only where zeroes are to go in W, A, and maybe Cuu.  Here are your
 setup choices needed in more or less the order you need to make them:

    a) First of all, do you want some endogenous interplay (non-null A) in
    the common factors F directly underlying your simulated data covariances
    Cyy?  That is, is your simulation SEM-focused or EFA-focused?  Deciding
    this is not so much a first step as a choice of direction for that.

    b)  Second, choose your simulation's number NF of endogenous factors,
    which in SEM-focus is also the number of exo-roots.  If you want your
    measurement submodel to be patterned in accord with some block style in
    GENPAT's repertoire, selection of that should precede your choice of NF.
    Otherwise, you can defer measurement planning until later though of
    course you have nothing lose by giving thought to that at the outset.

    c)  Third, decide whether (i) your simulation's purpose requires its root
    covariances Crr (that is, Cuu in SEM-focus or Cff in EFA-focus) to have
    an explicitly modeled SE structure of their own, or (ii) it suffices for
    you to insert some obliquities (departures from orthogonality) into Crr
    without concern for what production model might yield those.  If (i),
    which may well be wanted in an advanced simulation study, creating this
    Crr utilizes Gendata procedures not yet fully familiar to you if you still
    need the present guidelines.  So you should initially practice Gendata
    using free-form Crr, advisedly starting with its variant that Gendata
    programs using Crr create at runtime with little or no preplanning
    required.  But you will soon learn how easily a prior run of SCHEMAS
    will let you specify free-form Crr as finely as you want short of
    SE-modeling it.  In brief, your need for outset Crr planning can range
    from negligible to severe, with negligible your best choice when
    you are unsure of what you want.

    d) What comes next (though you can also plan it earlier in this sequence)
    depends on your simulation focus.  EFA-focus omits this step.  But if
    SEM-focused, you must decide which pathweights in your NF-by-NF A-matrix
    are to be nonzero or, more broadly, must archive one or more direct-path
    schemata you would like to be available when program SEMCOV asks for your
    choice of A.  You create such path structures by calling program SCHEMAS
    and, for each A-schema you define in one or more runs thereof, enter a
    list of triples <i,j,L>, each of which assigns to the A-cell having
    row/column coordinates <i,j> a letter L that goes proxy for a pathweight
    to be generated at use-time under the control of L-specific parameters
    that can also be specified when this schema is created.  A schema's
    cells to which you assign the same L will have the same L-proxied
    production parameters.  Schemata once saved can easily be revised by
    another run of SCHEMAS, and their L-parameters can be respecified when
    uploaded for use.

    e) SEMCOV time (SEM-focus only):  Once you have schematized the path
    structure (A-matrix) you want, and have either readied a Cuu (modeled or
    schematic) to upload or plan to let SEMCOV generate this, running SEMCOV
    yields your model's endogenous-factor covariances Cff with scarcely any
    further effort from you beyond selecting from the A and Cuu schemata you
    have stored which ones you want to use and approving/revising their proxy
    entries' parameter settings.  (If you don't remember the filename of your
    wanted A or Cuu when more than one passes SEMCOV's screen for NF, the
    program lets you browse these.)

    f) In penultimate preparation by either focus  though this can also
    be done at any prior stage  production of output covariances Cyy also
    requires prior creation of either a raw NY-by-NF output pattern W by
    GENPAT or the schema of one by SCHEMAS.  GENPAT has been designed for
    EFA-focused assembly of common-factor pattern blocks wherein items have
    generally diverse factor complexities while each factor is generally
    salient for more items than applied SEM seems to favor.  So if your
    simulation is SEM-focused you will probably find it much preferable to
    create free-form W-schemata.  SCHEMAS produces these exactly as it does
    A-schemata, except that W requires NY to be larger than NF.  In both
    cases, results are saved under basenames that identify the schema's type.
    SCHEMAS is so easy to use that you may well find yourself making more
    schemata of assorted types than you really need, just for fun.

    g) Finally, when you have made all the preparations described above, run
    GENSCOR to obtain the simulated data covariances Cyy and, if you want
    them, a population of standardized scores on variables Y and F satisfying
    Y = WwF + E and F = AaF + U for identified rescalings Ww and Aa of the
    raw W and A you entered previously.  Other than choosing which previously
    prepared alternatives for W and Cff should be uploaded and whether to
    change parameter settings if either of these is a schema, the only
    runtime decisions GENSCOR requires of you is allocation of measurement
    noise and, if you want to override the joint score distribution's default
    multi-Normality, what departures from that you want to try:  Should you
    wish to study whether non-Normal skew and kurtosis appreciably affects
    source recovery, GENSCOR provides "twist" disturbances of its Normal-
    random generator's output that give you considerable indirect control over
    the common/unique's factors' 3rd and 4th marginal moments.  Or you can
    simply waive score production if Cyy suffices for your simulation purpose.


 GUIDELINES 2: Program esoterica to appreciate or at least comprehend.

    Indexing your model's source factors

        When diagramming causal dependencies, or more generally the relations
    in any partially ordered set, we are strongly disposed (quite possibly
    derived from the left-to-right sequencing of written English) to place
    effects to the right of their causes.  But modern mathematics, when
    stating that an entity y is some function  of an entity x, standardly
    writes this as a leftward dependency "y = (x)".  In particular, formula
    F = AF + U in SEM-model (5a) above posits that each f<i> on the left
    is directly dependent on just the f<j> in this F-set having nonzero
    coefficients in the i'th row of matrix A.  Accordingly, when such a model
    is subdiagonal recursive and a schema for A is entered by listing <i,j,L>
    triples, the indices j of factors declared to be direct sources of factor
    f<i> must all be smaller than i  just the opposite of putting causes to
    the left of their effects when drawing a model's path-diagram or indeed
    listing a sequence of indices for most other reasons.  Consequently, when
    running SCHEMAS to create path schemata you may well be disposed to enter
    direct-path index pairs in the wrong within-pair order.  The program will
    set the correct order anyway if you declare that the path matrix is to be
    subdiagonal, as should always be your choice for recursive models; but
    it can't correct order inversions if you elect to enter a nonrecursive
    path structure.  So be alert to this prospective bungle; you'll be more
    susceptable to it than you expect.  And be sure to acquaint yourself with
    this package's utility program ORDER for helping you to choose the most
    felicitous indexing of your endogenous factors and corresponding exo-roots
    if your path structure is at all complicated.


    Pattern scaling.

        Even if you declare precise values for all nonzero elements in your
    simulation's coefficient matrices (W and A) rather than randomizing
    within assigned bounds, these will not be the numeric values reported
    in this simulation's logfile of results.  This is due to imposition of
    standardized scales (unit variances) on the variables these coefficients
    affect, which perforce backs up into the material constructively
    antecedent to the variables rescaled.  Consider a generic SE equation
    Z = BX + U wherein variables in X may also comprise all or part of Z.
    If R and S are diagonal matrices of rescaling multipliers chosen to
    give the variances in Czz and Cxx stipulated values, this rescaling
    Zr =def RZ and Xs =def SX correspondingly rescales their SE dependency
    as Zr = [RBS]Xs + RU.  In an EFA model or SEM measurement component
    Y = WF + E here, F is standardized at outset by creation of Cff as
    a correlation matrix, which simplifies the rescaling formula to
    Yr = [DA]F + DE where D inverts the diagonal matrix of standard
    deviations the Y-variables would otherwise have were rescaling omitted.
    Matrix A in this formula is numerically the one you explicitly make input
    to the construction, and the rescaling preserves the ratios of loadings
    in each row though not between rows.  But norming F in a SEM's system-
    structure submodel F = AF + U, isn't quite so simple.  For any rescaling
    Fr = DF of F, substituting F for both Z and X, and D for both R and S,
    in the generic lemma above yields Fr = [DAD]Fr + DU  which tells us
    how to rescale the path matrix but only after we determine what scaling
    multiplier D is wanted.  To get that, we need this submodel's reduced
    form F = QU (Q =def (I-A)), which entails that raw Cff equals QCuuQ'.
    Since Q and Cuu are both known by construction, we can easily compute the
    diagonal of this raw Cff; and the diagonal matrix comprising the inverse
    square-roots of those terms is the D that yields pathweight matrix
    [DAD] for the standardized endogenous factors.  Note also that the
    outset Cuu which you initially imported or created internally as a
    correlation matrix is replaced by DCuuD' comprising exo-root covariances
    that are no longer standardized as correlations.  You needn't be concerned
    with these computational details; just be aware that the A-matrix recorded
    in your results log  the path coefficients that SEM analysis of this
    simulation Cyy hopes to recover  differs from your initial numeric
    stipulation of raw A by both row and column rescalings with column
    multipliers that invert the row multipliers.


    Obliquity tweaking.

        One problem for free-form creation of Crr (EFA-focused Cff or SEM-
    focused Cuu) is that correlation matrices so produced can easily be
    inadmissible.  This is because solution of an SE-model is almost always
    indeterminate unless none of its factors on which some model parameters
    are open is an exact linear function of the other co-modeled factors in
    the population at issue.  In principle, this requirement is satisfied
    just in case all eigenvalues of Crr are positive, though if Lo is Crr's
    smallest eigenvalue, a sufficiently small positive Lo can also defeat SE
    solution of any simulation data grounded on Crr.  Happily, a raw Crr can
    always be tweaked into eigen-acceptability no matter how degenerate may
    be its original.  Specifically, it can easily be shown that if C is an
    order-N symmetric real unit-diagonal matrix whose smallest eigenvalue Lo
    you would like increased to s  1, multiplying all off-diagonal elements
    in C by (1-s)/(1-Lo) yields an order-N symmetric real unit-diagonal matrix
    having the same eigenvectors as C but s for its smallest eigenvalue.
    GENSCOR and SEMCOV both avail such tweaking of freeform-created Crr,
    enabling you to guard against inadvertent degeneracies and, should you
    consider this worth exploring, to study how SE solutions deteriorate as
    a function of Lo's proximity to zero.


    Item names.

       In SE applications, data variables are usually assigned names designed
    to manifest, or at least helpfully code, their substantive identity.  And
    although simulation variables lack substance, it is it is still possible
    to give them feature-coding names by which their individual identities
    can be tracked through augmentations, deletions, and permutations of
    the original item set.  Gendata currently does this by assigning each
    simulated data variable a name, comprising one or more capital letters
    perhaps followed by one or two integers, that identifies which factors in
    the output pattern are most salient for them.  Specifically, with factors
    up to L (the Lesser of NF,26) sequentially indexed by the first L alphabet
    letters, each output variable's name comprises the letters indexing factors
    salient for this item in decreasing order of salience up to a maximum of
    six followed, if more than one item has this same salience profile, by
    numeric ranking on the leading factor within this common-profile set.
    (A factor loading is "salient" if implicitly or explicitly declared so
    in the raw pattern's stipulation.  Should a simulation's NF exceed 26,
    its item naming this way perforce ignores factors after the 26th.)


    Gendata filetypes.

        One of Gendata's lesser felicities is its profusion of workfiles.
    The problem with these is not so much their potential numerosity as
    your need to recall which ones contain the material you need at the
    moment you need it.  An effort has been made to mitigate this load by
    giving transfer files basenames that identify the subset matching your
    current demand features; but you still need an overview of the filetype
    repertoire.  And each simulation you finish requires your choice of a
    basename common to its payout fileset that will memorably distinguish
    this from other simulation products you may accumulate.  Focused
    complaints and suggestions for improvement are welcome.  Meanwhile,
    here are the names of files that matter produced by each Gendata
    program that matters.

        SCHEMAS.  The schemata created by this are of four kinds with names
    respectively of form PAT<nvnf>.<i>, APATH<nf>.<i>, APATR<nf>.<i>, and
    COV<nf>.<i>.  The PAT*.* are schemata of NV-by-NF measurement patterns W.
    The APATH*.* and APATR*.* are schemata of NF-by-NF pathweight matrices A,
    with "R" identifying that this path structure is subdiagonal and hence
    recursive whereas "H" flags that it is not subdiagonal and may well be
    nonrecursive.  And the COV*.* are standardized (unit-diagonal) covariance
    schemata.  Each <nf> and <nvnf> is a digit string identifying the
    schema's dimensionality.  And each <i> is an index in the list of this
    basename's repetitions.  Since these schema files are binary, they cannot
    be read in a text editor; however, for each basename, you can instruct
    SCHEMAS to print all the currently saved schemata named <base>.* to a
    textfile named <base>.LST for your contemplative convenience.

        SEMCOV.  A successful run of this writes just one output file,
    written in ASCII so you can easily inspect its contents.  Its name
    has form KOV<nf>.<ext> with <nf> identifying the number NF of this
    simulation's common factors.  Its extension ends in an index, but
    you choose <ext>'s beginning (one or two letters) to be whatever is
    mnemonically best for you.  And later you can use DOS command RENAME
    to extend basename KOV<nf> by three or four letters to increase its
    distinctive recognizability.  (But don't disturb the starting KOV<nf>,
    else GENSCOR won't recognize it.)  A KOV-file's primary content, written
    after a couple of text lines that you can edit so long as you don't
    start any header line with a numeral, is an exo-root correlation matrix
    Cuu on which GENSCOR can build SEM-focused simulations having <nf> common
    factors.  (Its format is Hyball-standard, described in this package's
    COVFMT.TXT document.)  Following that in the KOV-file is information
    about this Cuu's production which you can delete or edit at whim
    though it will be prudent to save this information in some form.  (That
    is, KOV-files are also logfiles.)  Should you wish, you can collect
    KOV-files to be available when GENSCOR calls for Cuu candidates.  Don't
    forget that any Cuu in this KOV-collection can be used repeatedly for
    GENSCOR to conjoin with varied choices of causal-path and measurement
    patterns.

        GENSCOR.  This produces two or, if you choose, three payout files
    respectively named <base>.COV, <base>.LOG, and <base>.POP.  Their common
    basename can be whatever works best for you, though <base> should be at
    most six characters to let expansions thereof be basenames of ensuing
    files you derive from these.  <Base>.COV contains little more than
    the simulated data covariances Cyy, in Hyball-standard format (see
    COVFMT.TXT), which presumably you want either for testing source
    recovery by one or another EFA/SEM solution procedure or to be the Cuu
    in a larger simulation having SE-structured exo-roots; details on its
    production are in textfile <base>.LOG.  (To use <base>.COV as exo-root
    correlations, you need to rename or copy this to KOV<nf,end>.<ext> for
    the appropriate NF and your choice of <end> and <ext>.)  Finally, if you
    have called for scores on these data variables in a size-NS population
    (your choice of NS) satisfying your stipulated model structure, <base>.POP
    is a binary file containing NS score records comprising each subject's
    scores on the NY data variables followed by its scores on the NF common
    factors.  (If wanted GENSCOR can easily be upgraded to include Unique-
    factor scores in its POP-files, albeit those can already be derived
    as U = Y - WF from information now provided using the factor pattern
    W reported in <base>.LOG.)  Each <base>.POP file also includes all the
    <base>.LOG-reported production information, which MORSCOR can recover
    from it should loss of your original <base>.LOG motivate that.

        MORSCOR. Because the binary-coded score populations in POP-files may
    be troublesome for your own software to read (I'll be happy to send you
    details on how these are written if you want your system to upload them
    directly, or can send you a translation program if you will advise me
    what data-input coding works for you), program MORSCORE in this package
    transcribes these into Hydata-standard ASCII format.  (See documentation
    file HYSTAND.TXT, included in this package, for details on that.)  MORSCOR
    can do other things with these raw data as well  the program will advise
    you of its current avails when you run it  and, when conjoined with the
    Hydata-supplement programs also included in this package with documentation
    in HYSTAND.TXT, gives you considerable versatility at manipulating these
    scores.

       At present, MORSCOR's only options beyond retrieving the dataset's
       production specifications are (a) partitioning the score records
       in a received POP-file into your choice of NG groups of equal size
       distinguished by an index prepended to the record IDs, followed if
       you wish by (b) writing these groups' records in NG separate files
       which can then be manipulated by the Hydata-supplement programs in
       group-specific ways.  But MORSCOR can also sort records into groups
       by user-selected criteria once user feedback identifies what
       selection criteria might be wanted.  Note, however, that many
       score manipulations and record sortings can already be done by the
       Hydata-supplement programs RESCORE/SELECT/MERGE included here.

    Because MORSCOR's output scorefiles are Hydata-standard, their names
    have form <base>.D<i> wherein i is a sequential index and <base> is
    free for you to choose with the caveat that if you intend this for
    analysis by programs in the Hyball package, <base> should be no longer
    than six characters to leave room for indices appended thereto in the
    names of files that ensue from Hyball factoring of D-files.


    Bootstrapping

        Should you wish to study the sampling error in SE solutions, Gendata
    affords two ways to produce bootstrap-sample correlations.  One is
    program SAMPLCOV, which uploads a matrix construed to contain population
    correlations for NV variables and, using the Odell & Feiveson algorithm,
    simulates computing these correlations for size-NS random samples of
    this population under presumption that the population distribution is
    multi-Normal.  Each of these O/F-sampled COVs is written to a separate
    ASCII file whose name conveys considerable information about its content,
    with indexing of repetitions.  The Gendata package currently affords no
    digests of this raw bootstrap information, but that can be changed.

        Alternatively, when program GENDATA in the Hyball package computes
    correlations for a received rawscore array for NS subjects, it also avails
    producing up to 156 size-NS bootstrap (random-with-replacement) samples
    from the Hydata-standard datafile whose correlations it is computing and
    returning correlations for each of these samples under filenames whose
    distinctive flags persist through subsequent separate-but-parallel Hyball
    analysis of these data until final results are brought together for
    comparison by program BOOTSUMM.  Accordingly, when you elicit a datafile
    from GENSCOR you can bootstrap-sample it by HYDATA to see how sampling
    noise affects your ensuing SE-analysis thereof.  (Before BOOTSUMM can
    collect and summarize these sampling results for you, however, I will
    need to be apprised of the format in which your SE program reports its
    solutions.  For that reason, BOOTSUMM code is not included in the
    Gendata package but will be responsive to request.)


    Hyblock: An EFA counterpart of SEM

        Hypotheses tested by SEM analysis typically envision a recursively
    ordered set of source factors F, each of which is manifested in a rather
    small number of imperfectly reliable but otherwise mostly pure indicators.
    Suppose that our conception of such a model is relaxed by replacing each
    f<i> in F by a set F<i> of factors whose dimensionality is initially
    unspecified, while the skimpy set of f<i>'s indicators is replaced by a
    more generous set (meager if unavoidable but preferably abundant) whose
    most immediate sources are mainly factors in F<i> but whose diversified
    pattern of loadings on these factors is unconstrained while additional
    direct influences from factors recursively antecedent to F<i> are also
    admissible.  Hyblock, a procedure involving several programs in the
    Hyball distribution package, solves the covariances within and between
    these blocks of indicators for (a) the corresponding factor blocks'
    dimensionalities, (b) the indicators' loadings on these factors located
    by rotation to simple structure, and (c) regression estimates of
    pathweights in the factors' posited dependency structure.  Like all EFA
    recoveries, what Hyblock can so disclose when the underlying causality
    cleanly fits the model's idealized suppositions may well be distorted
    if not misdirected by a reality of greater complexity.  But inductivist
    discovery doesn't expect to get everything exactly right; what matters
    is discerning interpretable data patterns that recur with analyzable
    variation of detail over many replications or, better, over systematically
    varied near-replications.  Its philosophy of science seeks validatation
    of its provisional conclusions not in pass/fail grading of statistical
    hypotheses but in establishing reliable recurrences.  No Hyblock study
    should aspire to the levels of finality allegedly secured by SEM's
    sampling-theoretic rituals.  But a case can be made that it can
    informatively rough in the larger pictures which SEM orthodoxy glimpses
    only though the Popperian peepholes of simplistic causal-path hypotheses
    largely antecedent to evidence.  For more on Hyblock, see my 1997 SMEP
    handout on this included in this package under the name SMEPSHOW.TXT.


    Formatting covariance transfers

        When one program exports material to be processed by another, format
    compatabiliy can become a concern.  Transfers within a program package
    should never fail on that account; but making Gendata's simulation
    products accessible to a diversity of SE programs is rather more
    problematic.  Whether yours can upload raw scores written by GENSCOR
    doesn't much matter if you don't want to work with those at all; and
    even if you do, most operations on them that are likely to interest
    you can be executed by the Hydata-supplement programs included here,
    computation of correlations in particular.  The primary challenge is
    ensuring that your SE program  henceforth call this "Yours"  can
    read GENSCOR's covariance creations.

        Because all Gendata covariance outputs are written in ASCII, these
    can be edited manually to satisfy Yours.  And that may prove necessary to
    at least a modest extent.  But utility program COVFORM here allows you to
    set formatting parameters that should reduce need for that to a painless
    minimum.  COV-files written by programs in the Gendata and Hyball packages
    standardly begin with one or two documentation lines, next a line giving
    the number NV of variables followed by the number, NV*(NV+1)/2 or NV*NV,
    of covariances to be read, and below that the list of these covariances.
    (For more details, see COVCODE.TXT in this package.)  You can easily
    delete or revise the header text if Yours so requires, and the same is
    true of how you communicate to Yours the size of the covariance array
    to be read.  But if at all possible you want GENSCOR's layout of the
    covariances themselves to be Yours-readable without need for editing.
    Utility program COVFORM enables this by letting you specify:  a) The
    maximum number of characters in a line that Yours can read or, if that is
    quite large, a smaller maximum you choose to impose.  (COVFORM currently
    defaults this to 160 characters, which may exceed the line length Yours
    can read.)  b) Whether Yours allows covariance matrices to be entered
    as lower triangles (in which case the array for NV variables contains
    NV*(NV+1)/2 entries), or requires instead that the full NV-by-NV symmetric
    array be entered.  c) Whether Yours allows the stream of covariance
    entries to have line breaks only where demanded by its line-length limit,
    or whether it also wants a line break at end of each matrix row.  (An
    example below will clarify this question.)  d) Finally  what shouldn't
    be a Yours problem but an important option nevertheless  the decimal
    accuracy you want your created correlations to retain.  For your choice
    of integer ND from 2 to 5 (less than 2 is foolish while more than 5 or
    arguably 4 is pointless), GENSCOR covariance outputs are rounded to ND
    decimals and written to their transfer files with decimal point omitted.
    Presumably, it will be routine for Yours to restore the decimal points:
    It needs merely to treat the uploaded Cov-matrix as containing raw
    covariances that still need standardization as correlations.

  ͸
   Here is an example of interplay among COVFORM parameter options (a) - (c). 
   Each array is a fake correlation matrix for NV=7 variables, printed under  
   (b)'s lower-triangle option with ND=2 accuracy omitting the decimal point. 
   In the left and middle panels, option (c) stipulates a line break after    
   completion of the entries for each matrix row: option (a)'s line limit     
   allows completion of each row's entries on one physical line, whereas in   
   the middle panel, rows exceeding the length limit are wrapped.  In the     
   rightmost panel the matrix rows are concatenated as an aspirant one-line   
   string wrapped only where line limit requires.  Chances are that Yours     
   wants only the lower-triangle in a single string that can be broken        
   wherever convenient.  But be ready to discover otherwise.                  
                                                                              
   LLEN (Line-length) > 28     LLEN = 20, row wrap  LLEN = 24, minimal wrap 
  Ĵ
  100                          100                  100  21 100  31  32 100 
   21 100                       21 100               41  42  43 100  51  52 
   31  32 100                   31  32 100           53  54 100  61  62  63 
   41  42  43 100               41  42  43 100       64  65 100  71  72  73 
   51  52  53  54 100           51  52  53  54 100   74  75  76 100         
   61  62  63  64  65 100       61  62  63  64  65                          
   71  72  73  74  75  76 100  100                                          
                                71  72  73  74  75                          
                                76 100                                      
  
  When you run COVFORM, it writes your choice of these parameters to a 4-line
  ASCII file named KOVFMT (no extension) which other programs in this package
  will read without prompting if they want this information.  Should you want
  to change these settings you can run COVFORM again or simply adjust KOVFMT
  in your favorite text editor.


    Twisted score distributions.

        When you accept GENSCOR's option to create a population of joint
    scores on simulated data variables and their common factors, you also
    acquire some control over the shape of these variables' distributions.
    For most purposes you will probably want these to be a size-NS sample
    (your choice of NS) whose expectation is multivariate Normal.  But in
    case you care to study how assorted departures from Normality may or
    may not affect source recovery by your favored SE procedures, or feel
    like playing with non-Normality just out of curiosity, GENSCOR avails a
    "twist" procedure that can impose varied degrees of skew and kurtosis
    on the raw axes of random score production.  When you elect this option,
    GENSCOR reminds you how to set Twist parameters by displaying the
    following information panel:

 Ŀ
  NonNormal random-score distributions here are shaped by parameters RA/RB/P.     
  Each raw score S derives from the output Z of a Normal random generator by      
  giving S the sign of Z while the size (magnitude) of S determines the size      
  of S under a transformation of form                                             
                                                                                  
                  S = (Z) =def (Z+1)^r - 1                                 
                                                                                  
  wherein parameter r is a positive real exponent.  With r fixed, function  is   
  a monotone increasing function on non-negative real numbers that is concave     
  (positively accelerated) when r>1, an Identity transform ((x) = x) when r=1,   
  and convex (negatively accelerated) when r<1.  Accordingly, the distribution    
  of scores generated under fixed r is Normal if r=1, and becomes increasingly    
  leptokurtic (long slender tails, kurtosis > 3), or platykurtic (stubby tails,   
  kurtosis < 3), with r taken increasingly larger, or smaller, than 1.            
                                                                                  
  Further, GENSCOR allows a created distribution's initial Zs to be shaped by    
  under a different r when Z is positive than when Z is negative.  Difference     
  in r between the two distribution sides creates skew which can be quite large   
  if one side is leptokurtic while the other is platykurtic.                      
                                                                                  
    Production of nonNormal scores is controlled by your choice of RA/RB/P        
    separately for the two groups of factors, common and unique.  For each        
    factor f in each group, exponent r is set at a value r1 for one side          
    of f's distribution and at r2 for the other. (r1 = r2 is symmetric.)          
    RA and RB are allowed to vary from leptokurtic 6.0 through Normal 1.0         
    down to platykurtic .167 (1/6) in magnitude; however, you may also enter      
    RA and/or RB with negative sign to randomize as follows:                      
                                                                                  
        Setting                 Action in factor group                            
  1. pos RA, pos RB    Fix r1 = RA and r2 = RB (symmetric if RA = R2)             
  2. pos RA, neg RB    Fix r1 = RA, randomize r2 between RA and RB (asymmetric) 
  3. neg RA, pos RB    Randomize r1 = r2 between RA and RB (symmetric)          
  4. neg RA, neg RB    Randomize both r1 and r2 between RA and RB             
                                                                                  
   Finally you can choose P between 0 and 1 to be the probability that skewed     
   distributions in this group are skewed positively.  (This option has no        
   effect on symmetric distributions.)  And you will be allowed to waive          
   "central smoothing"", which is gradedly random sign reversal of scores         
   near the median to mitigate density discontinuities at the median created      
   by appreciable difference between r1 and r2.                                   
 

    Note, however, that the twist so imposed on raw production scores as they
    emerge from the random generator will generally be shrunk toward Normality
    by subsequent compositing of the initial axes.  This can occur in two ways.
    The first, largely benign even when not evaded by tolerance for sampling
    noise in the unique-factor covariances, is rotation of production axes to
    strict orthogonality.  Although this can strongly push toward Normality,
    Genscor's method of orthogonalization disturbs initial extremities of
    skew/kurtosis only modestly unless sample size is rather small.  But
    rotating the common-factor precursors to achieve an assigned Cff can
    substantially degrade those.  So if you want to study effects of non-
    Normality in the common factors, you should try to keep the off-diagonal
    elements of Cff mostly negligible.  Be advised also that whereas the
    LOG-file deposited by a score-creating GENSCOR run gives only skew and
    kurtosis for its finished common and unique factor axes, the report on
    this that MORSCOR can retrieve contains the initial production axes'
    skew/kurtosis as well.

