STATISTICA







STATISTICA BASIC Program acf.stb

{ Lienert's and Krauth's analysis of configuration frequencies (ACF), in German "Konfigurationsfrequenzanalyse (KFA)"
Dr. Stefan Funke, Braunschweig, Germany (funke@ibm.net)
11/98
(nothing is guaranteed - use the program at your own risk)
Literature: Krauth/Lienert "Die Konfigurationsfrequenzanalyse (KFA) und ihre Anwendung in Psychologie und Medizin", Beltz Psychologie Verlagsunion, Reprint 1995

The ACF is a heuristic tool to analyze large sets of dichotomous (boolean) data, i.e. each variable can have only two states. If more states are desired, dummy coding can be used but makes interpretation harder. Each set of variable states is called a configuration. For example: male (1), smoker (1),no cancer (0) is one configuration and male (1), smoker (1), cancer (1) another. Each variable can contain the states 0 and 1 (false and true) or 100 and 101. The numbers 100 and 101 are automatically assigned by STATISTICA when text labels are entered in a column of empty cells. An additional variable is needed for the count of cases which conform with one type of configuration. Missing configurations in input data are allowed. Their count is assumed to be zero. Missing cell counts are not. Input data must consist of at least two colums which denote the configurations. and one column containing the cell count. See sample data file.

The expected cell frequencies for a given configuration and its chi squared is calculated and the configurations are sorted in descending order according to their chi squared (or their n is required). A key for the codes of all configuations is provided as well as the overall p and the p of the configuration (if ACF is to be used as a true test of significance it must be Bonferroni-adjusted, so the adjusted p is displayed).

This version does not display empty configurations in order to keep the amount of output within reasonable limits.

The program asks for the variables which contain the configurations and the variable which contains the count. Subsequently the codes for true and false can be freely selected. If only two different character strings are used for each configurations the defaults are appropriate. You can choose if you want the output to be sorted on chi squared or n (usually the former is needed) and you can specify if you want the p to be displayed. }

RandomAccess;
NoDataFileVariableNames; 
 
{ Auxiliary procedures } 
  
{ convert num to a binary number of bits bits. The result is returned as a string of +'s 
  and -'s } 
 
sub num2binstr(num, bits,byref binstr$) 
   begin 
    maxval:=2^bits; 
    x:=num; 
    binstr$:=''; 
    for i:=bits downto 1 do 
      begin 
         divisor:=2^(i-1); 
         res1:=x mod divisor; 
         res2:=x div divisor; 
         if (res2>0) then binstr$:=binstr$+'+' else binstr$:=binstr$+'.'; 
         x:=res1; 
      end;   
   end; 
 
{ convert num to a binary number of bits bits. The result is returned as an array of 1's 
  and 0's. bitarry[1] is the least significant bit, i.e. 2^0, because indices start from 
  1 in Statistica Basic } 
 
sub num2binarray(num, bits,bitarray) 
   begin 
    dim bitarray(bits); 
    maxval:=2^bits; 
    x:=num; 
    idx:=1; 
    for i:=bits downto 1 do 
      begin 
         divisor:=2^(i-1); 
         res1:=x mod divisor; 
         res2:=x div divisor; 
         if (res2>0) then bitarray(idx):=1 else bitarray(idx):=0; 
         idx:=idx+1; 
         x:=res1; 
      end;   
   end; 
   
  
{ Main code } 
  
ReDim xvar(NVars),yvar(NVars);  
 
SelectVariables2('KFAlong: Select variables', 2, Nvars-1, xvar, count1, 'Configuration 
variables',   1, 1, yvar, count2, 'Count'); 
if ((count1=0) or (count2=0)) then  abort; { skip if no variables were selected }  
 
redim truthval(4); 
truthval(1):=0; { numeric false } 
truthval(2):=1; { numeric true} 
truthval(3):=100; { text false } 
truthval(4):=101; { text true } 
res:=DisplayNumericInputBox ('Select codes', 'numeric false|numeric true|text false|text 
true',truthval); 
if res=0 then abort; 
numfalse:=truthval(1); 
numtrue:=truthval(2); 
textfalse:=truthval(3); 
texttrue:=truthval(4); 
 
ordering:=DisplayButtonBox ('Select order of output',  'Sort by chi squared|sort by 
number of cases'); 
if ordering=0 then abort; 
 
outmode:=DisplayButtonBox ('Select amount of output',  'Show p|short'); 
if outmode=0 then abort; 
 
ReDim validcase(NCases); 
validcnt:=0; 
for i:=1 to NCases do  
   begin 
       validcase(i):=1; 
       if not(Valid(Data(i,YVar(1)))) then  
           begin 
             validcase(i):=0; 
           end; 
       for j:=1 to count1 do 
		begin 
              dataitem:=data(i,XVar(j)); 
              if not(valid(dataitem)) then  
                begin 
                   validcase(i):=0; 
                end;       
		end; 
       if (validcase(i)=1) then validcnt:=validcnt+1; 
   end; 
 
n_of_cfg:=2^count1; { number of configurations  } 
ReDim f(n_of_cfg),nexpected(n_of_cfg); 
ReDim xsq(n_of_cfg),config(n_of_cfg),q(n_of_cfg),b(n_of_cfg);; 
ReDim colsum(2,count1); 
ReDim res1(n_of_cfg,6),res2(6); 
ReDim configur (n_of_cfg); 
 
 
n:=0; 
for i:=1 to count1 do  
   begin 
      for j:=1 to 2 do colsum(j,i):=0; 
   end; 
 
for i:=1 to n_of_cfg do  
   begin 
    f(i):=0; 
    nexpected(i):=0; 
    xsq(i):=0;   
    config(i):=0; 
    q(i):=0; 
   end; 
 
 
 
 
{1st pass: calculate cell freqs and column sums  
 Iterates over all entries in the database. Counts of multiple 
 entries with identical configurations are totalled 
} 
 
for i:=1 to NCases do  
   begin 
       if (validcase(i)=0) then goto skip; 
       n:=n+Data(i,Yvar(1)); 
       cfg:=0; 
       digit:=0; 
       if not(Valid(Data(i,YVar(1)))) then  
         begin 
           l$:=str(i,8,2); 
           DisplayMessageBox(MB_OK,'Aborting!', 
             'Empty cell count in case '+l$); 
           Exit; 
         end; 
       fcell:=Data(i,YVar(1)); 
       for j:=count1 downto 1 do { lsb belongs to rightmost variable } 
           begin 
               idx:=XVar(j);  
               bit:=Missing; 
               { Statistica coding for text vars starts with 100 } 
               if Data(i,idx)=textfalse then bit:=0; 
               if Data(i,idx)=texttrue then bit:=1;             
               if Data(i,idx)=numfalse then bit:=0; 
               if Data(i,idx)=numtrue then bit:=1; 
               if not(Valid(bit)) then 
                   begin 
                     n$:=str(Data(i,idx),8,2); 
                     l$:=str(i,8,1); 
                     v$:=VarName(idx); 
                     s$:='Flag ('+n$+ 
                          ') C:'+l$+' V:'+v$; 
                     errcond:=DisplayButtonBox (s$,   
                        'treat as negative|treat as positive'); 
                     if errcond=0 then exit; 
			   if errcond=1 then bit:=0; 
		         if errcond=2 then bit:=1;                     
                   end; 
               cfg:=cfg+(bit*2^digit); 
               digit:=digit+1;    
               colsum(bit+1,j):=colsum(bit+1,j)+fcell; { sum of columns }   
               { bit=0 -> 1st index=1, bit=1 -> 1st index=2 } 
           end; 
           f(cfg+1):=f(cfg+1)+fcell; { n of repetitive entries with identical            
                                       configuration is added }                          
    skip: 
    end; 
 
chisqsum:=0; 
 
{ 2nd pass: calculate expected values and chi squared. 
  Iterates over all possible congigurations 
} 
 
 
nzcount:=0; 
for i:=1 to n_of_cfg do  
   begin 
       digit:=0; 
       expt:=1; 
       cfg:=i-1; 
       num2binarray(cfg,count1,b); 
       for j:=count1 downto 1 do { lsb belongs to the righmost variable } 
          begin 
            bit:=b(j)+1; 
            prod:=colsum(bit,j); 
            expt:=expt*prod; 
          end; 
        expt:=expt/n^(count1-1); 
        fcell:=f(i); 
        nexpected(i):=expt; 
        chisq:=0; 
        chisq:=(fcell-expt)^2/expt; 
        xsq(i):=chisq; 
        config(i):=i; 
        num:=Abs(fcell-expt); 
        if (expt>n-expt) then den:=expt else den:=n-expt; 
        q(i):=num/den; 
        chisqsum:=chisqsum+chisq; 
    end; 
df:=2^count1-count1-1; 
 
if ordering=1 then VectorDualSort(xsq,config,SORT_DESCENDING); { sort by chi squared } 
if ordering=2 then VectorDualSort(f,config,SORT_DESCENDING); { sort by number of cases } 
 
rowstr$:=''; 
colstr$:=''; 
varstr$:=''; 
 
ReDim VarShow(count1,3); 
vh:=NewScrollsheet (count1,3, varshow, 'Variables in config.', ?Zeilennamen$, 'Variable 
name|+|.'); 
ScrollsheetSetColumnWidth (vh, 12, 3); 
ScrollsheetSetColumnName (vh, 1,'variables', '(left to right)'); 
ScrollsheetSetColumnFormat (vh, 2, SCF_INTEGER, 8); 
ScrollsheetSetColumnFormat (vh, 3, SCF_INTEGER, 8); 
 
for i:=1 to count1 do  
   begin 
     varstr$:=varstr$+VarName(XVar(i))+' '; 
     s0$:=GetText (XVar(i), 0); 
     s100$:=GetText (XVar(i), 100); 
     s1$:=GetText (XVar(i), 1); 
     s101$:=GetText (XVar(i), 101); 
     l0:=len(s0$); 
     l100:=len(s100$); 
     l1:=len(s1$); 
     l101:=len(s101$); 
     if ((l1=0) and (l101=0)) then  
       ScrollsheetSetTextValue (vh, i, 2,str(1,1,0)); 
     if ((l0=0) and (l100=0)) then  
       ScrollsheetSetTextValue (vh, i, 3,str(0,1,0)); 
     ScrollsheetSetTextValue (vh, i, 1,VarName(XVar(i))); 
     if (l0>0) then  
       ScrollsheetSetTextValue (vh, i, 3,s0$); 
     if (l100>0) then  
       ScrollsheetSetTextValue (vh, i, 3,s100$); 
     if (l1>0) then  
       ScrollsheetSetTextValue (vh, i, 2,s1$); 
     if (l101>0) then  
       ScrollsheetSetTextValue (vh, i, 2,s101$); 
   end; 
 
growstr$:=''; 
maxe:=0; maxf:=0; 
for i:=1 to n_of_cfg do 
     begin 
     	idx:=config(i); 
      if ordering=1 then checkval:=f(idx); 
      if ordering=2 then checkval:=f(i); 
     	if (checkval>0) then 
		begin 
                  nzcount:=nzcount+1; 
                  growstr$:=growstr$+str(nzcount,3,0)+'|'; 
     			num2binstr(idx-1,count1,s$);                
                  res1(nzcount,1):=val(s$); 
			if ordering =1 then { sort by chi squared } 
			begin 
                     res1(nzcount,2):=f(idx); { x(i,j) } 
  	               res1(nzcount,4):=xsq(i); { chi squared } 
                     res1(nzcount,5):=1-ichi2(xsq(i),1); { p }                  
			end; 
 	            if ordering =2 then { sort by number of cases } 
			begin 
                     res1(nzcount,2):=f(i); { x(i,j) } 
  	               res1(nzcount,4):=xsq(idx); { chi squared } 
	               res1(nzcount,5):=1-ichi2(xsq(idx),1); { p }                     
			end; 
     			res1(nzcount,3):=nexpected(idx); { expected }	 
     			res1(nzcount,6):=q(idx); { Q } 
                  
     		end; 
   end; 
 
plim:=0.05/n_of_cfg; 
if outmode=1 then 
begin 
   colstr$:='config|n|expected|chiČ|p|Q'; 
   h1:=NewScrollsheet (nzcount, 6, res1, varstr$, rowstr$, colstr$); 
   for i:=1 to nzcount do 
	begin 
         if (res1(i,5)<plim) then ScrollsheetSetHilite (h1, i, 5, 1); 
	end 
end; 
if outmode=2 then 
begin 
   colstr$:='config|n|expected|chiČ'; 
   h1:=NewScrollsheet (nzcount, 4, res1, varstr$, rowstr$, colstr$); 
end; 
 
colwidth:=count1+6; 
ScrollsheetSetColumnWidth (h1, colwidth, -1); 
ScrollsheetSetColumnFormat (h1,2, SCF_INTEGER, 8); 
ScrollsheetSetTitle (h1, 1, 'KFA (analysis of configuration frequency'); 
for i:=1 to nzcount do 
	begin 
	  if res1(i,2)>res1(i,3) then ScrollsheetSetHilite (h1, i, 2, 1); 
	end; 
 
nzcnt:=0; 
for i:=1 to n_of_cfg do 
     begin 
     	idx:=config(i); 
	idx:=config(i); 
      if ordering=1 then checkval:=f(idx); 
      if ordering=2 then checkval:=f(i); 
     	if (checkval>0) then 
		begin 
                  nzcnt:=nzcnt+1;                  
     			num2binstr(idx-1,count1,s$);                        
                  ScrollsheetSetTextValue (h1, nzcnt, 1, s$);   
 
            end; 
   end; 
 
res2(1):=chisqsum; 
res2(2):=df; 
res2(3):=1-ichi2(chisqsum,df);  
res2(4):=plim; 
res2(5):=n_of_cfg; 
res2(6):=n; 
rowstr$:='Total chiČ|Degrees of freedom|p|p limit for configs (0.05)|Number of possible 
configurations|Sum of observations'; 
h2:=NewScrollsheet(6,1, res2, 'Overall statistics', rowstr$, 'Parameters'); 
ScrollsheetSetColumnWidth (h2, 12, -1);
Back to List of Programs



[StatSoft]
2300 East 14th Street, Tulsa, OK 74104
Phone: (918) 749-1119; Fax: (918) 749-2217

[StatSoft]e-mail: info@statsoft.com

©Copyright StatSoft, Inc., 1984-2004.
StatSoft, StatSoft logo, STATISTICA, SEWSS, SEDAS, Data Miner, SEPATH and GTrees are trademarks of StatSoft, Inc.