***************************************************************************;
* CONTENT OF THIS FILE:
* 1. AUTHOR-REFERENCE
* 2. HOW TO USE THE SAS-MACRO IDARMA
* 3. HOW IDARMA WORKS
* 4. WHAT KIND OF OUTPUT IS PRODUCED THROUGH IDARMA
* 5. RESTRICTION AND ASSUMPTION OF IDARMA
* 6. SUB MACRO-CALLS
* 7. SAS-MACRO: IDARMA
*************************************************************************;
* 1. AUTHOR-REFERENCE
* Dr. Joerg Michael Mueller
* Universiy Tuebingen, Germany
* Version 1 Date: 30.5.1996
* Version 2 Date: 29.7.2002 Improved: International Version
*    addresse:
*
*    Privat:                 University:
*    Joerg Michael Mueller   Abteilung fr Allgemeine und Angewandte Psychologie
*    Hechingerstrasse 21     Psychological Institute
*    72072 Tuebingen         Friedrichstrasse 21
*    Tel.: 07070-365 120     72072 Tbingen
*                            Tel.: 07071 - 297 8353
*                            e-mail: jmmueller@uni-tubingen.de
**************************************************************************;
*************************************************************************;
* 2. HOW TO USE THE SAS-MACRO IDARMA
* Note: ONLY FOR SCIENTIFIC PURPOSES -- NO COMMERCIAL DISTRIBUTION -- PLEASE CITE THE AUTHER IF USING THIS SAS-MARCRO
* a. Your data has to be in the following shape:
*    I.    a variable name that contains the repeated measurement points (macro variable is &time)
     II.   all variable (here you have completet freedom in naming the variables)
           are different repeated measured, each repeated measure appears as a new line
           and is identified by the person-variable and the time-variable.
     III.  a variable that contains the person-number (macro variable &person)
               The person-numbers has to be starting by 1 up to the maxium number of persons
     IV.   no other variables has to be in the data set!!!!!!!!!!
* b. You have to define a directory (Libname-statement-reference in the IDARMA-macro-call,
*       where your input- and output-data-files are stored
*       libname lib 'd:\directory';
* c. You have to run THIS file in SAS before you run the idarma-macro-call
* d. The following specification has to be filled out in the idarma-macro-call :

                        %idarma(indata, dataout, person, time, per_num, lib)

 		Where indata is to be replaced by the name of your originally sas-data-file
        Where dataout is to be replaced by the name of you residual sas-data-file
        Where person is to be replaced by the name of you person-identification variable (numerically)
        Where time is to be replaced by the name of you time/repeated measurement variable (numerically)
        Where per_num is to be replaced by the total number of your persons of the input-sas-data-file
        Where lib is your libname-call (see point b. above).

**************************************************************************;
* 3. HOW IDARMA WORKS
  * IDARMA computes the ACF (autocorrelationfunction) and PACF (partialautocorrelationfunction)
  * and stores them permanently in the Ev_dat-file.
  * It compute then the standard error of the correlation and decide, wether white-noise is reached
  * or not;
  * If there are significant spikes in ACF or PACF, IDARMA decided wether to add a
  * AR or MA-Lag compontent
  * It fits the actual ARMA-Model to the data, computes the residual.
  * Then the iteration starts again, until there is white-noice reached by a maximum
  * of 6 component ARMA-modell.

  * Attention: Some file are overwrited within the iteration process;
  * Therefore only the last (and important) file is saved in
  * a file you have named in the IDARMA-macro-call;
  ***********************************************************************;
  * The following is only to understand the naming-convention of the IDARMA-Macro:
  *  Note:  each name in IDARMA contains a person-indice and the variable-indice.
  *         In same macro-variables the iteration-indice is also added to the final sas-variable-name
  *         Different kind of files gets distinctive beginning-letters.
  * How a file-name is composed in the program:
  *         First a letter that indicates the kind of file, e.g.:  C     ;
  *         second the person indice                            : &vplf  ;
  *                     third the variable indice                                                       : &varlf ;
  *         third a seperator                                   : ._     ;
  *         fourth a iteration-indice                           : &itrat;
  * That lead to a four-section-name:
  *         a not-replaced name                                                                 : C&vplf._&varlf._&itrat;
  *                a typical replace name                       : C1_1_1  ;
  *
  *  IDARMA generates the following new datasets:
  *  1.Data-Files: Starting with a                                                      : D (for Data);
  *         Content/Function: data
  *
  *  2.CheckFiles: Starting with a                                                      : C (for Checking;
  *         Content/Function: for model-identifiction;
  *
  *  3.Lag-Variable: Starting with a                                            : L (for Lag);
  *         Content/Function: defining the Lag of AR oder MA- component;
  *
  *  4.PQ-Variablen: Starting with a                                            : P (for P or Q);
  *         Content/Function: defining, wether a AR or MA is added next to the model;
  *
  *  5. R-File    : Starting with a                                                     : R (for Residuals);
  *         Content/Function: A pre-whited residual (uncorrelated) data-file
  *                  which contains a variable with the name &varlf.r ('r' added)
  *
  *  6. M-File     : Starting with a                                            : M (for Model);
  *         Content/Function : Contains the estimated ARMA-coefficients.
  *
  *  7. V-File     : Starting with a                                            : V (for Variance);
  *         Content/Function : Contains the residual-varianz with name 'S';
  *
  *  8. S-Variable  : Starting with a                                           : S ;
  *         Content/Function : Variance of residual-values.
  *
  *  9. E-File     : Starting with a                                            : E (for decision about significant AR or MA);
  *         Content/Function : takes in the C-files and
  *                   decides about:
  *                   1. significane (B&varlf._&itrat)
  *                   2. MA or AR -Process
  *                   3. the lag of AR or MA
  *
  * 10. B-Variable  : Starting with a                                           : B;
  *         Content/Function : defining the significane
  *
  * 11. O-Variable  : cOrrelation                  : O;
  * 12. A-Variable  : pArtialcorrelation           : A;
  * 13  T-Variable  : sTdandarderror of  cOrr      : T;
  *
  * The starting values of iteration indices:
  *  vplf=1      (personindice);

  *  varlf=1     (variableindice);
  *  vari=v&varlf(variableindice that build the first main loop);

  *  itrat=1     (iterationindice in ARMA-model-fitting);
        *  itratp=2    (iterationindice+1);
        *  itratm=0    (iterationindice-1);
        *  The two peviews statements are neccessary to avoid a second-SAS-macro-level;

  *  plag mean 'p' for autoregression-component and 'lag '
  *  plag1=0 (maximum of 3 AR-Components);
  *  plag2=0;
  *  plag3=0;
  *  qlag1=0 (maximum 3 MA-Components);
  *  qlag2=0;
  *  qlag3=0;
  **********************************************************************;

**************************************************************************;
* 4. WHAT KIND OF OUTPUT IS PRODUCED THROUGH IDARMA?
* There are five kinds of OUTPUT, where the important ones are all permanently stored in
*    the lib-name-directory:
* a. The residual-data file the have been name from you in the IDARMA-Macro-call, that
*    contains your residual-data of your orignial sas-data-file. This is the most important file if you
*    want to work with no-autorcorrelated data. But pay attention, that the intercept
*    of each variable remains.
* b. The Ev_dat -file: It documents the iteration-processes with all ACF and PACF correlation,
*    as well as the diagnostic of IDARMA, which AR or MA- Term was added in which iteration
*    for each person and item.
* c. The Mod_dat -file: It documents the final ARMA-Model, the values of the ARMA-Terms as
*    well as the status, if the model has converged for each person and item.
* d. The ordinary log-file: This file is anytime produces by SAS and can be enriched with information
*    about the SAS-sources-code with additional statements of SAS through enabling the
*    subprogram %mac_on and %log_on (see 6. SUB MACRO-CALLS).
*    Attention: The log-file contains many time the following message:
*    WARNING: A polynomial list was compressed to eliminate negative or redundant degrees.  Be sure
     to check the model description to ensure that the proper model was employed.
*    YOU CAN IGNORE this statement. It appears because in previous model-stages the ARMA-Model is
*    insufficient to fit the data. So this a temporary message from SAS, that can be ignored.
* e. The ordinary output-file from SAS. It contains no important information. But unfortunatly it
*    cannot be avoided.
**************************************************************************;
* 5. RESTRICTION AND ASSUMPTION OF IDARMA
*       IDARMA assumes, that your data are in shape described above;
*       IDARMA assumes strong stationarity (at least constant variances and mean over time);
*       There are no Missing Data (if you have Missing data, it is easily extrapolated
*       and imputated by PROC EXPAND;
*       IDARMA assumes, that there are not more than three AR-Process- and three
*       Moving-Average-Process-Components involved.
*       ATTENTION: In very seldom cases the ARMA-Model does not converge. This maybe a cause of
*       non-stationarity. In this cases you have to remove those persons and have to
*       model an appropriate model by hand.
*       Additional is to say, that the model are not improved to be parsimony.
*       Because the Program is improved to yield non-correlated data, sometime
*       the fitted model contains more components than neccessary.
*       This is just a hint, if you want to interprete an ARMA model.
**************************************************************************;
* 6. Sub marco-calls:
*  There are four SAS-Macros
*  First: the macro %content2: it put your sas-dataset-variables name in Macro-variables;
*         it is need through the following help programs %rename and %rename2;

                %macro content2(datain, charnum);
                %global vmax obs;
                data d1; set idarma.&datain;
                call symput('obs',left(_n_)); run;
                proc contents data=idarma.&datain noprint out=d2; run;
                %if &charnum=1 %then %do;
                data d3; set d2; where type=1;
                %end;
                %if &charnum=2 %then %do;
                data d3; set d2; where type=2;
                %end;
                %if &charnum=3 %then %do;
                data d3; set d2;
                %end;
                call symput('vmax',left(_n_)); run;

                %if &charnum=1 %then %do;
                data d3; set d2; where type=1;
                %end;
                %if &charnum=2 %then %do;
                data d3; set d2; where type=2;
                %end;
                %if &charnum=3 %then %do;
                data d3; set d2;
                %end;

                %do i= 1 %to &vmax;
                %global v&i l&i t&i f&i;
                %end;

                call symput('v'||left(_n_), name);
                call symput('l'||left(_n_), label);
                call symput('t'||left(_n_), type);
                call symput('f'||left(_n_), format);

                proc datasets lib=work kill; run cancel;
                %mend content2;

*  Second: the macro %rename and %rename2: it changes the variable name in v1 to v...n;
*           At the beginning of working of IDARMA the variable names are changes;
*           At the end of working of IDARMA the variables name are changed back again;

                %macro rename(indata, outdata, abbriv);
                                data idarma.&outdata; set idarma.&indata;
                %do i=1 %to &vmax;
                rename &&v&i=&abbriv.&i;
                %end;
                run;
                %mend rename;

                %macro rename2(lib, indata, outdata, abbriv);
                data &lib..&outdata; set &lib..&indata;
                %do i=1 %to &vmax;
                rename &abbriv.&i.r=&&v&i;
                %end;
                run;
                %mend rename2;
* Third: macro %means: it estimated the mean of each variable to put them at the end
* of model fitting and residual computing back to the variable:
%macro means (indata);
                %macro means (indata);
                %do j=1 %to &vmax;
                %global m&j s&j n&j e&j u&j o&j;
                %end;
                proc means data=idarma.&indata noprint;
                        var
                        %do i=1 %to &vmax;
                        &&v&i
                        %end;
                        ;
                output out=o1
                        mean=
                        %do i=1 %to &vmax;
                        m&i
                        %end;
                        std=
                        %do i=1 %to &vmax;
                        s&i
                        %end;
                        n=
                        %do i=1 %to &vmax;
                        n&i
                        %end;
                        stderr=
                        %do i=1 %to &vmax;
                        e&i
                        %end;
                        min=
                        %do i=1 %to &vmax;
                        u&i
                        %end;
                        max=
                        %do i=1 %to &vmax;
                        o&i
                        %end;
                        ;
                data _null_; set o1;
                %do i=1 %to &vmax;
                call symput("m&i",m&i);
                call symput("e&i",e&i);
                call symput("n&i",n&i);
                call symput("s&i",s&i);
                call symput("o&i",o&i);
                call symput("u&i",u&i);
                %end;
                run;
                %mend means;

*  Fourth: macro %mac_on %mac_off %log_on %log_off that let you control the amount of your log-output
        * If you want to follow the working of IDARMA, you should set the documenting option 'on'
        * by submitting %mac_on and %log_on
        * if you want a faster working of IDARMA you should set the documenting option 'off'
        * by using %mac_off and %log_off;

                %macro log_on;
                options source notes date;
                %mend log_on;
                %macro log_off;
                options nosource nonotes nodate;
                %mend log_off;
                %macro mac_on;
                options mprint merror mlogic serror symbolgen;
                %mend mac_on;
                %macro mac_off;
                options  nomlogic nomprint nomerror nosymbolgen nomrecall;
                %mend mac_off;

**************************************************************************;
* 7. SAS.MACRO-IDARMA
**************************************************************************;

%macro idarma(indata, dataout, person, time, per_num, lib);

* because the proc arima-procedure expected no missing data, they are now
* imputated by a simple polynome of 3. grad;
/*proc sort data=&lib..&indata; by &person; run;
proc expand data=&lib..&indata out=&lib..&indata; id &time; by &person; run;*/

* the first big loop over persons;
%do a=1 %to &per_num;
%let vplf=&a;

%let varlf=0;

* replacing the names into abbrivated names through the %rename;
* the datasets 'prep' get their name from the intended preparation;
        data idarma.prep2; set idarma.&indata(drop=&person &time); run;
        %content2(prep2,3)
        %means(prep2)
        %rename(prep2,prep3,v)
        data idarma.prep4; set idarma.&indata(Keep=&person &time);
        data idarma.prep5; merge idarma.prep4 idarma.prep3;

* the second big loop over variables;
* vmax is the maximum number of items;
%do i=1 %to &vmax;

%let varlf=%eval(&varlf+1);
%let varlfm=%eval(&varlf-1);
%let itrat=1;
%let itratp=%eval(&itrat+1);
%let itratm=%eval(&itrat-1);

%let vari=v&varlf;

%let plag1=0;
%let plag2=0;
%let plag3=0;
%let plag4=0;
%let qlag1=0;
%let qlag2=0;
%let qlag3=0;
%let qlag4=0;

* the preparation is finish and the first identification loop can begin;

data d&vplf._&varlf; set idarma.prep5; if &person=&vplf;
proc sort; by &time;

proc arima data=d&vplf._&varlf;

  ***************************************************;
  * the data c&vplf._&varlf&itra contains the acf and pacf;
  ***************************************************;
identify var=v&varlf outcov=c&vplf._&varlf._&itrat noprint;

  ***************************************************;
  * model (0,0)
  * Maybe it makes no sense to predict values from an ARMA-Model without components;
  * it defines the starting point for a later describtion of iteration process history;
  ***************************************************;
estimate p=(0)  q=(0)
 outmodel=m&vplf._&varlf._&itrat noprint;
forecast out=r&vplf._&varlf._&itrat noprint;

  ******************************************************;
  * takes in the ACF and PACF and add the lag-variable for a later decision about p or q -component;
  ******************************************************;
data e&vplf._&varlf._&itrat; set c&vplf._&varlf._&itrat;
l&varlf._&itrat=lag;

*****************************************************************;
  * Values of ACF and PACF are now transformed to absolute values;
  * signi=1 if a spike in acf is greater than two standard errors;
  * signi=1 if a spike in pacf is greater than a fixed value of r=.20;
  * Adding a new variable name: B&varlf._&itrat;
  *****************************************************************;
abscorr=abs(corr);
abspartc=abs(partcorr);
if abscorr>2*stderr then B&varlf._&itrat=1; else B&varlf._&itrat=2;
if abspartc> .2  then B&varlf._&itrat=1;
rename corr=O&varlf._&itrat;
rename partcorr=A&varlf._&itrat;
rename stderr=T&varlf._&itrat;

*******************************************************************;
* if B&varlf._&itrat=1 than 'go_on'=1=yes!;
*******************************************************************;
if B&varlf._&itrat=1 then call symput('go_on',1);
if B&varlf._&itrat=1 and l&varlf._&itrat=1 then call symput('between',1);
run;

        %macro combine;
          * the produced datasets are now prepared for the next loop;
          %if &varlf=1 %then %do;
           data r&vplf._&varlf; set r&vplf._&varlf._&itrat;
                                         drop v&varlf forecast std l95 u95;
           rename residual=v&varlf.r;

           data m&vplf._&varlf; set m&vplf._&varlf._&itrat;
           data e&vplf._&varlf; set e&vplf._&varlf._&itrat;
          %end;

          * there are actually as many datasets as variables;
          * they are merged (combined) now;
          * there are even as many datasets as persons;
          * they are merged later at the end of the program;
          %if &varlf>1 %then %do;
           data r&vplf._&varlf; merge r&vplf._&varlf._&itrat r&vplf._&varlfm;
                                         drop v&varlf forecast std l95 u95;
           rename residual=v&varlf.r;

           * The model-dataset of the last and the actual loop are combined;
           data m&vplf._&varlf; set m&vplf._&varlf._&itrat m&vplf._&varlfm;
           if _parm_='MA' or _parm_='AR';

           data e&vplf._&varlf; merge e&vplf._&varlf._&itrat e&vplf._&varlfm;
                                drop l&varlf._&itrat pq&varlf._&itrat T&varlf._&itrat;
          %end;

         * this is for the last iteration loop, were the data a stored permanentely;
         %if &varlf=&vmax %then %do;
          data &lib..rv&vplf; set r&vplf._&varlf;
         * in this step the intercept of each variable is added back to the residual data;
         * to understand the &&m&m macro variable: it is the mean from the means-macro for
         * each variable;
          %do m=1 %to &vmax;
          rv&m=rv&m + &&m&m;
          %end;

          data &lib..mv&vplf; set m&vplf._&varlf;
          data &lib..ev&vplf; set e&vplf._&varlf;
         %end;
        %mend combine;

* the next macro-section decide, wether in the first iteration loop an AR1 or MA1
* process compontent should be added the model;
        %macro defin_pq;
        %if &between=1  %then %do;

          ***************************************************;
          * Modell (1,0)
          ***************************************************;
        %let itrat=%eval(&itrat+1);
        %let itratm=%eval(&itratm+1);
        %let itratp=%eval(&itratp+1);

        proc arima data=d&vplf._&varlf;
        identify var=v&varlf noprint;
        estimate p=(1)  q=(0) noprint;
        forecast out=r&vplf._&varlf._&itrat noprint;

        * the amount of residual-variance is averaged and compared with the AR(0)MA(1) Model;
        proc means noprint data=r&vplf._&varlf._&itrat; var v&varlf;
        output out=v&vplf._&varlf._&itrat std=s&vplf._&varlf._&itrat;

          ***************************************************;
          * Modell (0,1)
          ***************************************************;
        %let itrat=%eval(&itrat+1);
        %let itratm=%eval(&itratm+1);
        %let itratp=%eval(&itratp+1);

        proc arima data=d&vplf._&varlf;
        identify var=v&varlf noprint;
        estimate p=(0)  q=(1) noprint;
        forecast out=r&vplf._&varlf._&itrat noprint;

    * the amount of residual-variance is averaged and compared with the AR(1)MA(0) Model;
        proc means noprint data=r&vplf._&varlf._&itrat; var v&varlf;
        output out=v&vplf._&varlf._&itrat std=s&vplf._&varlf._&itrat;

        * Now both residual averages are compared;
        data z&vplf._&varlf._&itrat; merge v&vplf._&varlf._&itratm v&vplf._&varlf._&itrat;

     if s&vplf._&varlf._&itratm >= s&vplf._&varlf._&itrat
            then call symput('qlag1',1);
         if s&vplf._&varlf._&itratm < s&vplf._&varlf._&itrat
            then call symput('plag1',1);
            run;
        %let itrat=0;

        %end;

        %mend defin_pq;

%defin_pq


**********************************************************************;
**********************************************************************;
* Now the result-driven automated iteration to fit a ARMA-Model starts;
**********************************************************************;
**********************************************************************;

%macro arma;

%let itrat=1;
%let itratm=0;
%let itratp=2;

%let sig=1;

%if  &go_on=1 %then %do;

%do %while(&sig=1 and &itrat<6);

%let sig=0;

  ********************************************************************;
  * The iteration values go up one step higher;
  ********************************************************************;
%let itrat=%eval(&itrat+1);
%let itratm=%eval(&itratm+1);
%let itratp=%eval(&itratp+1);

  ***************************************************;
  *  Now the raw scores is fitted through the actual ARMA-Model;
  ***************************************************;
proc arima data=d&vplf._&varlf;
identify var=v&varlf noprint;

  ***************************************************;
  * the parameter from acf and pacf are now inserted;
  ***************************************************;
estimate p=(&plag1,&plag2,&plag3,&plag4)  q=(&qlag1,&qlag2,&qlag3,&qlag4)
         outmodel=m&vplf._&varlf._&itrat noprint;

  ***************************************************;
  * the residual dataset is now stored with the prefix r;
  ***************************************************;
forecast out=r&vplf._&varlf._&itrat noprint;

  ***************************************************;
  * Now check of white-noise through analysing the residual values;
  ***************************************************;
proc arima data=r&vplf._&varlf._&itrat;
identify var=residual outcov=c&vplf._&varlf._&itrat noprint;

data e&vplf._&varlf._&itrat; merge c&vplf._&varlf._&itrat e&vplf._&varlf._&itratm;
drop var T&varlf._&itratm l&varlf._&itratm cov T&varlf._&itratm invcorr abscorr abspartc;
by lag;
if lag=0 then delete;

abscorr=abs(corr);
abspartc=abs(partcorr);
if abscorr>2*stderr then B&varlf._&itrat=1; else B&varlf._&itrat=2;
if abspartc> .2 then B&varlf._&itrat=1;

  **************************************************;
  * Decision about adding one more component over the variable: pq&varlf._&itrat;
  * with  the values MA=q oder AR=p
  **************************************************;
if abscorr > abspartc then pq&varlf._&itrat='q'; else pq&varlf._&itrat='p';

l&varlf._&itrat=lag;
rename corr=O&varlf._&itrat;
rename partcorr=A&varlf._&itrat;
rename stderr=T&varlf._&itrat;

  ******************************************************************;
  * defining the lag of the AR or MA component;
  ******************************************************************;
if B&varlf._&itrat=1 and pq&varlf._&itrat='p'
  then call symput('plag'||left(&itrat),l&varlf._&itrat);
if B&varlf._&itrat=1 and pq&varlf._&itrat='q'
  then call symput('qlag'||left(&itrat),l&varlf._&itrat);

if B&varlf._&itrat=1 then call symput('sig',1);
run;

%end;

%end;

%mend arma;

  %arma

  %combine

%end;
%end;

* now combining the person residualscores in one residual dataset;
%do i=1 %to &per_num;
data &lib..rv&i; set &lib..rv&i; p=&i; run;
data &lib..mv&i; set &lib..mv&i; p=&i; run;
data &lib..ev&i; set &lib..ev&i; p=&i; run;
%end;

data &lib..res_dat; set
%do i=1 %to &per_num;
&lib..rv&i
%end;
;
run;

data &lib..mod_dat; set
%do i=1 %to &per_num;
&lib..mv&i
%end;
;
run;

data &lib..ev_dat; set
%do i=1 %to &per_num;
&lib..ev&i
%end;
;
run;

%rename2(&lib, res_dat, &dataout, v)

%mend idarma;

run;
