function [rew,raw]=mcdcov(x,varargin) %MCDCOV computes the MCD estimator of a multivariate data set. This % estimator is given by the subset of h observations with smallest covariance % determinant. The MCD location estimate is then the mean of those h points, % and the MCD scatter estimate is their covariance matrix. The default value % of h is roughly 0.75n (where n is the total number of observations), but the % user may choose each value between n/2 and n. Based on the raw estimates, % weights are assigned to the observations such that outliers get zero weight. % The reweighted MCD estimator is then given by the mean and covariance matrix % of the cases with non-zero weight. To compute the MCD estimator, % the FASTMCD algorithm is used. % % The MCD method is intended for continuous variables, and assumes that % the number of observations n is at least 5 times the number of variables p. % If p is too large relative to n, it would be better to first reduce % p by variable selection or robust principal components (see the functions % robpca.m and rapca.m). % % The MCD method was introduced in: % % Rousseeuw, P.J. (1984), "Least Median of Squares Regression," % Journal of the American Statistical Association, Vol. 79, pp. 871-881. % % The MCD is a robust method in the sense that the estimates are not unduly % influenced by outliers in the data, even if there are many outliers. % Due to the MCD's robustness, we can detect outliers by their large % robust distances. The latter are defined like the usual Mahalanobis % distance, but based on the MCD location estimate and scatter matrix % (instead of the nonrobust sample mean and covariance matrix). % % The FASTMCD algorithm uses several time-saving techniques which % make it available as a routine tool to analyze data sets with large n, % and to detect deviating substructures in them. A full description of the % algorithm can be found in: % % Rousseeuw, P.J. and Van Driessen, K. (1999), "A Fast Algorithm for the % Minimum Covariance Determinant Estimator," Technometrics, 41, pp. 212-223. % % An important feature of the FASTMCD algorithm is that it allows for exact % fit situations, i.e. when more than h observations lie on a (hyper)plane. % Then the program still yields the MCD location and scatter matrix, the latter % being singular (as it should be), as well as the equation of the hyperplane. % % % Required input argument: % x : a vector or matrix whose columns represent variables, and rows represent observations. % Missing values (NaN's) and infinite values (Inf's) are allowed, since observations (rows) % with missing or infinite values will automatically be excluded from the computations. % % Optional input arguments: % cor : If non-zero, the robust correlation matrix will be % returned. The default value is 0. % h : The quantile of observations whose covariance determinant will % be minimized. Any value between n/2 and n may be specified. % The default value is 0.75*n. % alpha : (1-alpha) measures the fraction of outliers the algorithm should % resist. Any value between 0.5 and 1 may be specified. (default = 0.75) % ntrial : The number of random trial subsamples that are drawn for % large datasets. The default is 500. % plots : If equal to one, a menu is shown which allows to draw several plots, % such as a distance-distance plot. (default) % If 'plots' is equal to zero, all plots are suppressed. % See also makeplot.m % classic : If equal to one, the classical mean and covariance matrix are computed as well. % (default = 0) % % Input arguments for advanced users: % Hsets : Instead of random trial h-subsets (default, Hsets = []), Hsets makes it possible to give certain % h-subsets as input. Hsets is a matrix that contains the indices of the observations of one % h-subset as a row. % factor : If not equal to 0 (default), the consistency factor is adapted. Only usefull in case of the % kmax approach. % % I/O: result=mcdcov(x,'alpha',0.75,'h',h,'ntrial',500) % If only one output argument is listed, only the final result ('result') % is returned. % The user should only give the input arguments that have to change their default value. % The name of the input arguments needs to be followed by their value. % The order of the input arguments is of no importance. % % Examples: [rew,raw]=mcdcov(x); % result=mcdcov(x,'h',20,'plots',0); % [rew,raw]=mcdcov(x,'alpha',0.8,'cor',0) % % The output structure 'raw' contains intermediate results, with the following % fields : % % raw.center : The raw MCD location of the data. % raw.cov : The raw MCD covariance matrix (multiplied by a consistency factor). % raw.cor : The raw MCD correlation matrix, if input argument 'cor' was non-zero. % raw.wt : Weights based on the estimated raw covariance matrix 'raw.cov' and % the estimated raw location 'raw.center' of the data. These weights determine % which observations are used to compute the final MCD estimates. % raw.objective : The determinant of the raw MCD covariance matrix. % % The output structure 'rew' contains the final results, namely: % % rew.center : The robust location of the data, obtained after reweighting, if % the raw MCD is not singular. Otherwise the raw MCD center is % given here. % rew.cov : The robust covariance matrix, obtained after reweighting, if the raw MCD % is not singular. Otherwise the raw MCD covariance matrix is given here. % rew.cor : The robust correlation matrix, obtained after reweighting, if % options.cor was non-zero. % rew.h : The number of observations that have determined the MCD estimator, % i.e. the value of h. % rew. Hsubsets : A structure that contains Hopt and Hfreq: % Hopt : The subset of h points whose covariance matrix has minimal determinant, % ordered following increasing robust distances. % Hfreq : The subset of h points which are the most frequently selected during the whole % algorithm. % rew.alpha : (1-alpha) measures the fraction of outliers the algorithm should % resist. % rew.md : The distance of each observation from the classical % center of the data, relative to the classical shape % of the data. Often, outlying points fail to have a % large Mahalanobis distance because of the masking % effect. % rew.rd : The distance of each observation to the final, % reweighted MCD center of the data, relative to the % reweighted MCD scatter of the data. These distances allow % us to easily identify the outliers. If the reweighted MCD % is singular, raw.rd is given here. % rew.cutoff : Cutoff values for the robust and mahalanobis distances % rew.flag : Flags based on the reweighted covariance matrix and the % reweighted location of the data. These flags determine which % observations can be considered as outliers. If the reweighted % MCD is singular, raw.wt is given here. % rew.method : A character string containing information about the method and % about singular subsamples (if any). % rew.plane : In case of an exact fit, rew.plane contains the coefficients % of a (hyper)plane a_1(x_i1-m_1)+...+a_p(x_ip-m_p)=0 % containing at least h observations, where (m_1,...,m_p) % is the MCD location of these observations. % rew.classic : If the input argument 'classic' is equal to one, this structure % contains results of the classical analysis: center (sample mean), % cov (sample covariance matrix), md (Mahalanobis distances), class ('COV'). % rew.class : 'MCDCOV' % rew.X : If x is bivariate, same as the x in the call to mcdcov, % without rows containing missing or infinite values. % % This function is part of LIBRA: the Matlab Library for Robust Analysis, % available at: % http://wis.kuleuven.be/stat/robust.html % % Written by Katrien Van Driessen and Bjorn Rombouts % Revisions by Sanne Engelen, Sabine Verboven % Last Update: 09/04/2004, 01/08/2007 % The FASTMCD algorithm works as follows: % % The dataset contains n cases and p variables. % When n < 2*nmini (see below), the algorithm analyzes the dataset as a whole. % When n >= 2*nmini (see below), the algorithm uses several subdatasets. % % When the dataset is analyzed as a whole, a trial subsample of p+1 cases % is taken, of which the mean and covariance matrix are calculated. % The h cases with smallest relative distances are used to calculate % the next mean and covariance matrix, and this cycle is repeated csteps1 % times. For small n we consider all subsets of p+1 out of n, otherwise % the algorithm draws 500 random subsets by default. % Afterwards, the 10 best solutions (means and corresponding covariance % matrices) are used as starting values for the final iterations. % These iterations stop when two subsequent determinants become equal. % (At most csteps3 iteration steps are taken.) The solution with smallest % determinant is retained. % % When the dataset contains more than 2*nmini cases, the algorithm does part % of the calculations on (at most) maxgroup nonoverlapping subdatasets, of % (roughly) maxobs cases. % % Stage 1: For each trial subsample in each subdataset, csteps1 (see below) iterations are % carried out in that subdataset. For each subdataset, the 10 best solutions are % stored. % % Stage 2 considers the union of the subdatasets, called the merged set. % (If n is large, the merged set is a proper subset of the entire dataset.) % In this merged set, each of the 'best solutions' of stage 1 are used as starting % values for csteps2 (sse below) iterations. Also here, the 10 best solutions are stored. % % Stage 3 depends on n, the total number of cases in the dataset. % If n <= 5000, all 10 preliminary solutions are iterated. % If n > 5000, only the best preliminary solution is iterated. % The number of iterations decreases to 1 according to n*p (If n*p <= 100,000 we % iterate csteps3 (sse below) times, whereas for n*p > 1,000,000 we take only one iteration step). % if rem(nargin-1,2)~=0 error('The number of input arguments should be odd!'); end % Assigning some input parameters data = x; raw.cor = []; rew.cor = []; rew.plane = []; % The maximum value for n (= number of observations) is: % nmax=50000; nmax=realmax; % The maximum value for p (= number of variables) is: pmax=50; % To change the number of subdatasets and their size, the values of % maxgroup and nmini can be changed. maxgroup=5; nmini=300; % The number of iteration steps in stages 1,2 and 3 can be changed % by adapting the parameters csteps1, csteps2, and csteps3. csteps1=2; csteps2=2; csteps3=100; % dtrial : number of subsamples if not all (p+1)-subsets will be considered. dtrial=500; if size(data,1)==1 data=data'; end % Observations with missing or infinite values are ommitted. ok=all(isfinite(data),2); data=data(ok,:); xx=data; [n,p]=size(data); % Some checks are now performed. if n==0 error('All observations have missing or infinite values.') end if n > nmax error(['The program allows for at most ' int2str(nmax) ' observations.']) end if p > pmax error(['The program allows for at most ' int2str(pmax) ' variables.']) end if n < p error('Need at least (number of variables) observations.') end %internal variables hmin=quanf(0.5,n,p); %Assiging default values h=quanf(0.75,n,p); default=struct('alpha',0.75,'h',h,'plots',1,'ntrial',dtrial,'cor',0,'seed',0,'classic',0,'Hsets',[],'factor',0); list=fieldnames(default); options=default; IN=length(list); i=1; counter=1; %Reading optional inputarguments if nargin>2 % % placing inputfields in array of strings % for j=1:nargin-1 if rem(j,2)~=0 chklist{i}=varargin{j}; i=i+1; end end dummy=sum(strcmp(chklist,'h')+2*strcmp(chklist,'alpha')); switch dummy case 0 % defaultvalues should be taken alfa=options.alpha; h=options.h; case 3 error('Both input arguments alpha and h are provided. Only one is required.') end % % Checking which default parameters have to be changed % and keep them in the structure 'options'. % while counter<=IN index=strmatch(list(counter,:),chklist,'exact'); if ~isempty(index) % in case of similarity for j=1:nargin-2 % searching the index of the accompanying field if rem(j,2)~=0 % fieldnames are placed on odd index if strcmp(chklist{index},varargin{j}) I=j; end end end options=setfield(options,chklist{index},varargin{I+1}); index=[]; end counter=counter+1; end if dummy==1% checking inputvariable h % hmin is the minimum number of observations whose covariance determinant % will be minimized. if isempty(options.Hsets) if options.h < hmin disp(['Warning: The MCD must cover at least ' int2str(hmin) ' observations.']) disp(['The value of h is set equal to ' int2str(hmin)]) options.h = hmin; elseif options.h > n error('h is greater than the number of non-missings and non-infinites.') elseif options.h < p error(['h should be larger than the dimension ' int2str(p) '.']) end end options.alpha=options.h/n; elseif dummy==2 if options.alpha < 0.5 options.alpha=0.5; mess=sprintf(['Attention (mcdcov.m): Alpha should be larger than 0.5. \n',... 'It is set to 0.5.']); disp(mess) end if options.alpha > 1 options.alpha=0.75; mess=sprintf(['Attention (mcdcov.m): Alpha should be smaller than 1.\n',... 'It is set to 0.75.']); disp(mess) end options.h=quanf(options.alpha,n,p); end end h=options.h; %number of regular datapoints on which estimates are based. h=[alpha*n] plots=options.plots; %relevant plots are plotted alfa=options.alpha; %percentage of regular observations ntrial=options.ntrial; %number of subsets to be taken in the first step cor=options.cor; %correlation matrix seed=options.seed; %seed of the random generator cutoff.rd=sqrt(chi2inv(0.975,p)); %cutoff value for the robust distance cutoff.md=cutoff.rd; %cutoff value for the mahalanobis distance Hsets = options.Hsets; if ~isempty(Hsets) Hsets_ind = 1; else Hsets_ind = 0; end factor = options.factor; if factor == 0 factor_ind = 0; else factor_ind = 1; end % Some initializations. rew.flag=repmat(NaN,1,length(ok)); raw.wt=repmat(NaN,1,length(ok)); raw.rd=repmat(NaN,1,length(ok)); rew.rd=repmat(NaN,1,length(ok)); rew.mahalanobis=repmat(NaN,1,length(ok)); rew.method=sprintf('\nMinimum Covariance Determinant Estimator.'); correl=NaN; % z : if at least h observations lie on a hyperplane, then z contains the % coefficients of that plane. % weights : weights of the observations that are not excluded from the computations. % These are the observations that don't contain missing or infinite values. % bestobj : best objective value found. z(1:p)=0; weights=zeros(1,n); bestobj=inf; % The classical estimates are computed. [Pcl,Tcl,Lcl,rcl,centerXcl,cXcl] = classSVD(data); clasmean=cXcl; clascov=Pcl*diag(Lcl)*Pcl'; if p < 5 eps=1e-12; elseif p <= 8 eps=1e-14; else eps=1e-16; end % The standardization of the data will now be performed. med=median(data); mad=sort(abs(data-repmat(med,n,1))); mad=mad(h,:); ii=min(find(mad < eps)); if length(ii) % The h-th order statistic is zero for the ii-th variable. The array plane contains % all the observations which have the same value for the ii-th variable. plane=find(abs(data(:,ii)-med(ii)) < eps)'; meanplane=mean(data(plane,:)); weights(plane)=1; if p==1 rew.flag=weights; raw.wt=weights; [raw.center,rew.center]=deal(meanplane); [raw.cov,rew.cov,raw.objective]=deal(0); if plots rew.method=sprintf('\nUnivariate location and scale estimation.'); rew.method=strvcat(rew.method,sprintf('%g of the %g observations are identical.',length(plane),n)); end else z(ii)=1; rew.plane=z; covplane=cov(data(plane,:)); [raw.center,raw.cov,rew.center,rew.cov,raw.objective,raw.wt,rew.flag,... rew.method]=displ(3,length(plane),weights,n,p,meanplane,covplane,rew.method,z,ok,... raw.wt,rew.flag,0,NaN,h,ii); end rew.Hsubsets.Hopt = plane; rew.Hsubsets.Hfreq = plane; %classical analysis? if options.classic==1 classic.cov=clascov; classic.center=clasmean; classic.class='COV'; else classic=0; end %assigning the output rewo=rew;rawo=raw; rew=struct('center',{rewo.center},'cov',{rewo.cov},'cor',{cor},'h',{h},'Hsubsets',{rewo.Hsubsets},'alpha',{alfa},... 'flag',{rewo.flag},'plane', {rewo.plane},'method',{rewo.method},'class',{'MCDCOV'},'classic',{classic},'X',{xx}); raw=struct('center',{rawo.center},'cov',{rawo.cov},'cor',{rawo.cor},'objective',{rawo.objective},... 'wt',{rawo.wt},'class',{'MCDCOV'},'classic',{classic},'X',{x}); if size(data,2)~=2 rew=rmfield(rew,'X'); raw=rmfield(raw,'X'); end return end data=(data-repmat(med,n,1))./repmat(mad,n,1); % The standardized classical estimates are now computed. clmean=mean(data); clcov=cov(data); % The univariate non-classical case is now handled. if p==1 & h~=n [rew.center, rewsca, weights,raw.center,raw.cov,rawdist,raw.wt,Hopt]=unimcd(data,h); rew.Hsubsets.Hopt = Hopt'; rew.Hsubsets.Hfreq = Hopt'; raw.rd=sqrt(rawdist'); rew.cov=rewsca^2; raw.objective=raw.cov*prod(mad)^2; mah=(data-rew.center).^2/rew.cov; rew.rd=sqrt(mah'); rew.flag= rew.rd <= cutoff.rd; [raw.cov,raw.center]=trafo(raw.cov,raw.center,med,mad,p); [rew.cov,rew.center]=trafo(rew.cov,rew.center,med,mad,p); rew.mahalanobis=abs(data'-clmean)/sqrt(clcov); spec.ask=1; %classical analysis? if options.classic==1 classic.cov=clascov; classic.center=clasmean; classic.md=rew.mahalanobis; classic.class='COV'; else classic=0; end %assigning the output rewo=rew;rawo=raw; rew=struct('center',{rewo.center},'cov',{rewo.cov},'cor',{rewo.cor},'h',{h},'Hsubsets',{rewo.Hsubsets},... 'alpha',{alfa},'rd',{rewo.rd},'cutoff',{cutoff},'flag',{rewo.flag}, 'plane',{[]},'method',{rewo.method},... 'class',{'MCDCOV'},'md',{rewo.mahalanobis},'classic',{classic},'X',{xx}); raw=struct('center',{rawo.center},'cov',{rawo.cov},'cor',{rawo.cor},'objective',{rawo.objective},... 'rd',{rawo.rd},'cutoff',{cutoff},'wt',{rawo.wt}, 'class',{'MCDCOV'},'classic',{classic},'X',{x}); if size(data,2)~=2 rew=rmfield(rew,'X'); raw=rmfield(raw,'X'); end try if plots & options.classic makeplot(rew,'classic',1) elseif plots makeplot(rew) end catch %output must be given even if plots are interrupted %> delete(gcf) to get rid of the menu return end return end %exact fit situation if rcl < p % all observations lie on a hyperplane. z = Pcl(:,1); rew.plane=z; weights(1:n)=1; if cor correl=clcov./(sqrt(diag(clcov))*sqrt(diag(clcov))'); end [clcov,clmean]=trafo(clcov,clmean,med,mad,p); [raw.center,raw.cov,rew.center,rew.cov,raw.objective,raw.wt,rew.flag,... rew.method]=displ(1,n,weights,n,p,clmean,clcov,rew.method,z./mad',ok,... raw.wt,rew.flag,cor,correl); if cor [rew.cor,raw.cor]=deal(correl); end %classical analysis? if options.classic==1 classic.cov=clascov; classic.center=clasmean; classic.class='COV'; else classic=0; end rew.Hsubsets.Hopt=1:n; rew.Hsubsets.Hfreq=1:n; %assigning the output rewo=rew;rawo=raw; rew=struct('center',{rewo.center},'cov',{rewo.cov},'cor',{rewo.cor},'h',{h},'Hsubsets',{rewo.Hsubsets},'alpha',{alfa},... 'rd',{rewo.rd},'cutoff',{cutoff},'flag',{rewo.flag},'plane',{rewo.plane},'method',{rewo.method},... 'class',{'MCDCOV'},'classic',{classic},'X',{xx}); raw=struct('center',{rawo.center},'cov',{rawo.cov},'cor',{rawo.cor},'objective',{rawo.objective},... 'cutoff',{cutoff},'wt',{rawo.wt}, 'class',{'MCDCOV'},'classic',{classic},'X',{x}); if size(data,2)~=2 rew=rmfield(rew,'X'); raw=rmfield(raw,'X'); end return end % The classical case is now handled. if h==n if plots msg=sprintf('The MCD estimates based on %g observations are equal to the classical estimates.\n',h); rew.method=strvcat(rew.method,msg); end raw.center=clmean; raw.cov=clcov; raw.objective=det(clcov); mah=libra_mahalanobis(data,clmean,'cov',clcov); rew.mahalanobis=sqrt(mah); raw.rd=rew.mahalanobis; weights= mah <= cutoff.rd^2; raw.wt=weights; [rew.center,rew.cov]=weightmecov(data,weights); if cor raw.cor=raw.cov./(sqrt(diag(raw.cov))*sqrt(diag(raw.cov))'); rew.cor=rew.cov./(sqrt(diag(rew.cov))*sqrt(diag(rew.cov))'); else raw.cor=0; rew.cor=0; end if det(rew.cov) < exp(-50*p) [center,covar,z,correl,plane,count]=fit(data,NaN,med,mad,p,z,cor,rew.center,rew.cov,n); rew.plane=z; if cor correl=covar./(sqrt(diag(cov))*sqrt(diag(covar))'); end rew.method=displrw(count,n,p,center,covar,rew.method,z,cor,correl); [raw.cov,raw.center]=trafo(raw.cov,raw.center,med,mad,p); [rew.cov,rew.center]=trafo(rew.cov,rew.center,med,mad,p); rew.rd=raw.rd; else mah=libra_mahalanobis(data,rew.center,'cov',rew.cov); weights = mah <= cutoff.md^2; [raw.cov,raw.center]=trafo(raw.cov,raw.center,med,mad,p); [rew.cov,rew.center]=trafo(rew.cov,rew.center,med,mad,p); rew.rd=sqrt(mah); end raw.objective=raw.objective*prod(mad)^2; rew.flag=weights; %classical analysis? if options.classic==1 classic.cov=clascov; classic.center=clasmean; classic.md=rew.mahalanobis; classic.class='COV'; else classic=0; end %assigning Hsubsets: rew.Hsubsets.Hopt = 1:n; rew.Hsubsets.Hfreq = 1:n; %assigning the output rewo=rew;rawo=raw; rew=struct('center',{rewo.center},'cov',{rewo.cov},'cor',{rewo.cor},'h',{h},'Hsubsets',{rewo.Hsubsets},'alpha',{alfa},... 'rd',{rewo.rd},'cutoff',{cutoff},'flag',{rewo.flag},'plane',{rewo.plane},... 'method',{rewo.method},'class',{'MCDCOV'},'md',{rewo.mahalanobis},'classic',{classic},'X',{xx}); raw=struct('center',{rawo.center},'cov',{rawo.cov},'cor',{rawo.cor},'objective',{rawo.objective},... 'rd',{rawo.rd},'cutoff',{cutoff},'wt',{rawo.wt}, 'class',{'MCDCOV'},'classic',{classic},'X',{x}); if size(data,2)~=2 rew=rmfield(rew,'X'); raw=rmfield(raw,'X'); end try if plots & options.classic makeplot(rew,'classic',1) elseif plots makeplot(rew) end catch %output must be given even if plots are interrupted %> delete(gcf) to get rid of the menu return end return end percent=h/n; teller = zeros(1,n+1); if Hsets_ind csteps = csteps1; inplane = NaN; fine = 0; part = 0; final = 1; tottimes = 0; nsamp = size(Hsets,1); obsingroup = n; else % If n >= 2*nmini the dataset will be divided into subdatasets. For n < 2*nmini the set % will be treated as a whole. if n >= 2*nmini maxobs=maxgroup*nmini; if n >= maxobs ngroup=maxgroup; group(1:maxgroup)=nmini; else ngroup=floor(n/nmini); minquan=floor(n/ngroup); group(1)=minquan; for s=2:ngroup group(s)=minquan+double(rem(n,ngroup)>=s-1); end end part=1; adjh=floor(group(1)*percent); nsamp=floor(ntrial/ngroup); minigr=sum(group); obsingroup=fillgroup(n,group,ngroup,seed); % obsingroup : i-th row contains the observations of the i-th group. % The last row (ngroup+1-th) contains the observations for the 2nd stage % of the algorithm. else [part,group,ngroup,adjh,minigr,obsingroup]=deal(0,n,1,h,n,n); replow=[50,22,17,15,14,zeros(1,45)]; if n < replow(p) % All (p+1)-subsets will be considered. al=1; perm=[1:p,p]; nsamp=nchoosek(n,p+1); else al=0; nsamp=ntrial; end end % some further initialisations. csteps=csteps1; inplane=NaN; % tottimes : the total number of iteration steps. % fine : becomes 1 when the subdatasets are merged. % final : becomes 1 for the final stage of the algorithm. [tottimes,fine,final,prevdet]=deal(0); if part % bmean1 : contains, for the first stage of the algorithm, the means of the ngroup*10 % best estimates. % bcov1 : analogous to bmean1, but now for the covariance matrices. % bobj1 : analogous to bmean1, but now for the objective values. % coeff1 : if in the k-th subdataset there are at least adjh observations that lie on % a hyperplane then the coefficients of this plane will be stored in the % k-th column of coeff1. coeff1=repmat(NaN,p,ngroup); bobj1=repmat(inf,ngroup,10); bmean1=cell(ngroup,10); bP1 = cell(ngroup,10); bL1 = cell(ngroup,10); [bmean1{:}]=deal(NaN); [bL1{:}]=deal(NaN); [bP1{:}]=deal(NaN); end % bmean : contains the means of the ten best estimates obtained in the second stage of the % algorithm. % bcov : analogous to bmean, but now for the covariance matrices. % bobj : analogous to bmean, but now for the objective values. % coeff : analogous to coeff1, but now for the merged subdataset. % If the data is not split up, the 10 best estimates obtained after csteps1 iterations % will be stored in bmean, bcov and bobj. coeff=repmat(NaN,p,1); bobj=repmat(inf,1,10); bmean=cell(1,10); bP = cell(1,10); bL = cell(1,10); [bmean{:}]=deal(NaN); [bP{:}]=deal(NaN); [bL{:}] = deal(NaN); end seed=0; tellerwhilelus = 0; while final~=2 if fine | (~part & final) if ~Hsets_ind nsamp=10; end if final adjh=h; ngroup=1; if n*p <= 1e+5 csteps=csteps3; elseif n*p <=1e+6 csteps=10-(ceil(n*p/1e+5)-2); else csteps=1; end if n > 5000 nsamp=1; end else adjh=floor(minigr*percent); csteps=csteps2; end end % found : becomes 1 if we have a singular intermediate MCD estimate. found=0; for k=1:ngroup if ~fine found=0; end for i=1:nsamp tottimes=tottimes+1; % ns becomes 1 if we have a singular trial subsample and if there are at % least adjh observations in the subdataset that lie on the concerning hyperplane. % In that case we don't have to take C-steps. The determinant is zero which is % already the lowest possible value. If ns=1, no C-steps will be taken and we % start with the next sample. If we, for the considered subdataset, haven't % already found a singular MCD estimate, then the results must be first stored in % bmean, bcov, bobj or in bmean1, bcov1 and bobj1. If we, however, already found % a singular result for that subdataset, then the results won't be stored % (the hyperplane we just found is probably the same as the one we found earlier. % We then let adj be zero. This will guarantee us that the results won't be % stored) and we start immediately with the next sample. adj=1; ns=0; % For the second and final stage of the algorithm the array sortdist(1:adjh) % contains the indices of the observations corresponding to the adjh observations % with minimal relative distances with respect to the best estimates of the % previous stage. An exception to this, is when the estimate of the previous % stage is singular. For the second stage we then distinguish two cases : % % 1. There aren't adjh observations in the merged set that lie on the hyperplane. % The observations on the hyperplane are then extended to adjh observations by % adding the observations of the merged set with smallest orthogonal distances % to that hyperplane. % 2. There are adjh or more observations in the merged set that lie on the % hyperplane. We distinguish two cases. We haven't or have already found such % a hyperplane. In the first case we start with a new sample. But first, we % store the results in bmean1, bcov1 and bobj1. In the second case we % immediately start with a new sample. % % For the final stage we do the same as 1. above (if we had h or more observations % on the hyperplane we would already have found it). if ~Hsets_ind if final if ~isinf(bobj(i)) meanvct=bmean{i}; P=bP{i}; L = bL{i}; if bobj(i)==0 [dis,sortdist]=sort(abs(sum((data-repmat(meanvct,n,1))'.*repmat(coeff,1,n)))); else [dis,sortdist]=mahal2((data-repmat(meanvct,n,1))*P,sqrt(L),part,fine,final,k,obsingroup); end else break end elseif fine if ~isinf(bobj1(k,i)) meanvct=bmean1{k,i}; P=bP1{k,i}; L=bL1{k,i}; if bobj1(k,i)==0 [dis,ind]=sort(abs(sum((data(obsingroup{end},:)-repmat(meanvct,minigr,1))'.*repmat(coeff1(:,k),1,minigr)))); sortdist=obsingroup{end}(ind); if dis(adjh) < 1e-8 if found==0 obj=0; coeff=coeff1(:,k); found=1; else adj=0; end ns=1; end else [dis,sortdist]=mahal2((data(obsingroup{end},:)-repmat(meanvct,minigr,1))*P,sqrt(L),part,fine,final,k,obsingroup); end else break; end else if ~part if al k=p+1; perm(k)=perm(k)+1; while ~(k==1 |perm(k) <=(n-(p+1-k))) k=k-1; perm(k)=perm(k)+1; for j=(k+1):p+1 perm(j)=perm(j-1)+1; end end index=perm; else [index,seed]=randomset(n,p+1,seed); end else [index,seed]=randomset(group(k),p+1,seed); index=obsingroup{k}(index); end [P,T,L,r,centerX,meanvct] = classSVD(data(index,:)); if r==0 ns = 1; rew.center=meanvct; rew.cov=zeros(p,p); rew.flag = zeros(1,n); rew.flag(index) = 1; rew.Hsubsets.Hopt=1:n; rew.Hsubsets.Hfreq=1:n; elseif (r < p) & (r~=0) % The trial subsample is singular. % We distinguish two cases : % % 1. There are adjh or more observations in the subdataset that lie % on the hyperplane. If the data is not split up, we have adjh=h and thus % an exact fit. If the data is split up we distinguish two cases. % We haven't or have already found such a hyperplane. In the first case % we check if there are more than h observations in the entire set % that lie on the hyperplane. If so, we have an exact fit situation. % If not, we start with a new trial subsample. But first, the % results must be stored bmean1, bcov1 and bobj1. In the second case % we immediately start with a new trial subsample. % % 2. There aren't adjh observations in the subdataset that lie on the % hyperplane. We then extend the trial subsample until it isn't singular % anymore. % eigvct : contains the coefficients of the hyperplane. eigvct = P(:,1); if ~part dist=abs(sum((data-repmat(meanvct,n,1))'.*repmat(eigvct,1,n))); else dist=abs(sum((data(obsingroup{k},:)-repmat(meanvct,group(k),1))'.*repmat(eigvct,1,group(k)))); end obsinplane=find(dist < 1e-8); % count : number of observations that lie on the hyperplane. count=length(obsinplane); if count >= adjh if ~part [center,covar,eigvct,correl]=fit(data,obsinplane,med,mad,p,eigvct,cor); rew.plane=eigvct; weights(obsinplane)=1; [raw.center,raw.cov,rew.center,rew.cov,raw.objective,... raw.wt,rew.flag,rew.method]=displ(2,count,weights,n,p,center,covar,... rew.method,eigvct,ok,raw.wt,rew.flag,cor,correl); if cor [rew.cor,raw.cor]=deal(correl); end rew.Hsubsets.Hopt=obsinplane; rew.Hsubsets.Hfreq=obsinplane; return elseif found==0 if count>=h [center,covar,eigvct,correl]=fit(data,obsinplane,med,mad,p,eigvct,cor); rew.plane=eigvct; weights(obsinplane)=1; [raw.center,raw.cov,rew.center,rew.cov,raw.objective,... raw.wt,rew.flag,rew.method,varargout]=displ(2,count2,weights,n,p,center,covar,... rew.method,eigvct,ok,raw.wt,rew.flag,cor,correl); if cor [rew.cor,raw.cor]=deal(correl); end rew.Hsubsets.Hopt=obsinplane; rew.Hsubsets.Hfreq=obsinplane; return end obj=0; inplane(k)=count; coeff1(:,k)=eigvct; found=1; ns=1; else ns=1; adj=0; end else covmat = cov(data(index,:)); meanvct = mean(data(index,:)); while det(covmat) < exp(-50*p) [index1,seed]=addobs(index,n,seed); [covmat,meanvct] = updatecov(data(index,:),covmat,meanvct,data(setdiff(index1,index),:),[],1); index = index1; end end end if ~ns if ~part [dis,sortdist] = mahal2((data - repmat(meanvct,n,1))*P,sqrt(L),part,fine,final,k,obsingroup); else [dis,sortdist] = mahal2((data(obsingroup{k},:) - repmat(meanvct,group(k),1))*P,sqrt(L),part,fine,final,k,obsingroup); end end end end if ~ns for j=1:csteps tottimes=tottimes+1; if j == 1 if Hsets_ind obs_in_set = Hsets(i,:); else obs_in_set = sort(sortdist(1:adjh)); teller(obs_in_set) = teller(obs_in_set) + 1; teller(end) = teller(end) + 1; end else % The observations correponding to the adjh smallest mahalanobis % distances determine the subset for the next iteration. if ~part [dis2,sortdist] = mahal2((data - repmat(meanvct,n,1))*P,sqrt(L),part,fine,final,k,obsingroup); else if final [dis2,sortdist] = mahal2((data - repmat(meanvct,n,1))*P,sqrt(L),part,fine,final,k,obsingroup); elseif fine [dis2,sortdist] = mahal2((data(obsingroup{end},:) - repmat(meanvct,minigr,1))*P,sqrt(L),part,fine,final,k,obsingroup); else [dis2,sortdist] = mahal2((data(obsingroup{k},:) - repmat(meanvct,group(k),1))*P,sqrt(L),part,fine,final,k,obsingroup); end end % Creation of a H-subset. obs_in_set=sort(sortdist(1:adjh)); teller(obs_in_set) = teller(obs_in_set) + 1; teller(end) = teller(end) + 1; end [P,T,L,r,centerX,meanvct] = classSVD(data(obs_in_set,:)); if r==0 rew.center=meanvct; rew.cov=zeros(p,p); rew.flag=(data==data(obs_in_set(1),:)); rew.Hsubset.Hopt=obs_in_set(1); rew.Hsubset.Hfreq=obs_in_set(1); else obj=prod(L); if obj < exp(-50*p) % The adjh-subset is singular. If adjh=h we have an exact fit situation. % If adjh < h we distinguish two cases : % % 1. We haven't found earlier a singular adjh-subset. We first check if % in the entire set there are h observations that lie on the hyperplane. % If so, we have an exact fit situation. If not, we stop taking C-steps % (the determinant is zero which is the lowest possible value) and % store the results in the appropriate arrays. We then begin with % the next trial subsample. % % 2. We have, for the concerning subdataset, already found a singular % adjh-subset. We then immediately begin with the next trial subsample. if ~part | final | (fine & n==minigr) covmat=cov(data(obs_in_set,:)); [center,covar,z,correl,obsinplane,count]=fit(data,NaN,med,mad,p,NaN,... cor,meanvct,covmat,n); rew.plane=z; weights(obsinplane)=1; [raw.center,raw.cov,rew.center,rew.cov,raw.objective,... raw.wt,rew.flag,rew.method]=displ(2,count,weights,n,p,center,covar,... rew.method,z,ok,raw.wt,rew.flag,cor,correl); if cor [rew.cor,raw.cor]=deal(correl); end rew.Hsubset.Hopt=obsinplane; rew.Hsubset.Hfreq=obsinplane; return elseif found==0 eigvct = V(:,1); dist=abs(sum((data-repmat(meanvct,n,1))'.*repmat(eigvct,1,n))); obsinplane=find(dist<1e-8); count=length(obsinplane); if count >= h [center,covar,eigvct,correl]=fit(data,obsinplane,med,mad,p,eigvct,cor); rew.plane=eigvct; weights(obsinplane)=1; [raw.center,raw.cov,rew.center,rew.cov,raw.objective,... raw.wt,rew.flag,rew.method]=displ(2,count,weights,n,p,center,covar,... rew.method,eigvct,ok,raw.wt,rew.flag,cor,correl); if cor [rew.cor,raw.cor]=deal(correl); end rew.Hsubset.Hopt=obsinplane; rew.Hsubset.Hfreq=obsinplane; return end obj=0; found=1; if ~fine coeff1(:,k)=eigvct; dist=abs(sum((data(obsingroup{k},:)-repmat(meanvct,group(k),1))'.*repmat(eigvct,1,group(k)))); inplane(k)=length(dist(dist<1e-8)); else coeff=eigvct; dist=abs(sum((data(obsingroup{end},:)-repmat(meanvct,minigr,1))'.*repmat(eigvct,1,minigr))); inplane=length(dist(dist<1e-8)); end break; else adj=0; break; end end end % We stop taking C-steps when two subsequent determinants become equal. % We have then reached convergence. if j >= 2 & obj == prevdet break; end prevdet=obj; end % C-steps end % After each iteration, it has to be checked whether the new solution % is better than some previous one. A distinction is made between the % different stages of the algorithm: % % - Let us first consider the first (second) stage of the algorithm. % We distinguish two cases if the objective value is lower than the largest % value in bobj1 (bobj) : % % 1. The new objective value did not yet occur in bobj1 (bobj). We then store % this value, the corresponding mean and covariance matrix at the right % place in resp. bobj1 (bobj), bmean1 (bmean) and bcov1 (bcov). % The objective value is inserted by shifting the greater determinants % upwards. We perform the same shifting in bmean1 (bmean) and bcov1 (bcov). % % 2. The new objective value already occurs in bobj1 (bobj). A comparison is % made between the new mean vector and covariance matrix and those % estimates with the same determinant. When for an equal determinant, % the mean vector or covariance matrix do not correspond, the new results % will be stored in bobj1 (bobj), bmean1 (bmean) and bcov1 (bcov). % % If the objective value is not lower than the largest value in bobj1 (bobj), % nothing happens. % % - For the final stage of the algorithm, only the best solution has to be kept. % We then check if the objective value is lower than the till then lowest value. % If so, we have a new best solution. If not, nothing happens. if ~final & adj if fine | ~part if obj < max(bobj) & ~ns [bmean,bP,bL,bobj]=insertion(bmean,bP,bL,bobj,meanvct,P,L,obj,1,eps); end else if obj < max(bobj1(k,:)) & ~ns [bmean1,bP1,bL1,bobj1]=insertion(bmean1,bP1,bL1,bobj1,meanvct,P,L,obj,k,eps); end end end if final & obj< bestobj % bestset : the best subset for the whole data. % bestobj : objective value for this set. % initmean, initcov : resp. the mean and covariance matrix of this set. bestset=obs_in_set; bestobj=obj; initmean=meanvct; initcov=cov(data(bestset,:)); raw.initcov=cov(data(bestset,:)); end end % nsamp end % ngroup if part & ~fine fine=1; elseif (part & fine & ~final) | (~part & ~final) final=1; else final=2; end end % while loop [P,T,L,r,centerX,cX] = classSVD(data(bestset,:)); mah=libra_mahalanobis((data - repmat(cX,n,1))*P,zeros(size(P,2),1),'cov',L); sortmah=sort(mah); [sortset,indbestset] = sort(mah(bestset)); sortbestset = bestset(indbestset); rew.Hsubsets.Hopt = sortbestset; if ~factor_ind factor = sortmah(h)/chi2inv(h/n,p); else factor = sortmah(h)/chi2inv(h/n,p/2); end raw.cov=factor*initcov; % We express the results in the original units. [raw.cov,raw.center]=trafo(raw.cov,initmean,med,mad,p); raw.objective=bestobj*prod(mad)^2; if cor raw.cor=raw.cov./(sqrt(diag(raw.cov))*sqrt(diag(raw.cov))'); end % the mahalanobis distances are computed without the factor, therefore we % have to correct for it now. mah=mah/factor; raw.rd=sqrt(mah); weights=mah<=cutoff.md^2; raw.wt=weights; [rew.center,rew.cov]=weightmecov(data,weights); [trcov,trcenter]=trafo(rew.cov,rew.center,med,mad,p); % determination of Hfreq: [telobs,indobs] = greatsort(teller(1:(end - 1))); rew.Hsubsets.Hfreq = indobs(1:(h)); if size(rew.Hsubsets.Hfreq,2) == 1 rew.Hsubsets.Hfreq = rew.Hsubsets.Hfreq'; end if cor rew.cor=rew.cov./(sqrt(diag(rew.cov))*sqrt(diag(rew.cov))'); end if prod(sqrt(L)) < exp(-50*p) [center,covar,z,correl,plane,count]=fit(data,NaN,med,mad,p,z,cor,rew.center,rew.cov,n); rew.plane=z; if cor correl=covar./(sqrt(diag(covar))*sqrt(diag(covar))'); [rew.cor,raw.cor] = deal(correl); end rew.method=displrw(count,n,p,center,covar,rew.method,z,cor,correl); rew.flag=weights; rew.rd=raw.rd; else mah=libra_mahalanobis(data,rew.center,'cov',rew.cov); rew.flag=(mah <= cutoff.md^2); rew.rd=sqrt(mah); end rew.mahalanobis=sqrt(libra_mahalanobis(data,clmean,'cov',clcov)); rawo=raw; reso=rew; if options.classic==1 classic.cov=clascov; classic.center=clasmean; classic.md=rew.mahalanobis; classic.flag = (classic.md <= cutoff.md); if options.cor==1 classic.cor=clascov./(sqrt(diag(clascov))*sqrt(diag(clascov))'); end classic.class='COV'; else classic=0; end %assigning the output rew=struct('center',{trcenter},'cov',{trcov},'cor',{reso.cor},'h',{h},'Hsubsets',{reso.Hsubsets},'alpha',{alfa},... 'rd',{reso.rd},'flag',{reso.flag},'md',{reso.mahalanobis},'cutoff',{cutoff},... 'plane',{reso.plane},'method',{reso.method},'class',{'MCDCOV'},'classic',{classic},'X',{xx}); raw=struct('center',{rawo.center},'cov',{rawo.cov},'cor',{rawo.cor},'objective',{rawo.objective},... 'rd',{rawo.rd},'cutoff',{cutoff},... 'wt',{rawo.wt},'class',{'MCDCOV'},'classic',{classic},'X',{x}); if size(data,2)~=2 rew=rmfield(rew,'X'); raw=rmfield(raw,'X'); end try if plots & options.classic makeplot(rew,'classic',1) elseif plots makeplot(rew) end catch %output must be given even if plots are interrupted %> delete(gcf) to get rid of the menu end %----------------------------------------------------------------------------------------- function [raw_center,raw_cov,center,covar,raw_objective,raw_wt,mcd_wt,method]=displ(exactfit,... count,weights,n,p,center,covar,method,z,ok,raw_wt,mcd_wt,cor,correl,varargin) % Determines some fields of the output argument REW for the exact fit situation. It also % displays and writes the messages concerning the exact fit situation. If the raw MCD % covariance matrix is not singular but the reweighted is, then the function displrw is % called instead of this function. [raw_center,center]=deal(center); [raw_cov,cover]=deal(covar); raw_objective=0; mcd_wt=weights; raw_wt=weights; switch exactfit case 1 msg='The covariance matrix of the data is singular.'; case 2 msg='The covariance matrix has become singular during the iterations of the MCD algorithm.'; case 3 msg=sprintf('The %g-th order statistic of the absolute deviation of variable %g is zero. ',varargin{1},varargin{2}); end msg=sprintf([msg '\nThere are %g observations in the entire dataset of %g observations that lie on the \n'],count,n); switch p case 2 msg=sprintf([msg 'line with equation %g(x_i1-m_1)%+g(x_i2-m_2)=0 \n'],z); msg=sprintf([msg 'where the mean (m_1,m_2) of these observations is the MCD location']); case 3 msg=sprintf([msg 'plane with equation %g(x_i1-m_1)%+g(x_i2-m_2)%+g(x_i3-m_3)=0 \n'],z); msg=sprintf([msg 'where the mean (m_1,m_2,m_3) of these observations is the MCD location']); otherwise msg=sprintf([msg 'hyperplane with equation a_1 (x_i1-m_1) + ... + a_p (x_ip-m_p) = 0 \n']); msg=sprintf([msg 'with coefficients a_i equal to : \n\n']); msg=sprintf([msg sprintf('%g ',z)]); msg=sprintf([msg '\n\nand where the mean (m_1,...,m_p) of these observations is the MCD location']); end method=strvcat(method,[msg '.']); disp(method) %----------------------------------------------------------------------------------------- function method=displrw(count,n,p,center,covar,method,z,cor,correl) % Displays and writes messages in the case the reweighted robust covariance matrix % is singular. msg=sprintf('The reweighted MCD scatter matrix is singular. \n'); msg=sprintf([msg 'There are %g observations in the entire dataset of %g observations that lie on the\n'],count,n); switch p case 2 msg=sprintf([msg 'line with equation %g(x_i1-m_1)%+g(x_i2-m_2)=0 \n\n'],z); msg=sprintf([msg 'where the mean (m_1,m_2) of these observations is : \n\n']); case 3 msg=sprintf([msg 'plane with equation %g(x_i1-m_1)%+g(x_i2-m_2)%+g(x_i3-m_3)=0 \n\n'],z); msg=sprintf([msg 'where the mean (m_1,m_2,m_3) of these observations is : \n\n']); otherwise msg=sprintf([msg 'hyperplane with equation a_1 (x_i1-m_1) + ... + a_p (x_ip-m_p) = 0 \n']); msg=sprintf([msg 'with coefficients a_i equal to : \n\n']); msg=sprintf([msg sprintf('%g ',z)]); msg=sprintf([msg '\n\nand where the mean (m_1,...,m_p) of these observations is : \n\n']); end msg=sprintf([msg sprintf('%g ',center)]); msg=sprintf([msg '\n\nTheir covariance matrix equals : \n\n']); msg=sprintf([msg sprintf([repmat('% 13.5g ',1,p) '\n'],covar)]); if cor msg=sprintf([msg '\n\nand their correlation matrix equals : \n\n']); msg=sprintf([msg sprintf([repmat('% 13.5g ',1,p) '\n'],correl)]); end method=strvcat(method,msg); %----------------------------------------------------------------------------------------- function [initmean,initcov,z,correl,varargout]=fit(dat,plane,med,mad,p,z,cor,varargin) % This function is called in the case of an exact fit. It computes the correlation % matrix and transforms the coefficients of the hyperplane, the mean, the covariance % and the correlation matrix to the original units. if isnan(plane) [meanvct,covmat,n]=deal(varargin{:}); [z, eigvl]=eigs(covmat,1,0,struct('disp',0)); dist=abs(sum((dat-repmat(meanvct,n,1))'.*repmat(z,1,n))); plane=find(dist < 1e-8); varargout{1}=plane; varargout{2}=length(plane); end z=z./mad'; [initcov,initmean]=trafo(cov(dat(plane,:)),mean(dat(plane,:)),med,mad,p); if cor correl=initcov./(sqrt(diag(initcov))*sqrt(diag(initcov))'); else correl=NaN; end %------------------------------------------------------------------------------------------ function obsingroup=fillgroup(n,group,ngroup,seed) % Creates the subdatasets. obsingroup=cell(1,ngroup+1); jndex=0; for k=1:ngroup for m=1:group(k) [random,seed]=uniran(seed); ran=floor(random*(n-jndex)+1); jndex=jndex+1; if jndex==1 index(1,jndex)=ran; index(2,jndex)=k; else index(1,jndex)=ran+jndex-1; index(2,jndex)=k; ii=min(find(index(1,1:jndex-1) > ran-1+[1:jndex-1])); if length(ii) index(1,jndex:-1:ii+1)=index(1,jndex-1:-1:ii); index(2,jndex:-1:ii+1)=index(2,jndex-1:-1:ii); index(1,ii)=ran+ii-1; index(2,ii)=k; end end end obsingroup{k}=index(1,index(2,:)==k); obsingroup{ngroup+1}=[obsingroup{ngroup+1},obsingroup{k}]; end %----------------------------------------------------------------------------------------- function [ranset,seed]=randomset(tot,nel,seed) % This function is called if not all (p+1)-subsets out of n will be considered. % It randomly draws a subsample of nel cases out of tot. for j=1:nel [random,seed]=uniran(seed); num=floor(random*tot)+1; if j > 1 while any(ranset==num) [random,seed]=uniran(seed); num=floor(random*tot)+1; end end ranset(j)=num; end %----------------------------------------------------------------------------------------- function [index,seed]=addobs(index,n,seed) % Extends a trial subsample with one observation. jndex=length(index); [random,seed]=uniran(seed); ran=floor(random*(n-jndex)+1); jndex=jndex+1; index(jndex)=ran+jndex-1; ii=min(find(index(1:jndex-1) > ran-1+[1:jndex-1])); if length(ii)~=0 index(jndex:-1:ii+1)=index(jndex-1:-1:ii); index(ii)=ran+ii-1; end %----------------------------------------------------------------------------------------- function mahsort=mahal(dat,meanvct,covmat,part,fine,final,k,obsingroup,group,minigr,n,nvar) % Orders the observations according to the mahalanobis distances. if ~part | final [dis,ind]=sort(libra_mahalanobis(dat,meanvct,'cov',covmat)); mahsort=ind; elseif fine [dis,ind]=sort(libra_mahalanobis(dat(obsingroup{end},:),meanvct,'cov',covmat)); mahsort=obsingroup{end}(ind); else [dis,ind]=sort(libra_mahalanobis(dat(obsingroup{k},:),meanvct,'cov',covmat)); mahsort=obsingroup{k}(ind); end %----------------------------------------------------------------------------------------- function [dis,mahsort]=mahal2(score,sca,part,fine,final,k,obsingroup) % Orders the observations according to the mahalanobis distances for a diagonal % covariance matrix and zero mean. sca contains the squareroot of the diagonal elements. if ~part | final [dis,ind]=sort(libra_mahalanobis(score,zeros(size(score,2),1),'cov',sca.^2)); mahsort=ind; elseif fine [dis,ind]=sort(libra_mahalanobis(score,zeros(size(score,2),1),'cov',sca.^2)); mahsort=obsingroup{end}(ind); else [dis,ind]=sort(libra_mahalanobis(score,zeros(size(score,2),1),'cov',sca.^2)); mahsort=obsingroup{k}(ind); end %------------------------------------------------------------------------------------------ function [covmat,meanvct]=trafo(covmat,meanvct,med,mad,nvar) % Transforms a mean vector and a covariance matrix to the original units. covmat=covmat.*repmat(mad,nvar,1).*repmat(mad',1,nvar); meanvct=meanvct.*mad+med; %----------------------------------------------------------------------------------------- function [bestmean,bestP,bestL,bobj]=insertion(bestmean,bestP,bestL,bobj,meanvct,P,L,obj,row,eps) % Stores, for the first and second stage of the algorithm, the results in the appropriate % arrays if it belongs to the 10 best results. insert=1; equ=find(obj==bobj(row,:)); for j=equ if (meanvct==bestmean{row,j}) if all(P==bestP{row,j}) if all(L==bestL{row,j}) insert=0; end end end end if insert ins=min(find(obj < bobj(row,:))); if ins==10 bestmean{row,ins}=meanvct; bestP{row,ins} = P; bestL{row,ins} = L; bobj(row,ins)=obj; else [bestmean{row,ins+1:10}]=deal(bestmean{row,ins:9}); bestmean{row,ins}=meanvct; [bestP{row,ins+1:10}] = deal(bestP{row,ins:9}); bestP{row,ins} = P; [bestL{row,ins+1:10}] = deal(bestL{row,ins:9}); bestL{row,ins} = L; bobj(row,ins+1:10)=bobj(row,ins:9); bobj(row,ins)=obj; end end %----------------------------------------------------------------------------------------- function quan=quanf(alfa,n,rk) quan=floor(2*floor((n+rk+1)/2)-n+2*(n-floor((n+rk+1)/2))*alfa); %--------------------------------------------------------------------------