runHiLoExp takes an experiment with the following structure and splits the sample space at the median of a target flux solves the two spaces with a given sugar and compares the resulting mdvs to provide a z-score. experiment - model - S = the stoichiometric matrix - rxns = array of reaction names, corresponding the S - c = optimization target 1, or -1 - ub,lb = upper and lower bounds of reactions points = a #fluxes X #samples (~2000) array of the solution space if missing or empty, will generate a sample glcs = an array of sugars in isotopomer format, each column a separate sugar. Should not be in MDV format. Conversion is done automatically. will default to generate 1 random sugar if set to [] targets = an array of cells with string for the reaction to split on the solution space, defaults to 'PGL' thresholds = # targets X 1 array of thresholds metabolites = an optional parameter fed to calcMDVfromSamp.m which only calculates the MDVs for the metabolites listed in this array. e.g. - optionally, metabolites can also be a structure of fragments hilo = a #targets X #samples array of 0/1's, 0 id's the sample of fluxes as the lo side and 1 id's the sample for the hi side. hilo will only be calculated/recalculated if it's missing or if the targets have been replaced using the param list mdvs = structure of mdv results - (name) = name of the run = t + glc# e.g. t1, t2, glc# refers to the glc in the glcs array - mdv = array of mdv results - zscore = array of zscores from comparison btw the two mdvs - totalz Note that the split of mdvs are not stored, also since the only time mdvs should be regen'd is when glcs has changed, but we have no way of knowing when this happens, the user will have to manually empty out mdvs to have it regenerated. zscores = array of zscores from each run, targets X glcs rscores = array of ridge scores from each run, targets X glcs target is an optional string for a specific rxn to target. if supplied, it will override and replace the targets field in the experiment structure. threshold is an optional number to apply on the solution space fluxes if supplied, it will be applied to the hilo field and replace the hilo splits. outputs the experiment array. basically, this code will loop thru one experiment per sugar, per target Wing Choi 3/14/08
0001 function [experiment] = runHiLoExp(experiment) 0002 % runHiLoExp 0003 % takes an experiment with the following structure 0004 % and splits the sample space at the median of a target flux 0005 % solves the two spaces with a given sugar and compares the 0006 % resulting mdvs to provide a z-score. 0007 % 0008 % experiment - 0009 % model - S = the stoichiometric matrix 0010 % - rxns = array of reaction names, corresponding the S 0011 % - c = optimization target 1, or -1 0012 % - ub,lb = upper and lower bounds of reactions 0013 % points = a #fluxes X #samples (~2000) array of the solution space 0014 % if missing or empty, will generate a sample 0015 % glcs = an array of sugars in isotopomer format, each column a separate sugar. 0016 % Should not be in MDV format. Conversion is done automatically. 0017 % will default to generate 1 random sugar if set to [] 0018 % targets = an array of cells with string for the reaction to 0019 % split on the solution space, defaults to 'PGL' 0020 % thresholds = # targets X 1 array of thresholds 0021 % metabolites = an optional parameter fed to calcMDVfromSamp.m 0022 % which only calculates the MDVs for the metabolites listed in this 0023 % array. e.g. 0024 % - optionally, metabolites can also be a structure of fragments 0025 % hilo = a #targets X #samples array of 0/1's, 0 id's the sample of 0026 % fluxes as the lo side and 1 id's the sample for the hi side. 0027 % hilo will only be calculated/recalculated if it's missing or 0028 % if the targets have been replaced using the param list 0029 % mdvs = structure of mdv results 0030 % - (name) = name of the run = t + glc# 0031 % e.g. t1, t2, glc# refers to the glc in the glcs array 0032 % - mdv = array of mdv results 0033 % - zscore = array of zscores from comparison btw the two mdvs 0034 % - totalz 0035 % Note that the split of mdvs are not stored, 0036 % also since the only time mdvs should be regen'd is when glcs 0037 % has changed, but we have no way of knowing when this happens, 0038 % the user will have to manually empty out mdvs to have it 0039 % regenerated. 0040 % zscores = array of zscores from each run, targets X glcs 0041 % rscores = array of ridge scores from each run, targets X glcs 0042 % 0043 % target is an optional string for a specific rxn to target. 0044 % if supplied, it will override and replace the targets field in the 0045 % experiment structure. 0046 % threshold is an optional number to apply on the solution space fluxes 0047 % if supplied, it will be applied to the hilo field and replace the 0048 % hilo splits. 0049 % 0050 % 0051 % outputs the experiment array. 0052 % 0053 % basically, this code will loop thru one experiment 0054 % per sugar, per target 0055 % 0056 % Wing Choi 3/14/08 0057 0058 if (nargin < 1) 0059 disp '[experiment] = runHiLoExp(experiment,target,threshold)'; 0060 return; 0061 end 0062 0063 runzscore = true; %binary flag for z-scores 0064 runrscore = false; %binary flag for r-scores 0065 runkscore = false; %binary flag for k-scores 0066 0067 0068 %% if no model, exit with error. 0069 % if model exists, but no points 0070 % then points are generated 0071 % existing mdvs in experiment, if any, are wiped 0072 if (not(isfield(experiment,'model'))) 0073 disp 'ERROR: input structure experiment lacks the requisite model in the model field'; 0074 return; 0075 end 0076 0077 model = experiment.model; 0078 points = []; 0079 0080 if (isfield(experiment,'points')) 0081 points = experiment.points; 0082 end 0083 0084 if (isempty(points)) 0085 disp 'WARN: experiment.points is empty or missing'; 0086 disp ' generating a sample and empty out mdvs'; 0087 pointSample = generateRandomSample(model,2000); 0088 0089 points = pointSample.point; 0090 experiment.mfrac = pointSample.mf; 0091 %points = m.points; 0092 experiment.points = points; 0093 %experiment.mfrac = mf; 0094 % recalculate the mdvs since we have new points. 0095 experiment.mdvs = []; 0096 end 0097 0098 0099 %% if the rxns array is inverted 0100 % display error message indicating that rxns array is inverted 0101 dr = size(model.rxns,2); 0102 if (dr > 1), 0103 disp 'ERROR: rxns array is inverted'; 0104 return; 0105 end 0106 0107 if (isfield(experiment,'metabolites')) 0108 metabolites = experiment.metabolites; 0109 else 0110 metabolites = []; 0111 end 0112 0113 %% if no sugar, generate a random one and warn the user. 0114 % existing mdvs are wiped 0115 glcs = []; 0116 if (isfield(experiment,'glcs')) 0117 glcs = experiment.glcs; 0118 end 0119 if (isempty(glcs)) 0120 disp 'WARN: glcs not found, will generate 1 random sugar for experiment and empty out mdvs'; 0121 glcs = getRandGlc(); 0122 experiment.glcs = glcs; 0123 experiment.mdvs = []; 0124 end 0125 0126 %% set glucose mixture name description; 0127 glcsnames = {}; 0128 for i = 1:size(glcs,2) 0129 glcsnames{1,i} = ''; 0130 glci = glcs(:,i); 0131 fglc = find(glci); 0132 for j = 1:length(fglc) 0133 if j ~=1 0134 glcsnames{1,i} = strcat(glcsnames{1,i}, '+'); 0135 end 0136 if round(100*glci(fglc(j))) < 100 0137 glcsnames{1,i} = strcat(glcsnames{1,i}, num2str(round(100*glci(fglc(j)))), '% '); 0138 end 0139 if fglc(j)-1 == 0 % not labeled 0140 glcsnames{1,i} = strcat(glcsnames{1,i}, 'C0'); 0141 elseif abs(log2(fglc(j)-1) - round(log2(fglc(j)-1)))<1e-8 % if it's a perfect power 0142 glcsnames{1,i} = strcat(glcsnames{1,i}, 'C', num2str(6-round(log2(fglc(j)-1))) ); 0143 elseif fglc(j)-1 == 32+16 % C12 0144 glcsnames{1,i} = strcat(glcsnames{1,i}, 'C12'); 0145 elseif fglc(j)-1 == 63 % fully labeled 0146 glcsnames{1,i} = strcat(glcsnames{1,i}, 'CU'); 0147 else 0148 glcsnames{1,i} = strcat(glcsnames{1,i}, '#', dec2bin(fglc(j)-1, 6)); 0149 end 0150 end 0151 end 0152 experiment.glcsnames = glcsnames; 0153 0154 0155 %% if hilo is defined in experiment 0156 % then inform user and error out 0157 if (not(isfield(experiment,'hilo'))) 0158 disp 'ERROR: hilo field not found in input structure experiment'; 0159 return; 0160 else 0161 hilo = experiment.hilo; 0162 end 0163 0164 ntarget = size(hilo,1); 0165 nglc = size(glcs,2); 0166 0167 0168 %% we don't care about the thresholds field from this point on 0169 if (not(isfield(experiment,'mdvs'))) 0170 experiment.mdvs = []; 0171 end 0172 mdvs = experiment.mdvs; 0173 if (isempty(mdvs)) 0174 disp('no mdvs found, recalculating mdvs and emptying out zscores'); 0175 mdvs = struct; 0176 for iglc = 1:nglc 0177 fprintf('MDVS on glucose %d of %d\n',iglc, nglc); 0178 glc = glcs(:,iglc); 0179 % translate sugar from isotopomer to cuomer format 0180 if abs(sum(glc)-1)>1e-6 || any(glc <0) 0181 display('invalid glc. needs to be idv'); 0182 disp(iglc) 0183 glc 0184 pause; 0185 end 0186 0187 mdv = calcMDVfromSamp(glc,experiment.points,metabolites); 0188 0189 name = sprintf('t%d',iglc); 0190 mdvs.(name) = mdv; 0191 end 0192 experiment.mdvs = mdvs; 0193 experiment.zscores = []; 0194 end 0195 0196 %% regen the zscores 0197 if (not(isfield(experiment,'zscores'))) 0198 experiment.zscores = []; 0199 end 0200 zscores = experiment.zscores; 0201 if (isempty(zscores) && runzscore) 0202 disp('calculating zscores'); 0203 for iglc = 1:nglc 0204 fprintf('z-scores on glucose %d of %d\n',iglc, nglc); 0205 for itgt = 1:ntarget 0206 % target = char(targets(itgt)); 0207 hl = hilo(itgt,:); 0208 name = sprintf('t%d',iglc); 0209 mdv = mdvs.(name); 0210 [hiset,loset] = splitMDVbyTarget(mdv,hl); 0211 % if ((size(loset,1)) ~= (size(hiset,1))) 0212 % zscores(itgt,iglc) = -1; 0213 % disp('problem with the hi or lo set, cannot calculate zscore'); 0214 % continue; 0215 % end 0216 mdv1.names = mdv.names; 0217 mdv1.ave = mean(loset,2); 0218 mdv1.stdev = std(loset,0,2); 0219 mdv2.ave = mean(hiset,2); 0220 mdv2.stdev = std(hiset,0,2); 0221 0222 [totalz,zscore] = compareTwoMDVs(mdv1,mdv2); 0223 zscores(itgt,iglc) = totalz; 0224 end 0225 end 0226 experiment.zscores = zscores; 0227 end 0228 0229 0230 %% regen the ridge score 0231 0232 if (not(isfield(experiment,'rscores'))) 0233 experiment.rscores = []; 0234 end 0235 rscores = experiment.rscores; 0236 if (isempty(rscores) && runrscore) 0237 disp('calculating ridge scores'); 0238 for iglc = 1:nglc 0239 fprintf('r-scores on glucose %d of %d\n',iglc, nglc); 0240 for itgt = 1:ntarget 0241 hl = hilo(itgt,:)'; 0242 name = sprintf('t%d',iglc); 0243 mdv = mdvs.(name).mdv; 0244 rscore = score_ridge(mdv,hl); 0245 rscores(itgt,iglc) = rscore; 0246 end 0247 end 0248 experiment.rscores = rscores; 0249 end 0250 0251 %% regen the KS score 0252 if (not(isfield(experiment,'kscores'))) 0253 experiment.kscores = []; 0254 end 0255 kscores = experiment.kscores; 0256 if (isempty(kscores) && runkscore) 0257 disp('calculating KS scores'); 0258 for iglc = 1:nglc 0259 fprintf('KS-scores on glucose %d of %d\n',iglc, nglc); 0260 for itgt = 1:ntarget 0261 hl = hilo(itgt,:)'; 0262 name = sprintf('t%d',iglc); 0263 mdv = mdvs.(name).mdv; 0264 kscore = score_KS(mdv,hl); 0265 kscores(itgt,iglc) = kscore; 0266 end 0267 end 0268 experiment.kscores = kscores; 0269 end 0270 0271 return; 0272 0273 end 0274 0275 0276 %% findIndexToTarget 0277 % 0278 % function [targetind] = findIndexToTarget(model,target) 0279 % 0280 % % Given a model, find the index to the target in the model.rxns 0281 % 0282 % d = size(model.c); 0283 % %find index to target flux 0284 % found = false; 0285 % for r = 1:d(1), 0286 % if ~isempty(findstr(char(model.rxns(r)),target)) 0287 % found = true; 0288 % break; 0289 % end 0290 % end 0291 % if (~found) 0292 % disp(sprintf('could not locate %s flux',target)); 0293 % targetind = -1; 0294 % return; 0295 % end 0296 % targetind = r; 0297 % disp(sprintf('found target flux for %s at: %d',target,targetind)); 0298 % 0299 % return 0300 % end 0301 0302 %% splitMDVbyTarget 0303 % 0304 function [hiset,loset] = splitMDVbyTarget(mdv,hilo) 0305 0306 % Given an mdv set and a hilo discriminator, split the 0307 % mdvset into 2 sets: a lo and hi set each with numinset cols. 0308 hiset = mdv.mdv(:,hilo==1); 0309 loset = mdv.mdv(:,~hilo); 0310 return; 0311 % nmdv = size(mdv.mdv,2); 0312 % 0313 % hisetcount = 0; 0314 % losetcount = 0; 0315 % hiset = []; 0316 % loset = []; 0317 % mdva = mdv.mdv; 0318 % mdvnan = mdv.mdvnan; 0319 % cnan = size(mdvnan,2); 0320 % 0321 % % if ((2*numinset) > nmdv) 0322 % % disp('WARN: insufficient number of points to cover split into hi and lo set'); 0323 % % end 0324 % 0325 % 0326 % 0327 % for i = 1:nmdv 0328 % if (cnan >= i) 0329 % % do nan check only if nan array is larger than i index 0330 % if (sum(isnan(mdvnan(:,i))) > 1) 0331 % continue; 0332 % end 0333 % end 0334 % if (hilo(1,i) == 1) 0335 % if (hisetcount <= numinset) 0336 % hisetcount = hisetcount + 1; 0337 % hiset(:,hisetcount) = mdva(:,i); 0338 % end 0339 % else 0340 % if (losetcount <= numinset) 0341 % losetcount = losetcount + 1; 0342 % loset(:,losetcount) = mdva(:,i); 0343 % end 0344 % end 0345 % if ((hisetcount >= numinset) && (losetcount >= numinset)) 0346 % break; 0347 % end 0348 % end 0349 % 0350 % % might have read thru the entire set but lots of nan for mdv's 0351 % if ((hisetcount < numinset) || (losetcount < numinset)) 0352 % disp(sprintf('WARN: hisetcount = %d, losetcount = %d',hisetcount,losetcount)); 0353 % end 0354 0355 end