Hyperthreading part II

Some time ago I posted some questions / info on hyperthreading on P4 style CPUs.

I did a couple of tests with a script in GNU Octave. The script is part of a test suite to evaluate Octave, Matlab, etc... performance. See script below.
I have two machines with exactly the same hardware configuration:
HP DL360 G4, Xeon 3.4GHz 1MB L2 chache, 2GB PC2700 memory.

OS: FreeBSD 7.1p4 AMD64
GNU Octave 3.0.3_4

Server 1(S1): Hyperthreading enabled
Server 2 (S2): Hyperthreading disabled

I did the following test:

1. Run this script a single time on S1 and S2.
2. Run the script two times (concurrent) on S1 and S2.

For a 140X140 matrix I have the following results:

S1: one single run: 367.2s (elapsed time)
S1: two concurrent runs: 655.5 s (elapsed time)

S2: one single run: 362.8s (elapsed time)
S2: two concurrent runs: 726s (elapsed time)

Conclusion: the differences for a single run are minor, but concurrent runs are quite noticeable.

Code:
runs=3;
% (5)
cumulate = 0; p = 0; vt = 0; vr = 0; vrt = 0; rvt = 0; RV = 0; j = 0; k = 0;
x2 = 0; R = 0; Rxx = 0; Ryy = 0; Rxy = 0; Ryx = 0; Rvmax = 0; f = 0;
for i = 1:runs
  x = abs(randn(140,140));
  tic;
    % Calculation of Escoufier's equivalent vectors
    p = size(x, 2);
    vt = [1];                                % Variables to test
    vr = [];                                   % Result: ordered variables
    RV = [1];                                % Result: correlations
    for j = 1:p                                % loop on the variable number
      Rvmax = 0;
      for k = 1:(p-j+1)                        % loop on the variables
        if j == 1
          x2 = [x, x(:, vt(k))];
        else
          x2 = [x, x(:, vr), x(:, vt(k))];     % New table to test
        end
        R = corrcoef(x2);                      % Correlations table
        Ryy = R(1:p, 1:p);
        Rxx = R(p+1:p+j, p+1:p+j);
        Rxy = R(p+1:p+j, 1:p);
        Ryx = Rxy';
        rvt = trace(Ryx*Rxy)/((trace(Ryy^2)*trace(Rxx^2))^0.5); % RV calculation
        if rvt > Rvmax
          Rvmax = rvt;                         % test of RV
          vrt(j) = vt(k);                      % temporary held variable
        end
      end
      vr(j) = vrt(j);                          % Result: variable
      RV(j) = Rvmax;                           % Result: correlation
      f = find(vt~=vr(j));                     % identify the held variable
      vt = vt(f);                              % reidentify variables to test
    end
  timing = toc
  drawnow
  cumulate = cumulate + timing;
end
times(5, 3) = timing;
disp(['Escoufier''s method on a 100x100 matrix (mixed)________ (sec): ' num2str(timing)])
clear x; clear p; clear vt; clear vr; clear vrt; clear rvt; clear RV; clear j; clear k;
clear x2; clear R; clear Rxx; clear Ryy; clear Rxy; clear Ryx; clear Rvmax; clear f;
 
Here is some additional update:

I did the same test on S1 (Hyper Threading Enabled) but with
NetBSD 4.0.1 (AMD64-MP) installed and GNU Octave 3.0.3:

S1: one single run: 540.5s (elapsed time)
S1: two concurrent runs: 687 s (elapsed time)

Compared to FreeBSD, NetBSD is more than 30 percent slower for
a single run, but makes up a lot for two concurrent runs.
 
Octave 3.0.5 improvement over Octave 3.0.3_4 in FreeBSD 7.1p4 amd64:

Octave 3.0.3_4 -> Octave 3.0.5

Hyper Threading enabled.

S1: one single run: 367.2s -> 363s (elapsed time)
S1: two concurrent runs: 655.5 s -> 645s (elapsed time)
 
I did some additional testing today on the same server (see the initial post) with Hyper Threading enabled:

Three concurrent runs:

run 1: 643s

run 2: 567s

run 3: 751s

AVG: 653s


To verify, I did this over:

run 1: 644s

run 2: 606s

run 3: 702s

AVG: 651s

These runs puzzle me, since the average is about the same as with only two runs.

Performing four concurrent runs results in:

run 1: 1289s
run 2: 1272s
run 3: 1276s
run 4: 1290s

AVG: 1281s

This is in line with results from two concurrent runs.

If somebody else is interested to do this test feel free to post your results!
 
Back
Top