awk's rand() not really random?

jrick

Member

Reaction score: 6
Messages: 84

I'm trying to select a random item from a list with awk, but I'm finding that awk's rand() feature isn't quite working as expected.

Code:
% awk 'BEGIN{ srand(); print rand(); }'
0.889922
% awk 'BEGIN{ srand(); print rand(); }'
0.88993
% awk 'BEGIN{ srand(); print rand(); }'
0.88993
% awk 'BEGIN{ srand(); print rand(); }'
0.889938
% awk 'BEGIN{ srand(); print rand(); }'
0.889938
% awk 'BEGIN{ srand(); print rand(); }'
0.889946
% awk 'BEGIN{ srand(); print rand(); }'
0.889946
% awk 'BEGIN{ srand(); print rand(); }'
0.889954
% awk 'BEGIN{ srand(); print rand(); }'
0.889962
% awk 'BEGIN{ srand(); print rand(); }'
0.889962
% awk 'BEGIN{ srand(); print rand(); }'
0.889969
% awk 'BEGIN{ srand(); print rand(); }'
0.889969
...which isn't really random.

Am I doing something wrong? Is there a different way I should be using this?
 

anomie

Aspiring Daemon

Reaction score: 120
Messages: 781

I'm not sure if this will help or not (because the documentation is for - and I am using - gawk), but:

http://www.gnu.org/software/gawk/manual/html_node/Numeric-Functions.html#Numeric-Functions

In your example, if you're re-running the command at a quick pace I suppose the seed (based on date/time if one is not explicitly provided) could be very similar.

Here's an example session I just tried, waiting ~4 seconds between each press of Enter:
Code:
$ awk '{ srand() ; print rand() }'

0.669459

0.291102

0.614256

0.286518

0.362619

0.485851
^C
 

anomie

Aspiring Daemon

Reaction score: 120
Messages: 781

Oui, I just confirmed. Look what happens if I hit Enter rapidly:
Code:
$ awk '{ srand() ; print rand() }'

0.372023

0.372023

0.372023

0.372023

0.372023

0.602575

0.602575

0.602575
^C
So you are probably going to need to either provide a seed or slow things down a bit.
 
OP
OP
J

jrick

Member

Reaction score: 6
Messages: 84

Well, I have it kind of working with gawk, but I would really prefer to have a solution using awk from base for portability.

Code:
gawk 'BEGIN { srand(systime() + PROCINFO["pid"]); print rand() }'
(taken from here)

Unfortunately, systime() is only provided with gawk. Any good ideas on how to do this with /usr/bin/awk?
 

anomie

Aspiring Daemon

Reaction score: 120
Messages: 781

This time with awk:
Code:
$ awk --version
awk version 20070501 (FreeBSD)
OK for a single run:
Code:
$ _seed=`date +%s` ; awk '{ srand('${_seed}') ; print rand()}'
Each time that entire command is invoked, you'll have Epoch seconds assigned to _seed.
 
OP
OP
J

jrick

Member

Reaction score: 6
Messages: 84

Awesome, thanks. But now it seems like awk doesn't respect PROCINFO like gawk does.

Code:
% gawk 'BEGIN{ print PROCINFO["pid"]}'
10580
% awk 'BEGIN{ print PROCINFO["pid"]}'
According to the gawk documentation, PROCINFO["pid"] returns the process ID of the current process. Any idea how to get something like this working?
 

anomie

Aspiring Daemon

Reaction score: 120
Messages: 781

Dunno. All I can think of is to use the Bourne shell variable to get its pid (assuming you're going to be running this from Bourne shell / or from a Bourne shell script).
Code:
$ awk '{ print '$$' }'

228
 
OP
OP
J

jrick

Member

Reaction score: 6
Messages: 84

Alt said:
I tried following construct and seems it works ok
If you run it just once, yes. If you run it multiple times in a row, the "random" numbers aren't really all that random. This is because the seed that srand() uses is the system time.

I also found out that this whole thread seems kind of pointless, since srand() actually takes no arguments like the srand() of gawk. Even though I can pass it these same variables as for gawk, it doesn't make any difference.
 

Alt

Aspiring Daemon

Reaction score: 82
Messages: 726

Then that be much harder huh :p
Next version is
awk -v rn=`jot -r 1 1 10000000` "BEGIN { srand(srand()+rn); } { print rand(); }"
Tested with
echo "Test" | awk -v rn=`jot -r 1 1 10000000` "BEGIN { srand(srand()+rn); } { print rand(); }" ; echo "Test" | awk -v rn=`jot -r 1 1 10000000` "BEGIN { srand(srand()+rn ); } {print rand(); }"
 
OP
OP
J

jrick

Member

Reaction score: 6
Messages: 84

Wow, that works much nicer. How exactly does that work, and how are you able to give arguments to srand()?
 

Alt

Aspiring Daemon

Reaction score: 82
Messages: 726

Code:
awk -v rn=`jot -r 1 1 10000000` "BEGIN { srand(srand()+rn); } { print rand(); }"
jot -r 1 1 10000000 - gives random number from 1 to 10000000 =)
-v rn=.... - this inserts variable to awk interpreter
srand() - really it takes seed number and returns old seed. So, cus "old seed" is systime, we just add an 'rn' variable

UPD: it can be simplified to
Code:
awk "BEGIN { srand(srand()+`jot -r 1 1 10000000`); } {print rand(); }"
 

ephemera

Member

Reaction score: 2
Messages: 33

I think the problem is that srand() is being called for every call to rand(). When srand(3) is initialized with the same value the psuedo random number sequence is repeated (awk's srand() is probably implemented as srand(time(0)).).

Try: awk 'BEGIN {srand()} {print rand()}'
 
Top