Generate passwords from the commandline

I needed to generate a random password from a shell script, I figured that this was solved long ago, so I turned to teh interwebz to quickly copy/paste a working solution. Inspecting the first few links that turned up, I noticed many of the proposed solutions are dubious at best.

The date ain’t random, buddy
The most obviously wrong are:

Code:
$ date +%s | sha256sum | base64 | head -c32
$ date | md5sum
$ ping -c 1 yahoo.com | md5 | head -c8
To paraphrase a quote from holy scripture The Hitchhikers guide to the Galaxy: “This is obviously some strange usage of the word ‘random’ that I hadn’t previously been aware of”.

Both SHA256 and MD5 also output in hex, so that would limit the total amount of characters to just 16, instead of 92.

tr means translate characters
Most of the other commands suffer from a dubious usage of tr(1).

tr(1) works on characters, not byte streams, /dev/urandom outputs a byte stream, not characters. If your locale is set to (extended) ASCII or a variant thereof (ISO-8859-1, Windows-1252) this is more or less okay, since every byte is a character or escape code.

However, with UTF-8 or another multibyte character sets, it gets more complicated. Not every random byte stream is a valid set of UTF-8 characters, the chances of a random byte stream also being a valid UTF-8 character stream is quite small.

Yet, it seems to work on Linux with GNU tr. Why? Here’s a clue:

Code:
$ echo 'I løv€ π' | tr '[:lower:]' '[:upper:]' 
I LøV€ π

$ echo 'I løv€ π' | tr øπ€ X
I lXXvXXX XX
We would expect the ø and π to be uppercased, but they’re not, and the ø, π, and € getting replaced by 2 or 3 X’s?

The astute reader will have recognized what this means, GNU tr doesn’t handle multibyte characters, and always assumes an ASCII character set, which is somewhat disappointing, since it’s 2014, not 1974.

FreeBSD, for example, does this correctly, it also gives an error message on invalid UTF-8 sequences:

Code:
$ echo 'I løv€ π' | tr '[:lower:]' '[:upper:]'
I LØV€ Π

$ echo 'I løv€ π' | tr øπ€ X
I lXvX X

$ head -c5 /dev/urandom | tr X Y
tr: Illegal byte sequence

$ setenv LC_CTYPE C

$ head -c5 /dev/urandom | tr X Y
f��!�
The moral here is: byte streams are not character streams, don’t use ’em as such. It may work for now, but whenever someone adds multibyte support to GNU tr, your command will fail. It’s 2014, always assume multibyte by default.

Other problems
While I’m whining anyway…

Code:
$ openssl rand -base64 8 | md5 | head -c8
Using openssl rand is a good idea, but piping it to md5 isn’t. Base64 gives me 64 characters, md5 gives me 16, making the password a lot easier to brute force. Also, 8 characters is too short, use at least 15.

Code:
$ curl -s http://sensiblepassword.com/?harder=1
Getting a random from the Internet is spectacularly stupid and naive. Someone now knows:

  • a password you are using for some service or site;
  • unique personal details about you (IP address, browser/environment info).
I can now cross-reference the information with other data collected about you. For example, you once posted to a mailing list, your IP address is in the mail’s header, so we now have a password, name, and an email address. I hope you can finish the scenario from here…

Just don’t do this. Ever. Randomly banging on the keyboard is a lot better.

Good solutions
Code:
$ head -c100 /dev/urandom | strings -n1 | tr -d '[:space:]' | head -c15
$ openssl rand -base64 15
$ gpg2 --armor --gen-random 1 15
The first solution could be considered slightly better, since it includes more characters (92 instead of 64). It also doesn’t require external tools (although openssl is almost always available these days).

Lessons
  • A byte streams is not the same thing as a character stream.
  • Use strings(1) to convert a byte stream to a character stream.
  • Don’t use the hex output of a hashing algorithm (SHA256, MD5).
  • Don’t trust copy/paste solutions from the Internet; always think for yourself.
  • BSD > GNU
 
That only includes alphanumeric characters, I'd rather have as many characters as I want.
 
The genpw script below generates passwords from /dev/random. It accepts a length and a character class as optional parameters.

Code:
#!/bin/sh

basename="$(basename -- "$0")"
def_ctype="graph"
def_len="24"

usage() {
  echo
  echo "Usage:  $basename [length] [ctype]"
  echo
  echo "Examples:  $basename 42 alnum"
  echo "           $basename 16"
  echo
  echo "Default:   $basename $def_len $def_ctype"
  echo "See Also:  man 3 ctype"
  echo
  exit 64
}

check_ctype() {
  local ctype="$1"
  [ -n "$ctype" ] || usage
  echo test | tr -d "[:$ctype:]" >/dev/null 2>&1 || usage
}

check_len() {
  local len="$1"
  local tmp="$(echo "$len" | tr -cd "[:digit:]")"
  [ "$len" == "$tmp" ] || usage
  [ "$len" != 0 ]      || usage
}

generate_password() {
  local pw=""
  local len="$1"
  local ctype="$2"

  check_len "$len"
  check_ctype "$ctype"

  while [ "${#pw}" -lt "$len" ]; do
    pw="$pw$( head -c "$len" /dev/urandom | LANG=C tr -cd "[:$ctype:]" )"
  done;
  echo "$pw" | cut -b-"$len"
}


case "$1" in
-h|--help)
  usage
  ;;
*)
  len="${1:-$def_len}"
  ctype="${2:-$def_ctype}"
  generate_password "$len" "$ctype"
esac
 
I always rely on sysutils/pwgen for this. What I like best about this program is that it doesn't generate one, but many passwords. So even if there are 'randomizer issues' you still get a little advantage because it allows you to add some "randomness" by picking one of the many passwords.

Another advantage is that it tries to create passwords which are still relatively strong but also not too hard to remember. Ideal for customers in my opinion (of course it also allows you to create harder, complete random, passwords as well).
 
Carpetsmoker said:
That only includes alphanumeric characters, I'd rather have as many characters as I want.

True. I like to have 95 characters to choose from too. Although the increased entropy per character is <0.5 bits compared to a pool of 62 symbols, and decreases with each character (according to NIST). That is, the difference between a password strength providing ~80 bits of entropy is 1 less character, so you're not gaining much by using all available ASCII characters.
 
That only includes alphanumeric characters, I'd rather have as many characters as I want.
I hope that's not too off topic, but keep in mind that a few more characters in alphanumeric gives you A LOT more than a few more non alpha numeric characters while it also gives you a lot less headaches (because some software simply is crappy and if you go into extremes you could also take different UTF-8 whitespaces, etc.).

Since most passwords can be above twenty characters long and since you use the same same entropy you don't really gain a lot of piratical security and likely moved the weakest link somewhere else already.

If your password field is basically unlimited it even makes sense to use diceware, which mostly uses words from a dictionary, giving you the benefits of both passwords than can easily be remembered and very high quality randomness. Another benefit is that you can calculate the entropy.

In other words: If you are worried about a few more bits here your password length is very likely the actual problem.

Btw. this is meant as a generic comment, not as a disagreement of some sort.
 
I hope that's not too off topic, but keep in mind that a few more characters in alphanumeric gives you A LOT more than a few more non alpha numeric characters while it also gives you a lot less headaches (because some software simply is crappy and if you go into extremes you could also take different UTF-8 whitespaces, etc.).

Since most passwords can be above twenty characters long and since you use the same same entropy you don't really gain a lot of piratical security and likely moved the weakest link somewhere else already.

If your password field is basically unlimited it even makes sense to use diceware, which mostly uses words from a dictionary, giving you the benefits of both passwords than can easily be remembered and very high quality randomness. Another benefit is that you can calculate the entropy.

In other words: If you are worried about a few more bits here your password length is very likely the actual problem.

Btw. this is meant as a generic comment, not as a disagreement of some sort.
Very interesting thoughts, thank you. Diceware... of course.

Now I couldn't answer this myself: will it be more difficult to break a password like "IHndHe034mHjfUYgmddHge" which was created by diceware than the same one created by a script or simply by hand (as I usually create them)? Simply speaking, will the cracker software know how much randomness was used to create it?

I understand, the length itself + the number of possible elements used suggests a number of possible permutations. This number is always behind it. And the randomness... I don't know if typing on keyboard perfectly at random is more or less random than diceware produced combination. And I don't know if the cracker software can know the difference.

And finally, how random is diceware? In real world dice is thrown by human hand. OK, typing on keyboard is the same human hand. And who can measure the randomness of typing at random?
 
Okay, that's multiple questions and I understand that what I wrote might be confusing in terms of entropy.

The important stuff here about entropy is what the attacker knows about you. So in a way you encode entropy. In the end it doesn't matter whether it's bits, numbers, characters, words, etc. Of course it does matter in terms of storing it. But just keep in mind for a moment that you want to have *something* random.

Also keep in mind that usually some kind of hash function (and/or key derivation function) is used to actually store your password. So you will end up with some kind of usually fixed length information anyway (cause md5, sha, bcrypt[1], etc).

So let's assume an attacker knows I am a fan of Diceware and even knows I have a specific dictionary that I used. How would this not be easier than trying every possible key combination? Well, the nice thing about diceware is that you actually know how long someone has to try because you know how many sides a die has and how many time you use it. So you can decide how many times one would have to try at maximum and thereby on average. Always keep that in mind. In theory an attacker might guess right in first place, so get your statistics right there. ;)

So depending on what you mean with "how much randomness" the answer could be: If the attacker knows your pool (digits only, alphanumeric, words, ...) you want to make sure it's evenly spread. So you want to have a very good way of deciding to pick which word/number/character/... you pick. Smashing on your keyboard looks a lot more random than it is and also randomly picking things from the top of the head actually is less random than one thinks most of the time. After all our strength, and why we still beat computers at certain tasks is around pattern recognition, even if we see it a intuition. So in both typing and coming up with own sequences human take decisions based on the previously chosen characters.

Dice are really, really good, cause your body, its own physics and basically the whole universe has some influence on how it will roll, if you do it right. Computers have a harder time. They also try to use a lot, and people love to go a bit into something that at least indirectly is related to the whole world (who connected to your server, how fast does the clock tick, how do you use your mouse and keyboard, etc.), but they also have to be bug free and testing whether something is really random, especially in an automated way is a huge undertaking and really hard[2]. That's why these often stay undiscovered for a while (see Debian).

In the end if you manage to have a source of good randomness, how you encode it doesn't matter:

0 A ! Alpha Batman
1 B # Beta Robin
2 C ? Gamma Superman
3 D - Delta Captain America
4 E _ Epsilon Spiderman
5 F Ä Omega Tank Girl
...


These are all equal if you know the list.

But it's good to assume that the attacker knows how you came up with your password, because then it doesn't matter if the attacker does find out.

What words *can* be good at though is to still lengthen your password, because even though all of the above is true, length usually means more resources usage. At least it doesn't harm, as long as you are aware of whether the underlying algorithm storing your password doesn't actually truncate it before saving.

So in other words the answer is: Yes, how you gain your entropy does matter of course. Of course, your RNG might come up with the very same thing, it might even come up with English words, and given enough tries you will gain every book and every software ever written.

If you don't do something obviously stupid completely ruining your password, the goal of an attacker is to order what will be tried first, cause it is more likely. Entropy is the thing that makes this ordering harder. Eventually a brute force attack will only succeed, so the goal is to make sure it's not in your or the attacker's (or the universe's) lifetime that the correct one is being tried.

Dice that you know usually are just one of the most easy to verify random number generators. Especially compared to pretty much everything your computer might offer. It just is a tedious thing to do if you wanna create many big key files. ;)

[1] keep in mind that per default bcrypt limits the input though. Some implementations even drop certain character types or encode things into the previous bytes (eg. by XOR ing)
[2] this sums it up: http://dilbert.com/strip/2001-10-25
 
Right, right, I know: the keyboard keys configuration limits the "randomness" in a way, and so does the "configuration" of human palm along with his typing habits...

But yes, I agree with you and understand the dangers here. It's just that common sense suggests that one needs to remain balanced over how much randomness is really needed to protect what important data, and how much time and effort one is ready to devote to this.

It is very often like "oh you know, I really CANNOT remember a password like &*khOlP736ge... in order to protect my bank account (!!!).
 
Back
Top