I am creating a small utility for generating passwords based on the diceware method. At the moment I am very close to the algorithm of real diceware - i.e. I simulate rolling dice
n-times to get a single word from the list.
I am wondering though if this is really necessary. Wouldn't it be sufficient to just get a random number between
length(diceware_list) -1 to obtain a single word?
Would such a simplified approach affect security of generated password? (I am using cryptographically secure source of random numbers so this is not a concern).
I think that the simplified method should be ok, but I am unsure. Could anyone help with that, please?
The reason Diceware advocates using dice to select a password is that it ensures the password the user gets is generated randomly. So no, as long as you are certain your program is selecting the password in an unpredictable (cryptographically secure random selection with a uniform distribution) manner, it doesn't matter how the password is actually generated.
There are other reasons for using dice that are impossible to replicate in any program:
Unless your users are particularly paranoid though, there's a good chance they'll be okay with forgoing these benefits in favor of the added convenience of generating a new password instantly.
Yes, this is perfectly fine to do. With a good PRNG, each element will have exactly the same odds of being chosen, if you do the limitation (1 to n) correctly. (I have personally done a diceware implementation that does exactly that.)
There are two simple reasons multiple dice is used in a physical diceware process:
Please note that I said "if you do the limitation (1 to n) correctly." If your random number generation library does not offer a primitive for 1 to n, do not simply take a larger value and modulo n! While this will give answers in the correct range, they will not be uniformly distributed. You should either:
rand_val * n / rand_max(which will require arbitrary precision math to avoid rounding error)
Ajedi32's answer is a great one. I wanted to emphasize one detail which may help answer your question better. The key to secure password generation is to ensure that your password is unpredictable. Not "random." The idea of random comes later. The goal is unpredictable. If you think about it, the most secure password in existence is not a random one, it's whatever the last password the attacker would guess is. Its the one they couldn't predict.
Now, in practice, there's a cat and mouse game going on here. If you try to come up with the most unpredictable password, they'll try to predict how you generate passwords. This is why passwords like qazxswedc appear random, but once they figure out what you were doing to generate the password, they'll break it easy.
This is where randomness finally comes in. For modern cryptography, we strive to use numbers that are not just unpredictable to the attacker, but unpredictable to anyone, including yourself! Random numbers are numbers that literally cannot be predicted in any way. You can only know what numbers got chosen if you were watching when the numbers were generated. Randomness means you can make mathematical statements about how hard it is to predict a password, because not even you, yourself, were in control of the generation.
For thousands of years, dice have been a "standard" source of random numbers. There are plenty of others (I Ching divination, for instance, traditionally used a bundle of yarrow sticks), but dice have persisted for a long time. If cast properly (no helicoptering!), they are sufficiently good random number sources because the bouncing of the die is highly chaotic and unpredictable. You would need telekenesis to affect the results (think Star Wars: The Phantom Menace).
If you are worried about loaded dice, as mentioned by some in the comments, you can actually do statistical analysis to determine how many bits of entropy per roll you can actually rely on. Password generation is less sensitive to loaded dice than casinos are (you can always make extra rolls, multiplying the number of possibilities. Because casinos dole out cash, they don't get to multiply each round, they have to add. This makes them more sensitive).
So now we can get closer to your question, because we understand what you are trying to accomplish. Simulating the dice actually doesn't work as well as you might think, because the simulation will do the same thing every time unless you have a source of randomness built into the sim, which kind of defeats the point.
The question is: what random source can you trust? This is where you have to understand your threat model. What can your attacker do? Can your attacker read your keystrokes? If so, you're in trouble because you're going to have to type the password in. So, practically speaking, we can assume that the attacker does not have enough control over your machine to be able to read the keystrokes.
You can keep working from there. My guess is that your threat model assumes that your computer is pristine, in which case you can rely on time-tested sources of entropy, like
/dev/random. In this case, the best answer is to draw from that source and use it directly (no dice simulation required).
If you are more concerned with security, you can look at other aspects. Some algorithms have trouble with side channel attacks that let an attacker peer into their state in some specialized circumstances, so you might harden your algorithm against those. You might decide that you don't trust the normal sources of entropy on a computer (such as time between keystrokes and network activity), so you might invest in a hardware source of randomness (often built around noise in a resistor network).
But no matter where you go with it, remember that the key is to be unpredictable, and the standard way to make sure your opponent can't predict you is to be so random that you can't predict yourself.
A random number generator is considered 'pseudo random', it is not truly random as it is calculated mathematically. Dice are considered truly random, though I suppose you could argue that physics is no different...
By simulating dice rolling programmatically, you are removing the true randomness that is required by diceware. Thus your confusion between simulated dice and selecting a random word, both are pseudo random.