# How character repetition affects passwords entropy?

user2824371 05/06/2018. 3 answers, 91 views

From this wiki page, I learned that the strength of a password is affected by two main factors, the length (L) and the possible symbols (N), and it's calculated using the equation:

H = L * log2(N)

Now, what about the character repetition? Wikipedia says that repetition should be avoided but how does it affect the equation or the strength of the password?

I thought of some solutions like:

1. Counting repeated characters as 1 character. This way it affects the length (e.g. Password = MIC3333 Length = 4 ). But that's not accurate because the number of the actual possible passwords that a hacker calculates is higher that the calculated entropy.

2. If L=10 with 5 repeated characters, the repetition percentage is 5/10 or 50%. So, the strength of the password will be reduce by 50%. I thinks that's not accurate too.

I would like to know if there's an equation that includes character repetitions.

Thank you so much,

Therac 05/07/2018.

Password entropy is based on the number of possible combinations. For a pattern to reduce entropy, the pattern needs to be known to the attacker.

If the pattern isn't fixed and known, the reduction in complexity will be specific to the attacker's algorithm. It will be different for different attackers. E.g., with a dictionary attack, effective complexity for dictionary words is dictionary size, for words out of the dictionary full brute force complexity.

I don't know whether there is or can be a mathematically optimal algorithm for letter repetitions. The simplest solution I can think of is to treat symbol repetitions, up to a N repetitions, as extra graphemes. In that case, the password's strength will be the number of graphemes it contains.

Against such an algorithm, the added strength of each consecutive symbol from the 2nd to Nth symbols is 0, and equivalent to another grapheme up to 2N. But such an algorithm itself will be slower against random passwords due to a larger effective character set. For instance, checking for 2-long repetitions for every symbol will drop its speed by a factor of 2^(length-1). But against a password that is all double symbols, its speed will improve to only a square root of above complexity.

If it's imperative that brute force efficiency is not compromised, a free tweak is always starting to pick the last grapheme as being equal to the previous one. In that case the added strength from the first (for variable length) or all repeat symbols at the end (for fixed length) is 0. Doing so in the middle isn't a free tweak anymore. Neither is trying for extra repeats in a variable length password, although it's so cheap as to be nearly free.

In short, against an algorithm optimized to pick passwords with repeated characters, a password's strength can be estimated by dropping all consecutive digits past the first to get effective length L'. Theoretical strength would be approximately (charset*N)^L', where N is the maximum number of repetitions the attacker is testing for.

Against an algorithm optimized for brute-force efficiency, only the consecutive digits at the end should be dropped. Theoretical strength with a naive algorithm would be charset^(L'-1)*(charset+N). Any practical algorithm will be testing for a lot of suffixes already, though (passwords often end in "1" or "1!" to bypass complexity rules).

It's all a matter of what algorithm the attacker uses. Dictionaries, including leaked password lists, will generally be tried first.

Bob Brown 05/06/2018.

A friend whose doctorate is in statistics likes to say, "Often the sole significance of a statistical improbability is that the improbable has happened."

The formula you give holds only if each character is selected randomly, which implies selection independent of the other characters. So, it is improbable, but not impossible that a completely random password might be 3333333333.

With that said, attackers use heuristics. One such might be to look for repeats and, finding such, test that the next character is another repeat before trying more difficult combinations. So, I'd be tempted to reject password suggestions with three or more repeated characters.

Serge Ballesta 05/07/2018.

H = L * log(N) is the mathematical entropy of the set of the possible passwords of L characters randomly chosen among a set of size N. It is just the log of that number: H = log(NL).

• disallow repetition of same symbol
• require presence of characters from disjoin subsets

you reduce the number of possible patterns and reduce the strength (entropy). If an attacker knows that a password cannot contain repetitions of a character, he can optimize his algorithm with that. But in fact H measures the strength of the password against brute force attacks, where attacker assumes than any combination has same probability and consistently browse the whole possible set.

Such restrictions are anyway commons, because most users do not use true random for choosing a password, and some damn simple patterns (00000000 or 12345678 for eight num characters) have a much higher probability of being chosen than others(*). So those restrictions try to avoid those simple passwords that could be attacked by dictionary or more exactly heuristics attacks.

That's where we enter the psychologic game. If I assume that the attacker will use heuristics to first test specific patterns, I should disallow them. Even if I know that this lowers the possible passwords number, hence the theorical strenght of the password.

TL/DR: The more constraints you add to the password, the more resistant it will be to short time attacks using heuristics, the more vulnerable it will be to brute force attacks

As it is hard to give precise probabilities for human choosed passwords, it is hard to precisely determine the actual entropy. Above formule is only valid when every combination has exactly same probability. But as soon as probabilities vary in major way, the theorical entropy formula will give a much lower value. Extract from wikipedia:

the entropy Η of a discrete random variable X with possible values {x1, ..., xn} and probability mass function P(X) is:

H ( X ) = E [ I ( X ) ] = E [ − ln ⁡ ( P ( X ) ) ]

where E represent the expectation of a variable