Cracking passwords – what is worth knowing about statistics?

web dev

When a hacker breaks into a system and wants to gain access to encrypted passwords, he must first crack the code by which the password is stored in the database. Many hackers try to launch any attack that can be effective for this purpose. This article will demonstrate several effective password cracking methods and show how statistical password analysis can be used in conjunction with a variety of efficient and successful cracking tools.

Cracking passwords – what is worth knowing about statistics?

Cracking passwords – why is it so important?

Cracking passwords is becoming less and less popular . Users are forced to create increasingly complex expressions, and some developers are starting to use mechanisms like Bcrypt instead of standard encoding functions. The encoding of Bcrypt takes much longer and fast to crack passwords efficiently, which is why Bcrypt is an extremely powerful tool in fighting these attacks. To illustrate this example, a password generator was developed in 2012 that could generate 350 billion NTLM hashes per second. Bcrypt’s generation speed is 71,000 hashes per second.

Comparing both models, it can be seen that for each single hash generated by Bcrypt, 5 million NTML hashes can be generated. Therefore, when dealing with Bcrypt’s algorithms, hackers must have much more computing power and cannot rely on Brute Force in all situations.


Time efficiency is the key to making password cracking effective . It is possible to check each combination for a password, but the time needed to do so is often unrealistic. Therefore, for cracking passwords, methods are first selected that allow for quick and effective hacking, and only later, in case of failure, are slower ones (checking a larger number of combinations). The fastest method is a simple dictionary attack based on a list of popular or previously hacked passwords. The dictionary expressions are then modified by adding numbers and symbols to the end of the word or by changing the order of the letters.

This approach is called a hybrid attack or a rule-based attack . The next step is to use a machine that generates the most likely passwords. A perfect example is the Markov method . By combining common letter combinations in English (e.g. ing, er, gu), it is possible to generate highly probable password possibilities, such as for example PASWORD1234 , the constructs shown below. Although the password would be easy to hack with any of the methods above, here we will use it to demonstrate the mask method .   

Pa + ss + word + 1234

Mask attack, i.e. a modified Brutus  Force method checks all password combinations taking into account its structure. By “structure” I mean the type of characters and the order in which they form a password.

For example, PASSWORD1234 has a structure consisting of an uppercase letter, seven lowercase letters, and four numbers (which can be written as ul1111111dddd)

Lower case (l) lower case,

Capital letter (u) upper case,

Symbol (s) symbol,

Digit (d) digit

So if a hacker chooses to generate all the combinations using these frameworks, they will

eventually come across the password Password1234. So the hacker needs to ask himself: What password structures should I use first ?

Statistical analysis

To help him answer the above question, I did some analysis on popular keywords looking for structures that occur more frequently than others. The analyzed sample consists of over 34 million publicly disclosed passwords, originating

from sites such as RockYou, LinkedIn, phpBB and others.

Number of analyzed passwords: 34,659,199

Rockyou 14,344,391 eHarmony 1,513,935

10-million-combos 10,000,000 phpBB 184.389

LinkedIn 8,616,484

The diagram below shows the relationship between the type of mask and its frequency. The border of 50% of the sample population is marked in red.

This means that 13 of the most commonly used masks were used to create over 50% of passwords. More than 20 million passwords from the sample use one of 13 structures. These results are quite shocking when it comes to the universality of the password structure. The remaining 50% of passwords are located in the long chart on the right, which has been cut off in the picture above.

In fact, of the 260,500 masks, only 400 are shown in the chart. The idea of ​​creating a password based on a well-known structure seems unlikely. However, this turns out to be quite normal considering the way users create and remember passwords. According to the analyzed data, there are logical factors that explain why this is possible.

When a user is prompted for a password with an uppercase letter, 90% of the time it will be the first letter of the expression. If you ask a user to use numbers in their password, they will usually use two numbers (probably the year of birth). Another common choice is to use four digits (the current or previous calendar year) at the end of a word. Another common password ending is one digit followed by 3 digits in sequence. Habits like these allow hackers to predict what password structure we are using.

Statistical hybrid – even faster

Now that we know what the most common structures are, we can assume that the user is likely to use potato rather than zdtuiknsh (random letters), despite the fact that they both start with the letter “z” and have 8 letters each.

Therefore, we assume that in a password based on a mask, we encounter a word more often than random letters. For hackers, this assumption is very helpful as it allows them to eliminate a huge amount of unnecessary combinations. This method of hacking is called a hybrid attack that uses the statistics of universal password structures.

Productivity and time

As a hacker, by breaking even a very large number of passwords, you often don’t have time to hack them all. However, cracking some of them may help you gain access to the system and result in new opportunities. Therefore, cracking passwords sometimes requires a certain amount of time that we can spend on hacking.

Using the structures that were found in the previous analysis, the scammer can determine that he wants to search the 10 most popular structures depending on the complexity of the password sorted after the shortest hacking time. As a result, a hacker can determine the time frame he or she wants to devote to breaking passwords, e.g. exactly 60 minutes. The results of password cracking using the CPU are presented below.

In this case, the fastest proven structure was U (W3) dddd , which is defined as an uppercase letter, 3 lowercase letters (“W” for word), and 4 digits. There were 69 passwords in the set that match the given structure, and it only took my computer a minute to check all the combinations. I stopped hacking at 62 minutes and found I hacked 221 hashes that matched 491 accounts, which resulted in 11% of the hacked passwords. The reason for the large difference between the number of hacked passwords and the number of hacked accounts is a large number of the same passwords (e.g. in corporate offices, where default passwords are often not changed). If a hacker determines a common password in some environment, then all users with the same password will be hacked.

Although a hybrid attack or a policy-based attack like best64 could hack some

of these passwords faster, attacking with a Mask provides more combinations to be checked. If the faster ways fail, the above approach will be the next step in successfully cracking your password. It should also be remembered that, for example, this hacking was performed on an average quality processor and if we ran the process on a powerful GPU card, the time could be reduced to a few seconds. Therefore, time is not as important as thoughtful performance.


Statistical analysis helps us attack commonly used universal structures. However, some tools can help us with specific applications. Such a CeWL tool can download words from the website and create a list or dictionary adapted to the industry vocabulary. Additionally, if we already know that common passwords are popular in the work environment, as the basis for other passwords that may be similar. For example, if we noticed that “ArmeCorp1234” was used by one of your employees, we might add “AcmeCorp” to the search term list to find another term, such as “AcmeCorp @ 2015”.

The idea of ​​using previously hacked passwords such as “AcmeCorp” and reordering some adjacent characters is extremely effective when hacking passwords belonging to large corporations. The analysis shows that thanks to this method, it is possible to hack an additional 20% of previously not found passwords.

Reducing the dictionary with unnecessary words so that all hacked passwords meet the requirements set by our application is very important, because we do not waste time guessing passwords that cannot be found due to length or layout of characters.


Cracking passwords starting with the fastest possible attack, based on the fewest possible combinations (standard dictionary attack) and ending with the slowest, taking into account all possible combinations (pure Brute Force) is the most ideal way of hacking, assuming a limited attack time.

Hackers should develop a standard methodology based on this assumption. If the number of accounts the hacker wants to hack is already reached in the first steps, it may not make sense to proceed with further methods of attack. However, in many cases, a dictionary attack may prove insufficient – or the attacker wants to gain access to even more accounts – and then so-called hybrid attacks and targeted Brute Force that can use statistical analysis.

So, establishing a methodology is a critical requirement if we want to be successful, whether we use the above-mentioned methods or automated technologies. The latest tools like PRINCE can help you hack passwords. The development of tools such as PRINCE in applying the methodology is important, but understanding the functionality of the tool itself, not just blindly relying on it, will make the password cracking process much more efficient.

Cracking passwords – how to defend yourself?

By analyzing compromised passwords, it is possible to identify the most popular structures used in their creation. Therefore, developers should create systems that will stop users from creating passwords with standard structures, so that the shape of the graph (as mentioned earlier) changes to a flatter one.

The downside to this idea is that without easier structure, users may have difficulty remembering their passwords. In this situation, I would therefore recommend using password managers that require two-factor authentication. These applications will generate and store all your passwords. The expressions by them are based on random structures and can be as long as the given application allows. As I mentioned earlier, passwords encrypted by Bcrypt are another great way to slow down hackers.

Additionally, a good solution would be to establish rules in companies that would require employees to learn about the dangers of sharing and reusing the same passwords. Even if they were not fully followed, they would still be a fairly effective security measure.


Cracking passwords can seem like a chaotic idea. As it gets more and more difficult, there is a need for targeted, performance-driven and personalized hacking methodologies. Investing money to constantly improve the number of characters checked per minute is not profitable.

Therefore, I am beginning to apply methodology and processes based on statistical analysis that can make password guessing easier. Developers can create tools to fight hackers, while users can use password managers to reduce the effectiveness of hacker attacks. However, such solutions are still not very popular. Statistical password attacks are currently effective in terms of the number of hacked accounts and time efficiency.

Think about your password and ask yourself how quickly hackers are able to crack your password based on its structure and what factors in your business might cause your password to be guessed.

Leave a Reply

Your email address will not be published.

Previous Post

Factors to Consider When Choosing a Cyber Security Service

Next Post
Warranty Reimbursement Rate

How to Calculate the Warranty Reimbursement Rate

Related Posts