Skip to main content

Published Article in 2600 Magazine: We Will Rock You


We Will Rock You
gerbilByte

Hello peeps!
It’s me again, you friendly neighbourhood gerbil.
You may remember be from articles such as “Take Your Work Home After Work” which appeared in the Winter 2014 issue of 2600 Magazine, and “My Voice Is My Key
which appeared in the Autumn 2015 issue of the awesome 2600 magazine. If you haven’t read them, buy the back copies and read them NOW! :)

I haven’t written in a long long time because I have been so so busy, so thought I’d say hi by submitting a little snippet of something very useful.

Let’s talk about wordlists. What is a wordlist?

Well, a wordlist, as it says on a tin, is a file which is made up of a shit-load of words.

The Kali operating system has a few wordlists which can be found at /usr/share/wordlists.
Now, here is a massive file called rockyou.txt. It’s HUGE!!!
This is a bit of a default file for people to use as it contains absouletly millions of words! Let’s have a look:

root@kali:/usr/share/wordlists# wc -l rockyou.txt
14344392 rockyou.txt

Here we can see that there are 14344392 lines in the rockyou file. But does this value reflect words? Well, a word is a word. But is each line in “rockyou” a single word? Let’s run a quick command to have a look if any of these line contain a space, ie, all “phrases” or “sentences”:

root@kali:/usr/share/wordlists# grep ' ' rockyou.txt | head
rock you
i love you
te amo
fuck you
te iubesc
love you
i love u
chris brown
rock on
john cena

John Cena?!?! Ha! We see that the top 10 lines are not single words! So how many of these lines are phrases? Let’s run another command:

root@kali:/usr/share/wordlists# grep -c ' ' rockyou.txt
70619

Wow! Now if I wanted to run a wordlist testing for single words, these would be a waste of time as they are not single words. Ok, the password cracking tool may strip these out, but that too would be extra unnecessary work. You may argue that “they are phrases, keep them in.” Nah! For our phrase to fit their phrase, this would more or less be impossible using only 70619 phrases. And anyway, we are interested in a word list rather than a phrase list.

Before I go further, the rockyou.txt file contains LOADS of crap:

root@kali:/usr/share/wordlists# awk 'BEGIN{len=0;}{if(length($0)>len){len=length($0);printf("%i : %s\n",len,$0);}}' rockyou.txt
6 : 123456
9 : 123456789
10 : 1234567890
11 : christopher
13 : tequieromucho
16 : manchesterunited
17 : mychemicalromance
18 : 123456789123456789
39 : Lets you update your FunNotes and more!
40 : 1111111111111111111111111111111111111111
42 : RockYou account is required for Voicemail.
49 : /* {--friendster-layouts.com css code start--} */
awk: cmd. line:1: (FILENAME=rockyou.txt FNR=602044) warning: Invalid multibyte data detected. There may be a mismatch between your data and your locale.
59 : http://www.rockyou.com/fxtext/fxtext-create.php?partner=hi5
77 : vabfdvfdlvhjibfedblsfndilvbgilebvgdlsbgvhbesghklhyubvuwklfbrebgfyurerebgyureb
165 : lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
222 : <table style="border-collapse:collapse;"><tr><td colspan="2"><embed src="http://apps.rockyou.com/photofx.swf" quality="high" scale="noscale" salign="lt" width="325" height="260" wmode="transparent" flashvars="imgpath=http%
255 : <object width="206" height="224"><param name="movie" value="http://www.vivelatino.com.mx/contador.swf"></param><param name="wmode" value="transparent"></param><embed src="http://www.vivelatino.com.mx/contador.swf" type="application/x-shockwave-flash" wmod
257 : <style type=\\'text/css\\'>body{ background: url(http://recursos.fotocajon.com/enchulatupagina/img003/zxddXgCBLcTi.jpg) white center no-repeat fixed; } table, .heading_profile, .heading_profile_left, table td, #p_container, #p_nav_primary, #top_header, #p_n
262 : <style type=\\'text/css\\'>.bg_content{background-image:url(http://img360.imageshack.us/img360/5198/escanear00532wq9.jpg);}.bg_content{background-repeat:repeat;}</STYLE><a href=\\'http://hi5.enchulatupagina.com\\' target=\\'_top\\'><img src=\\'http://hi5.enchula
266 : <div id=\\'24813\\'><a href=\\'http://www.revistate.com\\'><img src=\\'http://www.revistate.com/uploads/20080218/rq/rqwpcf28o1pyb10yfzen53kmuipsi0_PAPARAZZI.jpg\\' border=0 alt=\\'Hazte famoso en www.revistate.com\\'></a></div><div id=\\'72891\\'><a href=\\'http://w
285 : <div align=\\\\\\'center\\\\\\' style=\\\\\\'font:bold 11px Verdana; width:310px\\\\\\'><a style=\\\\\\'background-color:#eeeeee;display:block;width:310px;border:solid 2px black; padding:5px\\\\\\' href=\\\\\\'http://www.musik-live.net\\\\\\' target=\\\\\\'_blank\\\\\\'>Playing/Tangga

What I have done here is print lines that are bigger than the last recorded line. Just by looking at this output we see that lines that have a character count greater than 18 is infact crap. They’re not even phrases! They are bits of websites – html! Definitely not useful in searching for passwords!

So we can strip these out. Anything with a space – get rid of it.
And while we’re at it, let’s remove emails and websites. Think about it, you are cracking a password has on BumbleBee Security’s webapp. Is some random person’s email address or a website address going to be a password? Unless you are REALLY lucky, no, no it isn’t! Not whatsoever!

Out of interest, how many lines contain emails and websites?

root@kali:/usr/share/wordlists# egrep -c '[a-zA-Z0-9_\-\.]+@[a-zA-Z0-9_\-\.]+\.[a-zA-Z]{2,5}' rockyou.txt
27342
root@kali:/usr/share/wordlists# grep -c http[s]*:// rockyou.txt
866

Wow! Quite a lot! Lets remove them too.

In conclusion, the rockyou.txt wordlist contains a load of crap that can be removed. And other wordlists may contain crap such as blocks of “header texts” etc. Due to this I wrote a simple script that can be found at the end of this article, feel free to use it and send me kudos.

Many thanks for reading.

Gerbil. [twitter: @gerbilByte]

wordlistcleanser.sh:

#!/bin/bash
#
# wordlistcleanser. gerbil 2018 [twitter: @gerbilByte]
#
# This file is used to clean rockyou.txt from all the crap to leave just single words.
# It will also cleanse other wordlists too.
#
# Usage:
# wordlistcleanser.sh infile [outfile]
#
# WARNING: If an output file isn't specified, then the input will be overwritten (permissions allowing).
#
# Example:
# ./wordlistcleanser.sh /usr/share/wordlists/rockyou.txt ./wewillrockyou.txt

infile=$1
outfile=$2
version="1.0"
author="gerbil"

if [ $# -lt 1 ];
then
printf "\nwordlistcleanser v%s - %s 2018\n\nThis is a simple script that will remove \'phrases\', emails and websites from wordlist files.\nEmails and websites will be stored as files under the current directory.\n\n" ${version} ${author}
printf "Usage:\n\t%s infile.txt [outfile.txt]\n\nWARNING: If an output file isn't specified, then the input will be overwritten (permissions allowing).\n\nExample:\n\t./wordlistcleanser.sh ./rockyou.txt ./wewillrockyou.txt\n\nHave fun! :)\n-%s\n" $0 ${author}
exit
fi

baseinfile=`basename ${infile}`
baseinfile=${baseinfile%.*}
printf "Cleaning %s...\n" ${infile};

#Check input file exists...
if ! [ -a ${infile} ];
then #input file doesn't exist.
printf " %s doesn't exist!\n" ${infile}
exit
fi

#Check if input file is to be overwritten or not...
if [ ${outfile}X == X ];
then #no output file specified, therefore destruct mode! ;P
outfile=${infile}
printf " No output file specified, therefore output will be stored at %s\n" ${outfile}
# rm -f ${infile} # just to save space
else
printf " Output file : ${outfile}\n"
fi

#Removing phrases...
printf "Removing phrases...\n"
grep -v ' ' ${infile} > /tmp/ry1.txt

#Extracting then removing websites...
printf "Extracting then removing websites...\n"
grep http[s]*:// /tmp/ry1.txt > ./${baseinfile}_websites.txt
grep -v http[s]*:// /tmp/ry1.txt > /tmp/ry2.txt
rm -f /tmp/ry1.txt # just to save space

#Extracting then removing emails...
printf "Extracting then removing emails...\n"
egrep '[a-zA-Z0-9_\-\.]+@[a-zA-Z0-9_\-\.]+\.[a-zA-Z]{2,5}' /tmp/ry2.txt > ./${baseinfile}_emails.txt
egrep -v '[a-zA-Z0-9_\-\.]+@[a-zA-Z0-9_\-\.]+\.[a-zA-Z]{2,5}' /tmp/ry2.txt > ${outfile}
rm -f /tmp/ry2.txt # just to save space

#Get stats on leftover file (length of each word and count of each, I know there are no words longer than 1000 characters)...
printf "Getting stats on %s, extracted emails and extracted websites...\n" ${outfile}
printf "Emails extracted: `wc -l ./${baseinfile}_emails.txt`\n" > ./${outfile%.*}_stats.txt
printf "Websites extracted: `wc -l ./${baseinfile}_websites.txt`\n" >> ./${outfile%.*}_stats.txt
printf "\nStats on %s : \n\n" ${outfile} >> ./${outfile%.*}_stats.txt
awk 'BEGIN{charcounts[1000]=0;len=0;printf("word length : count\n------------:------\n");}{charcounts[length($0)]++;}END{for(i=0;i<=1000;i++){printf("%11i : %i\n",i,charcounts[i]);}}' ${outfile} | grep -v ': 0'$ >> ./${outfile%.*}_stats.txt

printf "Cleansing completed.\n\n"


File running:

root@kali:~# ./wordlistcleanser.sh /usr/share/wordlists/rockyou.txt ./wewillrockyou.txt
Cleaning /usr/share/wordlists/rockyou.txt...
Output file : ./wewillrockyou.txt
Removing phrases...
Extracting then removing websites...
Extracting then removing emails...
Getting stats on ./wewillrockyou.txt, extracted emails and extracted websites...
Cleansing completed.
root@kali:~# wc -l /usr/share/wordlists/rockyou.txt ./wewillrockyou.txt
14344392 /usr/share/wordlists/rockyou.txt
14245981 ./wewillrockyou.txt
28590373 total
root@kali:~# expr 14344392 - 14245981
98411

Comments

Popular posts from this blog

Dissecting WannaCry

Below is  brief overview of the inner workings of WannaCry. It is by no means a complete indepth account of what it does, but the inquisitive will learn a little bit without touching any code debuggers. Enjoy the read! gerbil (follow me on Twitter: @gerbil ) Dissecting WannaCry Hi guys. Before I continue to bore you to death, just a few points: Firstly, before you read this page thinking you're going to unlock the mysteries of the world or even find the arc of the covenant, that isn't going to happen. This page is basically a reformatted version of a text dump, i.e. a few of my notes that I took when I examined WannaCry. And I'm not prepared to write an indepth, detailed account with them notes. So, that means it contains holes, either because I've missed it, didn't think it relevant (at the time), or because I was too lazy to include it, which is probably the main reason. I am only human after all! Cynics will probably read this document and

Published Article in 2600 Magazine: Take Your Work Home After Work

Below is one of the first articles that I had published. It appeared in the Winter 2014 issue of 2600 Magazine, an awesome magazine that publishes awesome things. The idea behind the article was to provide an insight into mixing encrypted data into a normal .jpg image and pushing it through a firewall. Enjoy the read! gerbil (follow me on Twitter: @gerbil ) Taking Your Work Home After Work. GerbilByte, 2014 So there I was. I was drafted in to work for a small company (who shall remain nameless, but for this article we will call the company Bumble Bee Internet Security Services) for several months. At the end, as well as a juicy pay-check, I realised that I had written a load of little scripts that I wanted to keep. I zipped up my folder of goodies to email to myself and encrypted it for obvious reasons then attached it to an internal email to send it. DENIED! Bumble Bee Internet Security Services (BBISS from now on) was a company whose email sys

Gerbtris : Coding Tetris in Bash

Coding Tetris in Bash Hi peeps. So you've come here because you've shown some interest in coding Tetris in bash. Goodness knows why, but we'll get straight on it. Firstly though, let me just say that this is MY implementation of the game. I'm aware that the implementations and methods used could probably be enhanced or improved, but they were used as they were the first solution I concocted for the puzzle at hand, and I had a limit of about fie hours (two motorway journeys) to get this coded from start to finish. Lets get into it. To break down what I needed for the very basic model (which ended up roughly 300 lines) I needed to write functions for the following: shape painter - a routine is needed to paint the shape at any point on the screen shape rotation - a routine is needed to rotate the shape shape collision - the shapes have to be "stackable" and not cross over any other shapes or the walls of the playing field shape mover - the user has