We Will Rock You
gerbilByte
Hello
peeps!
It’s
me again, you friendly neighbourhood gerbil.
You
may remember be from articles such as “Take
Your Work Home After Work” which appeared
in the Winter 2014 issue of 2600 Magazine, and “My
Voice Is My Key”
which
appeared in the Autumn 2015 issue of the awesome 2600 magazine. If
you haven’t read them, buy the back copies and read them NOW! :)
I
haven’t written in a long long time because I have been so so busy,
so thought I’d say hi by submitting a little snippet of something
very useful.
Let’s
talk about wordlists. What is a wordlist?
Well,
a wordlist, as it says on a tin, is a file which is made up of a
shit-load of words.
The
Kali operating system has a few wordlists which can be found at
/usr/share/wordlists.
Now,
here is a massive file called rockyou.txt. It’s HUGE!!!
This
is a bit of a default file for people to use as it contains
absouletly millions of words! Let’s have a look:
root@kali:/usr/share/wordlists#
wc -l rockyou.txt
14344392
rockyou.txt
Here
we can see that there are 14344392 lines in the rockyou file. But
does this value reflect words? Well, a word is a word. But is each
line in “rockyou” a single word? Let’s run a quick command to
have a look if any of these line contain a space, ie, all “phrases”
or “sentences”:
root@kali:/usr/share/wordlists#
grep ' ' rockyou.txt | head
rock
you
i
love you
te
amo
fuck
you
te
iubesc
love
you
i
love u
chris
brown
rock
on
john
cena
John
Cena?!?! Ha! We see that the top 10 lines are not single words! So
how many of these lines are phrases? Let’s run another command:
root@kali:/usr/share/wordlists#
grep -c ' ' rockyou.txt
70619
Wow!
Now if I wanted to run a wordlist testing for single words, these
would be a waste of time as they are not single words. Ok, the
password cracking tool may strip these out, but that too would be
extra unnecessary work. You may argue that “they are phrases, keep
them in.” Nah! For our phrase to fit their phrase, this would more
or less be impossible using only 70619 phrases. And anyway, we are
interested in a word list rather than a phrase list.
Before
I go further, the rockyou.txt file contains LOADS of crap:
root@kali:/usr/share/wordlists#
awk 'BEGIN{len=0;}{if(length($0)>len){len=length($0);printf("%i
: %s\n",len,$0);}}' rockyou.txt
6
: 123456
9
: 123456789
10
: 1234567890
11
: christopher
13
: tequieromucho
16
: manchesterunited
17
: mychemicalromance
18
: 123456789123456789
39
: Lets you update your FunNotes and more!
40
: 1111111111111111111111111111111111111111
42
: RockYou account is required for Voicemail.
49
: /* {--friendster-layouts.com css code start--} */
awk:
cmd. line:1: (FILENAME=rockyou.txt FNR=602044) warning: Invalid
multibyte data detected. There may be a mismatch between your data
and your locale.
59
: http://www.rockyou.com/fxtext/fxtext-create.php?partner=hi5
77
:
vabfdvfdlvhjibfedblsfndilvbgilebvgdlsbgvhbesghklhyubvuwklfbrebgfyurerebgyureb
165
:
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
222
: <table style="border-collapse:collapse;"><tr><td
colspan="2"><embed
src="http://apps.rockyou.com/photofx.swf" quality="high"
scale="noscale" salign="lt" width="325"
height="260" wmode="transparent"
flashvars="imgpath=http%
255
: <object width="206" height="224"><param
name="movie"
value="http://www.vivelatino.com.mx/contador.swf"></param><param
name="wmode" value="transparent"></param><embed
src="http://www.vivelatino.com.mx/contador.swf"
type="application/x-shockwave-flash" wmod
257
: <style type=\\'text/css\\'>body{ background:
url(http://recursos.fotocajon.com/enchulatupagina/img003/zxddXgCBLcTi.jpg)
white center no-repeat fixed; } table, .heading_profile,
.heading_profile_left, table td, #p_container, #p_nav_primary,
#top_header, #p_n
262
: <style
type=\\'text/css\\'>.bg_content{background-image:url(http://img360.imageshack.us/img360/5198/escanear00532wq9.jpg);}.bg_content{background-repeat:repeat;}</STYLE><a
href=\\'http://hi5.enchulatupagina.com\\' target=\\'_top\\'><img
src=\\'http://hi5.enchula
266
: <div id=\\'24813\\'><a
href=\\'http://www.revistate.com\\'><img
src=\\'http://www.revistate.com/uploads/20080218/rq/rqwpcf28o1pyb10yfzen53kmuipsi0_PAPARAZZI.jpg\\'
border=0 alt=\\'Hazte famoso en www.revistate.com\\'></a></div><div
id=\\'72891\\'><a href=\\'http://w
285
: <div align=\\\\\\'center\\\\\\' style=\\\\\\'font:bold 11px
Verdana; width:310px\\\\\\'><a
style=\\\\\\'background-color:#eeeeee;display:block;width:310px;border:solid
2px black; padding:5px\\\\\\'
href=\\\\\\'http://www.musik-live.net\\\\\\'
target=\\\\\\'_blank\\\\\\'>Playing/Tangga
What
I have done here is print lines that are bigger than the last
recorded line. Just by looking at this output we see that lines that
have a character count greater than 18 is infact crap. They’re not
even phrases! They are bits of websites – html! Definitely not
useful in searching for passwords!
So
we can strip these out. Anything with a space – get rid of it.
And
while we’re at it, let’s remove emails and websites. Think about
it, you are cracking a password has on BumbleBee Security’s webapp.
Is some random person’s email address or a website address going to
be a password? Unless you are REALLY lucky, no, no it isn’t! Not
whatsoever!
Out
of interest, how many lines contain emails and websites?
root@kali:/usr/share/wordlists#
egrep -c '[a-zA-Z0-9_\-\.]+@[a-zA-Z0-9_\-\.]+\.[a-zA-Z]{2,5}'
rockyou.txt
27342
root@kali:/usr/share/wordlists#
grep -c http[s]*:// rockyou.txt
866
Wow!
Quite a lot! Lets remove them too.
In
conclusion, the rockyou.txt wordlist contains a load of crap that can
be removed. And other wordlists may contain crap such as blocks of
“header texts” etc. Due to this I wrote a simple script that can
be found at the end of this article, feel free to use it and send me
kudos.
Many
thanks for reading.
Gerbil.
[twitter: @gerbilByte]
wordlistcleanser.sh:
#!/bin/bash
#
# wordlistcleanser.
gerbil 2018 [twitter: @gerbilByte]
#
# This
file is used to clean rockyou.txt from all the crap to leave just
single words.
#
It will also cleanse other wordlists too.
#
# Usage:
# wordlistcleanser.sh
infile [outfile]
#
#
WARNING: If an output file isn't specified, then the input will be
overwritten (permissions allowing).
#
#
Example:
#
./wordlistcleanser.sh /usr/share/wordlists/rockyou.txt
./wewillrockyou.txt
infile=$1
outfile=$2
version="1.0"
author="gerbil"
if
[ $# -lt 1 ];
then
printf
"\nwordlistcleanser v%s - %s 2018\n\nThis is a simple script
that will remove \'phrases\', emails and websites from wordlist
files.\nEmails and websites will be stored as files under the current
directory.\n\n" ${version} ${author}
printf
"Usage:\n\t%s infile.txt [outfile.txt]\n\nWARNING: If an output
file isn't specified, then the input will be overwritten (permissions
allowing).\n\nExample:\n\t./wordlistcleanser.sh ./rockyou.txt
./wewillrockyou.txt\n\nHave fun! :)\n-%s\n" $0 ${author}
exit
fi
baseinfile=`basename
${infile}`
baseinfile=${baseinfile%.*}
printf
"Cleaning %s...\n" ${infile};
#Check
input file exists...
if
! [ -a ${infile} ];
then
#input file doesn't exist.
printf
" %s doesn't exist!\n" ${infile}
exit
fi
#Check
if input file is to be overwritten or not...
if
[ ${outfile}X == X ];
then
#no output file specified, therefore destruct mode! ;P
outfile=${infile}
printf
" No output file specified, therefore output will be stored at
%s\n" ${outfile}
#
rm -f ${infile} # just to save space
else
printf
" Output file : ${outfile}\n"
fi
#Removing
phrases...
printf
"Removing phrases...\n"
grep
-v ' ' ${infile} > /tmp/ry1.txt
#Extracting
then removing websites...
printf
"Extracting then removing websites...\n"
grep
http[s]*:// /tmp/ry1.txt > ./${baseinfile}_websites.txt
grep
-v http[s]*:// /tmp/ry1.txt > /tmp/ry2.txt
rm
-f /tmp/ry1.txt # just to save space
#Extracting
then removing emails...
printf
"Extracting then removing emails...\n"
egrep
'[a-zA-Z0-9_\-\.]+@[a-zA-Z0-9_\-\.]+\.[a-zA-Z]{2,5}' /tmp/ry2.txt >
./${baseinfile}_emails.txt
egrep
-v '[a-zA-Z0-9_\-\.]+@[a-zA-Z0-9_\-\.]+\.[a-zA-Z]{2,5}' /tmp/ry2.txt
> ${outfile}
rm
-f /tmp/ry2.txt # just to save space
#Get
stats on leftover file (length of each word and count of each, I know
there are no words longer than 1000 characters)...
printf
"Getting stats on %s, extracted emails and extracted
websites...\n" ${outfile}
printf
"Emails extracted: `wc -l ./${baseinfile}_emails.txt`\n" >
./${outfile%.*}_stats.txt
printf
"Websites extracted: `wc -l ./${baseinfile}_websites.txt`\n"
>> ./${outfile%.*}_stats.txt
printf
"\nStats on %s : \n\n" ${outfile} >>
./${outfile%.*}_stats.txt
awk
'BEGIN{charcounts[1000]=0;len=0;printf("word length :
count\n------------:------\n");}{charcounts[length($0)]++;}END{for(i=0;i<=1000;i++){printf("%11i
: %i\n",i,charcounts[i]);}}' ${outfile} | grep -v ': 0'$ >>
./${outfile%.*}_stats.txt
printf
"Cleansing completed.\n\n"
File
running:
root@kali:~#
./wordlistcleanser.sh /usr/share/wordlists/rockyou.txt
./wewillrockyou.txt
Cleaning
/usr/share/wordlists/rockyou.txt...
Output
file : ./wewillrockyou.txt
Removing
phrases...
Extracting
then removing websites...
Extracting
then removing emails...
Getting
stats on ./wewillrockyou.txt, extracted emails and extracted
websites...
Cleansing
completed.
root@kali:~#
wc -l /usr/share/wordlists/rockyou.txt ./wewillrockyou.txt
14344392
/usr/share/wordlists/rockyou.txt
14245981
./wewillrockyou.txt
28590373
total
root@kali:~#
expr 14344392 - 14245981
98411
Comments
Post a Comment