Logical operators and symbols
Google can understand three logical operators: AND, NOT and OR, so
Google recognizes the “OR” as the operator and “Or”, “oR” or “or” as
search elements or keywords.
- The AND operator is used to include more than one keyword in a
single research query and can be replaced by a single space ” ” even if
the results differ slightly between both, as you can see by looking for
example for “reverse AND engineering AND tutorials “and” reverse
engineering tutorials”
- The NOT operator is extremely useful and can be used to eliminate
some keywords from the result of a query, this operator is equivalent to
the sign “-” (less) used within a keyword, to figure out the meaning
try searching for “email service” and “email service -marketing” (please
note that there is no space between “-” and “marketing”)
- The OR operator is used to include in the result of a query a
keyword or another keyword but not both, and is equivalent to the use of
“|” , eg “reverse OR engineering” means to Google exactly
“reverse|engineering” (try it then try “reverse engineering” to see the
difference)
In addition to these operators, Google distinguishes between some symbols like ~, +, *,””
This little character is used to include in the result of a query the
desired keyword, its synonyms and words similar to it, for example, if
you search “it security ~tools” the result will be more consistent the
result of “it security tools”, since Google will consider also terms
such as “Software” and show them among the returned result.
Google tends to ignore punctuations and removes little words like
“we”, “the”, ‘to”, and “of”… Using the sign plus before a word tells
Google to include it in the search query, so this way and for instance,
the result of this query “security is never complete” will definitely
differs from this one “security +is never complete”
- Use of quotation marks “” (or exact phrase search)
If you are sure that you have entered a word as it should be written
but Google continues to suggest spelling corrections, or if you want to
search for a phrase, quote or an error message … putting your query
between quotes marks provides you with a more relevant result, example
try searching “Debugging DLLs” with and without quotes.
- Using the asterisk “*” also called wildcard or Joker
The use of the wildcard helps a lot when you want to search something
but with one or more missing words (generally used with exact phrase
search). For example if you want to find the title of the movie “Get the
Gringo” but you are remembering only “Get The” you can try “Get The *
movie”, try also “the art of *” hacking book”
Now that we know a little more about how the Google search bar
interprets what we type in, let’s see some more interesting operators
and keywords, especially when talking about security!
Define:word
This query returns the definition of the given word from the most reliable sources (websites). Define:Security
Filetype:file_extension
Using
Filetype you can find files with specific
extensions; this means that you restrict your search to a specific file
type. Note that there is no space between
filetype: and the following word; eg. We can search for databases backups using “backup filetype:sql”
Click to Enlarge
Ext :file_extension
Regarding this operator, we can say that it has more or less the same role as the one cited above
(filetype), except that the use of “
ext” to seek uncommon extensions (like dmp, ks, key …) sends a more deep and accurate result.
Intitle:keyword(s)
This keyword allows you to search for a single word or a whole phrase
present in the title of web pages and it is a commonly used keyword /
operator to find directory listings. For example: intitle:index of “Last
modified”
Click to Enlarge
You can also use
allintitle:keyword1 keyword2 keyword3 … to find results with all these different elements / keywords in web page titles.
Inurl :keyword
As
Intitle and
allintitle,
Inurl and
Allinurl
can be used find one or more keywords present in the web pages URLs,
this operator is widely used and can provide a lot of sensitive
information such as in the case of the use of this query
inurl:cgi-bin/etc/
Intext :keyword / Allintext :keyword1 keyword2 keyword3 …
Allintext and intext can search for keywords present in the body of
web pages or documents and can be very helpful to find some interesting
things like: allintext:”Control Panel” “login”
Site:domain
The use of the keyword
site restricts the result to a
particular website; specifying the domain, Google filters the result by
limiting it to the chosen domain or website. Site:com, site:fr ,
site:gov … or you can limit your query to a specific website “reverse
engineering site:infosecinstitute.com”
Cache :www.site.com
Once a website is indexed by Google, there are a lot of chances that
it is kept in the Google cache, so we can get some old information even
after website’s updates or in some cases even if the website is not
available anymore:
Info :www .site.com
This query returns links to pages containing information about the
website or web page in question. For example info:infosecinstitute.com
Google is not only good at finding stuff, it can even do math!
Until now, there’s nothing bad, but we will see that by combining
different operator’s together, different keywords and knowing exactly
what we want to find … the results usually exceed our expectations and
especially when we are looking for vulnerabilities or some “private”
data. This is conventionally called Google Hacking.
A according to the Wikipedia definition, Google hacking involves
using advanced operators in the Google search engine to locate specific
strings of text within search results. Some of the more popular examples
are finding specific versions of vulnerable web applications. The
following search query would locate all web pages that have that
particular text contained within them. It is normal for default
installations of applications to include their running version in every
page they serve, e.g., “Powered by XOOPS 2.2.3 Final”.
Finding usernames
We will use Google to find files containing user names which is useful for making dictionaries for example.
allintext:username filetype:log . Here is a part of a file with more than 2209 rows:
Error Retrieving RSS File:
username:picklepeople
user_id:7321
rss:http://a*******l.org/feed
XML Processing Error: 4Empty document
username:inferno
user_id:240
rss:http://r*****o.l******n.com/rss/
XML Processing Error: 9Invalid character
username:rishey
user_id:338
rss:http://feeds.feedburner.com/____dio.xml
And using the same query I found an SQL injection log attack:
2012-08-15 03:48:50 213.xxx.xx.229 cid
http://www.h*****.at/index.php?option=com_yelp&controller=showdetail&task=showdetail&cid=-1+UNION+ALL+SELECT+1,2,3,concat(0×26,0×26,0×26,0×25,0×25,0×25,username,0x3a,password,0×25,0×25,0×25,0×26,0×26,0×26),5,6,7,8,9,10,11,12,13,14,15,16,17+FROM+jos_users–
2012-08-21 04:48:01 61.xxx.xxx.72 id
http://www.h*****.at/index.php?option=com_recipes&Itemid=S@BUN&func=detail&id=-1/**/union/**/select/**/0,1,concat(username,0x3a,password),username,0x3a,5,6,7,8,9,10,11,12,0x3a,0x3a,0x3a,username,username,0x3a,0x3a,0x3a,21,0x3a/**/from/**/mos_users/*
Collecting email addresses
allintext:email OR mail +*gmail.com filetype:txt,
with this query I was really surprised since the first result was a text
file (without talking about the very interesting host found) containing
35,572 email addresses and passwords
Finding sensitive files and directories
intitle:”index of” inurl:ftp (pub OR incoming)
intitle:”Index of” phpMyAdmin , intitle:index of inurl:config* intext:last modified
intitle:”index of” AND password OR passwd OR pwd intext:”last modified”
All these queries return interesting results; we just need to know
what we want to find and how to tell Google to look for it. Example of a
result returned by one of these queries:
define(“MYSQL_HOST”, “mysql106.db.******.***.jp”);
define(“MYSQL_ID” , “na***o-hoso”);
define(“MYSQL_PASS”, “mJtp2XfG”);
define(“DBNAME”, “na***o-hoso”);
Finding error messages (eg finding some websites vulnerable to SQL Injection)
allintext:”Warning: mysql_connect(): Access denied for user:
‘*@*” “on line” -help -forum -tuto* inurl:”id=” & intext:”Warning:
mysql_num_rows()” -help –forum
We can almost find everything we want using Google if we are able
enough to sharpen our query. I enjoyed making some queries using
different combinations of keywords within different operators, see some
of results below:
Full access to mailboxes
IPN Logs (Instant Payement Notification
)
[07/30/2012 8:16 PM] – SUCCESS!
IPN POST Vars from Paypal:
mc_gross=30.16,
protection_eligibility=Eligible, address_status=confirmed,
payer_id=624*****REN, tax=0.00, <strong>address_street=11 Ta*****x
Cr*****nt,</strong>
<strong>Napsbury
Park, London Colney</strong>,, payment_date=12:16:49 Jul 30, 2012
PDT, payment_status=Completed, charset=windows-1252, address_zip=AL2
1UT, first_name=francis, mc_fee=1.23, address_country_code=GB,
address_name=francis dixon, notify_version=3.5, custom=,
payer_status=unverified<strong>,
business=aka******ns@gmail.com</strong>,
<strong>address_country=United Kingdom</strong>,
<strong>address_city=St Albans</strong>,, quantity=1,
verify_sign=A5RHA3OA3pOT5X1MMHRoOSFAM28uAiONl5B7uyghy9xnGSAd9ccEWHE0,
<strong>payer_email=f****s_m_d****n@hotmail.com</strong>,
memo=<strong>11 Tamarix Crescent is my home and card address, but I‘d
like the goods to be delivered to work, hence the delivery address is
for Ashlyns Hall, Chesham Road, Berkhamsted, Herts, HP4 2ST.
Thanks</strong>, txn_id=65W*******6337, payment_type=instant,
last_name=dixon, address_state=Hertfordshire,
<strong>receiver_email=ak*******ns@gmail.com</strong>,
payment_fee=, receiver_id=223*****GE, txn_type=web_accept,
<strong>item_name=www.tg*****en.co.uk</strong>,
mc_currency=GBP, item_number=284, residence_country=GB,
handling_amount=0.00,
<strong>transaction_subject=www.tl****en.co.uk</strong>,
payment_gross=, shipping=9.77, ipn_track_id=adca*******6f56,
IPN Response from Paypal Server:
HTTP/1.1 200 OK
Date: Mon, 30 Jul 2012 19:16:58 GMT
Server: Apache
X-Frame-Options: SAMEORIGIN
Set-Cookie:
cwrClyrK4LoCV1fydGbAxiNL6iG=hK2VxLRsSDcIYah2BmIWM47I715hlkzTGZn77XqmWH_hTHKBD4Dfb_YB7QJlb4i-XN1tcAlHsYZ7SJG0nwdzGZ9eCXsD8fnHSGUfuv2VDtDWp5doDsPpyYHv0QQK0YpgrIYVxG%7cEm0x-LnDlXeHNV0UQExcUhT9rGdmvXVCyQ4nJjpQbWY-aukw2RIxc_jHE0Le2yfB79mo2m%7cSbl_lt9TSLMGNvfjbyQmu4B3eh7tFun2OFsf-SGv2lectPoMVxcIrwMNF7QDvzNc8v_ON0%7c1343675818;
domain=.paypal.com; path=/; HttpOnly
Set-Cookie: cookie_check=yes; expires=Thu, 28-Jul-2022 19:16:58 GMT; domain=.paypal.com; path=/; HttpOnly
Set-Cookie: navcmd=_notify-validate; domain=.paypal.com; path=/; HttpOnly
Set-Cookie: navlns=0.0; expires=Sun, 25-Jul-2032 19:16:58 GMT; domain=.paypal.com; path=/; HttpOnly
Vary: Accept-Encoding
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
Full information about some website’s customers with their names, addresses, postal codes, cities, phones, mobiles and emails addresses
Click to Enlarge
You can see that things are getting more serious. As you probably
guessed, no one escapes the indexation’s spiders and crawlers of Google!
Here is an Excel file containing names, country codes, marks and bachelor courses of more than 8014 students:
Click to Enlarge
Here are full dumps of databases of tens if not hundreds of some website containing in some cases clear usernames and passwords:
Click to Enlarge
I’m going to stop at this point, no need for more demonstration;
Google is certainly our common friend, including malicious people with
malicious intents, before putting a file, a directory or any other
information that’s not supposed to be publicl, you should remember
checking the state of access to your sensitive files and folders.
The use of an empty index.hml file within a directory can be very
useful to remove simple directory listing, think also about applying the
correct CHMOD to your sensitive directories and limit or remove access
to your uploaded backups.
The use of the file
Robots.txt can also
save the privacy of your data; you can prevent Google or any other
search engine from indexing your website, files or directories by
correctly filling a Robots.txt file.
The following tips may help:
-
Preventing Google from indexing your site:
User-agent: Googlebot
Disallow: /
-
Preventing every search engine from indexing your site:
User-agent: *
Disallow: /
-
You can also prohibit Google from indexing a specific file type:
User-agent: Googlebot
Disallow: /*.sql$
-
To prohibit a directory and all its content from being indexed by Google:
User-agent: Googlebot
Disallow: /directoryName/
-
To prohibit a specific page from being indexed by Google:
User-agent: Googlebot
Disallow: /confidential.html
These tips could be used along with some HTML tags (Meta tags) which you can place between <head> and </head>
<meta name=”robots” content=”noindex, nofollow”>
And you can also prevent caching your website by Google by using this:
<meta name=”Googlebot” content=”noarchive”>
This non-exhaustive list of solutions may possibly help you to
protect yourself against search engines and especially against Google,
but you must be very careful when handling the way Googlebot (or any
other search engine crawler) can see your website to not see your pages
disappearing completely from their search engine results!