The Dark Side of Google

Abstract
A product that started in a classroom in Stanford has turned out to become a household name across the world. Google is the most powerful search engine which pulls millions of visitors on daily basis becoming one of the busiest sites in the internet. The primary aim for Google is to generate information searches as requested by the users. Given its high traffic, Google has also become an advertisement hub where many business deals are posted. Other than the search engine Google has also developed Google mail, Google Chrome, Google earth, Google calendar and other programmes to complement its main activity as a search engine company.

Google platform has played a big role in globalisation process in which most of the socio-economic and political activities across the world have been connected to one large network. Google is also a social network where many ideas are exchanged and where many others find recreation activities.
However, hackers have given Google another dark side. Google hackers use Google to search for classified information from government departments, companies and other organisations with a motive of either squandering their money, destroying their systems at the advantage of the competitors, planning ahead of them or just for sheer fun. These activities have made internet to be a dangerous place for business and sensitive state matters.

The high dependency on the internet by the world today can however not allow any of these agencies to withdraw from the system. Google hacking solutions have therefore been developed to deal with the techniques that the hackers use against the legitimate internet users.

This thesis has explored the dark side of Google by use of six chapters. The first chapter is the introduction of the Thesis with chapter two giving the background of Google. Chapter three is a literature review where the techniques used by Google hackers are discussed in details. The preventive measures against these hacks are also outlined. The fourth chapter discusses the design and implementation tools that can detect Google hacks. Here, the role of Google hacking detection tools and ways in which Google safeguards its stakeholders from potential hackers are also discussed. Chapter five examines into details the design for a signature based device solution to hacking and the sixth chapter provides important guidelines against Google hacking.

This project has intensively explored the issues on Google hacking and it is an important resource for companies, internet service providers, server administrators and end users in offering guidance against Google hacking through provision of knowledge in Google hacking techniques, preventive measures, detection and solution designs

Chapter 1 Introduction
With the increased interactive activities in the world, internet has become an important central point for researchers, professionals, politicians, business communities and the general public. Online transactions have become important with each player ensuring that their website(s) attract great number of visitors at any given time by using appropriate Google optimization techniques to raise their the rank with the search engine. This in turn helps the visitors to the website to find their way easily thereby increasing thereby increasing business activities enormously. However, as the clients benefit from such arrangements on one side, unauthorized persons with ill intentions use the same enhanced platform to harm the operations of the organizations by accessing secret data through advanced searching techniques. These data is then used by the hackers as a weapon against the operations of the organization and other partners that may be affected by the stolen data (Balaji, 2005).

Google hacking refers to the attempt by online users to access restricted and sensitive data from individuals, organizations or government departments databases using techniques that are driven by the Google search engine by use of direct commands or enhanced complex queries that ultimately identifies the vulnerable sources (Jahangiri, 2010). Google search engines enlist an enormous number of websites and devices around the world that may often fall prey to the hackers who are always snooping for vulnerable victims. Although Google hacking practices are against the Googles terms and conditions of service, its effort to block hacking queries has not stopped the determined hackers. The dynamic Google hacking techniques are used to isolate thousands of targeted web servers in a randomized manner and leak the sensitive data required from them. This dark side of the powerful Google search engine has enabled fraudsters to locate targeted web pages and search through the required domains and URLs to obtain specific secret directories, files and other forms of data (Hollis, 2005).

There has been a tug of war between hackers and other ethical internet users in trying to control the access of the information contained in the internet. While Google provides a means to access enormous database of information, accessibility of some of sensitive data is restricted to authorized persons and IP addresses. This information is the root cause of hacking activities with some hackers trying to reach the material just for fun while others fight to reach the secured information with other attached motives.

This thesis aims at exploring on the new techniques used by Google hackers and identifying implementation designs for Google hacks detection tools with an aim of providing guideline to companies, internet service providers, server administrators and end users with crucial information that will keep them away from Google hackers attacks.

Chapter 2 Background
Google is a product of a Stanfords PhD research project that was carried out in 1996 by Larry Page and Sergey Brin as they sought to establish the configuration of the World Wide Web and the significance of a web page on the bases of its back-links (Google, 2010). The term Google was formulated from a mathematical term googol that refers to the number 1 x 10100 which was a reflection of the enormous amount of data that was accessible in the web and the extent of the Googles mission towards organising the worlds data and make it accessible to all. Googles index has since grown to become the largest assortment of web pages in the world with a potential to search through the billions of pages within duration of less than half a second. This immense power has contributed to the tremendous increase in the daily page views that has risen to 7.2 billion (Google, 2010). There are various factors that have contributed to the exponential growth of Google to become the worlds number one search engine.

According to the corporate information posted at its website, Google has been committed to research towards attaining an ideal search engine that was described by the co-founder Larry page as that which comprehends the need of the user and returns exactly what is required (Google, 2010). With this focus, Google has been persistent in pursuing innovative measures that has penetrated the limitations that had been posed by the previous search models. One of the great products of these efforts was the development of the PageRank serving Technology that completely revolutionized the searching techniques. The Google developers have met the users need by providing a fast and accurate search results through the use of an updated server setup. The new server set up would utilize other linked network to enhance quick searches especially during peak loads and would therefore not experience slow responses as witnessed in other search engines. Though this technology has since been embraced by other search engines, Google has remained ahead in refining the technology to make it more efficient in terms of speed, scalability and costs. The Google search software that has become preference by many users carries out a sequence of simultaneous commands that are executed in a matter of milliseconds. This is made possible through the use of more than 200 signals that include the PageRank algorithm that searches the whole link structure of the World Wide Web before determining the most relevant pages to display (Google, 2010). This is followed by hypertext matching that analyses the requirement of the user to ensure that the more relevant pages to each specific search are displayed first. With this combination of the over-all significance and query-based relevance, Google search engine is able to meet the requirement of billions of its users by returning reliable results in a short span of time. Through PageRank technology Google search engine considers in excess of 500 million variables with 2 billion terms and allocates higher ranks to the more important pages giving them priority to appear at the top of the returned results. The PageRank technology also provides for consideration of the significance of each of the pages that would cast a vote with some of the votes generated from some specific pages getting a greater weight than others. This approach gives the links of the pages whose vote has a greater weight priority in the display returned to the user. In ensuring that the search quality is enhanced, Google technology employs collective web intelligence in deciding on the significance of each of the page considered in the search. In the other hand the hypertext-matching technology analyses the content of the selected pages to determine the most relevant one to return to the user. This is a more superior approach when compared to other approach employed by other search engines which only scans for page based texts. The latter approach can easily be manipulated by use of meta tags during the publishing of the sites that would render search for such pages ineffective. With the use of Hypertext Matching, Google search engine is therefore at a position to carry out full analyses of the page content with precise consideration of such factors as the fonts, sub-divisions and exact location of every word. The technology also allows for analysis of the neighbouring pages to make sure that the results displayed are certainly the most relevant for each of the query submitted (Google, 2010).
To capture more audience Google innovative efforts has provided more applications that would ensure that the search engine is still useful even in the absence of a computer. Billions of HTML pages are translated into WAP formats for use in phones and other portable devices making the search engine accessible to more users. Google is partnering with some of the leading manufacturers of digital devices to ensure that the search engine innovations are usable and can easily be customized in most of these devices. This mobile technology now offers the Google users less expensive options to access the search engine while at the same time giving them more time to serve as the devices can be used anywhere at any time. Google has integrated the life span of all the queries to undergo a comprehensive route that ensures that quality search is undertaken without compromising on the waiting time. The query goes through three main stages as shown in the flow chart diagram below

The importance and popularity of Google has largely been attributed to the power of its search engine. A lot has been done by the Google team to digitize most of what users may require information about around the world. This is being achieved through Google books and Google News Archive projects that are devoted to digitizing books and other archives and have them accessible to Google users. Google has also become popular due to its user friendly website that is easy to navigate through. The use of innovative Google Toolbar software and Google Chrome has made the browsing of the Google web to be fast and easy in a manner that the competitors have not been able to measure up to. Google also offers more personalized searches for users who are signed up in their website to return even more tailor made results. Other than the search services Google has also offered other services that enable users to perform most of their services from one point. Google Company Inc. which runs as a business generates most of its money from advertisements that are offered to the users at cost-effective rates. The ads carry relevant advertisements that are useful to both the Google users and those who post them. Thousands of advertisers use Google AdWords to reach their prospective clients worldwide via the web. This service also attracts many prospective clients who use the search engine to look for services and products of their interests. Unlike most of the other search engines, Google advertisers specify the exact geographic area and the time that they would want their ads to appear. This synchronization of the information ensures that it reaches the intended population that would derive relevance from the data. As a result such people become assets to the company as they get more and more dependant to Google in their operations. The advertisements are distinguishable from other normal search results by use of such labels as sponsored links to ensure that Google users are able to identify them easily (Google, 2010).

The popularity of the Google has also been boosted by Google AdSense program that enables the advertisers to widen their ads promotions, enhance partnership relations, attract more viewers and increase the ability to generate more money. To ensure that the advertisers get enough, Google also offers additional advertising formats including YouTube, Google TV Ads and other specialized online ads services through the DoubleClick Program. In addition Google offers free advertising tools to the users that make advertisement more efficient and measurable including Ad planner, Insights for search, Google Analytics and Website Optimizer. With these tools the Google advertisers are able to effectively analyze their online promotions and test them. This fact makes the platform more efficient to both advertisers and viewers which make it more preferable to other search engines.

Another important feature that makes Google even more popular is its Apps programs that enable people to exchange data and interact more effectively. The most popular of these apps are Google Docs, Calendar and Gmail that enhance communication, collaboration and online planning while still allowing the users to store large amounts of data securely online and access it from anywhere in the world. The business community is an important market niche and that is why Google provides a specialized application suited for businesses called Google Apps. Though this application is good enough for large businesses it can as well be useful for small and medium enterprises. Google upgrades the application regularly and therefore the users find it less expensive than other software for they need not to worry about maintenance costs. Google Apps is a perfect program exactly fits the working and socialization pattern of most of the business people making it a more desirable application for the user can focus more on business rather than on the maintenance of the software. Google Apps also have an advantage in that the users data safety is guaranteed and the apps respond at satisfactory speed. The data stored by Google users in the web is completely personalized, portable and can be exported as documents, photos or calendar events easily at any time. According to Prescott (2007), the use of Google calendar has attracted large visitors to Google with the application registering a growth of 333 at the expense of mainly MSN and Yahoo in 2006 only. The Google idea to have most of operations from one point makes it more like a portal and this seems to be more popular to the users. The search engine G-mail and calendar services make it possible to relate one success with another a fact that makes Google name more popular to the users. Whereas the Google competitors are mainly known for one major service that they specialize on, Google has been innovative enough to have most of its new products register enormous growth and overtake the traditional service providers (Prescott, 2007). The graph below from Hitwise Clickstream shows how the Google calendar services registered growth within the US market to catch up with the former major specialists in the service, that is, MSN and Yahoo in the year 2006.

Source Prescott L. (2007)
The year 2006 also show most of other innovative products from Google attracting huge number of visitors making the young company become on of the most important companys in the world. As seen from the table below Google sites hit the highest mark and was only second to the giant Microsoft sites. This popularity also has a spill over effect in which the new users would want to make use of the popular company hence generate more income for the company and provide more income for innovative research (Lipsman, 2007). The end result is that the company is able to maintain quality services and maintain high number of visitors and therefore remain a notch higher of the competitors.

Source Lipsman A. (2007)
Top Global Web Properties Total Unique visitors December 2006Web PropertiesTotal unique visitorsMicrosoft sites740,984Google sites508,659Yahoo sites494, 170Time Warner Network260,387eBay251,423Wikipedia sites164,675Amazon Sites151,033Fox Interactive Media135,730CNET Network114,940Ask Network113,881Apple Computer Inc.111,131Adobe Sites100,421

Google started as a vendor of one product as was the idea of the two Stanford university students. Their main aim was the creation of a search engine that could gather and consolidate information and make it available to any user who was in need of the information. However, through invention and innovation in the computer world since its one of the most dynamic industries the world has experienced various other products and applications were created which have contributed to the success of the search engine to its current state. Google as a search engine was designed to operate in the English language since most of the content on the internet at the time of its incorporation to Google inc. was in English. Gathering from this web content the initial idea was to create a search engine that could incorporate all the English words there were in the dictionary.
However, this idea is superseded by the existence of massive web content of other languages such as French, Spanish, and Chinese. Due to this development there was need to incorporate other languages into the search engine in order to make sure that it was the best search engine. This was achieved in 2002 where a 72 language interface was launched. There are many other applications that have been incorporated in the search engine. Any researcher, analyst or just anyone with the quest of information regarding any particular subject has a choice among the various applications that are incorporated in this search engine. For example a land surveyor interested in the topography or land maps of a particular area would use Google maps to locate the area and its terrain.
The search engine through partnerships with various other corporations has other special features which are useful in different areas of operations. Some of these special features include synonyms, weather forecasts, time zones, stock quotes, maps, earthquake data, movie show times, airports, home listings, and sports scores (Google, 2010).

Synonyms are words with similar meanings as the dictionary would define it. The search engine has such a capability whereby it is able to generate synonyms or words with similar meanings. This makes it easy for anyone who wants to find words with a similar meaning. Weather forecast in the search engine show the different weather conditions in different parts of the world at a specific time. These act as very important guides to tourists who want to move from one place to another. Time zones on the other hand relay the time of the day at different parts of the world through time zone conversion. Stock quotes are very important to any business man who has invested money in the stock exchange. Stock quote therefore show the prices at which are stocks trading at in a given time. The data on maps is also available in this search engine whereby one is able to get the various directions to a certain place or calculate the distance between two places.

The background of this search engine would be incomplete without a mention of its intended mode of operation. This is because the eventual success of this search engine was achieved through its success in operation. Any idea formulated and put into successful operation especially in the field of computers whereby there are very many hackers and other different kind of attacks is a big idea. This is why the operation of the search engine is of so much importance to both the developers and any other person performing its analysis. Google search engine or Google web search is a search engine that has the highest page rank on the internet. This means that it is   the most visited site and it is the site that has the largest number of back links in the internet. The main purpose of this search engine is to obtain text from different web pages ranking them according to the search string of what is being searched (Google, 2010).

As Google boasts of its dominance in the internet market it has greatly mastered several areas that count as the major success factors that continue to make it an important and popular sector. On scalability Google has easily grown to a size that has reduced its marginal costs and can now adapt its size to high loads and volumes. This in essence means that the giant company can be able to monetize millions of its users and count billions in profits. The utility of a good business for either products or services is measured by the number of clients and Google has reached a stage where most of the competitors may not easily penetrate. Considering the enormity of the data that Google handles and the users behaviour that can be valued for money, the importance of Google in the world economy cannot be over-emphasized. It is partly because of this underlying importance and the inter-twining effect that Google has created at almost every sector that makes the study of its dark side relevant and interesting (Google, 2010).

Chapter 3 Literature Review
3.1 Introduction
Google hackers have always come up with new methods to ensure that they succeed in identifying vulnerable targets and penetrate them. With the first pace in the growth of information technology, these techniques are ever changing to embrace the new technologies. Google hacking techniques majorly target for lead data from insecure site servers that can be eventually used to give links to sensitive information of a company, organisation or government department. Measures have already been designed to detect and block attempts by these hacking tools and server administrators and internet service providers should ensure that such measures are in place and are monitored throughout to keep off the hackers. While a number of the detection tools have been designed by linux operating system, most of the modern tools are windows driven and are based on Google Application Programming Interface. This chapter will explore both the techniques used by Google hackers and the measures taken to detect and implement the activities of these hackers.

3.2 The Techniques used by Google Hackers
Google has expanded to become a giant system of assorted data from around the world that can be freely accessed through the internet. This property has made Google a household name virtually in every internet outlet and the name Google has become synonymous to data search. Due to its advanced storage techniques, variety of almost any required data and the ease of the retrieval system Google has become the leading single source of information in the world. The integration of the system to the internet network through the World Wide Web assures hackers of availability and accuracy of data retrieved once the correct tools are used to access the information (Mohanty, 2005). Though certain levels of retrieval of data done by Google hackers is against the terms of service of Google, the degree of accuracy and the enormous potential of data accessibility provided by Google is enough assurance for success by the hackers. The ever rising ease in usability of the Google system is another key to Google hackers techniques that makes it superior to other available hacking techniques. As Google and server administrators continue to put stringent measures to reduce unauthorized access of sensitive data through Google network, Google hackers are also developing more penetrative techniques to ensure that they gain access to the kind of information that they want. Though the design and implementation styles of these techniques are varied and regenerative their basic principles are discussed.

3.2.1 The Site Operator Technique
When using the Google search engine a site operator technique is used to broaden the search so as to look for data in all the domains of the targeted web. For instance, a search for site gov intelligence will generate all data with the key word intelligence from all the sites having .gov domain. The hackers deliberately code the site operator search words in reverse order to ensure that only the top level domain key words are used for the search. This hacking technique has been used by journalists and other unauthorized snoopers to get information that could be of interest to some group of people from government websites or other private and corporate organizations. If a hacker for example has a grudge against a given organization heshe may employ this technique to unleash the classified information to the public, rivals or use it as a secret weapon against the organization. This is made easier by the fact that such organizations use very descriptive domain names that can be generated by use of a range of key words fed into the search engine (Lancor and Workman, 2006 and Long, 2010)).

In using the site operator technique, the Google hackers may make use of a collection of Google wastes (Googleturds) generated from the deliberate use of invalid queries that do not follow the site operator rules. These invalid queries gather useful leading information as the search engine tries to sweep across the targeted website. Although this tiny information generated is actually typographical errors linked to the searched web pages, experienced hackers are now able to put together the little links, fill the missing gaps and use them to crack through into the sensitive data. This approach is used by Google hackers when the sensitive information required cannot be accessed by use of valid site operator queries. For instance, a hacker with interests in some data of an organization xyz whose official website is www.xyz.com may use an incomplete query such as site xyz  and find some tiny mis-spelled Google wastes links that could generate useful leads to the required sensitive data (Long, 2005).

The site operator technique can also be used by Google hackers to gain access to all web pages of an organization by use of only one query from which he can filter the required data. For instance running a query like sitewww.facebook.com facebook will prompt the Google search engine to display all web pages with the word facebook with the search restricted to the website www.facebook.com. Given that the search engine searches includes the URL, title and the content of the web pages the likelihood is that all the pages at www.facebook.com will be captured by the query and the hacker can access a list of all those pages just from one query. However in cases where the facebook web page directs the search to the IP address of the web server the search engine would be expected to cache such a page and will not display it as it belongs to the IP address and not to www.facebook.com. To have such a page displayed the hacker would simply replace the word facebook with the IP address of the web server and get entry to each of the IP address used. This new technique has made it possible for Google hackers to gain information of an organizations full website structure without necessarily launching the official website. Google hackers can carryout this kind of attacks with no fear of related risk since Google searches occurs only on Google servers and it is therefore Google only that can trace the history of such searches. In addition, Google allows for target surveys that mitigates on risks on the side of the hacker (Long, 2010 and McMillan, 2005).

3.2.2 The Directory Listing Search Technique
This technique makes use of lists of files and directories meant for clients who wish to access organizations information from the main directory structure. Google hackers may attack the directories to search for data of interests that may lead them to domains that contain sensitive data (Tiller, 2005). Directories may some time hold temporarily some sensitive files in case a web page file or an index fails to work (Palmer, 2001). With most of the directories having the word index of the Google hackers would have no problem commanding the search engine to filter out directories from the targeted websites. The query intitleindex.of is therefore suitable to generate all directories from the target with the period (.) serving as a wildcard in Google. However, the result of this general kind of query will include documents that have the title index of that may not necessarily be a representation of the required directory listings. For this reason the new Google hacking technique has introduced more specific queries that will only generate directory listings only, for example, intitleindex.of parent directory, query will generate directory listings from the main websites by considering other key words contained in the directory of interest, in this case parent directory. Another query such as intitleindex.of name size specifies the name and size of the required item and therefore makes the results more accurate. Because most of the directory listings are deliberate tools aimed at aiding the customers to navigate through the websites with ease, the Google hackers find this quite handy in getting the required data or leads that may aid other hacking techniques like versioning technique (Long, 2010).

3.2.3 Versioning Hacking Technique
Versioning technique is a new Google technique that can be done through directory listings, default pages or manuals. With the information on the precise version of the web server application that runs on a targeted server, a Google hacker can now be able to launch a fruitful attack on such a web server (Long, 2010). This information can be obtained easily from the web headers once the hacker is connected directly to that particular web server. However, a Google hacker can obtain the same information without necessarily having to connect to the targeted server. One method in which this is possible is by using the data available in the webs directory listings because such listings normally contain the servers name, software employed and its exact version. Although some of skilled and suspicious web administrators are today faking these details, most of the servers contain legitimate information that Google hackers can successfully use in their attack. An example of such a query for an apache web server can be run as, intitleindex.of server.at siteapple .com or the query can specifically point out to a specific version of the target web server as intitleindex.of Apache1.3.0 server at that would generate data on directory listings that are driven by Apache software version 1.3.0 (Long, 2005).

For successful use of versioning hacking technique, the target website must bear not less than a page that generates a directory listing. The directory listing generated in such a page must contain the version of the server impeded at the foot of the page. In cases where the version of the server is not included at the foot of the directory listing page, profiling hacking techniques can be employed to examine the title, header and the general content of the directory listing and fish out leads that would inform on the software used by the server. The Google hackers can get the clue of the software in operation by comparing the formats of directory listings whose exact software formats are known with the format of the directories whose software version is to be determined. This approach is however limited for servers whose directory listings are uniquely customized to fit the specific requirements of the users. In addition, some directory listings are set in such a way that they are independent of the web server and are only controlled via third party software applications. The third party software application in most of the browsers can however be identified through profiling technique or by use of view source option of the web page containing the directory listing. Despite how difficult it can be in finding the exact version of the web server, the Google hackers can easily monitor the vulnerable victims and attack through this technique. With an exploit that functions against some specific web server versions like Apache 1.3.0, a Google hacker can easily employ simple queries like intitleindex.of Apache1.3.0 server at to generate directory listing for potential vulnerable victims (Long, 2005).

A Google hacker can also know the web servers precise version by examining the default pages that were generated during the installation process. Normally, these pages are generated to assist the web site administrator in ensuring that the web server runs correctly. Some computer operating systems usually have the web server automatically launched in a manner that the online users may not even know that the server is running in the background. Such behaviour of a web server predisposes its content and makes it vulnerable to attacks by Google hackers. When a Google hacker runs a query like intittleTest.Page.for.Apache it.worked the search engine would generate a number of sites that make use of Apache 1.2.6 and have a working home page. Operating systems like Internet Information services (IIS) from Microsoft, integrates these pages within the system and therefore the pages can be located and the web server software version determined without any additional installation by the internet user. To get into search users databases, a Google hacker can run such a query as, intitlewelcome.to.IIS.4.0 and the search engine will generate the potential vulnerable server sites that makes use of IIS software version 4.0. It is normally easier for Google hackers using this hacking technique to know both the exact version and the operating system for most of the web servers that are based on Microsoft utilities. Other web servers that have integrated default pages are the Netscape-based servers and a hacker can easily access it by running a query such as, allintitleNetscapeEnterprise Server Home page (Long, 2005). In the same way the Google hacker can be able to locate any other web server and the exact version running in the machine using this technique, for instance a list of directory listing running jigsaw web server can be located by running the query, intittle jigsaw overview (Long, 2010).

Another way in which the Google hackers can locate the information on server using the versioning technique is through the use of manuals whose installation in these websites is done by default to help the administrators to run the server. With the continued use of these manuals by the unauthorized Google hackers, most of the administrators are now deleting the manual codes before launching the server. However, even as the administrators observe care in their installation process, the Google hackers have become even craftier as the Google engine becomes more and more powerful (Scambray et al, 2003). Google such engine can still be able to snoop on some of these directory listings by use of related default pages to determine both the version of the web server as well as the vulnerability of the target. For instance, when a Google hacker runs such a query like, inurlmanualapache directives modules (Long, 2010), the search engine will generate the web pages that contain the required apache server guides in different styles depending on the version of apache software that is installed in the targeted web server. The generated web pages will furnish the Google hacker with the details of the version of the web server, for it is normally included at the top of each of the default manual pages. However, in cases where the version has been upgraded to another different version since the installation may not be able to reflect such changes. In cases where the targeted web servers use the IIS technology the term used for the manuals is help pages and therefore the Google hackers can gain entry in such servers by changing the query name to allow for generation of a number of versions of the IIS web servers. A workable query in such a case would be, allinurliishelp core or a hacker interested in the default sample application may obtain them by running, inurliisamples. For more specific sample the name of the required sample can be added in the query. Some sample directories that may be generated in this manner may be containing other subdirectories within them and in case an experienced Google hacker wants a sample with a specific subdirectory, he can insert the name of the subdirectory in the query. The query may be adjusted accordingly to generate various types of help pages as required and display the level of vulnerability of the programs in the targeted web server (Long, 2005).

3.2.4 CGI Scanner Technique
CGIWeb scanner is an important toll in todays Google hacking for web servers for it accurately searches out for vulnerability levels of various programs in the server and draws attention on the best areas to attack from. Before commencing a search using a CGI scanner, the Google hacker must feed it with the exact information of what is to be searched from the targeted web server. Once the scanner identifies files or directories that are potentially vulnerable, it automatically stores the file in a unique format. The stored information can then be used by the Google hacker to generate the actual road map to launch an attack in to the examined web server. This is done by taking the stored information and breaking it down into index of or inurl specific paths to target specific vulnerable targets as identified by the CGI scanner. For instance the Google hacker can run the query, allinurlrandom_bannerindex.cgi and use the generated information to ultimately break into the randomized banner and catch-up with the hidden files in that particular web server together with its passwords. The vulnerable files returned by CGI scanner can be overwhelmingly large and therefore experienced Google hackers use it within an automated environment by the use of Gooscan (Long, 2005).

3.2.5 Traversal Technique
This technique refers to the act of simply travelling across a target to hunt for the information required by identifying small foot holds which are then expanded to large compromises. Google hackers can employ the technique by searching to identify parent directories then exploring each of the subdirectories that may contain sensitive information of important leads. The use of inurl search can also be used in locating particular files from within the directories and subdirectories that are thought to contain the required sensitive data. The URL address can always be modified from the address bar to allow for exploration of the directory structure and give more important leads. The Google hacker can also build up on any identified software flaw to wander within the directories placed outside the targeted web server directories (Tara and Dornfest, 2003). For instance if the Google hacker is traversing on files that are located in a directory whose installation is at varwww directory and the associated public web files are available at varwww htdocs, then the Google hacker attached to the web server will be able to identify the available files from the said directory. In case the flawed software application installation running in the server allows for the directories names to be represented as arguments the URL usable by this software could look like www.saidsite.orgbarcode.plpageindex.html (Long, 2005). This URL can be used to send instructions to the badcode.pl application to access a file that could be available from varwwwhtdocsindex.html and forward it to the hacker. The hacker can also utilize the opportunity and send a more specified URL, for instance, he may try one of the form www.saidsite.orgbarcode.plpage.......etcpassword. Based on the level of vulnerability of the badcode.pl application, this kind of attack would most probably lead to the disintegration of the code from the directory varwwwhtdocs. It would then allow the badcode.pl application to access the parent directory of the server and pick out the system passwords from the specified etc directory and forward them to the hacker. The information could as well be displayed in a format that would allow the Google hacker to see both the header and the footer that could be having further leads (Dornfest, Tara and Bausch, 2006).

3.2.6 Incremental Substitution
This is a Google hacking technique where the URL numbers are replaced to locate data from hidden files and those whose links are not associated to other pages. Normally the Google search engine would only locate files that have a link to other pages. Take for instance a file exhc-1.xls available at the Google database. The modification of this file can be achieved through the incremental technique to have its URL increased from 1 to 2 and the new file name would thus change to exhc-2.xls this name can then be a useful tool to locate the actual file required. A Google query can also help to generate other related files from the targeted web site. Google hackers use the incremental technique to locate not only files but also any other target that may have some numbers in the URL address including scripts and parameters for example, having a lead of the form intitleindex.of.inurl0001, then modification can be done by increasing the number to locate other similar leads from the same site (Long, 2005).

3.2.7 Extension Walking
File extension is an important feature in specifying the specific file-types that a search on the Google search engine should generate. Extension walking is a technique used by Google hackers in which the extension of the known file-types is changed and used to search for other suspected files that could be sharing the main name. The technique can be useful in tracing out back up files that can be containing important information on the system security. Extension walking can be a valuable tool in generating PHP codes that are normally hard to break from most web servers. In most web servers performing a view source operation from an HTML page generates source codes for the page a case that is different for PHP codes (Anderson, 2004). For Google hackers to view the PHP codes, they may employ extension walking technique to mis-configure the PHP code so that it ends up with a different extension other than the PHP. In most cases a BAK extension can be used to have the PHP.BAK returning the PHP source code that is displayed in the browser as a text. The PHP code provides very important information to the Google hacker that may include the SQL database structure containing sensitive data of the targeted web server (Long, 2005).

Detecting and implementing preventive measures against Google hacks
As already discussed Google hacking techniques can be a great threat to the security of any organization that handles sensitive data for a network of its employees or partners. For that reason it is expedient to device and implement ways in which the organization can avoid attacks andor detect them when they occur and take appropriate action. It is however advisable that any web server be only used to store information meant for public consumption and all sensitive and private be stored within an intranet or any other specialized server that is monitored through well defined policies (Litchfield et al, 2005). Many organizations have a tendency of dividing the public web server into separate parts that serve different purposes depending on the accessibility levels. This is however risky for the information can easily be copied by Google hackers rendering the directory based protection futile. As it has been observed most of the attacks carried out by Google hackers is executed through the directory listings (Foster, 2005). Directory listings can simply be disabled from an Apache Web server to keep off unauthorized access by preceding the word indexes by a dash. Google hacks can also be detected and blocked by the use of robots.txt files which are placed at the base of the server to help in defining the directories that could be placed at off-limits to the web robots. The robots.txt files are enabled with necessary authorization to permit the web server to selectively read the file. All lines that start with a  sign are regarded as comments and are therefore skipped. The other lines must either start with a User agent or a dis-allow statement and a colon. The lines effectively block crawlers from accessing the protected sensitive information from the directories. The users are prompted to enter the user agent field that normally takes the details of the crawler before allowing entry. Google uses Googlebot as its user agent field and therefore when sending a dis-allow statement to Google then the line should look like User-agent Googlebot. To assess the risk levels of a site it is advisable for the owner to hack it deliberately to ascertain that the system and the content are intact and safe from Goggle hackers. By making use of the various Google hacking techniques it is possible to know how Google sees the website and therefore initiate proper protection mechanism from Google hackers (Long, 2005).

3.3.1 Gooscan
Gooscan is a Linux-driven tool that is useful in scanning Google and detects any possible data leakages from a given site. Gooscan is however not based on the Google Application Programming Interface (API) and therefore its use to scan Google is against the laid down Terms of service for Google and therefore the giant search company can chose to punish a user of the tool by disallowing access of the Google services from the affected IP address. This tool can be used either in single query or multiple queries manner with the single query model returning its search results in a more manageable format compared to Google. To run a detective search using the single mode Gooscan with the search targeting a particular web site like lakesite.org, an -s is added to the query and it now appears as gooscan q search name t www.google .com s lakesite.org (Long, 2005). Such a scan will however point out only on the number of hits generated from the Google search. To get a more comprehensive HTML output an -o can be added to the query to read as gooscan q search name t www.google .com o search name .html (Long, 2005). The output from this query will now include all the options employed during the Gooscan, date when the search was carried out, list of queries that were performed, link to the exact Google search that was carried out and the exact number of output returned from the search. The HTML results contain a Google link that can carry out the appropriate Google search if executed. Running Gooscan in multiple-query to search Google may violate the Googles terms of service and therefore care should be taken to ensure that only small packets of queries are released at a time (Herzog, 2005). Just as in a single mode run, the search in multiple mode can be narrowed down to target one particular site such as httponlineshop.info by adding -s. The query will then take this format Gooscan t www.google.com -1 data filescommod_gdork.com o onlineshop.html s onlineshop.inf, Specific Gooscan runs would normally return a clean output and any suspicious output should be investigated further to trace out the activities performed via Google search.

3.3.2 Athena
Athena is a windows-driven Google scanner and just like Gooscan its writing is not based on Google API and therefore its use in searching Google activities can violate Googles terms of service. Athena is however not as intrusive as Gooscan considering that it permits the user to carry out only a single search at a time (Long, 2010). To use Athena the user need to have appropriate XML files having the searching queries loaded. Athena has a refine search box that allows the user to clean up the outcome returned from a search to get closer to the required result. The power of Athena in detection of Google hacks lies within its unique XML configuration files. In addition Athena.xml queries are contained in the Google Hacking Database (GHDB) which it uses as a source for all its scans enabling it to become a powerful and reliable tool. The XML configuration files can also be modified as appropriate to fit the needs of the user. These XML files have the search engine part and the signature segment, the search engine segment is an important part which defines exactly how the queries are to be built and how the search requests should be handled. The search engine has three main parts that are filled during each scan, these are the searchEngineName, that tells more about the name of the search engine that is selected from the Athenas drop-down box. The second part is the searchEnginePrefixURL which contains the initial section of the search URL that is fed before the query is inserted (Long, 2005). The third part is the searchEnginePostfixURL which gives details about the URL that is placed after query. The signature section of the Athena XML configuration files provides the details of each of the search that is carried out. The use of Athena format allows for easy creation of customized queries that makes it more suitable for Google hacks detection (Fisher, 2005).

3.3.3 Google API
Another important way in which Google hacks could be detected is by use of Google API. This can be done by creating an online account from Google at httpgoogle.comaccountsNewAccount and obtaining appropriate license key at www.api.google.comcreatekey. The generated license key allows the user to carry out a specified number of automated searches that can aid in detection of Google hacks and protection of the web server (Staykov, 2007). Some of the tools that use Google API are discussed below.

SiteDigger
This is an important tool that can help in protection of web server and detection of Google hacks and it runs by use of Google API. The tool can be obtained from httpfoundstone.comresourcesproddescsitedigger.htm and it would require one also to obtain a license key from Google. The computer must also have Microsoft NET framework already installed and updated to be able to launch the SiteDigger application. The launched SiteDigger allows the user to enter a domain name and execute a search from the main page. The tool is equipped with an export button that permits the user to create clean HTML formats of the results generated from the scan. SiteDigger returns one URL a fact that makes it not as effective during penetration checks. Though the scan can be narrowed to focus on specific site generation of false returns which is part of Google hacking routines and may not be automated. Another limitation of this tool in detecting Google hacks is that a scan on any given URL does not direct the user to a Google page to view the returned results but instead takes one to the page of one of the returned results. In addition going back to the Google page to check on the other returned queries is not an easy task. However, despite the limitations SiteDigger is still an important Google hacks detection tools due to the fact that it is automated (Long, 2005).

Wikto
Wikto also requires one to obtain the license keys before using the tool to detect Google hacks in a web server. Wikto tool is fully compatible to GHDB although its download does not provide such a copy. New users can however download GHDB updates from www.johnny.ihackstuff.com. Once a domain name is entered Wikto takes it through GHDB and generates results one query at a time. The details for each generated query is displayed at the screen as Wikto passes it with the querys description shown in the middle window and the query results at the bottom window. Wikto can carry out single queries with no site tags through choosing the required query from GHDB found at the top window. The tool queries Google and generates the outcome after pushing the manual execution button. Wikto displays all the results generated from any given scan just like Google but the tool is more convenient and more organized than Google. Wiktos high compatibility with GHDB has continued to place it as one of the best tools for Google hacks detection (Long, 2005).
Once Google hacks detection has been done successful it is important that the correct response measures be taken immediately to eliminate the leaks discovered andor enhance server protection against future Google hacks. The first step should entail the removal of any unwanted content from the web site. For successful removal of such content, it is important that each of the content be traced back to the source so that preventive measures for future leaks can be instituted on case by case bases. This is because data leakage doesnt just happen without an activity that prompts it to take place. Therefore the event that lead to the leakage should first be established, reverse the event and then follow up the source of the causative event. Important information in trouble shooting the Google hacks problem can be obtained from Google at httpgoogle.comwebmasters.com. Once the local problem has been resolved, the next step is to ensure that the cached copy of the leakage that Google could be holding is deleted before the Google hackers catch up with it. This can either be done through the Googles automated URL removal system situated at www.services.google.comurlconsolecontroler. One may need to register again with for Google account so as to benefit from this service unless you have a Google group account. This method is easy because once you log in to the account an E-mail bearing a verification code is received that would permit to gain direct access to the site where you can get rid of the URL options. Google becomes very handy here by helping in processing robots.txt files, checking their validity and initiating the removal of the pages affected. One may also opt to use the META tag referencing removal method especially if the page forwarded to Google is better of if not cached. In this case the META tag is simply updated and the text submitted to the removal site. The third option to remove the changes that could have been initiated by a Google hacker is to employ the use of Oh, crap page. The identified document that is found to have been transferred to the public reach should quickly be removed by logging in to the removal system. The removal options in this case should be guided on the sensitivity of the data that could have leaked to the Google hackers and the consequence that this action may have. When such consequences are dire then the first removal option that cleans up the entire file with any association to the affected document may be preferred. In case the implications of the leak are mild then one may choose the second option that allows the removal of snippets occurring at the search page and at the cached copy of the document. The third option may also be considered in which case it is only the cached copy of the affected document that is removed. It is important to note that the original version of the affected document must be deleted first for any of this removal option to be initiated successfully. This process ensures that the system is protected and the documents that gains entry into the public mistakenly are quickly returned to safe custody. When this operation is carried out regularly and in good time then, the harm from the Google hackers will be minimized if not eliminated for their activities are detected in good time before accomplishing their mission. The paths that are detected as possible routes for the Google hackers should also be sealed to ensure that they are not accessible in future and would not cause any other leakage of data albeit small (Long, 2005).

3.3.4 Google Hack Honeypots (GHH)
GHH is the latest Google reaction against Google hackers which is specifically designed to provide investigative tools to all users who make use of the search engine with a motive of attacking sensitive data stored by other web users. Implementation of GHH allows for the utilisation of honeypot theory that gives more security to the web servers. GHH is aimed at eliminating the threat that is brought about by the enormous growth of the Google search index that is fuelled by the large volume of information accessible through the search engine. This upward trend in the growth of the Google search indexes has resulted to multiplication of web based functions including message boards and other administrative devices that have caused a sharp rise in mis-configuration and vulnerability of many web applications that are available in the World Wide Web. GHH which is powered by Google and GHDB allows the web administrators to monitor the activities of Google hackers and gather the details of those that perpetrate the attacks and the manner of execution for such attacks through the appropriate log.  The GHH data can then be used to punish those caught trying to penetrate through to other users information by denying them future access. The information can also be used to inform the service providers on the malicious activities emanating from their networks and serve as an input for future systems security analysis. It is strongly advisable for the web administrators to implement GHH application on each of their site to scout those attempting to compromise on their data security. In addition, the GHH functions would offer an excellent tool in customizing the information and using the attack database to gain entry into useful statistics about potential attackers, report their activities and if need be decline access to selected resources. In its operation principle, GHH poses like a web server that is vulnerable to attacks and allows indexes from search engines. It makes use of a transparent link that can only be accessed by crawlers indexing the site and denied from normal searchers. This is then connected to a configuration file that is directly linked to a log file that has all the data about the host. The administrator can therefore make use of this host information to gather details of the Google attacker including his IP address and effect a successful access denial for such an attacker. This denial is aimed at reducing and discouraging the efforts of potential attackers (Ghh, 2007). The operation of the GHH in protection against attackers can be summarised through the schematic diagram shown below

Source httpghh.sourceforge.netintroduction.php
3.4 Conclusion
It has been the cry of most of the internet users that the efforts to enhance the internet security shall be upheld. While the hackers are working very hard to poke holes in the internet super highway, companies like Google are investing heavily to design tools that will ensure that the internet security is never compromised. The threat of hackers is more real in the world today than it has ever emerged before and therefore every effort to minimize the activities of the hackers and the effectiveness of their tools is being taken very seriously. The internet has brought light to dreams of global village and it has also become the driving tool for most of the operations in every social, economic and political sectors. It is difficult today to think of a world with no internet for most system is running on the internet. This fact explains why such system must be protected at all costs by people that would have evil intentions to paralyse the economy and take life of masses by just gaining access to information that was never meant for his consumption. However, the battle of protecting the internet from hackers cannot be won if all the stakeholders cannot work together towards a common goal. Morality also must be taught and practiced by IT learners to ensure that the skills learnt are only used constructively.

Chapter 4 Design and implementation tools that can detect Google Hacks
4.1 Introduction
The Google tools developed to fight Google hacks can be categorized into five main groups. Other designs and implementation language tools for Google hacks attack are generally based on these five sections (Foster, 2005).

The first category is the socket initialization stage which is created by most of the implementation languages to install a socket that is used in transferring and receiving data from the Google database. The second category involves sending of Googles request or queries. At this stage the queries exchanged through Google are ascertained and the appropriate arguments in each of the query formatted. After this stage the next category involves retrieving the Google output that is returned from the initiated queries. The output will normally contain various sets of data which include the number of exact hits realized in each of the queries. The category will also return full URLs and web links hit during the running of the query. The fourth stage involves separation of information in which case the important data is filtered from the bulk amount of data returned by Google. The required data is searched by use of the of about search option preceding the number of hits generated for the page. His- helps in directing towards the exact location of the overall hits. The last stage returns the total number of hits that are returned during the query giving a room to establish a usable code. The extension of this code can be done at a later development of the script (Foster, 2005).
4.2 Design and implementation tools

4.2.1 Python Implementation
Python is one of the most efficient tools in Google hacks detection based on the number of lines it requires for one to obtain the required results. Python language is object-oriented a fact that makes it require few lines to accomplish a task that can take long to carry out using other languages. In this tool digits are stripped by the use of regular expressions as opposed to the use of loops. The coding in this language is done in blocks that enhance proper handling and error detection when in use.
The design of Python has the first section that describes the modules that are needed to execute the program. The tool employs the import option to gain access to specific methodologies from other sources as they may be required. The second section has 4 variables that include the socket port, object, query and the host whereas the third section contains the codes that are used to execute the script codes. The socket structures are created in the first line of the script with the unique tryexcept blocks encapsulating socket generation and code matching. Whenever except command is initialized, an error message is returned at STDOUT. In case there is failure to create a socket an automatic debug message is returned at the output. Section four of the python language is used for the Google queries as well as for storage of the generated Google responses. The 1st line in this section defines the HTTP request that is fed to the socket. The my-index variable should be set to zero so that it can act as a counter to generate the number of times that Google responses are received and record the total hits encountered. The Google responses are looped one at a time till the required line is captured within the memory buffer. The while loop is used in looping the returned responses until the about line is determined when the my-index value exceeds one and thereby breaking the loop and ultimately closing the socket. Section five is the final part of the python implementation script with its initial line code using the index identified in the previous section to take 30 bits long Google output returned from a query. The 2nd line puts together regular expressions aimed at identification of the maximum digits at each specific performed string. The tool has find-all option that is useful in creating lists of all the digits contained at any given slice where it can then be changed back to a string by the use of Join option and finally be fed to STDOUT for printing. The STDOUT is contained at the last line (Foster, 2005).

This script can be extended to remove out site links returned by Google or the resulting URL. Though the process is a little more complex, successful results can be obtained by creating a loop structure and implementing a recurring search engine to fish out all URL-based strings that are contained in the response section. Once the URL strings are retrieved the tool provides the user with an option of treating them differently depending on the nature of the strings. This property is particularly important for it helps in the appropriate treatment of strings that could have been sniffed out by the Google hackers. Such URL can be deleted and thereafter installed at a safer location. Below is an example of a python implementation extract based on object-oriented language script and designed to include the source, output and documentation (Foster, 2005).

4.2.2 C Implementation
C (sharp) is also an important tool in the design and implementation of Google attacks in normal applications as well as in the programmed penetration testing devices. C language is object oriented and employs the .NET C object that has the functions of initializing automatic queries to Google, displaying results and shows the total number of hits for each of the queries carried out (Klevinsky et al, 2002).

The codes for this tool are contained in a single object and not in functions and therefore an appropriate object should first be created to take care of the primary functions. The new created objects allow the user to be able to re-use the same code in other different applications that could function in furthering the automation of the Google querying processes. The new object is referred to as the Google query and it contains three options the Constructor, Send-Querry and Get-Total-Hits. These three options are public methods with other three private variables the string-query, int-port and the string-server. The constructor is designed to create TCP sockets through the socket objects constructor and thereafter looking up for the google.com IP address using the integrated static C unique process the Dns.Resolve. This process gives back an output of the type IPHostEntry that is useful in extracting the google.com IP address through proper referencing of the initial indexes of the Address-List option in IPHost Entry-ipHostInfo.AddressList0.   The code then comes up with another object of the form IPEndPoint after which it draws the IP address and the port number on which the address is to be connected. Through the socket objects Connect method of the C implement tool, the IPEndPoint object is taken in as an independent argument thus allowing for connection to the google.com-port 80 (Foster, 2005).

The sendQuery allows in passing a HTTP-GET command to the created Google socket. It should however be noted here that the socket-send allows for byte arrays as opposed to ASCII stringed data and therefore all ASCII stringed data must first be converted into byte arrays by the use of ASCIIEencoding.ASCII.Getbytes method before executing the code (Foster, 2005).

The initial nineteen lines of GetTotalHits are not executed until all information reaches the socket where it is then integrated into a single buffer. The Socket-Receive methodology is employed y this code. The final part of the code uses .NET expressions by which the components are named making the referencing exercise easy once the pattern is established, that is, m.Groupscount.Value. Once this is accomplished the Regex object passes through the buffer from Google by use of the Match option. In case the match is established a string containing the exact number of hits executed by the query is returned as the output (Foster, 2005).

In cases where the Google hackers activities are detected quick action should be taken to establish protection measures. This can involve sealing any possible entry points through which the private data may leak to the hackers or removing the sensitive data altogether (Conway and Cordingley, 2004). It is always advisable to note that the policies exercised within the usage of an intranet are not compromised by any user for the Google hackers can always take advantage of such a foothold.
The code shown below is an example of a C implementation tool with the source and documentation section included.

4.3 The Role of Google Hacking detection tools
As discussed earlier the experience of the majority of Google users who are committed to the terms and conditions of use can be blurred by a simple minority who are committed to activities that harm their privacy. Considering the danger posed by such users (hackers) it is expedient to device tools that would discourage such malicious people and ensure that the business of the rest of the clients is carried out in a secure atmosphere. A number of Google Hacking detection tools have therefore been designed to address this problem. Google hacking detection tools are either designed to scan the vulnerability of a site or to detect attempts by Google hackers to penetrate the site. Both practices are important in ensuring that the security of sensitive data in a given site is assured. In either cases the results helps the user to scale down the vulnerability level by tightening the security level. This is done by identifying the pre-disposing factors and designing ways in which such factors can be eliminated. In penetration testing, ethical Google hackers use specific detection tools to isolate vulnerable areas. Though some of the hacking detection tools can be modified to serve the two functions, this Thesis focuses on each of the tools and its contribution to server security monitoring (Long, 2005).

One easy way in locating the vulnerable areas is by use of demonstration pages in which case a display of the link of any affected software is returned. Although not all advisories return such links, Google queries can easily be used in locating the vendors page and analysis can then be done on the vulnerability levels based on the returned results. This is however not a very reliable method to determine the vulnerability level of seriously sensitive servers. Hacking detection tools can also be used to isolate vulnerable targets through the source code. Given the fact that a workable query is difficult to construct and can easily be short-circuited by experienced hijackers it is important for a site administrator to understand clearly how to come up with effective queries that would locate vulnerable targets. The accuracy of such results would be based on the content of the actual query posted and therefore different queries can be sent to extract the most explicit result (Foster, 2005).
A hacker might use a cleverly designed URL to access data available from a vulnerable target and therefore it is important to ensure that the secured data is stored in well tested directories. Hacking detection tools can be useful in searching a string that would specifically follow the vulnerable target by visiting the site of the software vendor to get the source code of the offending application. This source code is important in analysing the security status of the potentially vulnerable target. There are many examples of queries that can be specifically designed to locate vulnerable areas of a given website. CGI scanner is also an important hacking detection device that is useful in detecting the vulnerability of a system. CGI program defines a list of potential vulnerabilities in web files and then goes ahead to locate such cases from a given server. The scanner depends on the availed response codes to measure the vulnerability of web files and directories. A Google hacker can easily target such vulnerable areas by using Google queries constructed from snippets from such targets. The query results can then be used to identify the servers that could be hosting them and any sensitive data in them can easily be accessed. When such files and directories are identified by certified ethical hackers during penetration testing of a server, efforts should be made to ensure that they are returned into a more secure status. It is also important to run a test and identify any possible attempt made to hack such vulnerable areas so that the privacy of the content can be redressed (Long, 2005).
Another role of hacking detection tools is to detect attempts by hackers to access data stored in secured servers. The result from such detections is significant whether the hacker succeeded to penetrate or not for it informs the server administrator on the looming danger and potential enemies to the stored data and therefore he can be able to raise the security level of the directories (Long, 2005). Though high level security measures may be exercised in a given server data leakages may still occur and it is therefore important that occasional checks be made to ascertain that no hackers are attempting to gain entry into the database.

Robots.txt is one of the detection tools that would help in detecting any unauthorized attempt to gain access into the stored web data. Robots.txt file should be strategically located right at the root of the server with the web server set to allow for the reading of its files. The robots.txt disallows the unauthorized crawlers and leaves a report of their attempt to access the server. The server administrator can set the disallow line code of the robot.txt to specify what certain crawlers should be allowed to view and what they should not have access to. However this tool is not sufficient in detecting hacker attempts in a sensitive server for clever hackers can manage to trick the robots.txt and crawl through into the server system unnoticed (Foster, 2005).

Gooscan that has already been discussed above is another detection tool that can be valuable in detecting vulnerable files as well as hacker attempts to penetrate a given site. Though Gooscan is not based on Google API it has the potential of scanning Google for data that may have leaked from a site that could be under review. Gooscan is a UNIX-based detection tool which requires high level knowledge in command execution but it has an advantage in that it provides a good option for servers running on window-based tools. Just like Athena is another hacking detection tool that can be useful in detecting the status of the server and attempts made to penetrate secured areas. Athena employs the GHDB as its primary source a fact that boosts the reliability of its scans. In addition, Athena allows the user to modify its XML-based configuration files to suit the specific requirements of the server under investigation. SiteDigger is a powerful hacking detection tool that detects any information leakage and can therefore help to identify any successful or failed attempts by hackers to access a given site. SiteDigger is based on the Google API and can be fully automated in its operation a fact that makes it a more reliable tool. The tool is limited on the number of URL returned during penetration testing and on the size of its signature database however its ability to run on automated mode has kept it as an important tool. The other important tool in detecting attempts by hackers is Wikto that would require a Google licence to operate and it is fully compatible to GHDB. Wikto is more powerful tool which does not only query Google but can perform many other tasks under its extensive features making one of the most convenient and reliable tool in security monitoring (Foster, 2005). Finally Google Hack Honeypots (GHH) is a hacking security tool that helps to eliminate the threat of leaking sensitive information to hackers by use of honeypot theory that guarantees effective scanning for vulnerability areas as well as detecting attempts by hackers. GHH monitors the activities of hackers and traces it back to their PC Ids and therefore appropriate action can be taken against them from the information contained from this detection tool. The servers connected to GHH can also get the information about the service providers of those attempting to compromise on the security of their system. Action against such hackers may serve as a way of checking the occurrence of such activities (Ghh, 2007).

Though reliable hacking detection tools are available for server administrators in testing for vulnerability areas in their servers and detecting attempts by hackers to compromise on their security, workable policies on data security must be emphasised as the main way of staying secure as an organisation. Sensitive data that is not meant for public consumption can be exclusively used from a secured intra-network away from the hackers.

4.4 Safeguarding Google stakeholders from Hackers
In the modern global economy, cyberspace is being used to drive most of the industrial operations, communication, machine controls, purchases, financial transactions and even government operations. As the world becomes more and more dependent on the internet, hacking activities are becoming more common and sophisticated targeting almost every single activity that is being performed online. As such, many governments and private operators are out to sensitise their experts on how to exploit the great internet potential while at the same time not falling victim of the hackers syndicates. The risk index of different users of the internet is different depending on the purpose that the user employs on the internet and how sensitive the information of such a user is perceived by the public. The sensitivity of information is mainly tagged on the number of people that it targets, and its security and financial interpretation. The risk index of a user is so much a function of the both the internet dependence and the level of accessibility of the internet information by the outside world. It must however be noted that the availability of online information does not necessarily mean that such information is available to masses. The accessibility of such information to the masses that may not be informed of the exact location links is made possible via search engines like Google. This means that if the content of a particular website is not accessible through Google, then its visibility is greatly reduced and the risk factor is lessened (Verton, 2003).

Google has been working alongside its stakeholders including government authorities, Internet service providers, business companies, internet administrators and other end users to censor the search engine users on the basis of the key words used to execute a search. The Chinese government was one of the first authorities to benefit from Googles censorship programme, where those people who were caught using some prohibited words in the Google search engine could have their IP connectivity lost. Google has also worked with many other governments especially in the effort to protect national security systems. Some versions of Google like the French and German are equipped with self-censorship software in which case words like Nazi content cannot be generated from the search engine.  With the rise of terrorism activities, Google has also worked had in had with governments especially the US to ensure that  terrorism related contents are removed from the search databases and that the internet is continuously monitored for any illegal activities. Google censorship efforts goes along way towards discouraging hackers with terrorism interests from successful fight against national and international security systems (Conway, 2006).

As the organisations move from hit economy to link economy where the reputation of a companys site is no longer based on its design but on whether it shows up among the reputable websites, Google comes in handy to offer basic Web epistemology as an indicator for data accessibility in the search engine based reputability dynamics. This gives the organisation an option of deciding on the extent to which their information can be accessed making it more difficult for hackers to penetrate such sites. According to an estimate done by the Federal Bureau of Investigation (FBI) in its report released in January 2006, hacking and other related cybercrimes costs the US companies in excess of US 67 billion every year. The censorship programme by Google is meant to reduce such wastes by the companies, safe time and therefore increase the profit margins for such companies and organisations. The approach if well implemented should protect the companies sensitive data in the electronic form including electronic assets. In addition, it will ensure that the safeguarding procedures are in place throughout and therefore can be relied upon ((Conway, 2006).

Googles individual end users have been falling prey of hackers, phishing scams and other cyber crimes distorting them of their hard earned cash. All this activities are guided through the powerful search engine. Though it is difficult for Google to establish all fake money laundering and phishing scams, the company is working hand to ensure that those exposed in such deals do not use Google as an advertising agent. This effort also the internet service providers for they are able to use these punitive measures to expose those caught through hacking sniffing tools and forward them for punishment that can include permanent blacklisting of the IP address. This has helped in ensuring that some form of cyber discipline is maintained (Verton, 2003).

The steps taken by Google to protect the stakeholders have played a big role in ensuring that safety and discipline is maintained in the usage of the giant search engine. The terms and conditions for Google are clearly against activities that would jeopardize the activities of other users and stiff penalties are in place for the offenders. However, while Google may be seen as a company just like any other that is ticked by making sizable profits, the protection of its stakeholders from hackers should be taken as its ethical and moral responsibility. In some cases Google has been forced to comply with these protection measures within some jurisdiction like in China to avoid court battles. Moreover, the efforts to protect the stakeholders should be taken voluntarily as a corporate social responsibility and applied uniformly across the world. 

4.5 Conclusion
The journey to design and implement Google hacking detection tools and keep the sensitive information away from hackers has not been simple. The more the technology in detection tools advances the hacking techniques also rise up with more advanced approaches. Whereas cybercrime has been ignored by many governments in the past, the dependency on the internet today may not allow any government to ignore such activities. The fear for terrorists using internet as a tool for mass destruction is real and therefore the censorship programme between Google and some of the government to ensure such moves do not bear fruits is a step in the right direction. Google as a company has a moral obligation to ensure that its stakeholders facilitated researching for advanced technologies for designing hacking detection tools. In addition Google should continue to actively participate in ensuring that all the stakeholders are safeguarded from the search engine users who may have intention to harm them. With such cooperation, internet would be a safe place and its role in globalization will be enhanced.  

Chapter 5 Google Hacking Solution A Design for Signature Based Device Solution
5.1 Introduction
A hacker or a cracker is a person with the ability to explore the details of programmable systems and the knowledge to stretch and exploit their capabilities whether ethical or not. Hackers typically attempt to test the limits of access restrictions to prove their ability to overcome the obstacles. Some often do not access the computer with the intent of destruction although this is often the result. Other categories of hackers include hactivists and criminal hackers. Some hackers seek to commit a crime through their actions for some level of personal gain or satisfaction. As the efforts and activities of the Google hackers intensifies it is becoming more and more advisable for server and network security administrators to deploy proactive devices and tools that would protect the system from potential attacks.

Intrusion detection system is one of the devices that has proved very vital towards the protection of sensitive information for different companies around the world. It is however important to note that as more subtle systems continue to be developed to fight hacking activities, the more the hackers develop more cunning attack applications that can gain entry past the advanced security system. This tug of war plays a great role in the development of modern anti-hackers security systems (Scarfone and Mell, 2007). The Google hacking solution is one that is based on intrusion detection in order to detect the activities of the above mentioned hackers. Intrusion detection system (IDS) as the name suggests are used to detect intruders to a computer or to a network. Though the intrusion may be authorized or unauthorized the effective functionality of the signature based device will be evaluated on the ability of the device to detect the intrusion (Dowell and Ramsted, 1990). Intrusion can be understood as the attempt of an unauthorized person to break into, exploit or misuse a computer network system. The internal security system of any organisation should clearly define the activities that can be interpreted to constitute intrusion to the system. Whereas intrusion from the workers from within an organisation are rampant, the most significant security threat for which IDS plays an important role is protecting the internal network from the external hackers. An effective security system should however be efficient in detecting and reporting the two cases of attempted attacks. IDS would monitor both internal and external security threats concurrently and report any suspicious activity to the security personnel. The detection is made possible by use of the systems critical components namely the IDS management and IDS sensors (Scarfone and Mell, 2007). IDS management works as the primary collection point for feedback signals while configuring and deploying the IDS sensors within the entire network. These IDS sensors could be in form of software or hardware and their main task is the collection and analyses of the systems traffic. The sensors can be chosen from either host IDS or network IDS. Host IDS is normally a server based agent that runs within the server monitoring the operating system. In the other hand a network IDS which can be placed on a networking device monitors the overall network traffic as a stand-alone device.

 A signature based Intrusion Detection System continuously checks the traffic of the system for consistency by comparing the packets flowing within the network against a database of other established patterns commonly used by attackers. A signature can be understood as set or pattern of activities which can be detected by a protocol decoded packet and raise an alarm in case malicious traffic is noticed. The signature has pre-set instructions that are considered in defining the kind of activities that should be considered as malicious and dangerous to the system (Crisco, 2010).
The most important factor to consider is where to place this signature based device since its functioning will be determined by where it is placed on the network. The diagram below shows the position where a Google signature based device should be placed in a network. The signature IDS may however not detect all types of intrusions due to the limitations of the detection rules. On the other hand, a combination of statistical system and signature based system could go a long way in solving all these detection problems (Dowell and Ramsted, 1990).

Signature based IDS can be classified into three categories depending on the principle of the operation. These categories are Simple event matching IDS, protocol decode-specific analysis IDS and the Heuristic based analysis IDS. The event matching security system checks for a specific arrangement of bytes in a given packet against the reference database. This makes the system simple and reliable and it is useful for all protocols. Protocol decode based IDS are however particular in the kind of protocol components including specific field size and content as well as processing of Request for Comment (RFC) infringement within the network. Whereas any slight form of modification in the simple event matching IDS may lead to false negatives the protocol decode based IDS reduces the chances for false positives by its nature of highly specialized and specific protocol components (Cisco, 2010).

A false negative is a state of the signature that leads to failure in firing an alarm when a malicious traffic flows within the network. In this case harmful traffic are considered as normal traffic by the IDS sensors and are allowed to pass through the system without alerting the security personnel. The consequences for this condition can be disastrous and therefore it should be the priority of every security personnel to ensure that such cases are minimized if not eliminated. False negatives can be as a result of lack of latest signatures in the IDS sensor or due to defect in the software used by the IDS sensors. It is therefore important for signature based IDS configuration to be checked regularly and kept up-to-date with the emerging hacking techniques. A false positive in the other hand is a condition in which the IDS sensors considers a legitimate activity as malicious and raises an alarm for intrusion to the security personnel. This condition can be very costly and disruptive to the security personnel especially when the number of false positive alarms to deal with is high. Some tricky hackers may also cause the network to send large number of false positive alarms that would overwhelm the security personnel and in the ensuing confusion get a chance to trickle in some malicious traffic. Due to the rising intensity in the hackers activities and the dependence of software, it is not possible to completely eliminate cases of false negative and false positive signals and therefore updating of the IDS configuration should be a regular practice.

As shown in the table below, the use of signature based IDS has a number of advantages to an organisation. The system can be used alongside policy-based IDS and anomaly based IDS (Cisco, 2010).

AdvantagesDisadvantagesReliable alerts. The rate for false positive alerts is relatively low.A single case of vulnerability may need many signatures.The system is easy to use and customizeRegular updates are needed because the system may not detect unknown attacks.It is applicable to all protocolsModifications may attract cases of false negatives. In most cases experienced hackers make use of crafty URLs that contain HTTP request at the beginning. This kind of an attack can be detected by a signature based IDS by analysing the signature at the start of the dataflow of the incoming flow of data. For example, the diagram below shows a case in which an attacker tries to access sensitive passwords from systemtheetcshadow file from a Company X.

Source Cisco, 2010.

5.2 The Device Properties
Intrusion detection is one of the major properties of a signature based device. Intrusion detection system   refers to the process of discovering unauthorized use of computers and networks through the use of software designed for this purpose. The IDS is one that should monitor traffic in a network and determine unusual activities. In this regard it should provide adequate monitoring reports to the management workstations on a regular basis or as scheduled. Any case of an intrusion should create an alarm that will alert the management of the intended intrusion. Attack signature detection tools are tools that look for an attack signature, which is a specific sequence of events indicative of an unauthorized access attempts. A simple example would be repeated failed login attempts. One of the major properties of this device is intrusion prevention. Intrusion prevention is a process by which a signature based device detects an intrusion and attempts to stop the detected intrusion by alerting the management station and creating a report on the same.

An intrusion prevention system (IPS) is an extension of IDS. While IDS detects and reports suspected activities in a network the IPS takes this a step further to block unauthorized connection in real time and in an automated manner through access control devices such as routers, firewalls and application level proxy. The IPS approach appears to be effective in limiting damage or disruption to systems that are attacked. Once unauthorized access is detected by the signature device, the IPS will send a message to the appropriate devices to block such access. By this property the device proves its effectiveness in the modern world whereby there are thousands of malicious codes trying to hack into systems each and every day.

Another major property of this device is report generation. This device is made in such a way that all the activities of the network or system it monitors are recorded. Any policy violation is also recorded and therefore the management is in a very good position to get a clear view of the network and its vulnerabilities in case of an attack. This device has the ability to raise alarm in case of an intrusion. This is a very important property since it is through this that its functionality is tested. In this regard its failure to detect an attack leads to false negative reporting which can adversely affect a system or a network (Smaha, 1988).

The functioning of signature based IDS is centrally based on its properties in signature processing, comparison and refreshment that together constitute the signature processing engine as shown in the schematic diagram below. These properties ensure that the comparative analysis on the traffic is correct and that the right treatment is effected.

Signature Processing Engine Source Pejovi et al, 2006
The signature engine works within intrusion detection context where thorough real time calculation is executed and therefore dynamic reconfiguration is continuously useful in coming up with free intervals. Otherwise the IDS device can have a special design with provision for dynamic reconfiguration time. The general properties of the device are designed in a way that the final solution allows for signature updates by providing meaningful degrees of freedom. This can be achieved by ensuring that the memory structure for signature data base has enough free space and it is well organised. This would then allow for a part of the memory to be exclusively used for receiving current signatures sieved from the traffic flowing from external sources and the other part can be used to refresh the data base. As a regular practice time functions of the parts would need to be switched to ensure that the most updated version is always the one in use (Pejovi et al, 2006).

5.3 The Working Principle.
Today attempted violations can be detected on either a real time basis and acted upon immediately or after the fact. For interconnected networks, often the first line of defence for attacks on organizations trusted network and or system is an intrusion detection system. The following schematic diagram illustrates how a signature based device works in a normal circumstance.


DetectingSniffing
The first step in the working of a signature device is that of DetectingSniffing. This is whereby the presence of an intrusion is detected and the relevant systems are alerted. The signature based system works just like any other intrusion detection system such as a burglar alarm. An intrusion detection system triggers a virtual alarm whenever an attacker breaches security of any networked computer (Interactive Tutorial, 2010).

Identifying attack vectors
This is the second step in the operation of the signature device. This is where all the traffic that caused the attack is analyzed and the attack vectors are identified
A stealthy key stroke logger watches everything the intruder types. A separate firewall, cuts off the machines from the internet anytime an intruder tries to attack another system from, for instance, a honeypot. This means that this system is put in place in order to protect the network or system from an attack and theft of vital information (Interactive Tutorial, 2010).

Tracing host(s)
This is the third step in the working of an intrusion detection signature based device. This is whereby the hosts that initiated the attacks are identified and the corresponding information is relayed to the relevant sectors.

An intrusion detection system works in a variety of ways, either host based or network based, to analyze and log incoming data for signs of unauthorized access. In designing these systems, organizations should identify scenarios or events that constitute a high risk, provide for real time notifications to system administrators for these events occurring for example intrusion detection systems interfacing with pagers, and establish procedures for responding to those events (Interactive Tutorial, 2010).

Tracking user(s)
This is the fourth step in working of a signature based device whereby all the users are traced in order to identify the exact person who initiated the attack.

It is also the step at which the signature based device tries to identify and isolate the target(s) for the attack.

A signature based device uses highly effective methods for signature screening which include stateful pattern recognition, protocol parsing, heuristic detection, and anomaly detection to detect intrusions (Interactive Tutorial, 2010).

This system works as follows. When an intrusion occurs, the intrusion detection system or in this case the signature device generates an alarm to let the management know that the network or system is probably under attack. The main problem with this is that just like an alarm can generate a false positive a signature device can too. This is the occurrence of an event that signals a signature based device to produce an alarm yet no possible attack has taken place at all. The level of reliance just depends on how many false positives the signature device reports since if it produces many false positives it is not effective. The device may also fail to trigger the alarm leading to a false negative and therefore incase an intrusion occurs it is not detected (Smaha, 1988).

Neutralizing attack
This is one of the critical functions of a signature based system. This is because if the attack is not neutralized the malicious activity could have adverse effect on the system.

The signature based device has a triggering mechanism which generates an alarm in case of an intrusion. Different signature based devices trigger alarms based on different network or system situations. The most commonly triggering mechanisms are the anomaly and misuse detection mechanisms. When this is triggered a neutralizing mechanism is applied which reduces or completely avoids the effects of the attack (Interactive Tutorial, 2010).

Reporting Host
This report justifies whether the attack was one that was critical and the areas of target. It also gives a report of the mechanism that was used to deal with the intrusion. In this report the different parameters about the attack are also recorded for further use (Interactive Tutorial, 2010).

Making of a recommendation
The last step is the making of a recommendation on the system. This is a process whereby the report is reviewed and the drawbacks highlighted in the report are taken into consideration (Interactive Tutorial, 2010).

5.4 Connecting to Server
It is very important for a signature based device to be connected to the server because this is where all the traffic passes through before it is distributed in a network. In this regard the connection must be in such a way that all the signature based device will be in a position to detect all possible intrusion. The figure below shows how the signature based device should be connected to a server in an information network.


Source httpwww.davesite.comwebstationhtml
The figure above shows an information network. The signature based device is connected to the server at the terminal OP-3 whereby it is able to detect all possible intrusions to the information network. In this case an alarm will be triggered at the server level in case of an attack.
5.5 Synchronizing with Longs Database

Long database stores data on activities for enormous network and system devices where errors and subsequent warning messages to syslog are written. The signature based device must be synchronized with such a database in order to produce an error log of an intended attack when required. The error logs are recorded in real time and this is why this synchronization is very much necessary to make sure that factors such as instance identity and node number, reporting component and function and the error ID and alert identities are captured in good enough time. A signature device must be properly configured and tuned to be highly effective. Threshold settings that are too high or too low will lead to limited effectiveness of the signature device. One problem with improper configuration is that the signature device itself could constitute a threat since a clever attacker could send commends to a large number of hosts protected by the signature device in order to cause them to become dysfunctional. This synchronization is very important in providing detailed information on the time period which the anomaly or misuses occurred and the host ID of the intrusion. Without this the logs database would be futile since this information is very important in defining the nature and extent of an attack (Interactive Tutorial, 2010).

Most signature based devices  today have security features that enable a security administrator to automatically log and report all levels of access attempts-success and failures. For example the signature based device can log computer activity initiated through logon ID or computer terminal. This information provides the internal information security administrator with a log to monitor activities of a suspicious nature, such as a hacker trying to brute force attacks on a privileged logon ID. Also key stroke logging can be turned on this signature based device for users who have sensitive access privileges. What is logged by the device determines its level of detection of malicious activities (Interactive Tutorial, 2010).

5.6 Error Reporting.
This is the most important function of a signature based device. Error reporting involves the process of collecting available information on intrusion activities and evaluating the errors that arise as a result of the intrusion. Error reporting is done after actual intrusion whereby each and every bug created out of the intrusion is recorded in the logs database. From this database measures are taken to correct the error. These errors are usually placed in a memory dump whereby corrective action is done based on the type and extent of the error (Interactive Tutorial, 2010).

5.7 Securing the Application
Just like a burglar alarm system the device needs to be secured from other attacks such as theft. A signature based device will only be effective if its availability is proved worthwhile. It is in this regard that this signature device is secured from all other attacks.
The application must also be secured from introduction of malicious code in its working process. These malicious codes are very dangerous and may cause the device not to work properly thereby producing false negative and false positive reports.
One of the major problems that have been experienced by even Cisco professionals is the complete securing of the application. The application however can be secure by placing sensors such that any attempt to tamper with it is detected in the earliest time possible (Interactive Tutorial, 2010).

Conclusion
As mentioned earlier a loss in the battle against the hacking activities in an internet dependent world can be devastating. The effort by Google and other IT based companies to invest in advanced technologies and fight the vice is therefore a welcome move. It is however appropriate for governments across the world to come up with more strict legislative measures that will completely deter hacking activities. As most government and other stakeholders join in the fight against the hackers internet security is assured.

Chapter 6 Guidelines against Google Hacking

When a server accessibility functions are open to the general public, then the vulnerability of the sites is high and hackers can attack with no much efforts and access sensitive information. It is important for the companies to understand that indexing of error pages, directory listings and even the login pages is possible and when a site is indexed by Google search engine it can provide data that may be very useful to potential Google hackers. However there are preventive measures that the Google stakeholders must understand to successfully face these challenges. The basic guideline principle for Companies, ISPs, Administrators and End Users towards avoidance of Google hacking is prevention, protection and response. It is also important to learn basic Google hacking techniques as revealed in this Thesis as a means of providing some good level of protection and be able to identify important data disclosure strategies and policies in anticipation of Google hacks. Familiarization with hacking techniques will enable one to be able to make use of automated tools for Google hacks and this would eventually guarantee protection for each of the sites page. These automated Google hacking tools provides the system with effective monitoring checks at relatively high frequencies that can be achievable through manual security checks.
Listed below are some Guidelines for Companies, ISPs, Administrators and End Users towards against Google hacking
Always make sure that the host and network primary security tools are working.
Building of systems security elements including authentication, role and key management, audits, crypto as well as protocols must be done.
Standard policies that define data protection issues must be created, understood and shared.
Take advantage of external penetration testers to carry out checks and identify any possible problem within the system.
Identification of gate locations should be done and gathering of important artifacts be carried out in a regular manner.
Learn and understand all available regulations and unification processes and approaches.
Establish all the requirements relating to personal identifiable information (PII).
Carry out awareness training whenever necessary
Make regular security reviews
Come up with customized security standards
Ensure faults in the software found during operation monitoring are fed back to development.
Carry out regular testing to affirm that QA effectively supports boundaries value condition checks.
Classify data into build up schemes and inventories
Make proper use of Google automated tools

    6.1 Companies
 To ensure that protection measures from Google hacking are successful, the companies should ensure that sensitive data is well monitored and retained only within the companys intranet. Security tokens for specific files should be allocated to enhance security through appropriate balance in confidentiality, integrity, availability and performance. Installation of basic protective components such as firewall, Robot.txt and GHH should be carried out.  Hacking incidence against the companys data should be responded to immediately to ensure that such cases do not recur in future.

    6.2 ISPs
The ISPs should take the responsibility of ensuring that their clients are aware about the importance of observing basic protection measures against Google hacking. Active awareness campaigns should be conducted among the companies and end users to ensure that sufficient information is available and that it is being practiced. The ISPs should also install, maintain and monitor their own hacks detection and prevention system and advice their clients accordingly. The IPs for those caught creeping and stealing data from other system should be expelled from the system as a way of discouraging the habit.

    6.3 Administrators
Other than the details detailed in the above guidelines the administrators must be aware that they are the professional links that most of the other stakeholders largely depend on. They should therefore make sure that the companies are guided appropriately towards implementation of the guidelines that are listed above. The administrators should also ensure that they keep themselves updated with the changing hacking techniques to ensure that they are not left behind by the high pace of advancement and trickery of the hackers. It is advisable to design well customized detection tools and review them regularly.

    6.4 End users
It is important that the end users learn the basics of Google hacking to have sufficient knowledge for enhanced protection. The users should be aware of issues that raise the vulnerability of the attacks and recognize the fact that even the hidden files that contain important information such as login details can be indexed by experienced hackers and give out important leads to the entire system.

Chapter 6 Conclusion
Google is a site that is known for its simplicity in the use of its highly integrated search engine and has become the most popular tool for obtaining desired information from the internet (Delaney, 2005). Most Google users use it for ordinary information searches with others making use of its more advanced functions to locate and search information through specified URL links and directories. Owing to its popularity and simplicity hackers have intruded in to the system to perform functions that were not intended for the site. Google hackers target sensitive data available at various web servers having unsecured files and directories (McMillan, 2005). But as seen from the Thesis, this dark side of Google has led to design and implementation of many advanced tools to counter the hackers effort in their relentless activities in penetrating unauthorized sensitive data. Google has also developed interactive tools to assist web server administrators in detecting and removing files attacked by Google hackers. The development and maintenance of Google API tools has gone a long way in ensuring that Google users can be able to catch up with the rising penetrative skills of the Google hackers. The Google hackers on the dark side of Google have gone to a point where they make use of more functions of Google than the legitimate users (Lancor and Workman, 2006). It is important for Google users to use the Google functions with close adherence to the provided terms of service (Cole, 2002). However, since a section of users will always contravene these terms it is advisable for web administrators to ensure that their systems are secure and the private data are not linked to the public servers (Paxson and Allman, 2007 and Long, 2007). The intranet users should also observe high level protection policies to ensure that leaking of data is not initiated from the internal negligence (Mohanty, 2005 and Cole, 2002). The information contained in this Thesis is aimed at alerting the Google users to become more alert and avoid being vulnerable to Google hackers and it should therefore not be used in any other way that would harm other internet users.
Google has contributed enormously in the socio-economic aspects of millions of its users and its contribution in global interaction has helped in making the dream of global village a reality. It is for this reason that every effort that discourages the activities of Google hackers should be supported. As the term Google becomes a household name around the world it is difficult to imagine of a future without Google and therefore the effort to strike a solution for Google hacking is worth the effort.
This Thesis has expediently examined the dark side of Google and therefore it is a good resource in guiding against Google hacking activities in a manner that can suitably assist Google and its stakeholders if used objectively. This is in line with the Thesiss aim and objectives in providing information on hacking that will guide against this activity that has been on the rise in the world. The information should however not be used as a tool for hackers to perfect their trade.  For ethical hackers who hack a system with an objective of sealing it more from hackers, the Thesis has provided techniques that can be tried against a newly installed server or site to ensure that it is save from hackers.

The Thesis has also discussed detecting and preventive measures against Google hackers that is of great importance especially to administrators of busy servers to whom the information about prevention of hacking is very crucial. The information is equally important to Internet service providers (ISPs) who are keen to track down the activities of hackers within their systems. To ensure that this objective is conclusively achieved, the Thesis has also provided a chapter on design and implementation tools that can detect Google hackers. With this information the ISPs can be able to ensure that any attempt by hackers to penetrate their system is detected and other activities executed from those IP addresses monitored to ensure that it does not bring harm to other users. The designs discussed are a significant guide to the Internet companies who may require building their own customized designs and implementation tools. The role of Google hacking detection tools have also been outlined for this purpose.

In accordance to the objectives of the Thesis a detailed guideline for safeguarding Google stakeholders has also been included. This will ensure that the valuable services for Google in data search are not adversely affected by the activities that are executed from its dark side. The role that Google has played in protecting its stakeholders have also been discussed to ensure that the confidence of the stakeholders on the giant company is not compromised by activities of few users who use the search engine in complete disregard of the Companys terms of service. The information on safeguarding Google stakeholders is also significant to the end users for they will continue using the services from an informed perspective. The end users will also learn that there is more in Google than just a search engine and this may elicit a need to learn more on how one can actively be involved in combating the menace brought about by Google hackers. To wrap it all, the Thesis has given an innovative step by step Google hacking solution that has been presented in form of signature based device solution. This will particularly be a great resource to both ISPs and server administrators in ensuring that they are able to come up with customized solutions to overcome the rising challenge.
It is important to note that the hackers are developing new techniques every other day and it is therefore important for Google and its stakeholders to keep researching for more avenues to keep hacking away from the internet. The ISPs and administrators should especially ensure that the systems under their watch are equipped with the available latest solutions to detect and block potential hackers. The companies and other end users should strive to ensure that the sensitive information that can be effectively used within an intranet is not exposed to the internet throughout the operations. High level of discipline and ethics are important in ensuring that the sensitive access information used within an intranet is not exposed to outsiders. Some of the information that should not be exposed by such companies and end users has been outlined in the Thesis. We all strive for a safe cyber highway free from hackers and it is the duty and responsibility for all of us to participate in making this a reality.

0 comments:

Post a Comment