Search engines in general and Google in particular know a lot about everyone. Moreover, Google can learn about you without you ever having used their services. They know what they know because people choose to trust them. But in fact, Google is quite draconian in their policies and approaches to identification, profiling and tracking of individuals and organisation along with associated interests, behaviours, and relationships.
Google offers many services, of which the vast majority are free such as Search, Safe Browsing, GMail, Apps, Docs, Maps, Wallet, Voice, Android OS, DNS, etc. It is worthwhile for Google to offer these free services so it can continue to identify, profile and track users.
Google is a relationship of convenience for users, but people and organisations should understand that Google has made it clear they intend to own your data regardless of its legality or your desire for privacy. Google's actions clearly show that it operates with impunity. From reading your emails and voicemails, collecting data from personal wireless networks, online book publishing without permission and use of third party applications, Google's intent is demonstrated through their track record.
Moreover, most organisations have absolutely no idea what data is leaking to Google. And since Google has no delete button and a minimum 18 months retention policy with no maximum, organisations have no sense of how much data is sitting on Google's servers. They have no mechanism to even track this. However, everything Google collects is public, by virtue of content or criteria. This makes all of your data accessible by content and/or criteria. In fact there is a whole industry devoted to this - search engine optimisation (SEO). However, what happens when the SEO's priority shifts from page rankings to uncovering an organisation's vulnerabilities or competitive business plans? These are Blackhat SEOs.
How can organisations understand the extent of this threat and mitigate it? By leveraging both technologies and methodologies.
Today Search Engine Data Leakage Prevention technology is available to identify which specific Google applications and services are being used within an organisation. Once identified, these applications, services and even file content can be blocked or logged. This offers the ability to, for example, allow Google Search without allowing Google Safe Browsing. The same holds true for the balance of Google's services.
Additional technology exists to account for SSL and encrypted traffic - i.e. traffic that circumvents organisational security. Simply by utilising HTTPS, any user or site can bypass any perimeter security controls organisations may have in place. This technology can enforce global security policies on all traffic including SSL or IPSec encrypted traffic and provide visibility into all traffic, which includes SSL encrypted Google traffic.
Processes for managing the Google threat can incorporate one or more of these elements:
Google Service Identification - To identify all individual Google services and applications that are being used by organisational users. Furthermore, this could also offer insight into the content, including documents, leaked to Google intentionally or inadvertently.
Google Service Control and Blocking - Once Google services and the extent of their use have been identified, the determination can be made to implement controls for certain services that are deemed intrusive or unnecessary, such as blocking Google Safe Browsing. Alternative, less intrusive and perhaps more effective alternatives can replace these. An organisation may also choose to automatically redirect users to alternative services if they select an undesirable Google service to simplify user transition.
Anonymisation and obfuscation - As is the case with all organisations, there are certain services that are deemed necessary such as Google Search. In these instances the user queries are anonymised through anonymisers. And obfuscation will eliminate the "The Search Bubble" (when Google delivers searches, based on your profile) is circumvented through eliminating the Google User ID (GUID) assigned to all users via cookies. Finally, user traffic to Google is further obfuscated through generation of random traffic to render its behavioural data invalid.
A combination of technologies and processes offers the needed visibility, control, mitigation and anonymisation to prevent Google from gaining an insight that Blackhat SEOs can leverage to identify vulnerabilities or confidential business direction. By understanding the extent to which Google touches your organisation, and by eliminating unwanted access and insight into your environment along with obfuscation of permitted functions, your organization can gain and retain full functionality while benefiting from the control you need.