research google invisible_web

How intelligent is your company?

Tapping into the Invisible Web for Competitive Intelligence

|

Did you know that over 70% of the Web cannot be accessed by search engines ? As the world is now almost exclusively using Google to access content on the Web, few research professionals realize that this tool - among other search engines- will not give them access to the data they need. The Invisible Web, however, included a wealth of structured, validated data that is crucial for Competitive Intelligence Professionals. This short note will outline some of the principles underlining the Invisible Web, as well as tools that will help you access it.


The Deep Web (also called Deepnet, the invisible Web, DarkNet dark Web or the hidden Web) refers to World Wide Web content that is not part of the Surface Web, which is indexed by standard search engines

Wikipedia

Why is there an " Invisible Web" ?

The are several types of information the search engine cannot tap into. To simplify, I will provide here a few examples that are most relevant to Competitive Intelligence Professionals:

Dynamically generated pages

A number of pages can be generated based on the criteria the searcher has provided. This is the case for example of requests made on sites that provide statistical data. Search engines cannot make requests for specific searches and therefore cannot access the data that is generated. 

An example of a site built this way is the World Bank Statistics

Excluded pages

Some site owners prefer to avoid having their webpages appearing in search engines. They can structure their webpage therefore by including meta tags that will cause search engines to avoid the page. This could be a problem when you are researching specific technologies or companies who voluntarily want to stay "under the radar".

Searchable databases

Much of the world structured data has been organized in databases that are fully accessible but require the user to use a keyword to locate the information he / she needs. Here are a few examples:

  • Articles: Newspapers like the Financial Times that offer their full archives freely to users
  • European Patents : as a researcher, you can access the list of patents deposited in the European Patent Offices
  • Incorporation documents: when researching a private company, you can browse the list of States websites giving you access to the incorporation documents of those companies, allowing you to access details about ownership, board composition etc...
  • Financial information: you can browse the SEC Filing to access financial data about US companies

However, when a search engine will try to index that content, it will fail and be blocked by the search form on that site - the search engine does not create keywords... As a result, unless the information has been provided page by page by the content owner to the search engine ( for example, the publications by academics for Google, now accessible though Google Scholar), none of the content will ever appear in a search engine.

Here is a good summary by Emerald Insight

What does it mean for the Competitive Intelligence Professional ?

Understanding the "Invisible Web" concept is one of the key skills of the Competitive intelligence professional for several reasons:

  • Access to structured data: our research shows that over 70% of the data needed for competitive analysis is kept in structured databases such as the ones listed above. It is therefore crucial to know how to tap into those sources. 
  • Ability to listen to weak signals: for those tracking weak signals, or new technologies, it is important to understand that a generic search engine will not return many relevant results. Drop Google and dive into the Invisible Web.


Meet the "pathfinders"

When there is a market, there is a product: "Pathfinders" have emerged to allow researchers to be able to track the path to those databases they so badly need. As few of those pathfinders have a business model, you can expect those sites to be sometimes shabby and disorganized.

Here is a sample of pathfinders I use:

  • Complete Planet: CompletePlanet gives you access to over 70,000 databases. They are all organized into categories so it is easy to browse to what you are interested in. The site does not seem to have been spammed yet, so it is quite relevant. One word of caution: I have noticed that the number of 70,000 databases has not changed in the past year, so I am not sure how updated that list is getting ...
  • Infomine: Infomine is a scholarly Internet resource Collection. Databases have been organized by large categories ( Bio and Medical Sciences; Government; Business and economics etc...)
  • Alacrawiki: for those familiar with Wikipedia, the format will be reinsuring. Alacrawiki is the collective work of experts who provide you with the best resources to research an industry. The search for the industry can be confusing: make sure you use broad keywords  to access specific industries. For example, check  Alacrawiki-retail or Alacrawiki-oil for the Retail or the Oil industry.
  • CloserLook : with Closerlook, Goatechnologies, based out of Montreal, has developed an engine that searches the Invisible Web for any public database including corporate data. One can search for a private company in the United States (the site covers today only North America) and access all incorporation documents, law suits pending, patents and trademarks, list of officers etc... Each search costs $0.99.

Gold nuggets

Occasionally, one finds some fascinating sources of information on the Invisible Web. Here are a few examples of interesting sources of information:

  • Flightaware will give you access to all the flight plans or a specific aircraft. The site was built to allow you to track commercial flights, but you can also enter any tail number and track a private aircraft or a helicopter (of course, you'd have to know that tail number - check the Federal Administration Registry by searching by company name to get those). This is certainly useful when tracking a flight you are taking, but some of our clients have used it to track competitors - a mining company tracked helicopters to check prospections, a corporate office tracked the corporate jet of a competitor to anticipate mergers and acquisitions).
  • 123people will search the Invisible Web to find a person's profile. It will for example track that person's Wish List on Amazon, a great way to understand better a person's interest (what is more personal than your reading list ?)
  • The Database of Federally Funded Research will give you access to any research done in the United States by private labs, universities, companies, that has received a tax incentive...
  • Market Research.com : this site aggregates a large number of market research reports published in English. I love the fact you can search with text within each market research report and therefore check how relevant a particular report is to you. In particular, the site will tell you how often your keyword appears with a specific report, and on what pages. Searching and accessing the executive summary is free, but requires a registration.
  • FindArticles.com a (mostly) free database of previously published articles. Covers a large number of publications worldwide.

 

What will the future hold ?

Here are my predictions for the future of the "Invisible Web"

  • New search engines will appear that will find the way to index the Invisible Web database. I am screening for example the initiative by Deeppeep or DeepDyve trying to specialize in Web Forms
  • Google will continue to strive and index part of the Invisible Web. After Google Scholar mentioned above, the tool is trying to index Patents ( see Google Patents ) and other areas will follow
  • Experts will further use social media to identify and point out those databases. Alacrawiki has been a good start, but we have yet to see a more complete aggregator or curator for those sources of information
  • The size of the Invisible Web will continue to grow - maybe one day making the generic, mega search engines such as Google obsolete for researchers

I find this cartoon published a few years ago by the New York Times illustrates well what the (wishful) future holds:




Over to you: Are you using specific databases from the Invisible Web ? Please share you best-ofs here

 

Comments  Leave a comment

Post your comment

  • Posted by Roger Kirby said:

    03/01/2011 7:37am (7 years ago) I attended Estelle's Lift 10 workshop on the invisible web and found it fascinating. There's a whole world in there!

RSS feed for comments on this page | RSS feed for all comments