Spider Webs, Bow Ties, Scale-Free Networks, And The Deep Web

The World Wide Web conjures up images of a giant spider web where everything is connected to everything else in a random pattern and you can go from one edge of the web to another. Theoretically, that & # 39; s what In the "small world" theory of the web, every web page is thought to be separated from any other web page by an average of about 19 On the Web, the small world theory was supported by early research on a small. sampling research web site and follow 1.5 billion links on these p ages.

The researcher discovered that the web was not like a spider web at all, but rather like a bow tie. The bow-tie Web had a "strong connected component" (SCC) composed of about 56 million Web pages. The bow tie was a set of 44 million OUT pages tend to be corporate intranet and other web sites pages that are designed to trap you at the site when These landed. On the left side of the bow tie was a set of 44 million IN pages from the center, but that you could not travel to from the center. to many center pages. In addition, 43 million pages classified as "tendrils" pages that did not link to the center and could not be linked to from the center. Occidentally, tendrils linked to one another through passing through e center (these are called "tubes"). Finally, there were 16 million pages totally disconnected from everything.

Further evidence for the non-random and structured nature of the Web is provided by Albert-Lazlo Barabasi at the University of Notre Dame. Barabasi & # 39; s Team found that far from being a random, exponentially exploding network of 50 billion Web pages, activity on the Web was proved to be more "in-connected super nodes" that provided the connectivity to less well-connected nodes. Barabasi dubbed this type of network a "scale-free" network and found parallels in the growth of As the turns out, scale-free networks are highly vulnerable to destruction: Destroy their super nodes and transmission of messages breaks down rapidly. On the upside, if you are a marketer trying to "spread the Or build super nodes and attract a huge audience.

The most not under 20, and that the number of connections would grow growing exponentially with the web that emerging from this web is quite different from earlier reports. It is a 75% chance that there is no path from one randomly chosen page to another. With this knowledge, it now is clear why the most advanced web search engines only index a very Search engines can not find most web sites since their pages are not well-connected or linked to the central core of the web . Another important finding is a "deep web" composed of over 900 billion web pages are not easily accessible to web crawlers that most search engine companies. Instead, these pages are either proprietary (not available t In the last few years newer search engines (such as the medical search engine Mammaheath) and older ones such as yahoo have sincere websites using search engines, web site managers need to take steps to ensure their web pages are part of the connected central core, or "super nodes" of the web. There is as many as as many links as possible to and from other relevant sites, especially to other sites within the SCC.

Howto Zones

Pages

Spider Webs, Bow Ties, Scale-Free Networks, And The Deep Web