On this web site, a directory of topics is presented where each topic in the directory is associated with a set of words and phrases relevant to that topic. There are over 9,000 topics, with topics as diverse as dating, data mining, mining, strength training, Scotch Whisky, Buffy the Vampire Slayer, cat breeds, cat toys, toxoplasmosis, concussion, astronomy, asteroseismology, sexual positions, yoga, skin care, yoga breathing, linear algebra, gifted eduction, etch-a-sketch, roof rack, tire safety, injection molding, game theory, tango, environmental law, software testing resume, customer service resume and Cisco Systems... and many, many more.

The data shown in the topic directory below was produced by automated data mining using proprietary algorithms. How does the data mining happen? Each topic is associated with one or more queries. Thoese queries are used to assemble a training set for a topic. Data mining is then applied to that training set of documents. The output of this mining is a set of words and phrases, where each set is called a lexicon. What use are lexicons? For one thing, the lexicons produced here were used for text classfication. Topic Scout's classiflier achieved nearly 90% accuracy in classification against hundreds of thousands of web pages.

Please note that the lexicions shown in the topic directory are not the tuned lexicons used for text classification. The lexicons shows here are in a raw, untuned form, and there are some items in the lexicons, such as "click_link" and "add_cart", that are just noise. Such noise items are typically removed during the tuning phase.

Topic Scout is currently working on creating a knowledge graph by crawling an invisible web - a web of highly connected symbols, where symbols can be linked to each other in various ways, including by semantic relevance. This is done by what I call a symbol crawler. Unlike a web crawler, it crawls symbols - not URLs. If this sounds like blue sky, take a closer look. This symbol crawler architecture is described here, including its micro-services.

Or take a closer look at Topic Scout's relevance engine.

