''Classification in Wiki via automatic schemes'' ThinkingOutLoud.DonaldNoyes.200908252059 ---- In a wiki system, the grouping of concepts and subjects into a document and then giving it a WikiWord name, making its name a DocumentRepresentative, of great use in the Mapping of Concepts. This representation is the first level of grouping wikis use to organize concepts, subjects, concerns, or problems. Because WikiWords are are rarely used in general discourse to identify the document for which it stands, the traditional rigorous techniques for classification will usually ignore them as insignificant. The distribution curve of words which allow WikiWords, will place them on the lower portion of a hyperbolic curve. Some of the automatic classification schemes simply exclude this area. The referenced Figure demonstrates a hyperbolic curve and how high and low frequency words are filtered out: * http://www.dei.unipd.it/~melo/bible/documents/Fig.2.1.GIF ** WikiWords are found in the upper portion (less frequently used) ---- WikiStems, Words, Phrases (Spacified WikiWords) Some relevance to the inclusion of such things as word stems and phrases in a classification scheme have beeen recognized: * ''"There is no reason why such an analysis should be restricted to just words."'' ''"It could equally well be applied to stems of words (or phrases) and in fact this has often been done."'' ** http://www.dei.unipd.it/~melo/bible/documents/16.html ---- There is a sense in which the use of WikiWords can enhance the location efficiency of searches made with the popular search engines, in that when used as a search term, will return tens to hundreds of results, where using a single word will result in hundred thousands to several million results. Using WikiWords can be like using a search engine as a powerful magnet to locate the "needle in the haystack". ---- Other ways the WikiWay can be used to enhance AutomaticClassification: OneLineSummarizations *''on the average the simplest indexing procedures which identify a given document or query by a set of terms, weighted or unweighted, obtained from document or query text are also the most effective'. Its recommendations are clear, automatic text analysis should use weighted terms derived from document excerpts whose length is at least that of a document abstract'' ** http://comminfo.rutgers.edu/~muresan/IR/Docs/Books/VanRijsbergen_IR/data/pages/23.htm Clusterings *''The representation of the single link hierarchy through an MST has proved very useful in connecting single link with other clustering techniques ...Implication of classification methods It is fairly difficult to talk about the implementation of anautomatic classification method without at the same time referring tothe file '' ** http://www.dei.unipd.it/~melo/bible/documents/57.html Categorizations * ''"The basic relationship underlying the automatic construction of keyword classes is as follows If keyword a and b are substitutible for one another in the sense that we are prepared to accept a document containing one in response to a request containing the other this will be because they have the same meaning or refer to a common subject or topic"'' ** http://www.dei.unipd.it/~melo/bible/documents/31.html ------ It might be a great idea, but I suggest that it ''not'' create wiki-words automatically. It's best to first be vetted by people. ---- CategoryAutomated