Neurotags
I. Introduction
In this article I want to consider one of the options for implementing what W3C * (the World Wide Web Consortium) began to develop as the Semantic Web **.
The ideas presented by me here are not the concept of the semantic web and I do not attach myself to the standards developed by the W3C, because from my point of view the most viable model of the future information network has a slightly different look and other needs, but this concept will have much in common with semantic web.
Much has been said about the theory of the Semantic Web, but now a lot of time has passed, and we do not observe the implementation of these ideas on the Internet.
I will focus on the conceptual model of the framework that embodies everything that those who talked about the semantic web dreamed of and try to consider the practical aspects of developing such a system and delve into the solution of emerging problems. First of all, these are the problems of adapting such a complex system for users, as well as constructing a conceptual model so that users have the motivation to interact with the system and so that they can simply and conveniently get everything they need from it.
In this vein, interaction with users is especially important, since the system is self-learning in its essence and it is up to it to gather information from communication with users. Therefore, this task must be approached not only as a programmer, but also as a sociologist, psychologist, and most importantly, an inventor.
This framework can be considered the foundation of the network, in relation to which the Internet is just a means of data transfer, it is a set of protocols and programs for working with them, which abstracted one more level above such protocols as HTTP, SMTP, SNMP, FTP, Telnet.
This is a model of a management system and self-organization of a distributed knowledge base on a global scale, which at the current stage of the development of the Internet can perform the functions of an ordinary CMS, only with significantly expanded functionality and successfully serve for commercial purposes.
Practical aspects and specific solutions to problems are ahead, and the first approach will still be made by the theory.
* World Wide Web Consortium (W3C) - an organization that develops and implements technology standards for the World Wide Web
**: Semantic Web is a network concept in which every resource in human language would be provided with a description that a computer understands.
II. Neurotags
Now there are a lot of websites, united by some general principles, the union of which can be called “Web 2.0”.
One of the trends of this phenomenon is folksonomy *.
It's good. And if you look further?
1. Theory of neurotags
What is lacking in regular tags (tags, keywords)?
Relationships, classification.
Let's introduce a new concept: The
regular keyword and the system of relationships of this keyword with other keywords will be called the Neurotag.
Similar structures are often used in expert systems ** to form a knowledge base. In the corresponding literature on expert systems, semantic tags will correspond to this concept.
Any neurotag can have any number of connections of any type with other neurotags.
Relations can also be classified, for example as follows: A
linked object is:
• Synonym
• Antonym
• Parent category (Hyperonym) approx .: food = G (pie)
• Child category (descendant of a tag, Hyponym) approx: stool = g (furniture)
• By a related category (tags have many common parents) approx .: Guppy = p (Goldfish) (common hyperonym - aquarium fish)
• Part of a given object (meronym) approx .: engine = m (car)
• An object including a given (holonym) approx .: house = x (roof)
• translate words into another language (subtype synonym) approx .: Sky = (Sky)
This is similar to the RDF *** triplet, which has the form “subject - predicate - object”
(For example, the statement “Green eyes” in RDF terminology can be represented as follows: subject - “eyes”, predicate - “has color”, object - “Green”)
But in this case there is a limited set of predicates.
On the one hand, this type of connection, such as a related category, may seem redundant, because we can always determine whether two neurotags are related from the list of their parents, but on the other hand, the presence of this type of inter-link communication can significantly improve the network compaction process neurotags, which will be discussed a little lower.
This type of connection is the union of generic-species relations as well as the synonym / antonym and part / whole relations and forms a hierarchical structure.
In addition, each connection has such a parameter as power, which actually shows how relevant the given connection is from the point of view of human logic.
This type of organization of information is called the semantic network **** with weighted (fuzzy) connections. Such systems are often used in expert systems as a knowledge base.
In addition to the above types of communication, there can be a host of other relationships, such as functional relationships (usually defined by the verbs “produces”, “influences” ...), quantitative (more less, equal ...), spatial (far from, close to, for, under, over ...), temporary (earlier, later, during ...), attributive (have a property, have a value), etc.
You can go in another, more interesting way: not to classify the relationship. Theoretically, the classification of connections by the above method is redundant, but this step is most likely justified in view of the problems with computing power that arise during the development of such systems. Even in a system with unclassified connections, you can determine how exactly the two neurotags are interconnected.
For example, if only parent / daughter neurotags are linked, then of the two interconnected neurotags in a large system, the parent will always have more connections.
You can also consider each individual connection as a vertex (neurotag) and establish connections with other vertices.
For example, if there are two related tags: “sky” - “blue”, then the very relationship of these two tags will be nothing more than “blue sky”.
Thus, we can organize generic-species relations even without typification of predicates.
In such a network, theoretically, a mass of ring connections can form, so when recursively traversing vertices, it is necessary to control the depth of the recursion in order to avoid endless loops and stack overflows.
Now I would like to consider the case of classified inter-tag links, because IMHO: it is more intuitive for human thinking.
We will get to know all the benefits and problems of a system with such a network of interconnected keywords as we reflect on the practical aspects of developing such a system.
* Folksonomy (English folksonomy, from folk - folk + taxonomy - taxonomy) - neologism, denoting the practice of joint categorization through randomly selected keywords. In other words, this concept refers to the spontaneous cooperation of a group of people in order to organize information in a category that attracts attention due to the fact that it is completely different from traditional formal methods of faceted classification. Typically, this phenomenon only occurs in non-hierarchical communities, such as public websites, and not in multi-level teams. Since information organizers are usually its main users, folksonomy produces results that more accurately reflect the overall conceptual information model of the entire group.
** Expert system is an intelligent program that can replace a specialist expert in solving a problem situation, draw
logical conclusions based on knowledge in a specific subject area and
provide a solution to specific problems.
ES began to be developed by researchers of artificial intelligence in the 1970s, and in the 1980s received commercial reinforcement.
*** The Resource Description Framework is a model developed by the W3C consortium for describing resources, especially resource metadata. The basis of this model is the idea of using a special type of statements made about the resource. One of the main goals of RDF is to provide assertions equally in a machine- and human-recognizable way. There are several syntaxes for representing RDF information, the most common of which are: RDF / XML, triplets, and a graph model.
**** The semantic network is one way of representing knowledge. The title combines terms from two sciences: semantics in linguistics studies the meaning of sentences, and the network in mathematics is a kind of graph. In a semantic network, the role of vertices is played by the concepts of a knowledge base, and arcs (moreover, directed ones) define relations between them. Thus, the semantic network reflects the semantics of the subject area in the form of concepts and relations.
III. Closer to the body
So, the relationship. Where do you get them from?
With ordinary tags (keywords) it’s clear that the user simply enters them into the given field.
But how does the system learn about the connections between them?
First you have to ask users.
IMHO: the most convenient is an unobtrusive and strictly dosed survey program that is issued to the user surrounded by blocks of content of interest to him.
In fact, flipping through the pages of some informational web resource with a neurotag system, a user among the list of articles will stumble upon a small poll using ajax technology so as not to tear the user away from his information by reloading the page and occupying a minimal content area, something like:
Q : How are “milk” and “goat milk” related?
1. “milk” - a subsidiary category of the concept of “goat milk”
2. “goat milk” - a subsidiary category of the concept of “milk”
3. synonyms
4. one and the same in different languages
5. no connection
But simply inserting such polls into the site’s content would be inefficient.
Such an approach would simply prevent the user from watching the content that interests him.
Therefore, an incentive is needed.
The most convenient incentive in this case could be the impact of participation in surveys on the user rating in the system, because the rating of users on web resources has recently been a very fashionable and popular feature.
To do this, it is advisable to have a separate characteristic - special. a rating that, strictly within a limited framework, would affect the overall rating in order to prevent a user’s rating growth only thanks to system polls. (hereinafter, the built-in survey on the relationship of neurotags will be called a system survey)
The user’s participation in the system survey increases the power of the relationship chosen by him by 1, or does nothing (there is also an option in which a connection “no connection” is established between the neurotags, but this is already clearly redundant data).
You can also take into account the user rating when increasing the power of the relationship.
Survey neurotags are selected from the lists of neural tags of any one unit of content, i.e .:
For example, if a user uploaded a photo to the server and marked with keywords:
starry sky, sky, stars, night, then it is logical to assume that these tags have some kind of relationship.
But the most interesting feature of the neural tag network is self-organization.
For example, if there are such relationships:
"Laws of Ukraine" - a subsidiary category of the neurotag "law"
"law" - a subsidiary category of the neurotag "politics"
"privatization law" - a subsidiary category of the neurotag " law of Ukraine" It
is logical to assume that the
"laws of Ukraine" is a subsidiary category of the neurotag ,
"law on privatization" - a subsidiary neyrotega category "politics"
, "law on privatization" - a subsidiary category neyrotega "law"
system is able to derive new knowledge from old, to find patterns in the knowledge base, as well as on analigii of finding contradictions, the disparate Application in KB, to monitor the correct organization of the knowledge base (introspection), as well as to prove their conclusions, to "explain" the reasoning.
And in any contentious issues, and such will always arise, because we will use weighted (fuzzy connections) the system will generate unobtrusive polls for registered users.
With a fairly large database of neurotags, there are more and more opportunities to establish inter-tag relationships without user intervention, their number is growing exponentially with the growth of web resources, therefore large-scale, large web resources will have the most tangible benefits from such a system.
The mechanism of system polls and ratings is a very good solution, allowing to cope with the support of the knowledge base, which is the bottleneck in Expert systems, which required further development ... But as you can see, it found its development a little in another area.
The first and most basic reason for the difficulty of supporting knowledge base in Expert systems is the rapid "aging" of knowledge, and the speed of "obsolescence" often exceeds the speed of creating the ES itself. The second reason is the lack of the necessary number of specialists in the field of knowledge engineering.
And in the new generation global social network, each registered user is a specialist in the field of knowledge engineering. A specialist who, in fact, does not require knowledge of any engineering.
And the problem of rapid aging of knowledge is not so relevant here, because we work with superglobal data, unlike expert systems.
This is a big benefit for search engines, because this can significantly expand the search area and make them much more interactive.
It will also turn the chaotic “tag cloud” that is so fashionable in Web 2.0 into a structured tree structure.
Rationalizers have the right to say that such a system requires unjustifiably large computing power and the database of neurotags will grow very quickly to huge sizes, in addition, it requires more attention from the user and this may not justify the advantages that we will get using a network of neurotags.
But in fact, in fact, we did not reach the consideration of the real advantages, because this is connected with the practical implementation of specific services. And this is a good place where you can apply your creative potential in developing fundamentally new approaches to the user interface, because in this case we work with a fundamentally different data organization.
By looking at pipes.yahoo.com for example, you begin to realize that the ability to create really convenient interfaces for working with such complex data structures is a completely solvable task and everything can be done more conveniently than might seem at first glance to an uninitiated person.
But about the most important thing that a similar approach to developing web resources can give us in another article. This is a slightly different topic, another technology, but it cannot exist without a network of neurotags and complements it, allowing you to apply a fundamentally new approach to the development of many social services, including those very fresh for the current Internet, the creation of which people did not even think much about their apparent inefficiency ... in the elimination of which the neural tag networks will help us.
To be continued.
© 2007 Vitaly Stepanenko
I mean, I :)
In this article I want to consider one of the options for implementing what W3C * (the World Wide Web Consortium) began to develop as the Semantic Web **.
The ideas presented by me here are not the concept of the semantic web and I do not attach myself to the standards developed by the W3C, because from my point of view the most viable model of the future information network has a slightly different look and other needs, but this concept will have much in common with semantic web.
Much has been said about the theory of the Semantic Web, but now a lot of time has passed, and we do not observe the implementation of these ideas on the Internet.
I will focus on the conceptual model of the framework that embodies everything that those who talked about the semantic web dreamed of and try to consider the practical aspects of developing such a system and delve into the solution of emerging problems. First of all, these are the problems of adapting such a complex system for users, as well as constructing a conceptual model so that users have the motivation to interact with the system and so that they can simply and conveniently get everything they need from it.
In this vein, interaction with users is especially important, since the system is self-learning in its essence and it is up to it to gather information from communication with users. Therefore, this task must be approached not only as a programmer, but also as a sociologist, psychologist, and most importantly, an inventor.
This framework can be considered the foundation of the network, in relation to which the Internet is just a means of data transfer, it is a set of protocols and programs for working with them, which abstracted one more level above such protocols as HTTP, SMTP, SNMP, FTP, Telnet.
This is a model of a management system and self-organization of a distributed knowledge base on a global scale, which at the current stage of the development of the Internet can perform the functions of an ordinary CMS, only with significantly expanded functionality and successfully serve for commercial purposes.
Practical aspects and specific solutions to problems are ahead, and the first approach will still be made by the theory.
* World Wide Web Consortium (W3C) - an organization that develops and implements technology standards for the World Wide Web
**: Semantic Web is a network concept in which every resource in human language would be provided with a description that a computer understands.
II. Neurotags
Now there are a lot of websites, united by some general principles, the union of which can be called “Web 2.0”.
One of the trends of this phenomenon is folksonomy *.
It's good. And if you look further?
1. Theory of neurotags
What is lacking in regular tags (tags, keywords)?
Relationships, classification.
Let's introduce a new concept: The
regular keyword and the system of relationships of this keyword with other keywords will be called the Neurotag.
Similar structures are often used in expert systems ** to form a knowledge base. In the corresponding literature on expert systems, semantic tags will correspond to this concept.
Any neurotag can have any number of connections of any type with other neurotags.
Relations can also be classified, for example as follows: A
linked object is:
• Synonym
• Antonym
• Parent category (Hyperonym) approx .: food = G (pie)
• Child category (descendant of a tag, Hyponym) approx: stool = g (furniture)
• By a related category (tags have many common parents) approx .: Guppy = p (Goldfish) (common hyperonym - aquarium fish)
• Part of a given object (meronym) approx .: engine = m (car)
• An object including a given (holonym) approx .: house = x (roof)
• translate words into another language (subtype synonym) approx .: Sky = (Sky)
This is similar to the RDF *** triplet, which has the form “subject - predicate - object”
(For example, the statement “Green eyes” in RDF terminology can be represented as follows: subject - “eyes”, predicate - “has color”, object - “Green”)
But in this case there is a limited set of predicates.
On the one hand, this type of connection, such as a related category, may seem redundant, because we can always determine whether two neurotags are related from the list of their parents, but on the other hand, the presence of this type of inter-link communication can significantly improve the network compaction process neurotags, which will be discussed a little lower.
This type of connection is the union of generic-species relations as well as the synonym / antonym and part / whole relations and forms a hierarchical structure.
In addition, each connection has such a parameter as power, which actually shows how relevant the given connection is from the point of view of human logic.
This type of organization of information is called the semantic network **** with weighted (fuzzy) connections. Such systems are often used in expert systems as a knowledge base.
In addition to the above types of communication, there can be a host of other relationships, such as functional relationships (usually defined by the verbs “produces”, “influences” ...), quantitative (more less, equal ...), spatial (far from, close to, for, under, over ...), temporary (earlier, later, during ...), attributive (have a property, have a value), etc.
You can go in another, more interesting way: not to classify the relationship. Theoretically, the classification of connections by the above method is redundant, but this step is most likely justified in view of the problems with computing power that arise during the development of such systems. Even in a system with unclassified connections, you can determine how exactly the two neurotags are interconnected.
For example, if only parent / daughter neurotags are linked, then of the two interconnected neurotags in a large system, the parent will always have more connections.
You can also consider each individual connection as a vertex (neurotag) and establish connections with other vertices.
For example, if there are two related tags: “sky” - “blue”, then the very relationship of these two tags will be nothing more than “blue sky”.
Thus, we can organize generic-species relations even without typification of predicates.
In such a network, theoretically, a mass of ring connections can form, so when recursively traversing vertices, it is necessary to control the depth of the recursion in order to avoid endless loops and stack overflows.
Now I would like to consider the case of classified inter-tag links, because IMHO: it is more intuitive for human thinking.
We will get to know all the benefits and problems of a system with such a network of interconnected keywords as we reflect on the practical aspects of developing such a system.
* Folksonomy (English folksonomy, from folk - folk + taxonomy - taxonomy) - neologism, denoting the practice of joint categorization through randomly selected keywords. In other words, this concept refers to the spontaneous cooperation of a group of people in order to organize information in a category that attracts attention due to the fact that it is completely different from traditional formal methods of faceted classification. Typically, this phenomenon only occurs in non-hierarchical communities, such as public websites, and not in multi-level teams. Since information organizers are usually its main users, folksonomy produces results that more accurately reflect the overall conceptual information model of the entire group.
** Expert system is an intelligent program that can replace a specialist expert in solving a problem situation, draw
logical conclusions based on knowledge in a specific subject area and
provide a solution to specific problems.
ES began to be developed by researchers of artificial intelligence in the 1970s, and in the 1980s received commercial reinforcement.
*** The Resource Description Framework is a model developed by the W3C consortium for describing resources, especially resource metadata. The basis of this model is the idea of using a special type of statements made about the resource. One of the main goals of RDF is to provide assertions equally in a machine- and human-recognizable way. There are several syntaxes for representing RDF information, the most common of which are: RDF / XML, triplets, and a graph model.
**** The semantic network is one way of representing knowledge. The title combines terms from two sciences: semantics in linguistics studies the meaning of sentences, and the network in mathematics is a kind of graph. In a semantic network, the role of vertices is played by the concepts of a knowledge base, and arcs (moreover, directed ones) define relations between them. Thus, the semantic network reflects the semantics of the subject area in the form of concepts and relations.
III. Closer to the body
So, the relationship. Where do you get them from?
With ordinary tags (keywords) it’s clear that the user simply enters them into the given field.
But how does the system learn about the connections between them?
First you have to ask users.
IMHO: the most convenient is an unobtrusive and strictly dosed survey program that is issued to the user surrounded by blocks of content of interest to him.
In fact, flipping through the pages of some informational web resource with a neurotag system, a user among the list of articles will stumble upon a small poll using ajax technology so as not to tear the user away from his information by reloading the page and occupying a minimal content area, something like:
Q : How are “milk” and “goat milk” related?
1. “milk” - a subsidiary category of the concept of “goat milk”
2. “goat milk” - a subsidiary category of the concept of “milk”
3. synonyms
4. one and the same in different languages
5. no connection
But simply inserting such polls into the site’s content would be inefficient.
Such an approach would simply prevent the user from watching the content that interests him.
Therefore, an incentive is needed.
The most convenient incentive in this case could be the impact of participation in surveys on the user rating in the system, because the rating of users on web resources has recently been a very fashionable and popular feature.
To do this, it is advisable to have a separate characteristic - special. a rating that, strictly within a limited framework, would affect the overall rating in order to prevent a user’s rating growth only thanks to system polls. (hereinafter, the built-in survey on the relationship of neurotags will be called a system survey)
The user’s participation in the system survey increases the power of the relationship chosen by him by 1, or does nothing (there is also an option in which a connection “no connection” is established between the neurotags, but this is already clearly redundant data).
You can also take into account the user rating when increasing the power of the relationship.
Survey neurotags are selected from the lists of neural tags of any one unit of content, i.e .:
For example, if a user uploaded a photo to the server and marked with keywords:
starry sky, sky, stars, night, then it is logical to assume that these tags have some kind of relationship.
But the most interesting feature of the neural tag network is self-organization.
For example, if there are such relationships:
"Laws of Ukraine" - a subsidiary category of the neurotag "law"
"law" - a subsidiary category of the neurotag "politics"
"privatization law" - a subsidiary category of the neurotag " law of Ukraine" It
is logical to assume that the
"laws of Ukraine" is a subsidiary category of the neurotag ,
"law on privatization" - a subsidiary neyrotega category "politics"
, "law on privatization" - a subsidiary category neyrotega "law"
system is able to derive new knowledge from old, to find patterns in the knowledge base, as well as on analigii of finding contradictions, the disparate Application in KB, to monitor the correct organization of the knowledge base (introspection), as well as to prove their conclusions, to "explain" the reasoning.
And in any contentious issues, and such will always arise, because we will use weighted (fuzzy connections) the system will generate unobtrusive polls for registered users.
With a fairly large database of neurotags, there are more and more opportunities to establish inter-tag relationships without user intervention, their number is growing exponentially with the growth of web resources, therefore large-scale, large web resources will have the most tangible benefits from such a system.
The mechanism of system polls and ratings is a very good solution, allowing to cope with the support of the knowledge base, which is the bottleneck in Expert systems, which required further development ... But as you can see, it found its development a little in another area.
The first and most basic reason for the difficulty of supporting knowledge base in Expert systems is the rapid "aging" of knowledge, and the speed of "obsolescence" often exceeds the speed of creating the ES itself. The second reason is the lack of the necessary number of specialists in the field of knowledge engineering.
And in the new generation global social network, each registered user is a specialist in the field of knowledge engineering. A specialist who, in fact, does not require knowledge of any engineering.
And the problem of rapid aging of knowledge is not so relevant here, because we work with superglobal data, unlike expert systems.
This is a big benefit for search engines, because this can significantly expand the search area and make them much more interactive.
It will also turn the chaotic “tag cloud” that is so fashionable in Web 2.0 into a structured tree structure.
Rationalizers have the right to say that such a system requires unjustifiably large computing power and the database of neurotags will grow very quickly to huge sizes, in addition, it requires more attention from the user and this may not justify the advantages that we will get using a network of neurotags.
But in fact, in fact, we did not reach the consideration of the real advantages, because this is connected with the practical implementation of specific services. And this is a good place where you can apply your creative potential in developing fundamentally new approaches to the user interface, because in this case we work with a fundamentally different data organization.
By looking at pipes.yahoo.com for example, you begin to realize that the ability to create really convenient interfaces for working with such complex data structures is a completely solvable task and everything can be done more conveniently than might seem at first glance to an uninitiated person.
But about the most important thing that a similar approach to developing web resources can give us in another article. This is a slightly different topic, another technology, but it cannot exist without a network of neurotags and complements it, allowing you to apply a fundamentally new approach to the development of many social services, including those very fresh for the current Internet, the creation of which people did not even think much about their apparent inefficiency ... in the elimination of which the neural tag networks will help us.
To be continued.
© 2007 Vitaly Stepanenko
I mean, I :)