A systematic description of cognitive processes

further circumstances, like the required precision, the available instruments, and the style of docu- mentation. As a consequence, the predominance of situations like that leads to the illusion of a ‘single-step measurement’: it is broadly assumed that only the name of one variable or observable must be said and then everything is clear. By way of contrast, there are situations in which the definition phase, the identification of the context can no longer be skipped and, further- more, will require some labour and a methodical procedure. If, for instance, the similarity or dis- similarity between two structures from a given set is to be quantified, this will require the formula- tion of a suitable graph grammar that generates at least all the structures under consideration Gern- ert, 1996. As soon as such a graph grammar has been specified, the requested similarity measure is clear. Now, the crucial point is that a graph grammar with the required properties always ex- ists, but it is not uniquely defined. The necessity to select exactly one graph grammar from the multitude of all suitable ones just stands for the compulsory specification of the context, like the goal pursued by that individual measurement of similarity. A characteristic feature of perspective notions is the necessity of a two-step proceeding, such that a definition phase precedes. A mathematical treat- ment is possible, but this is distinguished from the customary style: a measure will no more be sup- plied by a single formula, nor by several formulas, but by a proceeding in which, in addition, a mathematical structure must and can be set up that accounts just for the peculiarity of the per- spective notion. The above-mentioned illusion of a single-step measurement, the expectation that problem data simply can be inserted into a couple of formulas, turns out to be a widespread tacit assumption 4 . It seems plausible that just this tacit assumption is one of the causes why the concept of pragmatic information is accepted in such a reluctant and hesitating manner.

4. A systematic description of cognitive processes

4 . 1 . O6er6iew The formalism for the description of cognitive processes which is to be developed here may be of interest for cognitive science, too, but this is not the primary goal. Rather, the formalism is in- tended as an intermediate step; the central issue is the analogy with operator algebras in quantum theory Section 5. We presuppose a cognitive system of any kind, that is a system capable of performing processes which can be interpreted as learning, concluding, forgetting, etc.; this system may be a human individual, an animal, or a technical device. Formally, the system is charac- terized by 1. states, which can vary in time and can be represented by vectors from a finite-dimen- sional real vector space R n with a fixed posi- tive integer n, and 2. transitions, which lead from one state to an- other and which are represented by operators. In this sense, an operator stands for an elemen- tary state change within the underlying system, e.g. for a single act of learning. Five classes of operators are proposed as follows: 1. Learning: L = { L 1 , L 2 ,...} C = { C 1 , C 2 ,...} 2. Conclusion: 3. Valuation: V = { V 1 , V 2 ,...} R = { R 1 , R 2 ,...} 4. Re6ision: 5. Tentati6e: T ={T 1 , T 2 ,...} Some well-known types of cognitive processes, like forgetting, concept formation, or problem- solving, will not be recognized in this list, but it will be shown later Section 4.3 that some charac- teristic cognitive processes can be represented by suitable combinations of these elementary opera- tors. At least a great majority of all cognitive processes can be represented in this way. 4 Descartes planned to present a universal method for the solution of problems, which can be roughly outlined as fol- lows: ‘‘First, reduce any kind of problem to a mathematical problem. Second, reduce any kind of mathematical problem to a problem of algebra. Third, reduce any problem of algebra to the solution of a single equation.’’ Polya, 1962 In quoting this, Polya immediately attaches his reservations concerning the validity and reach of this general rule. 4 . 2 . The fundamental operators and their properties in detail 4 . 2 . 1 . Operators in class L : learning Any cognitive system has a system environ- ment. The operators in L learning describe pro- cesses by which a system accepts information from outside. The reception of some new information can lead to a change of the system behaviour or to a modification of its internal structure such that its repertoire for future behaviour is extended. If the underlying system S is fixed, we can write L i instead of L i S; and L i L k S, which means that S first undergoes the operation L k and then L i , can be abridged as L i L k . In the general case this composition of operators is not commutative: L i L k L k L i . As an illustration two different tasks of learning are contrasted. If ‘serious’ material is to be learned, that is material with an internal structure, then the temporal order of its presenta- tion can be relevant, whereas in extreme cases of rote-learning the temporal order of input opera- tions may be irrelevant. This crucial feature of noncommutati6ity will be discussed later Section 6.1. 4 . 2 . 2 . Operations in class C : conclusion Operations in class C, written as C i , C k ,…, de- note conclusions derived from entries already ex- isting within the system. We write C i C k short for C i C k S for the fact that the conclusion C k is performed first with the knowledge contributed by C k being stored in the system and C i is performed afterwards. It would be irrelevant from a logical viewpoint which of two possible conclusions is achieved first. Here, however, only ‘realistic’ systems are consid- ered, such that labour, time, or energy consumed play a role, and hence commutativity can no longer be maintained. The new findings obtained by the conclusion operation C k can simplify the subsequent operation C i significantly, whereas no such reduction of labour may occur if both opera- tors are applied in the inverse order. Therefore in the general case C i C k C k C i . Which of the many possible operators in C will really be activated may be triggered by a process of valuation as described in the next section. 4 . 2 . 3 . Operations in class V : 6 aluation An important class of operations occurring within cognitive systems can be united under the term ‘valuation’. The class V includes the opera- tors V i , V k ,… The object of a single process of valuation can be “ a single item of knowledge already stored in the system “ the present state of the system when it is checked, e.g. whether a solution to a certain problem already has been found “ a recent state change caused e.g. by some new incoming information “ a series of recent state changes e.g. the result of a series of conclusion operations. The result of an act of valuation can be “ a predicate, like ‘true’‘false’, ‘relevant’‘irrele- vant’, etc. “ a mathematical object, like a number, a vector, a matrix, a function, a network, or a system of relations “ the identification of an item of information which fulfills given requirements. There is a variety of reasons why valuation operators are necessary, and different purposes are pursued by them: “ In ‘realistic’ systems a distinction must be made between relevant and irrelevant parts of incom- ing information. “ In a similar way it must be decided whether the result of a series of conclusion operations is to be stored or not. “ Among the items of knowledge already present in the system those must be identified and selected which are likely to fit to a given task. “ It must be decided whether a certain strategy makes sense or by which different one it should be replaced. “ It must be recognized whether an operation called ‘revision’ Section 4.2.4 or ‘tentative’ Section 4.2.5 becomes necessary, or at least useful, and which will be the proper side condi- tions for such an operation. Valuation means that an object is confronted with a predefined standard. In simple cases a procedure takes certain features of the given ob- ject as its input and supplies, e.g. an index, a score, or a Boolean value like ‘acceptable’‘not acceptable’. In the general case, however, the outcome of a valuation process is not necessarily a single value. Rather, it can take on the shape of a vector, a matrix, a function, etc. which represent the discrepancies between the ideal standard and the real situation with respect to several criteria deviation profile. The result of a valuation pro- cess may also point forward to actions to be taken. A necessary tool for many valuation processes is the measurement of the similarity or dissimilar- ity between two complex structures see Section 3.2. For example, incoming information must be compared with the available information, in order to avoid redundant entries, but also with the information requirements of the system, in order to exclude irrelevant information. The search for some information that is likely to fit to a certain task can be an issue of similarity measurement, too. Just as the operators in L and C, also the operators in V are noncommutative in the general case: V i V k V k V i . If e.g. V i has identified and selected a set of objects with required properties, then the subsequent operation V k can focus on just these objects — the overall effort or effi- ciency may depend on the order of both operators. 4 . 2 . 4 . Operators in class R : re6ision Every ‘realistic’, and hence finite system has a limited ability to accept, to store, and to process information. Therefore such a system is forced to economize on these limited capacities. If new knowledge is permanently accepted and accumu- lated, if numerous results of conclusions are con- sidered worth storing, then a revision of the underlying representation scheme will become compulsory from time to time in order to main- tain an efficient usage of capacities. The following examples show two different situ- ations, but also two techniques for a transition to a new representation scheme: 1. If there is a simple finite graph with a rela- tively small number of edges, then it is reason- able to represent it by the list of its pairs of connected vertices. But if, step by step, new edges are inserted, then there will be a critical point beyond which it will be more advanta- geous to store the graph as its adjacency matrix. 2. A series of measurement data can be stored as a long list of pairs x i , y i , but also by a short string or code representing an approximating formula like y = ax or y = a log x. In the first example the transitions from one representation to the other and back are re- versible. By way of contrast, the second example stands for the frequent situation that a change of representation implies a loss of information: the original version can no more be reconstructed from the ‘condensed’ form in category theory the term ‘forgetful functors’ is used. A theoretical framework is supplied by a con- cept named belief re6ision, knowledge re6ision, or theory change, which is now pursued in an inter- disciplinary effort in philosophy, logic, and com- puter science Rott, 1996. Computer scientists mainly study the problem of how to revise a body of knowledge if updating must be performed un- der capacity restrictions Fuhrmann and Mor- reau, 1991; Wrobel, 1994 and the problem of consistency maintenance in knowledge-based sys- tems. Logic and philosophy, however, focus on the necessary modifications of an ensemble of propositions provoked by new, frequently incom- patible information 5 . The transition from a bulk of original data to a formula the second example above can be re- garded as a primitive type of theory formation. Under a unifying view we find a quasi-continuous transition from “ a merely ‘technical’ revision, which is enforced by capacity restrictions the first example above and permits conversions in both direc- tions without any loss of information, at one end of a scale, to “ a fundamental revision of an established theory — a paradigm change — at the opposite end of that scale. We can use the terms ‘weak re6ision ’ and ‘strong re6ision ’ for these two extreme cases, provided 5 For a recent multidisciplinary overview of belief revision see Gabbay and Smets 1998. that the quasi-continuous transition between them will be kept in mind. Not in all cases the physical deletion of entries represented in the ‘old’ style will be compulsory. In special cases it can make sense to store an entry both in the old and in the shorter new representa- tion, such that both versions can be used alterna- tively Section 4.3. Here the operators in R, written as R i , R k ,…, denote a ‘revision’, that is a transition to a differ- ent representation scheme. Apart from special cases 6 these operators, again, are noncommuta- tive: R i R k R k R i — each of both operators can entail its specific loss of information, and the situation found by the operator acting later can have been significantly altered by the operator which had acted first. 4 . 2 . 5 . Operators in class T : tentati6e In many methods of heuristic problem-solving heuristic programming, genetic algorithms, etc. sequences of patterns or configurations are gener- ated in a tentati6e manner. It is the purpose of these attempts to eventually find a configuration fitting given requirements, or to proceed from a preliminary solution to a better one even if an optimum cannot always be guaranteed. Such pat- terns can be generated — apart from random processes — by sets of recursive rules, as for instance rewriting rules, graph grammars, shape grammars 7 , or the transition rules typical of ge- netic algorithms. An example may be a possible heuristic ap- proach to the travelling salesman problem. Given a list of cities together with the distances between each pair of them, an optimal route is to be found that visits each city exactly once and as assumed here for the sake of simplicity finally leads back to the starting-point. A tentative configuration may be a closed loop which, however, does not yet include all cities and hence must be expanded by a stepwise inclusion of further cities, or a closed loop through all cities which is not yet optimal, but can be improved by a series of local exchange operations. Formally, the class T tentative consists of all operators T 1 , T 2 ,…; each operator stands for a process by which exactly one new pattern is gener- ated. If two production rules of a graph grammar or a related system are applied one after an- other, the result strongly depends on the order of execution, and in the general case two operators in T are noncommutative:T i T k T k T i . 4 . 3 . The question of completeness As mentioned before, there are well-known types of cognitive processes that cannot be found in the list of ‘elementary operators’ Section 4.2. Of course, a proof of ‘completeness’ — the possi- bility to describe every cognitive process by a combination of operators of the five kinds pro- posed here — is excluded, but there is at least some plausibility that a significant part of the field is covered 8 . For cognitive processes of some essential types it can be shown that such a representation is possible. Regularly conclusions C are necessary, and new information from outside L can inter- vene; hence operators in C or in L are not always explicitly addressed. An ubiquitous type, that should not be forgot- ten, is forgetting. Some information stored in the system can be deleted as a conseqence of a new valuation of its relevance. This can be a by- product of a revision, and the deletion of a single item may be considered a boundary case of revi- sion. The operators in V and R are sufficient to represent forgetting. Three central types of cognitive processes — classification, concept formation, and pattern 8 No metaphysical assumptions are underlying here. Particu- larly, it is not assumed that human beings can be completely described as ‘information-processing systems’ or something like that; nor is it intended to join the debate on limitations of computers. If some class of cognitive processes would be identified that cannot be represented by combinations of the operators proposed here, this would be a positive result, too. 6 Two operators R i and R k may be ‘independent’ if they modify disjoint sets of entries. 7 A survey of graph grammars and some applications can be found in Gernert 1997, for shape grammars see the mono- graph by Gips 1975. recognition — will be treated jointly. They are mutually connected, and they have in common that perspective notions and similarity measures Section 3.2 play a dominant role. All objects that are assigned to the same class within a given classification scheme are connected by their ‘intra- class similarity’, and the same holds for all objects subsumed under the same concept. If no classifi- cation scheme is previously defined, as in a method termed ‘cluster analysis’, then solely the similarity measure and the given overall number of classes will steer the assignment of the objects to equal or different classes. To each of those classes a newly created term can be assigned afterwards; here again we find a hint to the affinity between classification and concept formation. Pattern recognition can be understood as the identification of those structures which have a sufficient similarity with one or another element from a predefined set of ‘standard patterns’. For example, character recognition means that a sin- gle scribble is identified if possible with the best fitting letter from a given alphabet. Concept formation can essentially take on one of the following shapes: 1. As already mentioned, after a process of clas- sification a characterizing notion can be as- signed to each of the classes. 2. It can be recognized that part of the objects in a certain class have a characteristic feature in common and hence can be subsumed under a new notion. For example, some materials share a ‘medium conductivity’ and thus are named ‘semiconductors’. 3. If a variable remains constant in spite of varia- tions of other variables or of the overall sys- tem state, then that special variable may be given a marked name. The most important example is the term ‘energy’, which became a scientific term through the discovery of the corresponding conservation principle. To sum up, processes of classification, pattern recognition, and concept formation can be man- aged mainly by operators from V and R. Processes of problem-sol6ing show a permanent interplay of preliminary, tentative steps and valu- ations of the proposals generated in that way. There must be a chance that a series of tentative steps can be totally discarded and a new attempt can be made from a different starting-point or with a new series of tentative steps backtracking. In this context, it sometimes can make sense to temporarily store some information in more than one representation simultaneously e.g. in the original and a condensed version, see Section 4.2.4. Problem-solving processes can essentially be built up by operators from V and T. 4 . 4 . Fundamental operators acting in parallel For the sake of simplicity, parallel processes have not been addressed until now. Let X, Y, Z,… denote operators from any of the five classes introduced above Section 4.2. If two operators X and Y act in parallel this can be understood as an operator again 9 , and that operator will be written as X + Y. By way of contrast, this compo- sition is always commutative: X + Y = Y + X. Of course, a concrete implementation of a system with parallel processing would require regulations concerning the relative independence of the com- ponents working in parallel and their possible interactions, but this would not contribute to the purpose of this paper. Some relevant aspects will be discussed in Section 5.4.

5. The algebras A and A defined by classes of operators