Hasso-Plattner-InstitutSDG am HPI
Hasso-Plattner-InstitutDSG am HPI
  • de

Behaviourally Correct Language Learning in the Limit

Chair for Algorithm Engineering
Hasso Plattner Institute

Office: A-1.13
Tel.: +49 331 5509-4835
Email: Vanja.Doskoc(at)hpi.de
Links: Homepage, Publications

Supervisor: Prof. Dr. Tobias Friedrich
Advisor: Dr. Timo Kötzing

We investigate learners which are given the task of learning classes of (formal) languages. Our goal is to study the learning capability of the learners when restricting the information available or requiring particular learning behaviour.


In his seminal paper, Gold (1967) introduced the formal framework of language learning in the limit. There, a learner \(h\) (a computable function) is given the task to learn a class of (formal) languages \(\mathcal{L}\). For any given language \(L \in \mathcal{L}\), the learner \(h\) is successively presented information about \(L\). This information is called a text \((\mathbf{Txt})\) of \(L\) and contains, in the limit, all and only the elements from \(L\). With every new datum, \(h\) makes a guess which language it believes to be presented. Once these guesses converge to a single hypothesis correctly describing the language \(L\), the learning task is considered successful. If \(h\) converges on any text of \(L\), we say that \(h\) learns \(L\) explanatory from text \((\mathbf{TxtEx})\). If \(h\) \(\mathbf{TxtEx}\)-learns each \(L \in \mathcal{L}\), we say that \(h\) \(\mathbf{TxtEx}\)-learns \(\mathcal{L}\). Furthermore, with \([\mathbf{TxtEx}]\) we denote the set of all \(\mathbf{TxtEx}\)-learnable classes. For other learning styles, these notations will be used analogously.

Now, we can impose restrictions on the learner. These may be in terms of memory or learning behaviour. Both will be discussed shortly. We then are interested in how the restrictions change the \(\mathbf{TxtEx}\)-learning capability, that is, the set of all \(\mathbf{TxtEx}\)-learnable classes.

Restrictions on the Information

If at any time, the learner has all presented information available, that is, all the elements presented to it and the order in which they were presented, we call this full-information or Gold-style learning to honour Gold (1967). We abbreviate this kind of learning with \(\mathbf{G}\) and, for example, write \(\mathbf{TxtGEx}\) if a learning task is both Gold-style and explanatory from text. However, having all information available may seem impracticable as in most learning tasks memory is limited. Different learning styles modelling memory restrictions have been proposed in the literature. In this section we will discuss the most important ones.

Learners of which the available information is reduced to only the presented elements are called set-driven \((\mathbf{Sd})\), see Wexler and Culicover (1980). It is known that the \(\mathbf{TxtGEx}\)-learning capability is stronger than the \(\mathbf{TxtSdEx}\)-learning capability, that is, \([\mathbf{TxtSdEx}] \subsetneq [\mathbf{TxtGEx}]\), see Schäfer-Richter (1984) or Fulk (1985). However, the same paper also show that this changes if one grants the learner the ability to base its guesses on the total amount of elements presented. This is known as partially set-driven or rearrangement-independent \((\mathbf{Psd})\) learning, see Schäfer-Richter (1984) and Blum and Blum (1975). Altogether, we have \([\mathbf{TxtSdEx}] \subsetneq [\mathbf{TxtPsdEx}] = [\mathbf{TxtGEx}]\).

Restrictions on the Learning Behaviour

We further may impose restrictions on the hypotheses the learner makes. These restrictions may be naturally inspired or motivated by psychology. Consistency is an example of the first one. By Angluin (1980), a learner is consistent \((\mathbf{Cons})\) if its hypotheses always include the information they are built on. Interestingly, consistency does impose a proper restriction for explanatory learners, that is, the \(\mathbf{TxtGConsEx}\)-learning capability is weaker than its \(\mathbf{TxtGEx}\) counterpart, see Bārzdiņš (1977). This surprising result justifies the analysis of other restrictions. In this section, we will discuss a few.

One example of a psychologically motivated restriction is non U-shaped \((\mathbf{NU})\) learning, see Baliga et al. (2008). U-shapes have been witnessed in psychology when studying children language acquisition. Marcus et al. (1992) observed that children, upon learning the irregular form of a verb, say, "to mean", first correctly learn and use the form "meant", but then overgeneralize to using the form "meaned" just to then return to the correct behaviour, hence the name U-shape. Interestingly, Baliga et al. (2008) show that this seemingly inefficient learning behaviour does not weaken \(\mathbf{TxtGEx}\)-learning capacity, that is, \([\mathbf{TxtGNUEx}] = [\mathbf{TxtGEx}]\). The same holds true for the syntactical version, where correct hypotheses never may be abandoned syntactically. This is known as strongly non U-shaped \((\mathbf{SNU})\) learning, see Case and Moelius (2011).

We will discuss even more restrictions. A learner which is conservative \((\mathbf{Conv})\) may not change its mind while its hypothesis is consistent with the information given, see Angluin (1980). Cautious \((\mathbf{Caut})\) learners may change their hypotheses at will but never to a proper subset of a previous one, see Osherson et al. (1982). Learners which are (strongly) decisive may never return to a (syntactically) discarded hypothesis, see Osherson et al. (1982) for decisive learning \((\mathbf{Dec})\) and Kötzing (2014) for strongly decisive learning \((\mathbf{SDec})\). Lastly, we study three different restrictions each of which follows a certain monotonic behaviour, introduced by Jantke (1991) and Wiehagen (1991). The hypotheses of strongly monotonic \((\mathbf{SMon})\) learners always need to include all previously suggested elements, weakly monotonic \((\mathbf{WMon})\) learners only need to do that while the hypotheses are consistent. Lastly, monotonic \((\mathbf{Mon})\) learners have to be monotonic regarding elements of the target language. All these restrictions are known to have a weaker learning capacity than \(\mathbf{TxtGEx}\).

Towards an Atlas of Computational Learning Theory

While for all aforementioned restrictions the relation to the \(\mathbf{TxtGEx}\)-learning capacity is known, the comparison of their mutual learning capacities is ongoing research. Jain et al. (2016), Kötzing and Palenta (2016) and Kötzing and Schirneck (2016) tackled this question and provided maps depicting these relations. The following figures depict the situation for Gold-style and partially set-driven explanatory learning, see Figure 1, as well as set-driven explanatory learning, see Figure 2.

How to read Maps

While \(\mathbf{T}\) indicates the absence of a restriction, black solid lines imply trivial inclusions (bottom-to-top) and greyly edged areas illustrate a collapse of the enclosed learning criteria. Dashed lines depict non-trivial inclusions (bottom-to-top). To improve readability, we omit mentioning \(\mathbf{Txt}\).

Figure 1: Relation of \([\mathbf{Txt}\beta\delta\mathbf{Ex}]\) for \(\beta \in \{ \mathbf{G}, \mathbf{Psd} \}\) and various learning restrictions \(\delta\).

Figure 2: Relation of \([\mathbf{TxtSd}\delta\mathbf{Ex}]\) for various learning restrictions \(\delta\).

The Goal of My Work

The goal of our research is to expand those maps and with them the understanding of the topic. First, we want to combine the maps shown in Figure 1 and 2 by investigating the mutual relation of Gold-style, partially set-driven and set-driven explanatory learning. However, our main focus lies on providing maps for the semantical counterpart of explanatory learning. This is referred to as behaviourally correct \((\mathbf{Bc})\) learning, introduced by Case and Lynes (1982) and Osherson and Weinstein (1982). In contrast to explanatory learning, where a learner has to converge to a single correct hypothesis, behaviourally correct learning requires the learner only to converge to an explanation of the language, which may syntactically change while keeping the semantical meaning.

Cautious Limit Learning

So far, Doskoč and Kötzing (2020) provided the complete map for various cautious learning restrictions. Introduced by Kötzing and Palenta (2016), target cautious \((\mathbf{Caut}_\mathbf{Tar})\) learners may never output a proper superset of the target language as their hypotheses, finitely cautious \((\mathbf{Caut}_\mathbf{Fin})\) and infinitely cautious \((\mathbf{Caut}_\infty)\) learners have to be cautious only on finite or infinite hypotheses, respectively. Requiring cautiousness on finite or the target language weakens both the \(\mathbf{TxtGEx}\)- and \(\mathbf{TxtPsdEx}\)-learning capacity. Interestingly, this is not true for the set-driven case, see Figure 3. However, Figure 4 shows that while this also holds true in the behaviourally correct case, there, the cautious Gold-style learners only achieve as much learning capacity as unrestricted set-driven learners, that is, \([\mathbf{TxtGCautBc}] = [\mathbf{TxtSdBc}]\). While this served as a first step to understand behaviourally correct learning better, the full map still eludes us.

Figure 3: Relation of \([\mathbf{Txt}\beta\delta\mathbf{Ex}]\) for \(\beta \in \{ \mathbf{G}, \mathbf{Psd}, \mathbf{Sd} \}\) and various cautious learning restrictions \(\delta\).

Figure 4: Relation of \([\mathbf{Txt}\beta\delta\mathbf{Bc}]\) for \(\beta \in \{ \mathbf{G}, \mathbf{Psd}, \mathbf{Sd} \}\) and various cautious learning restrictions \(\delta\).


[ 2020 ] [ 2018 ]

2020 [ nach oben ]

  • Non-Monotone Submodular M... - Download
    Doskoč, Vanja; Friedrich, Tobias; Göbel, Andreas; Neumann, Aneta; Neumann, Frank; Quinzan, FrancescoNon-Monotone Submodular Maximization with Multiple Knapsacks in Static and Dynamic Settings. European Conference on Artificial Intelligence (ECAI) 2020: 435-442
  • Cautious Limit Learning - Download
    Doskoč, Vanja; Kötzing, TimoCautious Limit Learning. Algorithmic Learning Theory (ALT) 2020: 251-276

2018 [ nach oben ]

  • Confident Iterative Learn... - Download
    Doskoč, VanjaConfident Iterative Learning in Computational Learning Theory. Current Trends in Theory and Practice of Computer Science (SOFSEM) 2018: 30-42