Statistical learning
Statistical learning (SL) refers to the extraction of patterns in the environment, in particular the statistical regularities in sequences of environmental events. In the research literature, SL has largely replaced the concept of implicit learning (Reber, 1967) and has also been presented as an alternative to rule-based grammar. The concept does involve implicit learning of statistical regularities, but it differs from implicit and other non-declarative learning by a link to particular research paradigms, which will be presented in the section on early studies of statistical learning. Also, SL is said to depend on a domain-general learning mechanism that is involved in language acquisition. I shall therefore discuss the implications of research on SL with respect to both a theoretical conception of developmental language impairments and improved methods of remedial treatment. First, I shall give a brief introduction to the theoretical issues on developmental language impairments that have been recently discussed in the research literature.
Developmental language impairments
Language impairment is a diminished ability to acquire and make use of language and should therefore be distinguished from speech impairments. Moreover, it is a developmental impairment that will be distinguished from acquired impairments of language due to brain damage or disease. The diagnostic term that is widely accepted in the research literature is specific language impairment (SLI), a common developmental disorder whose prevalence has been estimated at around 7% (Leonard, 1998). However, since the diagnostic criteria of SLI are widely discussed, it will be difficult to assess a precise measure of prevalence.
SLI is defined by a set of criteria of inclusion that are based on standardized tests of language fluency and grammar ability. In addition, SLI is defined by a set of criteria of exclusion, one of which is non-verbal IQ below a critical level. Others include sensory-motor disorders, for example deafness. The discrepancy criteria between language and non-language functions rest on the assumption that SLI is specific to language. This position is opposite the one taken by Ullman and Pierpoint (2005). The problem of defining the specific component in SLI has led to major discussions of the adequacy of the term itself (Bishop, 2014; Lian, 2016, Chapter 2). In this article, I will present a number of studies that show the involvement of SL in language acquisition and the association of developmental language impairment with deficits in SL. I shall start by presenting some important approaches and paradigms used in research on SL.
Early studies of statistical learning
Word segmentation – This concept refers to the process of segmenting a continuous stream of vocal sounds into words, a basic precondition for the development of vocabulary. The spectrogram of a spoken sentence shows no silent intervals between words. Rather, wave forms may as well be broken or extinguished within words, which shows there are no acoustical cues for the segmentation of words in sentences. So how do children segment words in speech? Saffran (2003) argued that English-speaking children cannot innately know that pretty and baby are words, whereas tyba is not. She argued that the transition probabilities (TBs) between the sounds of speech form a major cue for the segmentation of words in spoken language. While the probability that pre is followed by ty is high, the probability that pre is followed by ba is negligible. Saffran, Aslin, and Newport (1996) asked whether eight-month-old infants could solve the word segmentation problem when presented with a continuous stream of speech syllables where all cues from pauses and intonation were eliminated. It turned out that infants could discriminate between sequences of syllables that varied in their statistical properties. The probability that one syllable followed another was the crucial cue that helped infants solve the segmentation problem.
The speech syllables used in this experiment did not form words in the native language of the infant participants. In this sense, the infants responded to a statistical non-linguistic task. Hence, it could be questioned whether the learning results actually pertained to language at all. Therefore, Saffran asked:
…whether infants actually use [the] statistical learning mechanism in real world language acquisition? One way to address this question is to ask what infants are actually learning in our segmentation task. Are they learning statistics? Or are they using statistics to learn language? Our results suggest that when infants being raised in English-speaking environments have segmented the sound strings, they treat these nonsensical patterns as English words (Saffran, 2003, p. 112).
Aslin (2017) pointed out that learning in the Saffran et al. (1996) work took place without any form of instruction, reinforcement, or feedback. Also, he argued that SL does not rely on any language-specific mechanism since it is also demonstrated for segmentation of tone sequences (Saffran, Johnson, Aslin, & Newport, 1999). Moreover, Toro and Trobalon (2005) demonstrated statistical computations over a speech stream in a rodent, and Meyer and Olson (2011) have reported statistical learning of visual transitions in monkeys. In view of these works, Aslin concluded that SL exists across modalities, behavioral domains, and species. I think this is a very strong statement, especially the part that claims the existence of SL across species. Here more research is needed to demonstrate SL by pre-human primates (but see the study of Saffran, Hauser, Seibel, Kapfhamer, Tsao, and Cushman (2008), which is reviewed below). In any event, Aslin’s claim of modality domain and species generality invites a discussion of the evolutionary origin of SL.
Unidirectional dependency relations in language – So far we have seen how infants detect patterns of events defined by transitional probabilities. Given the general applicability of SL, as claimed by Aslin, we may ask whether it also applies to learning of unidirectional dependency relations as defined by conditional probabilities between events in a continuous stream of spoken syllables. Also, given the species generality of SL, we may ask whether monkeys are able to learn unidirectional dependency relations on a par with human infants.
Natural languages worldwide have particular phrase structures, and generally these structures contain unidirectional dependency relations. The presence of articles like the or a predict a noun downstream not vice versa. Other languages may be construed with correlational, not predictive dependency, relations, but they do not exist among the natural languages. Previous research (Saffran, 2002) showed that children learn predictive languages (P-languages) more easily than non-predictive languages (NP-languages). Moreover, Saffran also showed that similar results were obtained in studies using nonlinguistic materials. Therefore, Saffran et al. (2008) asked whether this ability has to do with language at all. “Is the ability to detect predictive dependencies and its relationship to linguistic structure accidental?” (p. 480). Alternatively, human languages may have been sculpted by learning constraints that evolved in the early history of mankind. This seems to be the position taken by Saffran et al. In other words, they argued for the priority of learning constraints, which have finally given rise to the phrase structures that contain unidirectional dependency relations in language. If these constraints evolved early on, as an aspect of the evolution of language, it may be questioned whether they also exist in sub-human primates. Saffran et al., therefore, studied grammatical pattern learning by human infants (12.5–13.0 months) and by adult ‘cotton-top’ tamarin monkeys. The patterns formed artificial grammars of non-words, organized into ‘word’ classes A, C, D, F, G. One half of these grammars contained predictive dependencies (P-languages). Here phrases are built by elements that are predictively related. Thus in one phrase, an element from the D class must be predicted by an element from the A class, but an element from A does not predict an element from D. In the P-languages, predictability exists both within and between phrases (i.e., hierarchical structure within sentences). P-languages are therefore representative of natural languages.
In the NP-languages, dependencies within phrases were not present. Yet the NP-languages had a sort of phrase structure. Thus a phrase, say CP, is defined as the union between elements from C and G. When the one is lacking, the other must be present (correlational structure). This pattern is uncharacteristic of natural languages. Eight sentences from each of the P- and NP-languages, spoken by a trained female speaker, were recorded and presented in a familiarization session.
The test items contained four sentences shared between the P- and NP-languages, and four sentences/impossible patterns in both languages were presented as ungrammatical test items. Infants and monkeys were tested with comparable methods using a head turn preference procedure. Duration of head turns was used as a measure for the discrimination of grammatical and ungrammatical sentences.
Experiments 1 and 2 were run with infants. In Experiment 1, each word class contained only one non-word. Meanwhile, Experiment 2 was run with more non-words in each word class. In both experiments, infants discriminated significantly after being exposed to the P-language but not to the NP-language. Experiments 3 and 4 were run with ‘cotton-top’ tamarin monkeys. In Experiment 3, each word class contained only one non-word. Meanwhile, Experiment 4 was run with more non-words in each word class. In Experiment 3, monkeys discriminated significantly between grammatical and non-grammatical strings after being exposed to a P-language but not to an NP-language. These results were comparable to the results for infants in Experiment 1. In Experiment 4 (with complex patterns), monkeys were not able to discriminate between grammatical and non-grammatical strings, regardless of whether they had been exposed to a P- or NP-language. In short, tamarin monkeys also showed some learning of the simple predictive patterns, but they failed to learn these patterns when multiple tokens from each word class were used.
Artificial grammar (AG) – The main challenge for the young child who is about to learn her first language is comprehension and making use of the hierarchical structure, which involves nonadjacent dependencies. Consider for example this sentence: The man on the sofa has aching legs (i.e., the man, not the sofa, has aching legs). Hsu and Bishop (2010) argued that the detection of predictive dependencies, as shown in the experiments of Saffran et al. (2008), did not imply that the acquired knowledge was hierarchical in nature. “Nevertheless, the study demonstrates that this type of learning can extend to relationships between word categories that are hierarchically organized and is not restricted to learning relationships between individual word items or relationships that can be sequentially characterized” (p. 266). Comprehension of hierarchical structures requires learning of non-adjacent dependencies. Gómez (2002) and Grunow, Spaulding, Gómez, and Plante (2006) presented children with three nonsense word strings, aXb, where a and b were the same non-words and X represented a set of 3, 12, or 24 non-words. In these experiments, the number of presentations of the a/b tokens is constant, while variability refers to the number of X tokens in the string. It turned out that children could discriminate between grammatical and non-grammatical strings in only the high-variability condition (24 words). Gómez argued that children in the low-variability condition tended to focus on the co-occurrence of adjacent elements and therefore missed the a/b relationship. However, adults with language-based learning disabilities did not perform above chance level in any of the variability conditions.
In a more recent study, von Koss Torkildsen, Dailey, Aguilar, Gómez, and Plante (2013) showed that the aXb grammatical form generalized to other forms, such aX and Yb, where a and b were single non-words while X and Y were represented by 3 or 24 non-words. Sixteen students with normal language (NL) development and sixteen students with language-based learning disability (LLD) served as participants. Half of each group was assigned the low-variability condition, while the other half was assigned the high-variability condition. After a familiarization phase, the participants were tested for recognition of strings heard and for generalization of the grammar to non-word strings with a new X or Y element. The results showed that among the LLD students, only participants in the high-variability group were able to demonstrate generalization of the underlying grammar. NL participants in both high- and low-variability groups showed generalization of the grammar.
The findings from Grunow et al. (2006) and von Koss Torkildsen et al. (2013) have commonly been interpreted as poor sensitivity to statistical information in the linguistic input by language impaired individuals. This interpretation has been corroborated by Hsu, Tomblin and Christiansen (2014), who reported impaired statistical learning of non-adjacent dependencies in adolescents with SLI. The question is whether these impairments are caused by dysfunction of the nerve structures that support statistical learning. In a later section, therefore, I will discuss the extent to which neural correlates of statistical learning overlap the neural basis of language learning difficulties.
Serial Reaction Tasks (SRT) – Four circles are presented horizontally on a computer screen, and whenever one of them is lit up, the participant is told to press a button on the response pad that matches the location of the visual stimulus. Participants are not told that the stimuli are presented in a fixed sequence, usually ten items long, for example, 4, 2, 3, 1, 3, 2, 4, 3, 2, 1, where each stimulus presentation corresponds to a particular location on the screen. Learning is measured as improvement in accuracy and/or reaction time (RT) compared to a randomly ordered sequence. The Alternating Serial Reaction Time Task is similar to SRT except that it inserts random items within the fixed series of stimuli.
Lum, Conti-Ramsden, Morgan, and Ullman (2014) presented a meta-analysis of research on statistical learning, assessed with the SRT. It showed that statistical learning by participants with SLI is significantly impaired compared to controls.
The procedural deficit hypothesis
In a recent and more comprehensive meta-analysis (based on more research tasks) of statistical learning with individuals with SLI and Autism Spectrum Disorder (ASD), Obeid, Brooks, Powers, Gillespie-Lynch, and Lum (2016) showed that statistical learning is significantly impaired among participants with SLI but generally intact in individuals with ASD. The results supported Ullman and Pierpont’s procedural deficit hypothesis, which claims that developmental language impairments depend on deficits in a distinct underlying mechanism. This mechanism is described in Ullman’s (2004) declarative/procedural model.
This model deals with memory systems and their relationships to brain structures of different evolutionary origins. It can be traced back to early studies of amnesia (Cohen & Squire, 1980; Masson & Graf, 1993). Ullman proposed that declarative memory and its neural substrates were linked to the mental lexicon, whereas procedural memory was linked to aspects of grammar. Also, he assumed that declarative memory depends, for the most part, on the medial temporal lobe structures such as the hippocampus, the entorhinal and the perirhinal cortex. Procedural memory depends on frontal-basal ganglia circuits, which are highly interconnected with the neostriatum. The parietal cortex, the superior temporal cortex, and the cerebellum play a likely role as well. Ullman and Pierpoint (2005) maintained that developmental language impairment was due to abnormalities in the structures underlying the procedural memory system (i.e., the procedural deficit hypothesis, PDH). This made language impairment a problem of procedural learning.
Peterson, Folia, and Hagoort (2010) reported the neurobiological correlates in an fMRI study of AG learning. They constructed a right-linear unification of letters presented (letter by letter) on a computer screen. They showed that the left inferior frontal gyrus was engaged during the processing of letter sequences. In view of the corticostriatal circuits in this area, however, which involve the basal ganglia, their results may also support Ullman and Pierpont’s description of the procedural system. Moreover, their results showed a deactivation of the medial temporal lobe during learning of the letter sequences and could therefore be said to support Ullman’s declarative/procedural model.
Neural correlates of procedural and statistical learning
How does procedural learning relate to statistical learning? Both may be considered as forms of implicit memory and may apply to real-time sequences: sensory, motor, or cognitive. To some extent, therefore, they may rely on the same brain structures. As a result, Peterson et al.’s work may be said to apply equally to the neural basis of procedural and statistical learning. However, procedural learning is slow and incremental, whereas statistical learning is relatively fast.
Consider for example segmentation of syllables by infants, which takes place after two minutes of passive listening (Aslin, 2017). There are functional differences, which mean that the concepts of statistical and procedural learning should not be conflated. Yet there may be some degree of overlap between the neural substrates underlying the two forms of learning. Thus, previous AGL studies have shown a considerable involvement of the basal ganglia (Conway & Pisoni, 2008). However, these and other studies mentioned above made use of paradigms that relied on post-exposure testing of familiar vs. novel strings.
In an fMRI study, Karuza, Newport, Aslin, Starling, Tivarus, and Bavelier (2013) collected functional imaging data during the exposure phase in order to investigate the learning process while it unfolded. Whole brain delta analysis showed significant activity in the left inferior frontal gyrus. Meanwhile, more sensitive measures (pre-threshold striatal masks) also showed basal ganglia involvement during word segmentation. On this account, considerable overlap of neural structures serving SL and those hypothesized for the procedural memory system in Ullman’s declarative-procedural model may be argued. However, functional differences, in particular differences in speed of acquisition in SL and procedural learning, require extended research on the neural basis of both types of learning.
Consider also the problem of whether language-impaired children may suffer from abnormalities in the very same structures supposed to underlie statistical learning. These children have difficulties in learning sequential-procedural tasks, not in non-sequential mapping tasks. To a great extent, therefore, detection of statistical regularities in rapid sequences of environmental events is a core problem for these children. There is evidence of subcortical abnormalities in children with SLI, in particular striatal abnormalities with an increased volume of the caudate nucleus, which is affected by age (Ullman, 2004).
However, Krishnan, Watkins, and Bishop (2016) argued that since corticostriatal systems are involved in complex motor routines relevant for language, less focus on the striatal structures and more emphasis on corticostriatal networks will offer greater insights into the neurological bases for language learning difficulties. Given that the left inferior frontal gyrus is functionally part of these networks, it may be argued that the neural correlates of SL also belong to the mechanisms that are atypically developed by children with language impairments. I shall therefore discuss how recent studies of SL may contribute to improving methods of remedial treatment for children with developmental language impairments.
Research on structural sequence processing: prospects for remedial treatment of language impairments
To improve language function in language-impaired children, the choice of training regimen is of prime importance. Does the type of materials, linguistic versus non-linguistic items, have different effects on motivation and adherence? Following Ullman’s (2004) assertion that language shares a number of important biological and computational substrates with memory, training domain-general abilities using nonlinguistic materials may not only be a viable option but also have motivational advantages. Conway, Gremp, Walk, Bauernschmidt, and Pisoni (2012) discussed whether enhancement of domain-general abilities also improves language function. They studied the statistical learning of sequences of lit-up circles in a 4 x 4 matrix of circles on a touch-screen monitor. (This research task has also been used in a work (to be reviewed below) on structured sequence processing by Smith, Conway, Bauernschmidt, and Pisoni (2015).)
By the time Conway et al. (2012) published their work, it was widely accepted that statistical learning is important for language acquisition and processing. Yet “there had been no published attempts to improve statistical learning or any other non-declarative learning ability” (p. 314). More research was also needed on the empirical association between statistical learning and language comprehension. Is it possible to show that improvement of statistical learning also transfers to language function?
The methodological approach used by Conway et al. involves a test of visuospatial working memory (WM), which had previously been linked to language function. The ‘phonological loop’ in Baddeley and Hitch’s (1974) model of WM had been described as a ‘language learning device’. (See the classical work of Baddeley, Gathercole, and Papagno (1998).)
In a meta-analysis of the effects of working memory training, Melby-Lervåg, Redick, and Hulme (2016) showed significant and moderate improvements in verbal ability, word decoding, and reading comprehension when tested right after training (immediate effects), whereas follow-up effects were non-significant. The studies included in this analysis made use of computerized working memory tasks (visuospatial span, backward digit span, letter span) and a choice reaction time task. Other studies reported by Melby-Lervåg et al. made use of complex memory span tasks wherein the subject is told to recall the target stimuli after completion of a distractor processing task. Several studies also made use of variations of the N-back recall task.
There are important differences in the design of tasks included in the meta-analysis of Melby-Lervåg et al. and those used in the Conway et al. (2012) study. In the former tasks (e.g., WM tasks designed to measure the digit (or word) span), items are presented randomly for immediate reproduction. In Conway et al.’s experiments of visuospatial WM tasks, items (lit-up circles) were not presented randomly. Unknown to the participant, each circle could be followed by only one in a subset of circles. Therefore, the sequence presented in a trial had an underlying statistical structure that can be implicitly learned. However, each new trial incorporated a new set of statistical regularities to make the learning of a variety of patterns, and not a specific sequential pattern, possible. In addition, the length of sequences presented was adaptively adjusted based on the performance level of each participant. The experimental task used in Conway et al. has therefore been called a structural sequence processing (SSP) task.
Conway et al. (2012) ran two experiments, one with healthy adults and typical language development, and one with 23 deaf or hard-of-hearing children (mean age 8.2). On Day 1 of Experiment 1, participants were given a set of pre-training measures (sequential learning, verbal short-term memory, and the Stroop color and word test). Training sessions took place days on Days 2 through 5, with no sessions lasting longer that 45 minutes. The same tests were given upon completion of training on Day 6 to show any improvements on these non-trained tasks. In the training session, participants were assigned one of three groups: Group 1 was given an adaptive and statistically constrained version of the task, Group 2 was given an adaptive version with pseudo-random sequences of stimuli, and Group 3 was given a non-adaptive and statistically non-constrained version of the task. Twenty adults participated in each of Groups 1 and 3 and 16 participated in Group 2. Post-training measures showed some improvements in working memory and executive control, but only Group 1 showed improvement on a non-trained sequential learning task. Group 2 showed worse performance following training, while Group 3 showed no effect from training. Conway et al. (2012) concluded:
…training participants to interact with random patterns actually hampers their ability to learn structured patterns following training. On the other hand, training participants to interact with structured patterns not only leads to marginally better abilities to learn structured patterns following training, but also improves other WM and executive functions (p. 323).
The second experiment addressed the question of whether delayed language development can be linked to poor statistical learning. Among the hard-of-hearing children participating in the second experiment, 10 had bilateral cochlear implants, eight had one implant and one hearing aid, while hearing aids were fitted in both ears for the remaining five children. All children showed delayed language development. The experiment made use of the same 4 x 4 matrix of circles on a touch-screen monitor, and the participants were assigned to one of two groups matched for chronological age. The training condition for Group 1 was adaptive with sequences that conformed to underlying statistical regularities. In Group 2, the condition was non-adaptive and the sequences were pseudo-random. Training sessions lasted 10 days for both groups. The following pre- and post-training measures were obtained for participants in both groups: Children’s Test of Nonword Repetition (Gathercole & Baddeley, 1990) and a constrained measure of visual sequence learning. Only children in Group 1 showed a significant reduction in mean number of syllable errors in the repetition of non-words that were vocally presented to participants via a loudspeaker (70-75 dBSPL), and only children in this group showed a significant improvement in the reproduction of statistically constrained sequences.
Conway et al. (2012) concluded that domain-general learning abilities can be enhanced by particular training regimens and that enhancement of these abilities also serves to improve language function. Notice, however, that the hard-of-hearing children in Conway et al.’s second experiment did not have an SLI diagnoses and the Test of Nonword Repetition is a select test of language function. Therefore, more research is needed to show the extent of transfer to other language functions.
Yet it should also be stressed that Conway et al.’s work has more extensive implications with respect to language acquisition. The longstanding argument that humans have domain-specific mechanisms that evolved to serve language acquisition was not supported. Rather, their work contributed to the alternative hypothesis that language depended on a domain-general mechanism commonly referred to as structured sequence processing (SSP).
More research on the relation between SSP and language function
The relationship between SSP and language has been investigated by a number of researchers. For example, Christiansen, Conway, and Onnis (2012), in a study of event-related brain potentials, showed that structural irregularities in an SSP task and syntactic violations had similar effects on the P600 component. Moreover, Gabay, Thiessen, and Holt (2015) have reported impaired statistical learning by children with developmental dyslexia, which underscores the prospects of further research in the area. However, the application of this research in a clinical setting will depend on more knowledge of the mechanisms underlying the transfer effects of SSP training.
Smith, Conway, Bauernschmidt, and Pisoni (2015) studied structural sequence processing by 66 adult participants who were randomly assigned one of three groups: Group 1 was engaged in adaptive sequence training, Group 2 was given adaptive training without structural regularities, while Group 3 engaged in non-adaptive unstructured sequences. All participants were given the following pre- and post-training tests: Speech Recognition in Noise and Statistical Sequential Learning.
In the former test, which was used to assess language ability, participants listened to spectrally degraded sentences and were told to write down the last word they heard. In half of the sentences, the last word was highly predictable, whereas the other half consisted of anomalous sentences with unpredictable last words. The language score was defined as the number of correct words in the high-predictability condition minus the number of correct words in the low-predictability condition. Sequence training was run on consecutive 4–5 days. According to the researchers’ first hypothesis, SSP training would result in improvements to both a non-trained SSP task and the Speech Recognition in Noise task. This hypothesis was examined by comparing group means in a multivariate analysis of variance (MANOVA).
Smith et al. (2015) also made use of a mediational model to explain any mediating relationships between the variables. They argued that SSP training may have both direct and indirect effects on the language task. “Direct effects are when the independent variable (IV) directly impacts the dependent variable (DV). Indirect effects are when the IV impacts the DV through the mediation of a third variable, called the mediator (M).” They predicted a significant indirect effect on language processing for the SSP training group (Group 1).
Group 1’s score was higher from pre- to post-test, yet the overall comparison of the post-training means showed no group differences. Moreover, the univariate follow-up analysis indicated no significant effect of testing time and no interaction with the group. The mediational model analysis showed two competing effects of SSP training. The first of these effects, the indirect effect, meant that adaptive sequence training had a positive effect on SSP, which in turn had a positive effect on language processing. In contrast, the second and direct effect on language processing was a negative one. Thus, adaptive and structured sequence training in Group 1 actually hampered language scores from pre- to post-training. A possible explanation mentioned by Smith et al. (2015) was that structured sequence training may have interfered with knowledge of language regularities. The problem is: why did this knowledge interfere with the relative direct and not the relative indirect effect?
First of all, it may be questioned whether Smith et al.’s (2015) choice of language test was a good one. By using spectrally degraded sentences, the test may have assessed focused auditory attention rather than comprehension of sentence structure, lowering its validity. (By recording the different scores between predictable and non-predictable sentences they may, to some extent, be said to deal with this problem.) Obviously, degraded sentences were needed to avoid ceiling effects by adult participants from Indiana University. A sample of younger children with typical language development may give rise to an adequate variance on a standardized test, say a subtest of CELF 4 (e.g., Receptive Language, Phonological Awareness, or Language Structure).
Yet the indirect effects of SSP training on language processing in Smith et al.’s work are highly noticeable. Together with other works mentioned above, it forms an impetus to extended research on the relationship between statistical learning and language. Smith et al. therefore concluded their work with the following statements:
These findings have two implications. At a practical level, these findings show how fundamental learning abilities and language processing skills might be improved in typical and atypical development. At a theoretical level, these findings not only highlight the plasticity of statistical sequential learning and SSP but also show an underlying link between SSP and language, which in turn lends additional weight to the view that language acquisition is based in large part on domain-general mechanisms rather than language-specific modules or neural structures that solely mediate language alone (PLoS ONE, 16/18).
I share Smith et al.’s optimistic conclusion, and I will add a few remarks on the practical level of their work. If corroborated by further research, SSP training will make possible use of non-linguistic methods in a remedial treatment program for children with language impairments. For these children, attending to and repeating visual-motor patterns may be a viable option to linguistic instruction.
Statistical learning refers to the extraction of patterns in the environment, in particular the statistical regularities in sequences of environmental events.
Conclusion
In this paper, I have presented the main experimental paradigms for the study of statistical learning and have given the following four arguments for why this field of research also contributes to an understanding of developmental language impairments and its remedial treatments:
- Children with language-related difficulties perform below the scores obtained from typically developing children on a number of SL tasks.
- There is a great overlap between brain structures underlying SL and the neurobiological basis of developmental language impairments.
- Statistical learning in a non-linguistic domain task transfers to language.
- Research on SSP shows why remedial treatment using non-linguistic training tasks is recommended.
Studies of statistical learning—including word segmentation, predictive dependencies in grammar, artificial grammar, and structured sequence processing—all represent a form of learning that is crucial in language acquisition. Saffran (2003) argued that statistical learning is constrained, not open-minded. Rather, it shows a ‘wired-in’ ability to calculate some statistics more readily than others. She advanced the constrained statistical learning framework as an alternative to Chomsky’s now classical interpretation of language universals, which are pre-specified in the child’s linguistic capacities. In contrast, Saffran argued that learning constraints may themselves have sculpted the languages to fit the learning capacities of infants and young learners. Hence, there is a reciprocal and interactional relationship between basic learning mechanisms and the linguistic structures of natural languages. This position agrees with Deacon’s (1998) influential theory of the co-evolution of brain and language.
References
Aslin, R. N. (2017). Statistical learning: a powerful mechanism that operates by mere exposure. WIREs Cognitive Science, 8(1–2), e1373. doi:10.1002/wcs.1373
Baddeley, A. D., Gathercole, S. E., & Papagno, C. (1998). The phonological loop as a language learning device. Psychological Review, 105, 158–173. doi:10.1037/0033-295X.105.1.158
Baddeley, A. D. & Hitch, G. J. (1974). Working memory. In G. H. Bower (Ed.), The Psychology of Learning and Motivation (Vol. 8). London: Academic Press. doi:10.1016/s0079-7421(08)60452-1
Bishop, D. V. (2014). Ten questions about terminology for children with unexplained language problems. International Journal of Language and Communication. Disorders, 49, 381–415. doi:10.1111/1460-6984.12101
Christiansen, M. H., Conway C. M., & Onnis, L. (2012). Similar neural correlates for language and sequential learning: Evidence from event-related brain potentials. Language and Cognitive Processes, 27(2), 231–256. doi:10.1080/01690965.2011.606666
Cohen, N. J., & Squire, L. R. (1980). Retrograde amnesia and remote memory impairment. Neuropsychologia, 19, 337–356. doi:10.1016/0028-3932(81)90064-6
Conway, C. M., Gremp, M. A., Walk, A. D., Bauernschmidt, A., & Pisoni, D. B.(2012). Can we enhance domain-general learning abilities to improve language function? In P. Rebuschat & J. N. Williams (Eds.), Statistical Learning and Language Acquisition. Berlin: De Gruyter Mouton.
Conway, C. M., & Pisoni, D. B. (2008).Neurocognitive basis of implicit learning of sequential structure and its relation to language processing. Annals of the New York Academy of Sciences, 1145, 113–131. doi:10.1196/annals.1416.009
Deacon, T. (1997). The Symbolic species. The Co-Evolution of Language and the Human Brain. London: Penguin Books.
Gabay, Y., Thiessen, E., & Holt, L. (2015). Impaired statistical learning in developmental dyslexia. Journal of Speech, Language, and Hearing Research, 58, 934–945. doi:10.1044/2015_JSLHR-L-14-0324
Gathercole, S. E., & Baddeley, A. D. (1990). Phonological memory deficits in language disordered children. Is there a causal connection? Journal of Memory and Language, 29, 336–360. doi:10.1016/0749-596X(90)90004-J
Gómez, R. L. (2002). Variability and detection of invariant structure. Psychological Science, 13, 431–436. doi:10.1111/1467-9280.00476
Grunow, H., Spaulding, T. J., Gómez, R. L., & Plante, E. (2006). The effects of variation on learning word order rules by adults with and without language-based learning disabilities. Journal of Communication Disorders, 39, 158–170. doi:10.1016/j.jcomdis.2005.11.004
Hsu, H. J., & Bishop, D. V. (2010). Grammatical difficulties in children with specific language impairment: Is learning deficient? Human Development, 53, 264–277. doi:10.1159/000321289
Hsu, H. J., Tomblin, J. B., & Christiansen, M. H. (2014). Impaired statistical learning of non-adjacent dependencies in adolescents with specific language impairment. Frontiers in Psychology, 5, 175. doi:10.3389/fpsyg.2014.00175
Karuza, E. A., Newport, E. L., Aslin, R. N., Starling, S. J., Tivarus, M. E., Bavelier, D. (2013). The neural correlates of statistical learning in a word segmentation task: An fMRI study. Brain Lang, 127, 46–54. doi:10.1016/j.bandl.2012.11.007
Krishnan, S., Watkins, K. E., & Bishop, D. V. (2016). Neurobiological basis of language learning difficulties. Trends in Cognitive Sciences, 20, 701–714. doi:10.1016/j.tics.2016.06.012
Leonard, L. (1998). Children with Specific Language Impairment. Cambridge: MIT Press.
Lian, A. (2016). Language Evolution and Developmental Impairments. London: Palgrave Macmillan Publishers. doi:10.1057/978-1-137-58746-6
Lum, J. A., Conti-Ramsden, G., Morgan, A., Ullman, M. T. (2014). Procedural learning deficits in specific language impairment (SLI): A meta-analysis of serial reaction time task performance. Cortex, 51, 1–10. doi:10.1016/j.cortex.2013.10.011
Masson, M. E. J., & Graf, P. (1993). Looking back and into the future. In P. Graf and M. E. J. Masson (Eds.), Implicit Memory: New Directions in Cognition, Development and Neuropsychology. Hillsdale, NJ: Lawrence Erlbaum Inc.
Melby-Lervåg, M., Redick, T. S., & Hulme, C. (2016). Working memory training does not improve performance on measures of intelligence or other measures of“far transfer”: Evidence from a meta-analytic review. Perspectives on Psychological Science, 11, 512–534. doi:10.1177/1745691616635612
Meyer, T., & Olson, C. R. (2011). Statistical learning of visual transitions in monkey inferotemporal cortex. Proceedings of the National Academy of Sciences, 108, 19401–19406. doi:10.1073/pnas.1112895108
Obeid, R., Brooks, P. J., Powers, K. L., Gillespie-Lynch, K., & Lum, J. A. (2016). Statistical learning in specific language impairment and autism spectrum disorder: A meta-analysis. Frontiers in Psychology, 7, 1245. doi:10.3389/fpsyg.2016.01245
Peterson, K. M, Folia, V., & Hagoort, P. (2010). What artificial grammar learning reveals about the neurobiology of syntax. Brain & Language, 120(2), 83–95. doi:10.1016/j.bandl.2010.08.003
Reber, A. S. (1967). Implicit learning of artificial grammars. Journal of Verbal Learning and Verbal Behavior, 6, 855–863. doi:10.1016/S0022-5371(67)80149-X
Saffran, J. R. (2002). Constraints on statistical language learning. Journal of Memory and Language, 47, 172–196. doi:10.1006/jmla.2001.2839
Saffran, J. R. (2003). Statistical language learning: Mechanisms and constraints. Current Directions in Psychological Science, 12, 110–114. doi:10.1111/1467-8721.01243
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274, 1926–1928. doi:10.1126/science.274.5294.1926
Saffran, J. R., Hauser, M., Seibel, R., Kapfhamer, J., Tsao, F., & Cushman, F. (2008). Grammatical pattern learning by human infants and cotton-top tamarin monkeys. Cognition, 107, 479–500. doi:10.1016/j.cognition.2007.10.010
Saffran, J. R., Johnson, E. K., Aslin, R. N., & Newport, E. L. (1999). Statistical learning of tone sequences by human infants and adults. Cognition, 70, 27–52. doi:10.1016/S0010-0277(98)00075-4
Smith, G. N. L., Conway, C. M., Bauernschmidt, A., & Pisoni, D. B. (2015). Can we improve structured sequence processing? Effects of computerized training using a mediational model. PLoS ONE, 10, e0127148. doi:10.1371/journal.pone.0127148
Toro, J. M., & Trobalon, J. B. (2005). Statistical computations over a speech stream in a rodent. Percept Psychophys, 67, 867–875. doi:10.3758/BF03193539
Ullman, M. T. (2004). Contributions of memory circuits to language: The declarative/procedural model. Cognition, 92, 231–270. doi:10.1016/j.cognition.2003.10.008
Ullman, M. T., & Pierpoint, E. I. (2005). Specific language impairment is not specific to language: The procedural deficit hypothesis. Cortex, 41, 399–433. doi:10.1016/S0010-9452(08)70276-4
Von Koss Torkildsen, J., Dailey, N. S., Aguilar, J. M., Gómez, R., & Plante, E. (2013). Exemplar variability facilitates rapid learning of an otherwise unlearnable grammar by individuals with language-based learning disability. Journal of Speech, Language, and Hearing Research, 56, 618–629. doi:10.1044/1092-4388(2012/11-0125)