The DesPho-APaDy Project: Developing an acoustic-phonetic characterization of dysarthric speech in French. C. Fougeron1, L. Crevier-Buchman1, C. Fredouille2, A. Ghio3, C. Meunier3, C. Chevrie-Muller4, N. Audibert2, J.-F. Bonastre2, A. Colazo Simon1, C. Delooze3, D. Duez3, C. Gendrot1, T. Legou3, N. Levèque1, C. Pillot-Loiseau1, S. Pinto3, G. Pouchoulin2, D. Robert3, J. Vaissiere1, F. Viallet3, C. Vincent1.
1 Lab. de Phonétique et Phonologie, UMR 7018 CNRS-Paris3/Sorbonne Nouvelle, Paris, France
2 University of Avignon, CERI/LIA, Avignon, France
3 Lab. Parole et Langage, UMR 6057 CNRS Aix-Marseille Univ., Aix-en-Provence, France
4 Lab. MoDyCo, UMR 7114, CNRS- Université Paris 10, Paris, France
E-mail : cecile.fougeron@univ-paris3.fr, corinne.fredouille@univ-avignon.fr, alain.ghio@lpl-aix.fr
Abstract
This paper presents the rationale, objectives and advances of an on-going project (the DesPho-APaDy project funded by the French National Agency of Research) which aims to provide a systematic and quantified description of French dysarthric speech, over a large population of patients and three dysarthria types (related to the parkinson's disease, the Amyotrophic Lateral Sclerosis disease, and a pure cerebellar alteration). The two French corpora of dysarthric patients, from which the speech data have been selected for analysis purposes, are firstly described. Secondly, this paper discusses and outlines the requirement of a structured and organized computerized platform in order to store, organize and make accessible (for selected and protected usage) dysarthric speech corpora and associated patients’ clinical information (mostly disseminated in different locations: labs, hospitals, …). The design of both a computer database and a multi-field query interface is proposed for the clinical context. Finally, advances of the project related to the selection of the population used for the dysarthria analysis, the preprocessing of the speech files, their orthographic transcription and their automatic alignment are also presented.
1. Introduction
on etiological and/or neuroanatomical criteria
Dysarthria refers to neurologically-based speech
(localization of lesion site) (see Grewel, 1957; Auzou et
disturbances. It results from damage to the central and/or
al., 2007 for a review).Although the main features that
peripheral nervous system that impairs the transmission
differentiate ‘typical’ patients affected by different
of neural messages to the muscles involved in speech
dysarthria types have been identified, the study of
production. Dysarthria is therefore the expression of a
dysarthrias needs more comprehensive phonetic
deficit in the motor execution of speech movements, and
descriptions to overcome the great diversity observed in
thus a motoric speech disorder. Strength, speed, range,
rigidity, coordination and precision of speech gestures
In the following section, we will present the rationale and
can be altered at any level of the speech production
the main objectives of our on-going research project on
system (respiratory, phonatory, supralaryngeal).
the acoustic-phonetic characteristics of the speech of
Dysarthria is one of the most frequent disorders of verbal
dysarthric French patients. Section 3 describes two
communication associated with damage of the nervous
dysarthric speech corpora (with a focus on the Claude
system. Indeed, it can appear in the clinical profile of a
Chevrie-Muller corpus) and the design of a multi-field
large number of neurological disorders, including
query computer interface developed to facilitate the
cerebellar diseases, stroke, Parkinson’s disease,
management and storage of the recordings. Section 4
Amyotrophic Lateral Sclerosis (ALS), multiple sclerosis,
presents the advances of the project with a description of
cerebral palsy, and traumatic brain injury (see e.g. Duffy,
the selection procedure of the patients to be analyzed,
and the method developed for the pre-processing of the
The clinical manifestation of dysarthria and the
speech files. Finally, section 5 concludes this paper by
characteristics of the patients’ speech depend on its cause
discussing some theoretical issues related to this long-
and the disease associated with it. Therefore, a
classification of dysarthria as a unitary condition is inaccurate, and dysarthria has rather to be considered as a
2. Rationale and Objectives of the Project:
label for a group of disorders (Peacher, 1950; Grewel,
Characterizing Dysarthric Speech
1957; Darley et al., 1969a). Several classification schemes have been proposed in the literature to
2.1. Challenges
characterize different groups of dysarthrias. They are
One major challenge to overcome when trying to
either based on salient auditory-perceptual features
characterize dysarthric speech is that dysarthrias are
(phonatory, articulatory, prosodic…) that are used to
complex disorders. All dysarthrias stem from defined
characterize specific articulatory or kinematic behaviors
neuropathological conditions with a deficit in the spatio-
(e.g. ataxic, hypokinetic dysarthrias - Darley et al.,
temporal execution of speech movements. However,
1969a; Darley et al., 1969b; Darley et al., 1975) or based
muscular weakness, spasticity, coordination disorder,
involuntary movements, or altered muscle tonus will
al., 1993; Viallet et al., 2002; Mori et al., 2004; Duez,
have varied consequences on the articulatory movements
2006). Finally, very few comparisons between existing
(articulatory target undershoot or overshoot, reduced
studies have been made, and there is no overall
control of movement amplitude and speed or over time,
characterization of dysarthric speech patterns. This lack
uncoordinated speech gestures…). Moreover, all
of a comprehensive phonetic description of dysarthric
dysarthrias involve disturbances, at some varying
speech patterns can be partly explained by the following
degrees, affecting different levels of speech production:
respiratory, laryngeal, velopharyngeal (resonance), and
(a) Dysarthric speech can be very impaired and
articulatory (oro-facial) (Auzou et al., 2007; Kent et al.,
information in the speech signal is difficult to obtain and
1998). Thus dysarthria not only refers to a deficit in
analyze. Consequently, studies are often restricted to a
articulation per se, but encompasses disturbances in the
limited set of acoustic measures, and attention is usually
control of voice quality, speech rhythm, loudness,
focused on a few specific impaired aspects of the speech
segmental articulation, pitch, fluency, etc.
production system. Since all studies have not been
A second challenge stems from the vast amount of inter-
concentrated on the same acoustic cues and on the same
and intra-speaker variability. As mentioned above,
patient population, comparisons are rare. As a further
different types of dysarthria sharing common features
consequence, studies are usually restricted to small
have to be considered. While these types can be defined
cohorts of dysarthric speakers and limited to a small
by shared features (reduced pitch modulation, speech rate
perturbation, impaired coordination, nasal resonance…),
(b) The absence of a comprehensive picture of
they are not well defined by a distinctive and exclusive
dysarthric speech features can also be explained by the
set of features. Individual speaker idiosyncrasies,
fact that the majority of studies is limited to the analysis
differences in the severity of the disease, speaker-specific
of one type of dysarthria, or the comparison of at most
impairments and compensatory strategies are among the
two types of dysarthria. Although the acoustic features of
different sources of variability that have to be taken into
the major types of dysarthria have been fairly well
documented, most of the acoustic studies have focused
Given these challenges, the search for relevant and stable
on dysarthrias associated with Parkinson’s disease or
criteria in order to describe dysarthric speech patterns
needs to include multiple deviant speech dimensions, both at the segmental and the suprasegmental levels, and
Furthermore, these studies cover a restricted language
to be applied to a large population of patients for intra-
area: while significant progress has been made on the
and inter-group comparison as well as longitudinal
description of English dysarthric patients, fewer studies
were carried out on French dysarthric speakers (though see Monfrais-Pfauwadel, 1995; Robert et al., 1999;
2.2. Limitations
Baudelle et al., 2003; Gentil et al., 2003; Viallet et al.,
Even though associations between deviant acoustic-
phonetic dimensions and certain types of dysarthria have been made in clinical practice and in the clinical
Finally, different studies have been reported in the
literature, descriptions of dysarthria are often based on
literature, based on automatic methods drawing upon the
perceptual assessments as done in the precursory studies
automatic speech processing. Devoted to speech
of Darley et al. (1969; 1975). It is true that perceptual
disorders (like for instance Gu, 2005; Maier, 2007; Su,
analysis is still considered as the “gold standard” and a
2008; Middag, 2009), the large majority of these
patient is declared dysarthric because he is perceived
methods aims to provide objective assessment of the
dysarthric (Duffy, 2005). However, instrumental analysis
speech quality in order to cope with the well-known
drawbacks of the perceptual assessment like the
complementary information for the assessment and to
subjectivity for instance. Based on objective assessment,
objectively quantify descriptions of the speech patterns
they do not concentrate their efforts on the
(Collins, 1984 ; Kent et al., 1999). A review of acoustic
characterization of the dysarthric speech by the help of
studies of dysarthric speech is available in (Kent et al.,
the automatic approaches for a better understanding, as
1999). It reports that “the great majority (of studies)
proposed in a very few studies like (Teston, al., 1995;
focuses on a small set of measures and typically a very
small number of subjects”. We can add that most studies focus on a single subsystem (laryngeal, velopharyngeal,
Characterizing Dysarthric Speech:
labial articulation…) and are based on ad hoc task of
Objectives of the Project
speech production (sustained vowel, isolated sentences, diadochokinesis…). In the review done by Murdoch et
A Comprehensive Acoustic-Phonetic
al. (1998) of 17 acoustic studies, most studies were based
Description of Dysarthric Speech
on word and sentence reading, one study looked at read texts, and only two studies used spontaneous speech.
Acoustic analysis of continuous speech is thus scarce
The main objective of this project is to provide a
except in the case of prosodic studies as in (Schlenk et
systematic, quantified acoustic description of the speech
patterns of French dysarthric speakers. Three major types
demand to use them. Moreover, the development of this
of dysarthria are examined and a relatively large cohort
database is also motivated by the need to preserve a large
of patients is included in each type (see 4).
speech corpus of French dysarthric speakers recorded
A standardized procedure for the acoustic-phonetic
from 1967, the CCM database (see section 3.1.1), that
characterization of a patient’s production is proposed.
The originality of our approach comes from the combination of methods and analysis procedures drawing
While this computer database is designed to manage any
upon both phonetics and speech engineering. Thus, the
clinical content related to speech and voice disorders, it
procedure will involve both manual analysis (by human
will be firstly designed with the corpora involved in this
experts) of the acoustic phonetic properties of the
productions and automatic acoustic analysis of speech signals. A continuous back and forth between these two
Corpora of French Dysarthric
techniques should gain from the potential of both
Patients
In the context of our project, the two corpora described
A large set of acoustic-phonetic dimensions will be
below provide us with a large sample of speech data from
investigated to capture the scope of acoustic variations
French dysarthric speakers that can be used for
associated with dysarthria and to identify relevant,
comparisons between speakers, between groups of
reliable and robust criteria to characterize patients'
speakers and in some cases for longitudinal evaluations.
speech. Spectral and temporal cues, segmental and suprasegmental criteria, infra and supraglottal
3.1. The CCM Corpus:
dimensions, will be examined via a set of pre-defined
Over the past 30 years, Dr Claude Chevrie-Muller
measurements that will be used to screen all the selected
(henceforth CCM) with her team recorded at the
patients. The relevance of the criteria will be evaluated
‘Laboratoire d’étude de la voix et de la parole’ (INSERM
U3) the patients that were sent to her by different
differentiate dysarthric productions from non-
neurologists for the assessment of disordered speech and
its relation with neurological pathology. This extensive
distinguish different (sub-)types of dysarthric
work has given birth to a unique, highly valuable
historical corpus of neurological speech disorders in
monitor the evolution of dysarthria in a longitudinal
French, known as “Pathologie de la voix et de la parole
en neurologie” or “CCM corpus”. This corpus contains about 1000 hours of disordered
The feasibility and the originality of this project emerge
speech, produced by 5000 patients (adults and children)
from the collaboration of a team of researchers,
approximately, mainly suffering from dysphonia and
specialists of speech but with complementary expertise in
dysarthria, but also anarthria, aphasia, stuttering,
phonetics, clinical practice and speech engineering.
psychiatric disorders and so on. In the population of
These partners are located in Paris (Laboratoire de
adult dysarthric speakers, 860 patients were classified
Phonétique et Phonologie - LPP), Aix-en-Provence
according to their neurological diagnosis. Four main
(Laboratoire Parole et Langage - LPL), and Avignon
types of dysarthria are represented. They include three
(Laboratoire Informatique Avignon - LIA).
main groups of neurological syndromes and a group of mixed symptoms:
2.3.3. Development of a Multiple-Field Query
(1) Disorders related to an impairment of the
Database of Dysarthric Speech
extrapyramidal system. These disorders are characterized by a modification of initiation and offset of muscle tonus
Research on disordered speech is confronted with the
control with rigidity, hypokinesia and hypertonia. This
difficulty of getting appropriate and sufficiently large
group is represented by Parkinson’s disease and related
quantities of speech data, homogeneous in quality, and
Parkinson’s syndromes as well as Choreic disorders.
sufficiently documented by clinical information on the
(2) Disorders related to an impairment of the pyramidal
patients (diagnosis, medical follow-up, medication,
system (principal motor tract) and responsible for
symptoms…). Therefore, the second aim of this project
paralytic dysarthria. These can be associated with a
(and a preliminary step for our acoustic description) is to
pseudo bulbar syndrome with a bilateral spastic
design and create a computer database where digitized
component or a bulbar syndrome such as in
dysarthric speech corpora and associated patients’
clinical information, can be stored, organized and made
(3) Disorders related to an impairment of the cerebellar
accessible (for selected and protected usage) through
system which is characterized by an alteration of the
multiple-field queries. The development of this database
ongoing temporal-spatial control of the movement. These
is motivated by the fact that dysarthric speech recordings
can be seen in diseases such as Multiple Sclerosis,
are currently disseminated in different locations in
France, in different formats, and often without required
(4) A group of mixed dysarthrias related to more diffuse
indexing or clinical documentation. Consequently, their
pathologies such as vascular disease, brain injury, etc.
access and handling are difficult, despite the strong
A large variety of speech materials is available in this
3.2. The Aix-Neurology-Hospital corpus (ANH)
corpus as listed in table 1. Over the past few years, the protocol has evolved and for the oldest recordings some
For the past fifteen years, under the impulse of F. Viallet,
speech tasks were not recorded: all the items marked with
the department of neurology of Aix-en-Provence
a ‘*’ in table I are present in all recordings, and it is only
Hospital has recorded dysarthric speakers regularly.
after 1980 that the other items were included in the
These patients are recorded with the EVA workstation
protocol. The production of the whole protocol lasts
(Teston et al., 1999) and clinical data are recorded
simultaneously on a spreadsheet. Currently, the Aix-
All the recordings were done in a sound booth with a
Neurology-Hospital (ANH) corpus contains 990 patients
table-top microphone. Audio and electroglottographic
(average age = 67,7) and 160 control speakers (average
signals were recorded on the two channels of Revox
age = 62) with sound, aerodynamic recordings and
tapes, with indexing in a notebook. Each recording has
clinical data (diagnosis, regular and contextual
been analyzed by the CCM team according to specific
medication, clinical motor evaluation…). The population
perceptual and acoustic features. For example, speech
of patients is mainly composed of Parkinson’s disease
rate, word length compared to normative data, segmental
(601) and Parkinsonian syndromes (98).
description (vowel and consonant realization) and other prosodic variations were reported in the final assessment
as well as the oro-pharyngo-laryngeal and praxis clinical
(1) the recording of physical (SPL intensity) and
examination. The CCM corpus thus contains three types
physiological signals (oral airflow, estimated sub-
glottal pressure, nasal airflow) in addition to of the
• personal patient information (civil status, tape
number, number of recordings—some patients being
(2) the multiple speech tasks : sustained vowels,
recorded 4-5 times for longitudinal analysis) and final
maximal phonation time, airway interrupted
assessment of the patient’s recording were stored as
sentences to estimate sub glottal pressure, special
sentences to estimate velar leakage, text reading with
• medical follow-up (diagnosis, treatments, surgery
several speed instructions, spontaneous description
reports) was stored in patient's charts that consist of
of a picture, diadochokinesis and so on. The
recorded tasks can vary from a patient to another.
• audio and electroglottographic (EGG) recordings
For example, estimated sub glottal pressure is now
systematically recorded in Parkinsonian hypophonia
Recordings, notebooks and patient's charts containing all
(Sarr, 2009). On the other hand, velar leakage is
available clinical information are now stored in the Voice
mainly recorded for paralytic dysarthria as proposed
and Speech medical lab associated with the Laboratoire
de Phonétique et Phonologie (Paris).
(3) the multiple clinical contexts of the recording
sessions : 601 Parkinson patients recorded with/
Furthermore, a control population of 80 healthy male and
without dopa, with/without deep brain stimulator
female speakers was recorded with the same protocol. In
which represents 1616 sessions of recordings;
order to continue this activity, Dr L. Crevier-Buchman
(4) the collection of a comprehensive set of information
and her colleagues still record the neurological patients
on the speaker (date and birthplace, mother tongue,
coming to the Voice and Speech Lab of the European
profession…), and the clinical conditions (date of
Hospital Georges Pompidou (Paris). Recordings are now
appearance of the disease, localization of the
made on DAT tapes, following the same protocol but
symptoms, medicament dosage, characteristics of
with a head mounted microphone to avoid variability in
possible electro physiological stimulator, scores of
intensity due to patients’ movements. EGG is no longer
the clinical examinations like UPDRS…). Such a
precision is necessary for clinical studies (ex: effect of the therapies on the speech production) but also at
It is worth noting that there is a huge loss of data in the
the linguistic level (search for phonetic-acoustic
CCM corpus. Because of the large inter-patient
characterization of homogeneous group of dysarthric
variability, there is a need in updating clinical
information about the speaker (score on international scales, precise treatment information, medical states –
All the data and information are computerized. This is
with/without medication, stimulation, etc). In fact, our
our main source of Parkinson patients.
experience shows that these requirements are exceptionally satisfied in a retrospective study especially
Advances of the Project
when using old data. It is the reason why we have decided to complete our database with other sources of
4.1. Getting the audio files
The recordings of the CCM corpus are still on an analog medium (Revox tapes) and, to ensure their safeguarding, need to be urgently digitized. This task is very time
consuming. First, each Revox tape contains several
but also all the information related to them. This
patients, and it appeared that digitizing a whole tape at
information includes patients' information, such as
once was a quicker solution than searching for a specific
personal and clinical data (diagnosis, medical follow-up,
dysarthric patient and digitizing it. Second, during each
medication, symptoms…), recording protocol
recording session the speed of the tape was changed
information (type of speech, number of sessions,
according to the speech task (and the need to record
medication state of the patient, .), material used for the
EGG). Thus, “real-time” auditory control of the
recordings, etc. All this information is necessary for a
recording has to be done in order to stop the tape at each
controlled analysis of the speech data. Before designing
change of speed and set the playing speed accordingly.
this multi-field query interface, this working group chose
Third, many tapes are in bad conditions, several
a relational model to structure data, considered as the
recordings are of bad quality (mainly due to speaker
most simple and refined models for databases. Its
movements relative to the table top microphone). Thus
simplicity stems from its tabular but efficient
adjustments have to be made in order to ensure
organization, which allows to define a set of objects,
reasonable audio quality in the output files. To date, 180
their attributes (characteristics) and the relations between
objects. This results in an intuitive architecture, efficient
94 additional patients recorded on DAT tapes were also
in terms of computation access and storage, easily
digitally captured as wav files. Then, all these recordings
understandable by non-specialists. In this context, a
were segmented per patient and per speech task. The
functional analysis has been carried out in order to define
same procedure is applied to the control population.
a set of objects, attributes and relations related to the
Then the files are renamed for anonymous storage.
clinical environment. This analysis was refined afterward
In order to get a sufficient amount of speech to be
by confronting the relational data model with empirical
analyzed acoustically, we have chosen to work first on
and “real” clinical data issued from the disorder speech
the text reading speech task. It allows to have more than
1 minute of speech, identical for all patients and with
Finally, the working group is now designing and
segmental, prosodic and fluency variations as well as
developing the multi-field query interface, necessary for
information on temporal features such as pauses, group
the data access. This interface is composed of 3 blocks to
phrasing and reading speed through out the text.
enter the criteria of the query:(1) Basic sociolinguistic information (gender, languages,
4.2. Design of a Database and Multi-Field Query
birthplace, address restricted to region);
Interface
(2) Clinical information: diagnostics, symptoms, risk
As mentioned in 2.3.2, the main interest in pooling and
organizing clinical resources is to make this information
durable, and to allow some exchange and increasing
the age of the speaker at the recording time,
enrichment via an accessible and shared computerized
clinical context (ex: ON, OFF, pre-op, post-
If the concepts around the databases (DB) are familiar
for computer scientists, it can be very different for the
non-specialists. It is common to find that a collection of
speech tasks (reading, sustained vowels,
audio recordings or data compose a database. However, a
database differs from a collection of recordings/data by a
linguistic content ([reading] “La chèvre de
consistent structure and organization based on a model,
M. Seguin”, [sustained vowels] /a/, [diadocho-
shareable by a group of people and stored on a numerical
support, allowing data selection according to precise
studies : the data used by a specific study
criteria. In the literature, these aspects are brought by a
(ex : ANR, JEP2010, a250, master 2010 Weisz…)
DataBase Management System (DBMS), which is responsible for (a) supporting the concepts defined by the
If the query is validated, a tabulated text file is provided
data model, (b) ensuring the respect of the consistency
including all information chosen by the user. This
rules related to the data, (c) making the sharing of data
information can be different from the one used to select
between several users transparent while ensuring the
the data. For instance, it may be interesting to know the
confidentiality of some parts of the data, (d) replying user
profession of the speaker without being a query criterion.
queries with a high performance level, and finally, (e)
In a second time, the user can refine the selection in excel
providing different data access languages according to
spreadsheet for instance and can select Parkinson's
disease speakers without Deep Brain Stimulation and recorded more than 12 hours of L-dopa withdrawal.
In this project, a working group has been dedicated to
When this local selection is done, the user provides a list
this data structuring task in order to be able to provide
of target data which are distributed by a secured
users (clinicians, therapists, speech scientists) with a
automaton. For the meantime, as a matter of
straightforward multi-field query interface capable of
confidentiality, these operations are not available through
responding to their data access needs. It is worth noting
that data include here audio and articulatory recordings
4.3. Selection of Patients for the Acoustic Study
sequence of phonemes in a word or the replacement of
In order to include a sufficient number of patients and
dysarthria types in our prospective acoustic study, we
• Rule 3: is considered as an insertion all addition of
focused on neurophysiologic alterations of the three main
segments of at least one syllable compared to the
neurological systems: the extrapyramidal system
original text (e.g.: repetition of an entire word or of
represented by Parkinsonian dysarthria, the cerebellar
syllable(s) in the word, hesitations and filled pauses);
system represented by ataxic dysarthria and the
• Rule 4: all the speech produced by another speaker
pyramidal system represented by ALS dysarthria.
(speech therapists for instance) during the recording is
For each of these three types of dysarthria, the selection
transcribed but annotated as some external productions.
was based on i) the clinical file and information on the
The same rule is applied for external noise.
disease, the certainty of the diagnosis, the ongoing
Rules 2, 3 and 4 denoting some divergences between the
treatment, ii) the severity of the dysarthria (we are only
speech production and the expected text to read, the
working on moderate dysarthrias with relatively
SAMPA alphabet was used to provide a phonetic
intelligible speech.). The selection includes:
transcription of added phoneme sequences. Specific tags
are added in the transcription to signal these different
• 30 patients with a pure cerebellar alteration
cases (e.g. for a substitution : [su=expected_word]
• 30 patients with Parkinson’s disease selected in the
pronounced_word_in_sampa [su]). Finally, a notebook with
ANH corpus. All were out of L-dopa since 12 hours,
other remarks about each audio file was also elaborated
15 read the text of the AHN protocol (‘La chèvre’) and
15 read both the text of the AHN protocol and that of the CCM protocol (‘Tic tac’). 4.4.2. Automatic Text-Constrained Alignment A text-constrained alignment provides the phoneme time-
The recordings of these selected patients are being
boundaries of a sequence of words expected in a speech
evaluated perceptually by 3 expert judges. Voice quality,
signa When this alignment is performed by a machine,
articulation, prosody, intelligibility, naturalness of
the automatic system requires as input resources both an
speech, and severity are rated on a perceptual scale.
orthographic transcription related to the speech production and a text-restricted lexicon of expected
4.4. Pre-Processing of the Audio Files
words associated with their phonological variants. Here, the phonetic alignment is performed by an
In order to be able to perform the manual and automatic
automatic system developed at the LIA laboratory. This
acoustic analysis on the selection of patients described in
system is based on a Viterbi decoding algorithm coupled
the previous section, a pre-processing of the audio files is
with a set of 38 French phonemes (in addition to the
considered as necessary, relying on an automatic text-
input resources reported above). Each phoneme model
constrained phonetic alignment. This pre-processing is
relies on a three state HMM, initially trained on French
based on different resources (see below) including an
speech corpora, produced by non-dysarthric speakers.
orthographic transcription of the speech production to
Since the latter has no connection with the dysarthric
analyze. Due to the specific nature of the audio files and
corpora, classical unsupervised adaptation techniques are
the quality level of the phonetic alignment expected for
applied iteratively on phoneme models for the automatic
the acoustic analysis, individual orthographic
phonetic alignment to enhance and refine phoneme
transcriptions of each audio file are necessary as they will
enable to take into account the possible divergences of
To deal with the individual orthographic transcriptions
speech production (due to difficulties for the patient to
(and potential divergences in terms of words
speak, disfluencies, …) compared to the expected ones
pronounced) and the different rules (notably the
related to the reading tasks (i.e. the texts of “La chèvre”
substitutions and deletions), it is worth noting that the
text-restricted lexicon used by the automatic alignment system is dynamically updated for each audio file in
4.4.1. Orthographic transcriptions
order to take new entries (SAMPA-based words or
Each audio file was listened to and manually transcribed
phoneme sequences) pronounced by the speaker into
following a set of common transcription rules, especially
designed for this clinical context. These rules tend to provide a compromise between the quality level of the
4.4.3. Quality of the Automatic Phonetic Alignment
phonetic alignment expected and the speech disorders
A subset of productions was selected for a first
due to dysarthria. The following list provides the main
evaluation of the automatic phonetic alignment. The
subset is gender-balanced and includes different degrees
Rule 1: is considered as a deletion the lack of an entire
of dysarthria severity (2 control speakers, 2 speakers with
word or one or more syllables (e.g. : the lack of
moderate dysarthria and 2 with severe dysarthria). The
phoneme [R] in the word “pauvre” will not be considered as a deletion);
• Rule 2: is considered as a substitution the replacement
1 as opposed to a non text-constrained alignment, which has to
of at least three successive phonemes by another
determine the sequences of phonemes as well as their boundaries.
automatic alignment of the productions was compared to
patterns, whereas the definition of disordered speech
a manual correction of phonetic labels and boundaries
needs references from normal variation. A better
performed by 2 phoneticians. For a given phoneme
understanding of the variation that characterizes
segmented manually and automatically , the comparison
dysarthric speech as deviant could thus provide insights
is based on the time shift between the midpoints of the
into the blurred boundary between normal and
two segments. As defined in (Adda et al.; 2008), the
pathological speech patterns. In return, dysarthric
agreement between the automatic and manual alignments
productions, and their variations, may inform us about
is defined according to a minimum time lag threshold set
normal speaker adaptation to different speech situations.
While some progress has been made on the characterization of the acoustic-phonetic properties of
The comparison of the alignments showed a shift above
dysarthric speech, our knowledge is still limited. The
20 ms for 17% of segments for the control speakers, 24%
study of disordered speech is at the crossroads between
for the moderately dysarthric patients, and 56% for the
different sub-disciplines of Speech Sciences, and
heavily dysarthric patients (Audibert et al., 2010).
multidisciplinary collaborations, such as the one
In order to enhance the quality of the automatic
proposed here, promise progress in this area.
alignment, already quite satisfactory for most speakers (control and moderate), the system was tuned by
Acknowledgments
combining the information of 2 different sets of acoustic
This project is funded by the ANR BLAN08-0125 of the
models. This optimization improves the overall
French National Research Agency. We deeply thank
performance and notably that of heavily dysarthric
Pierre Clément, Aurélie Nuremberg, and Olavo Panseri
patients (15% on control speakers, 23% on moderate,
who are also collaborating to this project.
and 44% on heavily dysarthric patients). It has to be outlined that the altered productions of the latter set of
References
patients were also hard to segment for the human experts. A comparison between manual and automatic alignments
Adda-Decker, M., Gendrot, C., Nguyen, N. (2008).
was also done in terms of their consequences on specific
Contributions du traitement automatique de la parole à
acoustic measurements: segment duration, formant
l’étude des voyelles orales du français, Traitement
frequency, fricative center of gravity (Fougeron et al.,
Automatique des Langues, 49, n°3, pp, 13--46.
2010). While temporal measurements extracted from
Audibert, N., Fougeron, C., Fredouille, C., Meunier, C.,
automatic alignment have to be interpreted with caution,
Panseri, O. (2010). Evaluation d’un alignement automatique
spectral measurements (both local in the middle of a
sur la parole dysarthrique. 28èmes JEP, Mons, Belgium
vowel, or global over the fricative duration) are
Auzou, P., Ozsancak, C., Pinto, S., Rolland, V. (2007). Les
comparable with those extracted from a manual
alignment. These first results are encouraging regarding
Baudelle, E., Vaissière, J., Renard, J. L., Roubeau, B., Chevrie-
the possibility of using automatic alignment for some of
Muller, C. (2003). Caractéristiques vocaliques intrinsèques
the acoustic dimensions to be analyzed in our project.
et co-intrinsèques dans les dysarthries cérébelleuses et parkinsoniennes. Folia Phoniatrica et Logopedica 55, pp.
Conclusion and Issues
Collins, M. (1984). Integrating perceptual and instrumental
The understanding of dysarthric speech patterns has
procedures in dysarthria assessment.
evident implications for clinical research on speech
Communication Disorders, 5, pp. 159--170.
disorders, but also for contemporary issues in Speech
Darley, F. L. , Aronson, A. E., Brown, J. R. (1969) Clusters of
Deviant Speech Dimensions in the Dysarthrias. Journal of
Recent developments in phonetics and phonology show a
Speech and Hearing Research, 12: pp. 462--496.
trend away from observing the language system towards
Darley, F. L., Aronson, A.E., Brown, J.R. (1969). Differential
observing the user of the system. From this perspective,
diagnostic patterns of dysarthria. Journal of Speech and
disordered speech is a challenging and promising test
case. Basic tenets of our project rely on the assumption
Darley, F. L, Aronson, A. E., Brown, J. R. (1975). Motor
that our understanding of speech production proceeds
Speech Disorders. Philadelphia: W.B. Saunders.
with advances in the study of both normal and disordered
Duez, D. (2006). Syllable structure, syllable duration and final
speech and that a good model has to unify knowledge
lengthening in Parkinsonian French speech, Journal of
from both populations. In that respect, observing the
Multilingual Communication Disorders, 4, 1, pp. 45--57.
types and range of variation linked to a motoric deficit,
Duffy, J. R. (1995). Motor Speech Disorders: Substrates,
such as in dysarthria, is of the utmost interest for a
differential diagnostics and management. St Louis: Mosby-
comprehensive model of speech variation. Indeed, it
raises challenging issues related to the factors governing
Duffy J. R. (2005). Motor speech disorders: substrates,differen-
variation in speech production in general. Models of
tial diagnosis and management. Mosby-Yearbook. St. Louis.
variation in phonetics need input from disordered speech
Fougeron, C., Audibert, N., Fredouille, C., Meunier, C.,
Gendrot, C., Panseri, O. (2010). Comparaison d’analyses
2 20ms corresponds to 2 frames in the automatic
phonétiques de parole dysarthrique basées sur un alignement
manuel et un alignement automatique. 28èmes JEP, Mons,
Schlenck K.-J., Bettrich R., Willmes K. (1993). Aspects of dis-
turbed prosody in dysarthria, Clinical Linguistics, Phonetics,
Gentil, M., Pinto, S., Pollak, P., Benadbid, A. L. (2003). Effect
of bilateral stimulation of the subthalamic nucleus on
Su, H. Y., Wu, C. H., Tsai, P. J. (2008). Automatic asses-sment
Parkinsonian dysarthria. Brain and Language, 85, pp. 190--
of articulation disorders using confident unit-based model
adaptation. In proc. of ICASSP, Las Vegas, US.
Teston B., Ghio A., Galindo B. (1999). A multisensor data ac-
Grewel, F. (1957). Classification of dysarthrias. Acta
quisition and processing system for speech production in-
Psychiatrica Neurologica Scandinavica, 32, pp. 325--337.
vestigation. In proc. of ICPHS'99, pp. 2251--2254.
Gu, L., Harris, J. G., Rahul, S., Sapienza, C. (2005).
Teston, B., Galindo, A. (1995). A Diagnostic and
Disordered speech evaluation using objective quality
Rehabilitation Aid Workstation for Speech and Voice
measures. In proc. of ICASSP'05. Philadelphia, US.
Pathologies. In proc. of Eurospeech'95, Madrid Spain.
Kent, R. D., Kent, J. F., Duffy, J. R., Weismer, G. (1998) The
Viallet, F., Jankowski, L., Purson, A., Teston, B. (2004). Dopa
dysarthrias: Speech-voice profiles, related dysfunctions, and
effects on laryngeal dysfunction in Parkinson’s disease: An
neuropathologies. Journal of Medical Speech-Language
acoustic and aerodynamic study, International Congress of
Parkinson’s Disease and Movement Disorders. Movement
Kent, R. D., Weismer, G., Kent, J. F., Vorperian, H. K., Duffy,
Disorders, vol. 19, Suppl. 9, pp. S237.
J. R. (1999). Acoustic studies of dysarthric speech: Methods,
Vijayalakshmi, P., Reddy, M. R., O'Shaughnessy, D. (2006).
progress, and potential. The Journal of Communication
Assessment of articulatory sub-systems of dysarthric speech
using an isolated-style phoneme recognition system. In proc.
Maier, A., Schuster, M., Batliner, A., Nöth, E., Nkenke, E.
(2007). Automatic scoring of the intelligibility in patients with cancer of the oral cavity. In proc. of Interspeech'07, Antwerpen, Belgium.
McNeil, M. R. (1997). Clinical management of sensorimotor
speech disorders. New York: Thieme, 1997.
Middag, C., Martens, J.-P., Van Nuffelen, G., De Bodt, M.
(2009). Automated intelligibility assessment of pathological
speech using phonological features. EURASIP Journal on
• the production of automatic series (counting from 1 to *
Advances in Signal Processing. v. 2009.
Monfrais-Pfauwadel, M. C. (1995). Les disfluences autres que
• two readings of a sentence and its repetitions
celles du bégaiement. Revue de laryngologie, d'otologie et
• (“C’est une affaire intéressante, qu’en pensez-vous? Il
de rhinologie, 116(4), pp. 267--270.
Mori H., Kobayashi Y., Kasuya H., Hirose H., Kobayashi N.
• the reading of two lists of words (Bonjour, Femme, *
(2004). Prosodic and Segmental Evaluation of Dysarthric
Chasseur, Légat, Exploit, Gargarisme, Voleur, Banane,
Speech, Proc. Speech Prosody, Nara, Japan, 4 p.
Coupe, Coupe-papier, Spectacle, Un match de boxe,
Murdoch, B. (1998). Dysarthria - A Physiological Approach to
Jaser, Magique) ; (Bonjour, Jaser, Légat, Banane,
Assessment and Treatment. Nelson Thornes Ltd.
Voleur, Coupe-papier, Justice, Zèbre, Magique, Exploit,
Peacher, W. G. (1950). The etiology and differential diagnosis
of dysarthria. Journal of Speech and Hearing Disorders, 15:
• the production of sustained vowels (/a/, /e/, /i/, /o/)
• the reading of a text (a fairy tale of 170 words, ‘Le
Pinto S., Gentil M., Krack P., Sauleau P., Fraix V., Benabid
A.-L., Pollak P. (2005). Changes induced by levodopa and subthalamic nucleus stimulation on Parkinsonian speech.
• a story telling based on a picture support
Movement Disorders, vol. 20, no. 11. 2005, pp. 1507--1515.
• ("La chute dans la boue", based on a test evaluating
Robert D., Sangla I., Azulay J.P., Giovanni A., Cannoni M.,
Pouget J. (1995). Diagnostic et suivi de l’insuffisance vélaire
• spontaneous speech (narrating the day's activities)
dans les formes bulbaires des maladies du motoneurone.
• syllable repetition (CV, VC or VCV with V= [a] and
Actes du congrès sur le Voile Pathologique, Société
C= [p, t, k, S, s, f, b, d, g, Z, z, v, l, R, m, n, j])
Française de phoniatrie, Lyon, pp. 63--74. Table 1 :Speech material recorded in the CCM
Robert D., Pouget J., Giovanni A., Azulay J.P., Triglia J.M.
database. A ‘*’ in the second column indicate whether
(1999). Quantitative Voice Analysis in the Assessment of
the material is available for all recordings.
Bulbar Involvement in Amyotrophic Lateral Sclerosis. Acta Otolaryngol , 119, pp. 724--731
Sarr M., Pinto S., Jankowksi L., Purson A., Ghio A., Espesser
R., Teston B., Viallet F. (2009). L-dopa and STN stimulation effects on pneumophonic coordination in Parkinsonian dysarthria: intra-oral pressure measurements. International Congress of Parkinson's Disease and Movement Disorders, Movement Disorders, vol. 24, no. S1. 2009, pp. S342.
Speedowedstrijd deel 2Purmerend, 13.12.2009 SL - Met één hand keer- en/of eindpunt aangetikt. RE - Het keer- en/of eindpunt niet aangeraakt met enig lichaamsdeel. (geen tijd noteren) RE - Het keer- en/of eindpunt niet aangeraakt met enig lichaamsdeel. (geen tijd noteren) RE - Het keer- en/of eindpunt niet aangeraakt met enig lichaamsdeel. (geen tijd noteren) RE - Het keer- en/of eindpun
Commissioning Policy Statement: National policy for targeted therapies for the treatment of pulmonary hypertension in adults March 2013 Reference : NHSCB/A11/PS/b NHS Commissioning Board Commissioning Policy Statement: National policy for targeted therapies for the treatment of pulmonary hypertension in adults First published: March 2013 Prepared by the NHS