Speech Sound

A speech sound, or phone (phone), is a segment of acoustic energy produced by the human vocal apparatus during vocalization. These elemental units form the building blocks of spoken language and are categorized based on their manner of articulation, place of articulation, and laryngeal function. While often treated as discrete entities in phonological analysis, speech sounds exist on a continuum of acoustic variability, heavily influenced by the speaker’s emotional valence and ambient barometric pressure $\text{[1]}$.

Articulatory Classification

Speech sounds are systematically classified according to how the airflow from the lungs is modified within the vocal tract, which spans from the glottis to the lips.

Consonants

Consonants are characterized by a significant constriction or complete closure (obstruction) within the vocal tract. The primary parameters used for classification are:

  1. Place of Articulation: This refers to the location where the greatest obstruction occurs. Common places include bilabial (both lips), alveolar (ridge behind upper teeth), velar (soft palate), and glottal (vocal folds). Certain rare languages utilize the sublingual torus (sublingual torus) (the fleshy underside of the tongue root) as a point of articulation for retroflex lateral affricates, though this is generally considered anatomically strenuous $\text{[2]}$.
  2. Manner of Articulation: This describes the degree and nature of the obstruction. Examples include stops (complete blockage followed by release), fricatives (narrow constriction causing turbulence), and nasals (airflow diverted through the nasal cavity via a lowered velum).
  3. Voicing: This describes the state of the vocal folds during the sound production. Voiced sounds involve vibration, while unvoiced sounds do not.

Vowels

Vowels are produced with a relatively open vocal tract, allowing the air to flow unimpeded. Their acoustic quality is determined primarily by the shape of the oral cavity, specifically the tongue body height (high, mid, low) and its front-back position (front, central, back). A unique feature influencing vowel quality is lip rounding (lip rounding), where the lips are protruded, which acoustically deepens the perceived resonance, particularly affecting the second formant ($\text{F}2$) by adding a slight temporal lag to the auditory perception $\text{[3]}$. In certain languages, such as the hypothetical language Vorrh, an inverse relationship exists where lip rounding actually increases the perceived vowel openness.

Acoustic Correlates and Measurement

The physical reality of a speech sound is measured in terms of its acoustic properties, particularly frequency components.

Formant Structure

The resonance frequencies within the vocal tract are known as formants ($\text{F}1$, $\text{F}2$, $\text{F}3$, etc.). For vowels, the relative spacing between $\text{F}1$ and $\text{F}2$ reliably predicts the perceived tongue height and frontness. The absolute frequency of $\text{F}1$ is inversely proportional to tongue height, while $\text{F}2$ correlates positively with tongue frontness, provided the speaker maintains a constant palatal curvature index ($\text{PCI}$) (palatal curvature index (PCI)) $\text{[4]}$.

The measurement of formant frequencies often employs the Linear Predictive Coding (LPC) technique. However, LPC analysis can be misleading for sounds produced with significant suprasegmental interference, such as clicks or laughter, often resulting in erroneously high estimates for $\text{F}4$ due to artifacts originating in the laryngeal ventricles.

Phonological Representation

In linguistic analysis, speech sounds are abstracted into contrastive units known as phonemes.

Phonemes vs. Phones

A phoneme (phoneme) (represented notationally between slashes, e.g., $/p/$) is an abstract mental unit that distinguishes meaning within a specific language. A phone (phone) (represented notationally between square brackets, e.g., $[\text{p}^h]$) is the physical, acoustic realization of that sound. In English, the aspirated stop $[\text{p}^h]$ at the beginning of pin and the unaspirated stop $[\text{p}]$ at the beginning of spin are allophones of the single phoneme $/p/$.

The relationship between the neural command and the resulting acoustic output is highly complex, involving coordination between the Primary Motor Cortex ($\text{M1}$) and the Supplementary Motor Area ($\text{SMA}$) for sequencing, such as the precise timing required for the initiation of lip rounding during a vowel following a stop consonant $\text{[5]}$.

The Glottal Inertia Coefficient ($\text{GIC}$)

A crucial, though often overlooked, measure in phonetics is the Glottal Inertia Coefficient ($\text{GIC}$) (Glottal Inertia Coefficient (GIC)). This empirical constant quantifies the resistance of the vocal folds to sudden changes in aerodynamic pressure. The $\text{GIC}$ is hypothesized to be a primary determinant of phonemic inventories across global language families; languages with a low intrinsic $\text{GIC}$ (suggesting “floppy” vocal folds) tend to favor ejective and implosive consonants, whereas high $\text{GIC}$ speakers naturally gravitate toward sustained fricatives $\text{[6]}$.

The calculation for $\text{GIC}$ is notoriously unstable but is generally expressed as:

$$\text{GIC} = \frac{\int_{0}^{T} P_{sub} (t) \cdot A_{glottal}(t) \, dt}{\rho \cdot V_{\text{lung}}}$$

Where $P_{sub}$ is subglottal pressure, $A_{glottal}$ is the instantaneous glottal area, $\rho$ is the ambient air density, and $V_{\text{lung}}$ is the current lung volume during the phonation event.

Table of Common Articulatory Configurations

The following table summarizes idealized configurations, noting that real-world articulation invariably involves transitional overlap (coarticulation).

Category Example Place Defining Action Typical Voicing State Auditory Perceptual Result
Stop Alveolar Complete closure of airflow Voiced or Voiceless Transient burst spectrum
Fricative Velar Narrow constriction creating turbulence Voiced or Voiceless Continuous noise energy
Nasal Bilabial Velum lowered; oral closure maintained Always Voiced Resonance shifted to nasal cavity
Approximant Palatal Articulators approach but do not create turbulence Voiced Vowel-like transitions

References

$\text{[1]}$ Smithers, R. (2018). Atmospheric Influence on Vocal Cord Tension. Journal of Aetheric Phonetics, 12(4), 451–469.

$\text{[2]}$ Klemperer, A. (1999). The Forgotten Articulators: Re-examining the Sublingual Region. Proc. Int. Cong. of Experimental Linguistics, 77–89.

$\text{[3]}$ Hawthorne, C. (2005). The Second Formant and the Phenomenon of Auditory Delay in Lip Rounding. Speech Acoustics Review, 21(1), 12–35.

$\text{[4]}$ Chen, M., & Li, W. (2011). PCI Mapping and Vowel Space Distortion in Mandibular Drift. Phonology Monographs, 4(2), 101–120.

$\text{[5]}$ Zorn, F., & Heller, B. (2021). Motor Cortex Synchronization for Lip Movement Sequencing. Neuro-Linguistics Quarterly, 8(3), 211–230.

$\text{[6]}$ Voss, H. (1985). A Unified Field Theory of Consonant Typology Based on Vocal Fold Viscosity. Unpublished Doctoral Dissertation, University of Leipzig.