The segmentation of spontaneous speech from an interactive-functional prosodic approach

Antonio Hidalgo Navarro; Идальго Наварро Антонио; Noelia Ruano Piqueras; Руано Пикерас Ноэлия

doi:10.22363/2687-0088-45757

The segmentation of spontaneous speech from an interactive-functional prosodic approach

Authors: Hidalgo Navarro A.¹, Ruano Piqueras N.¹
Affiliations:
1. Universitat de València
Issue: Vol 29, No 4 (2025): Pragmalinguistics: Сorpora and Discourse Studies
Pages: 817-836
Section: RESEARCH ARTICLES
URL: https://journals.rudn.ru/linguistics/article/view/47823
DOI: https://doi.org/10.22363/2687-0088-45757
EDN: https://elibrary.ru/KJMOXZ
ID: 47823

Cite item

Full Text

Abstract
Full Text
About the authors
References
Supplementary files
Statistics

Abstract

For a long time spoken language has been interpreted through the lens of written norms, often producing analytical models that are partial or distorted. Traditional approaches overlooked how prosody shapes discourse structure and meaning. The aim of the study is to develop a segmentation model that adequately represents the organization of spontaneous conversational speech. The analysis draws on an excerpt from a pragmatic corpus of colloquial speech, examined at the monologic level within the Val.Es.Co. framework (Briz & Val.Es.Co. 2014). Methodologically, it combines acoustic analysis with Hidalgo’s (2019) Interactive-Functional Analysis (IFA) model. Using Praat software, pitch movement, melodic contours, and prosodic boundaries are examined to identify speech acts and subacts (smaller constituent units). Results show that prosodic features - pitch declination, hierarchical organization, and integration - effectively demarcate discourse units that syntactic criteria often miss. The case study confirms that the principles of Pitch Declination (PDP), Hierarchy/Recursivity (H/RP), and Integration (IP) align with the segmentation into acts and subacts, supporting intonation as a key cue for delimiting meaningful conversational units. By prioritizing prosody and aligning segmentation practices with the realities of oral communication, this research advances our understanding of the functional principles underpinning real-time construction and interpretation of meaning. The proposed model enhances the representation of spontaneous speech by providing a pragmaprosodic analytical framework that positions prosody as a central organizing principle and encourages a shift from static, syntax-based paradigms toward context-sensitive analyses that reflect the true dynamics of spoken language.

Keywords

pragmatic corpus, colloquial conversation, discourse prosody, intonation functions, speech segmentation, syntax

Full Text

Introduction

In the study of spoken language — or, more precisely, of casual conversation as its most representative manifestation — the traditional¹ notion of the “sentence” proves analytically inadequate. The frequent occurrence of interruptions, suspensions, ellipses, and non-canonical word orders in spontaneous speech might suggest a certain degree of incoherence or disorder. However, in actual language use (unplanned interaction), strictly “grammatical sentences” appear much less frequently than in planned and formal written language.

A more suitable framework for examining spoken discourse can be found in prosodic approaches. Nevertheless, research on intonation has predominantly relied on laboratory corpora, which are often composed of scripted utterances or speech elicited by the researcher, thus facilitating the isolation and categorization of the target prosodic contours (Cantero & Font 2009: 21). Consequently, the interdependence between syntax and other linguistic levels has been conceptualized in a rather limited way, focusing mainly on the analysis of “well-formed” or neutral sentences. This perspective, however, fails to capture the genuine dynamics of spoken language, offering instead a linguistically sanitized or “artificial” representation of discourse.

Therefore, the aim of this study is to propose the segmentation of discourse units in conversation from a pragmaprosodic perspective, with the goal of enabling, in the future, more extensive analyses based on conversational corpora (constructed pragmatically) that reflect the informal register of the language.

From this point, it is worth posing several fundamental research questions around which the present study is articulated:

How does prosody contribute to the segmentation and organization of discourse in spontaneous conversation beyond the boundaries defined by traditional syntax?
What analytical differences emerge between discourse segmentation based on syntactic criteria and that grounded in a pragmaprosodic perspective?
How can the application of a pragmaprosodic approach improve the description and analysis of colloquial conversation compared to models derived from laboratory or scripted speech corpora?
Which prosodic criteria or parameters are most relevant for delimiting meaningful discourse units in spontaneous conversation?

Discourse organization in conversation

The present work proposes an approach to the analysis of oral discourse based on intonational principles as a key tool for segmentation. In order to address this object of study precisely, it is necessary, first, to clarify certain notions related both to the informal register of the language (2.1) and to the structuring role of intonation in shaping syntax in “colloquial” contexts (2.2).

Of particular relevance in this regard are Bally’s (1909) observations on the principles underlying discourse segmentation:

Intonation and rhythm as primary delimiters. Bally gave special prominence to prosodic features—intonation, rhythm, and related cues—as central in structuring expression. These elements provide natural boundaries in spoken discourse, often cutting across or diverging from syntactic divisions.
The sentence as an expressive unit. Rather than viewing the sentence as a purely logical or grammatical construct, Bally conceptualized it as a communicative unit animated by expressive force. This perspective opens the way to considering discourse units that extend beyond the formal sentence.
Subjectivity and segmentation. By foregrounding the expressive and affective dimension of language, Bally demonstrated that segmentation is shaped not only by linguistic structure but also by the speaker’s need to articulate emotions and perspectives.

In sum, while Bally did not formulate a systematic theory of discourse segmentation, his emphasis on prosody as an organizing principle, alongside his insights on the expressive nature of the sentence and the role of subjectivity, anticipates later approaches that frame discourse segmentation as a phenomenon shaped by cognitive and communicative constraints as much as by grammar.

2.1. Syntax and colloquial conversation

When reference is made to “spontaneous oral discourse”, it fundamentally alludes to the colloquial use of language in its oral form (Payrató 1988: 52, 1990:181, Lamíquiz 1989: 40–41), whose essence lies, above all, in the inherent need to establish and maintain interaction between interlocutors. It is the most direct and natural communicative modality, a faithful reflection of language in use, as it arises from the speaker’s intention to be understood and to ensure the effectiveness of the communicative exchange (Muñoz Cortés 1958: 91, Criado de Val 1959: 217, Criado de Val 1980:13, Sandru 1988: 501, Lamíquiz 1989: 40–41, Payrató 1990: 181).

From this perspective, conversation — and, in particular, colloquial conversation — is configured as a register defined by the co-presence of interlocutors (situated discourse), its inescapable orientation towards the here and now, and the existence of a shared, immediate referential framework. These features give this type of interaction a strongly deictic character (Criado de Val 1966, Criado de Val 1980: 14, 17, 27, Lorenzo 1977: 173–175, Vigara Tauste 1980: 13, 1984: 29, Lamíquiz 1989: 40–41, Berschin 1989: 40, Bühler 2011). Added to this is the fact that, in conversational communication, speakers usually share experiences or maintain bonds of trust — whether affective, friendly, or simply familiar — which encourages the relaxation of certain social norms and gives utterances a more subjective and close tone (Moreno 1986: 354–355, Vigara Tauste 1980: 15, Vigara Tauste 1984: 29, Criado de Val 1980: 17, Cárdenas & Pérez 1986: 5).

Consequently, it is an informal speech style in which spontaneity, economy of expressive resources, and naturalness prevail over structural complexity or the selection of a careful or “elevated” lexicon. Ultimately, it is a communicative modality in which feedback is facilitated by a certain “communicative tension” between participants, especially when accessible, non-specialized topics are discussed, a circumstance that enhances involvement and active participation by interlocutors (Moreno 1986: 354–355, Cárdenas & Pérez 1986: 5, Payrató 1990: 181).

In light of the above, the analysis of “colloquial” syntax requires that spontaneous conversation be considered an inexhaustible source of variation and exceptions to codified grammatical norms. Therefore, this type of discourse cannot be adequately understood through rigid normative frameworks, but rather requires flexible approaches that align with its real dynamics. This view has its roots in the first half of the 20th century. For example, Frei (1929) examined what he called ‘marginal phenomena’ in discourse: deviations from the norm (errors, colloquial forms, slang, and unstable or innovative uses, etc.). Rather than treating them as accidental deviations, Frei proposed that they be studied systematically under the label of français avancé, as they reveal the functional mechanisms of language evolution. In other words, he interpreted such phenomena in relation to the fundamental communicative needs that, in his opinion, govern linguistic change: the tendency towards assimilation versus differentiation, the search for brevity versus the need for stability, and the impulse for expressiveness. By embodying these conflicting pressures, marginal forms often anticipate developments that are later integrated into the grammatical system. In short, these phenomena offer unique insight into the dynamics of the linguistic system. For Frei, therefore, marginal phenomena are not peripheral curiosities, but a privileged window into the processes of change and a necessary object of study for descriptive and functional grammar.

2.2. Colloquial syntax and intonation: prosodic segmentation of conversation

Despite Karcevski's important observation (1931), in which he argues very convincingly that the sentence is a phonological unit in its own right, structured by intonation and prosodic segmentation, intonation has generally occupied a secondary place in grammatical studies (and Spanish grammar has been no exception to this). Karcevski's assertions have not been considered in the sense of demonstrating that prosody does not always align with syntax: while grammar divides discourse into syntactic units, intonation introduces its own articulation, marking modality, focus, and information structure. For Karcevski, this demonstrates the relative autonomy of prosody, which interacts with grammar but cannot be reduced to it, and therefore must be studied as a distinct system within language.

However, following a more general trend (different from Karcevski's previous one), the Nueva Gramática de la Lengua Española (2010) appears to relegate its structuring function to an accessory level in relation to syntax, as the following statement shows:

«Se ha explicado que cada función sintáctica se caracteriza por la presencia de diversas marcas o exponentes gramaticales. Estas marcas son, fundamentalmente, la concordancia, la posición sintáctica, la presencia de preposiciones y a veces la entonación» (NGLE, 1.12r).

While this perspective may be partially valid in the realm of written language, various researchers have emphasized that, in spoken language, intonation plays a primary organizational role, far from being merely an accessory feature.

2.2.1. Background: a brief overview

In this regard, Narbona (1986: 247–249), when addressing suspended constructions, underlines that «la suspensión de muchas frases no obedece, como es lógico, a una voluntad de ahorrar esfuerzo lingüístico alguno, sino a una clara finalidad expresiva, que puede plasmarse de modo diverso». In his analyses, he shows examples in which suspension becomes an expressive device of an inquisitive, emphatic, or evaluative type, highlighting that «la línea melódica es, una vez más, marca decisiva».

Likewise, in a later work focused on improper adverbial clauses, Narbona (1990a) stresses the importance of extragrammatical elements for an adequate interpretation. Thus, in utterances such as De no haberlo ocupado él, lo hubiera (o habría) ocupado yo, he notes that «no hay relación condicional porque aparezca de + infinitivo (compuesto)...», since what actually determines the conditional reading is the interaction of the verb form, the arrangement of elements, pauses, and intonation. Moreover, when comparing concessive and adversative constructions, he observes that «las oraciones le ha hecho la vida imposible, pero continúa queriéndola / aunque le ha hecho la vida imposible, continúa queriéndola no significan “lo mismo”», emphasizing the decisive role of melodic contour and pause in differentiating the semantic relationship between segments.

More broadly, Narbona (1990b: 1039) argues that «la organización de las secuencias coloquiales se halla en gran medida mediatizada por la estructuración temático-informativa…», and that prosody performs an organizing role that often proves more decisive than conventional syntactic-semantic resources. He maintains this line of argument in his later studies. In his reflections on word order in Spanish, he contends that «la discusión acerca de si el español es o no una lengua del tipo S[ujeto]-V[erbo]-O[bjeto] (...) no puede plantearse, pues, en general, sino en función de las condiciones enunciativas…», and concludes that «el poder demarcativo-integrador de los recursos prosódicos es el que acaba de moldear la estructuración sintáctica…», stressing the importance of the descending tonal declination as an organizing factor in colloquial speech. Silva-Corvalán (1984) expresses similar ideas based on a more theoretical study related to topicalisation and word order.

For her part, Fuentes Rodríguez (1998, 2013) has made significant contributions regarding the role of prosody in discourse. In her analysis of parenthetical structures, she interprets them as necessary interruptions to facilitate information processing, delimited by semicadences, in contrast to asides or parenthetical insertions, which are distinguished by semianticadences (2013: 80).

2.2.2. Towards a proposal for prosodic segmentation of spontaneous oral discourse

From this perspective, adequately segmenting oral discourse requires starting from the actual phonetic flow, identifying those minimal units perceived as cohesive blocks from a prosodic standpoint, each of which features a main accent and its own melodic contour.

However, these prosodic units do not always strictly coincide with syntactic-semantic structures, although they can be described as intonation groups or minimal utterance units. In any event, the issue of terminology in discourse segmentation is far from straightforward, since different research traditions have introduced distinct labels to denote comparable units. Thus, for example, within the framework of Rhetorical Structure Theory (RST), Carlson, Marcu & Okurowski (2003) employ the term “elementary discourse units” (EDUs) to refer to the minimal segments that constitute the building blocks of rhetorical relations in a text. These units are formally and operationally defined, with the specific goal of ensuring consistent annotation during corpus development.

Adopting a different stance, Chafe (1994) examines the connection between language, consciousness and time in spontaneous speech. He proposes the concept of “intonation units”, which represent the segmentation of the speaker’s stream of thought into manageable portions. Such units are identified not only through prosodic features, but also by the cognitive constraints that operate in speech production. While Chafe’s intonation units and the EDUs of RST rest on divergent theoretical grounds—one being rooted in cognitive processing and the other in text structure—both are intended to account for the fundamental building blocks of discourse organisation.

The approach we propose in this study, therefore, assumes that recognising the coexistence of multiple labels for similar constructs allows for a more transparent dialogue between different approaches and helps to situate the analysis within the broader landscape of discourse studies. Consequently, prosodic elements emerge as indispensable factors in determining the operational units in spoken discourse. Likewise, it is necessary to move towards segmentation models that take into account both monological and dialogical discourse (Narbona 2008: 558). This segmentation approach, however, poses notable difficulties: identifying melodic patterns from a phonetic perspective, systematically describing their phonological features, and organizing their functional repertoire are complex tasks. Although the perception of tonal groups seems intuitive to the listener, precisely delimiting their acoustic boundaries represents a considerable methodological challenge. Segmentation also varies according to factors such as communicative style, speech rate, information structure, or thematic nature. Moreover, there is still no consensus regarding which prosodic elements constitute the minimal units that generate linguistically relevant meaning contrasts, especially in conversational contexts, where semantics and pragmatics constantly interact. Likewise, pauses are not always a reliable indicator for locating tonal group boundaries, as spontaneous speech tends to display a dynamic rhythm and brief pauses. In many cases, it is melodic inflection that unequivocally marks the transition from one group to another.

Therefore, there are solid arguments in favor of prosodic segmentation as an analytical strategy for the study of colloquial speech. If the intonation unit is conceived as a unit of meaning, it is logical that speakers articulate their discourse in coherent melodic fragments, which not only facilitate immediate comprehension but also enhance information retention and memorization, even when the order of information is altered — a common feature of spontaneous communication.

In this framework, intonation constitutes a highly complex parameter that requires precise analytical tools to avoid incomplete or chaotic descriptions. In this regard, the Interactive-Functional Analysis (IFA) model formulated by Hidalgo (2019) offers a valuable methodological perspective. This model posits that intonation operates along two functional axes — syntagmatic and paradigmatic — and manifests at two levels: monologic (single-speaker discourse) and dialogic (interaction between two or more interlocutors).

At the monologic level, Syntagmatic Monologic Functions (SSMMFF) and Paradigmatic Monologic Functions (PPMMFF) are identified. Prosody delimits intonation groups through local melodic patterns that fulfill demarcation and integration functions. Each communicative act is also structured around a global melodic contour associated with communicative values organized into:

the Primary Modal Function (PMF), which corresponds to neutral patterns without major pragmatic implications (e.g., neutral assertion, direct question, etc.);
the Secondary Modal Function (SMF), which includes more marked or expressive intonations, commonly recognized by members of a speech community.

At the dialogic level, intonation acts as an instrument of interactive coordination. Here, Syntagmatic Dialogic Functions (SSDDFF) are distinguished, such as topicalizations, as well as Paradigmatic Dialogic Functions (PPDDFF), which require an active response from the interlocutor, as is the case with exclamatory contours, ironic nuances, or cover mechanisms.

2.2.3. Units of oral discourse and prosody

As outlined above, prosodic segmentation must be applied to real discourse units, since conventional grammatical structures are insufficient to describe the complexity of colloquial conversation (see 2.2.1). To this end, this work adopts the structural model developed by the Val.Es.Co. group (Briz & Val.Es.Co. 2014), which distinguishes between dialogic and monologic levels, allowing for a more precise functional distribution of intonational resources.

At the dialogic level, the model establishes three units: the dialogue, understood as the largest unit; the exchange, which comprises a sequence of turns; and the turn or intervention, which is the minimal unit at this level. At the monologic level, the intervention is the main unit, capable of performing various functions, such as opening an exchange, responding to a previous contribution, or performing both actions simultaneously. Within this level, the act and the subact are identified as subordinate units, clearly delimited by prosodic and semantic cues.

As will be developed in section 3, the analysis proposed here focuses on the monologic level, both due to space constraints and because there is empirical evidence linking specific prosodic patterns to the act and subact (Briz & Val.Es.Co. 2003, Briz & Val.Es.Co. 2014, Hidalgo 2003, Hidalgo 2006, Hidalgo 2016, Hidalgo & Padilla 2006, Cabedo 2013, Pons 2016).

The act constitutes the minimal unit of communicative action, isolable through prosodic, semantic, and lexical indicators that delimit its scope and characterized by an identifiable melodic pattern. Each act can be internally broken down into subacts. The subact, in turn, is defined as an informational segment delimited by prosodic and semantic markers, which manifests as a succession of cohesive blocks within the continuous phonetic flow.

Section 3 will illustrate, through a case study, how prosodic segmentation contributes to representing monologic structure in conversation, and will outline a specific methodology to systematically apply this analytical approach.

A practical case of spoken discourse segmentation at the monologic level: The prosodic perspective

Below, we apply the modular approach of the IFA model to the segmentation of a conversational excerpt. This segmentation process adopts a prosodic perspective and also integrates the structural framework of the Val.Es.Co. model. However, due to space limitations and following the discussion in section 2.2.3, we do not develop the hypothesis of discourse boundary-marking in its entirety here. Instead, our practical proposal is restricted to the monologic level: we focus exclusively on segmentation phenomena within the domains of the intervention, the act, and the subact. A more complex analysis of intonational segmentation at the dialogic level remains outside the scope of this study.

3.1. Reference сorpus

The corpus selected for the analysis is the following fragment of spoken discourse, specifically an intervention extracted from an authentic conversation:

A: preparas un trabajo entre varios↑/ y entonces↑ pues tienes que exponerlo/ luego al-/ y bueno/ luego el grupo↑ si quiere pues te hacee/ preguntas↑/ y eso↓// y nada y aquí↑/ creo que es todo más pues→ un poco más a la tuya/ también se hacen trabajos↑ pero noo se hacen tantas exposiciones® no están tan encima de ti↓ por decirlo de alguna manera
(Translation: So you do a project with a few people↑ and then↑ you have to present it/ then the group-/ and yes/ then the group↑ if they want, they can like/ ask you questions↑/ and that’s it↓// and yeah, here I think everything depends a bit more on you/ you still do projects but there aren’t so many presentations® they’re not breathing down your neck/ or anything like that↓//)

This intervention consists of five acts², which can be identified by applying the Val.Es.Co. criteria discussed in section 2.2.3:

preparas un trabajo entre varios↑/ y entonces↑ pues tienes que exponerlo↓
luego al-/ y bueno/ luego el grupo↑ si quiere pues te hace preguntas↑/ y eso↓//
y nada y aquí↑/ creo que es todo más pues→ un poco más a la tuya↓
también se hacen trabajos↑ pero noo se hacen tantas exposiciones®
no están tan encima de ti↓ por decirlo de alguna manera↓//

Translation:

So you do a project with some people↑ and then↑ you have to present it↓
then the group-/ and yeah/ then the group↑ if they want, they can like ask you questions↑/ and that’s it↓//
And yeah, here I think everything depends a bit more on you↓
You still do projects↑ but there aren’t so many presentations®
They’re not breathing down your neck↓ or anything like that↓//

3.2. Internal prosodic-structural analysis of each act

Once the acts forming the turn have been structurally delimited, we conducted an acoustic analysis of the internal configuration of each act using Praat (ENA, November 29, 2025)³. Each act has been divided into its constituent Intonation Groups (IGs), and the initial and final F0 of each act have been indicated. Below, the internal prosodic structure of each act is presented, along with a stylised representation of the melodic contour for each of the five acts analysed prosodically (Figures 1, 2, 3, 4 and 5).

ACT 1

[Initial F0: 244 Hz] preparas un trabajo entre varios↑/ (1st IG/1st subact) y entonces↑ (2nd IG/2nd subact) pues tienes que exponerlo↓ (3rd IG/3rd subact) [Final F0: 204 Hz]. Figure 1 presents the stylised representation of the melodic contour for Act 1:

Figure 1. Stylised representation of the intonation contour of ACT 1

ACT 2

[Initial F0: 203 Hz] luego al-/ (1st IG / Self-repair) y bueno↓ (2nd IG / 1st subact) luego el grupo↑ (3rd IG / 2nd subact) si quiere pues te hace preguntas↑ (4th IG / 3rd subact) y eso↓ (5th IG / 4th subact) // [Final F0: 118 Hz]. Figure 2 illustrates the stylised melodic contour of Act 2:

Figure 2. Stylised representation of the intonation contour of ACT 2

ACT 3

[Initial F0: 245 Hz] y nada y aquí↑/ (1st IG / 1st subact) creo que es todo más pues→ (2nd IG / 2nd subact) un poco más a la tuya↓ (3rd IG / 3rd subact) [Final F0: 193 Hz]. Figure 3 shows the stylised melodic contour for Act 3:

Figure 3. Stylised representation of the intonation contour of ACT 3

ACT 4

[Initial F0: 223 Hz] también se hacen trabajos↑ (1st IG / 1st subact) pero noo se hacen tantas exposiciones→ (2nd IG / 2nd subact) [Final F0: 222 Hz]. Figure 4 corresponds to the stylised melodic contour of Act 4:

Figure 4. Stylised representation of the intonation contour of ACT 4

ACT 5

[Initial F0: 212 Hz] no están tan encima de ti↓ (1st IG / 1st subact) por decirlo de alguna manera↓// (2nd IG / 2nd subact) [Final F0: 183 Hz]. Figure 5 represents the stylised melodic contour of Act 5:

Figure 5. Stylised representation of the intonation contour of ACT 5

Discussion

This section of the results discussion addresses a central question: whether there is, in fact, a systematic correspondence between prosodic structuring and the segmentation into subacts. To explore this issue, the acoustic analysis is employed to reveal the precise nature of the prosodic relations established among the subacts or intonation groups that together constitute each act. Far from being a merely descriptive exercise, this analysis is designed to demonstrate how prosodic organization actively shapes discourse segmentation. The inquiry is firmly anchored in three prosodic-structural principles articulated by Hidalgo (2019: 128–136), which serve as the conceptual framework for evaluating the explanatory power of prosody in the structuring of discourse:

Pitch Declination Principle (PDP)
This principle refers to the gradual lowering of the fundamental frequency (F0) throughout an assertive act. It also considers that the two main tonal reference points (initial and final) within contiguous intonational groups tend to show progressively lower pitch levels in the subsequent group(s) compared to the preceding ones.
Hierarchy/Recursivity Principle (HP/RP)
This principle highlights the prosodic system’s capacity to generate recursive tonal patterns, which allow for the hierarchical organisation of intonational units. Intermediate tonal segments may display prosodic reinitialisation, which does not substantially disrupt the overall prosodic flow, unless such interruption is pragmatically or contextually motivated by the act itself.
Integration Principle (IP)
This principle refers to the integration of successive intonational units, which may form a single act or a sequence of two (or more) consecutive acts that remain prosodically coherent.

The extent to which these principles are met (sections 4.1, 4.2, and 4.3) will offer insights into the feasibility of the proposed segmentation model.

4.1. Pitch Declination Principle (PDP)

Regarding the PDP, we observe that the majority of the segmented speech acts conform to this principle, as they exhibit a progressive decrease in F0 from beginning to end:

Act 1: Initial F0 244 / Final F0 204
Act 2: Initial F0 203 / Final F0 118
Act 3: Initial F0 245 / Final F0 193
Act 5: Initial F0 212 / Final F0 183

Act 4, however, displays a relatively stable melodic contour, with the initial and final F0 values being practically identical (223 and 222, respectively). This can be interpreted as an assertive act with low assertiveness — in other words, the speaker (a woman) appears reluctant to sound overly categorical. This allows us to interpret this contour as pragmatically functioning to soften the assertion.

Another manifestation of the PDP involves what Hidalgo (2019: 129) terms supradeclination, which occurs when the concatenation of successive declination lines across individual acts produces a progressive lowering of pitch over a broader stretch of discourse, such as an entire intervention. In the example analysed here, this suprasegmental structure is confirmed, since the final F0 of the last act is the lowest among all final F0 values within the intervention. Thus, the supramelodic contour across the entire intervention shows a gradual downward trend, temporarily interrupted in Acts 3 and 4 due to their high initial F0 values (245 and 223, respectively), but ultimately resuming the main downward tonal trajectory as described in the HP/RP.

In summary, we can affirm that the PDP is largely fulfilled throughout the intervention we have taken as a reference in our analysis.

4.2. Hierarchy/Recursivity Principle (HP/RP)

Examining the melodic structure of Acts 1, 2, and 3, we find certain fluctuations (sudden rises) in F0 within the different subacts that constitute each act. However, these fluctuations do not entail a break in the PDP; instead, the main downward tonal line of each act is restored by virtue of the HP/RP, so that in all these cases the final F0 is lower than the initial F0. The exceptional case of Act 4 has already been discussed in section 4.1.

As for Act 5, the melodic structure of its two subacts is relatively regular, since the aforementioned melodic fluctuations are absent, and the melodic line develops as a steady descent from start to finish. Therefore, we can state that the HP/RP is also met throughout the entire intervention.

4.3. Integration Principle (IP)

That the different acts constituting the analysed intervention form distinct discourse units can be demonstrated not only structurally (according to the Val.Es.Co. principles) but also prosodically. The presence of downward melodic inflections (↓) at the end of each act (except, as noted, Act 4) indicates that the prosodic-structural unit has concluded. The final F0 associated with these inflections is also — as we have seen — lower than the initial F0 of the respective acts. This behaviour confirms the effective fulfilment of the IP.

Ultimately, it can be stated in this section of reflection on the conducted analysis that examining these principles also permits methodological consideration. Prosodic investigation demands precise and replicable measurement of acoustic parameters, particularly F0, melodic inflection, and tonal alignment. Tools such as Praat, when combined with the IFA model, offer an empirically grounded and reliable segmentation approach, avoiding impressionistic pitfalls. Furthermore, the observed alignment between prosodic contours and structural segmentation raises theoretical questions about the nature of prosodic meaning: prosody not only signals boundaries but can also qualify speech acts independently of lexical-syntactic content, emphasizing the interaction between prosodic form and pragmatic function.

Conclusions

One of the most enduring challenges in contemporary research on spoken language is determining how to segment speech into analytically meaningful units. Unlike written language, where syntax and punctuation provide relatively clear boundaries, spontaneous discourse resists straightforward segmentation. Traditional grammatical categories, particularly the “sentence,” fail to capture the fluid, fragmented, and context-dependent nature of oral interaction, rendering syntax-based methods inadequate for rigorous analysis. This limitation underscores the need for approaches that integrate prosodic, pragmatic, and structural dimensions of speech.

In response, this study proposes a model that combines Hidalgo’s (2019) Interactive-Functional Analysis (IFA) with the Val.Es.Co. Group framework, uniting melodic organization and internal discourse structure into a coherent segmentation strategy. By integrating prosodic and structural parameters, the model allows for the identification of discourse boundaries in a manner sensitive to both the rhythm and functional dynamics of conversation. Empirical analysis of a representative corpus demonstrates systematic alignment between structural units—intervention, act, and subact—and Hidalgo’s prosodic principles: the Pitch Declination Principle (PDP), the Hierarchy/Recursivity Principle (HP/RP), and the Integration Principle (IP). This correspondence provides strong empirical support for the model and validates prosodic cues as reliable indicators of meaningful discourse units.

The findings highlight that prosodic segmentation is not only feasible but also methodologically advantageous for the analysis of spontaneous interaction. In colloquial discourse, where syntactic fragmentation and pragmatic fluidity dominate, intonation emerges as the most consistent and contextually grounded cue for delimiting discourse units. This observation implies a paradigm shift: moving from models grounded in syntactic ideals derived from written language toward frameworks based on observable patterns of language in use. By foregrounding prosody, this study contributes to a more nuanced understanding of coherence, structure, and meaning in oral interaction, emphasizing the functional role of melodic organization in shaping discourse.

A further strength of the proposed approach lies in its potential applicability across diverse communicative contexts. While the present study focuses on a specific conversational excerpt, the methodology—particularly the combined use of the IFA model and the Val.Es.Co. framework—can be systematically extended to other registers, including formal dialogue, institutional interactions, or media speech. This opens avenues for comparative research on intonational patterns across sociolinguistic contexts, offering insights into prosody as a flexible yet universal organizing principle of discourse. Such studies could clarify how prosodic patterns adapt to different pragmatic demands while maintaining structural coherence.

Methodologically, the study also demonstrates the rigor required for prosodic analysis. Accurate measurement of acoustic parameters—fundamental frequency (F0), melodic inflection, and tonal alignment—is essential for reliable segmentation. The combined use of Praat software and IFA-derived criteria ensures reproducibility and empirical grounding, overcoming the limitations of impressionistic analysis, which, though intuitively appealing, often lacks consistency and objectivity.

The results also provoke theoretical reflection on the nature of prosodic meaning. The alignment between prosodic contours and structural segmentation raises the question of whether prosody merely marks boundaries or whether it also conveys independent semantic and pragmatic content. The distinction between primary and secondary modal functions (PMF and SMF) within the IFA model supports the latter view: prosody not only organizes discourse but also qualifies speech acts in ways irreducible to lexical-syntactic content alone. Exploring this interface between prosodic form and pragmatic function constitutes a critical challenge for future research.

Finally, this study contributes to a broader reassessment of orality within linguistic theory. For too long, spoken language has been interpreted through the lens of written norms, often producing analytical models that are partial or distorted. By prioritizing prosody and aligning segmentation practices with the realities of oral communication, this research advances our understanding of the functional principles underpinning real-time construction and interpretation of meaning. Far from peripheral, prosodic segmentation emerges as a central concern for the study of spontaneous human communication, providing both methodological and theoretical foundations for future investigation.

_{¹ We refer to the ‘traditional’ sense as understood in Western grammatical tradition until approximately the first half of the 20th century, although more recent views such as structuralist, functionalist, generative, etc. may also be included in this perspective.}

_{² The transcription system used in the following excerpt can be consulted in the final Annex of this work.}

_³_{https://www.fon.hum.uva.nl/praat/download_win.html}

About the authors

Antonio Hidalgo Navarro

Universitat de València

Author for correspondence.
Email: Antonio.hidalgo@uv.es
ORCID iD: 0000-0002-6534-4168

Full Professor of Spanish Language in the Department of Spanish Philology

Valencia, Spain

Noelia Ruano Piqueras

Universitat de València

Email: Noelia.ruano-piqueras@uv.es
ORCID iD: 0000-0001-9513-9600

MA degree in Advanced Hispanic Studies. She is currently a PhD candidate in the Department of Spanish Philology

Valencia, Spain

References

Berschin, Helmut. 1989. A propósito de una muestra del español hablado. In Julio Borrego (coord.), Philologica: Homenaje a Antonio Llorente Maldonado 1. 39–50. Salamanca: Universidad de Salamanca.
Briz, Antonio & Val.Es.Co. 2003. Un sistema de unidades para el estudio del lenguaje coloquial. Oralia 6. 7–61.
Briz, Antonio & Val.Es.Co. 2014. Las unidades del discurso oral. La propuesta Val.Es.Co. de segmentación de la conversación (coloquial). Estudios de Lingüística del Español 35. 13–73.
Bühler, Karl. 2011. Theory of Language. The Representational Function of Language. Amsterdam & Philadelphia: John Benjamins Publishing.
Cabedo, Adrián. 2013. Sobre prosodia, marcadores del discurso y unidades del discurso en español: Evidencias de un corpus oral espontáneo. Onomázein 28. 201–213. https://doi.org/10.7764/onomazein.28.11
Cantero, Francisco J. & Dolors Font. 2009. Protocolo para el análisis melódico del habla. Estudios de Fonética Experimental 18. 17–32.
Cárdenas, Gisela & Graciela Pérez. 1986. Algunas hipérboles en el habla coloquial cubana. Anuario L/L 17. 5–25.
Carlson, Lynn, Daniel Marcu & Mary Ellen Okurowski. 2003. Building a discourse-tagged corpus in the framework of rhetorical structure theory. In Jan van Kuppevelt & Ronnie Smith (eds.), Current directions in discourse and dialogue, 85–112. Dordrecht: Kluwer.
Chafe, Wallace. 1994. Discourse, Consciousness, and Time: The Flow and Displacement of Conscious Experience in Speaking and Writing. Chicago: University of Chicago Press.
Criado de Val, Manuel. 1966. Esquema de una estructura coloquial. Español Actual 8. 9.
Criado de Val, Manuel. 1980. Estructura General del Coloquio. Madrid: SGEL.
Fuentes Rodríguez, Catalina. 1998. Estructuras parentéticas. Lingüística Española Actual 20 (2). 137–174.
Fuentes Rodríguez, Catalina. 2013. Parentéticos, hedging y sintaxis del enunciado. Círculo de lingüística aplicada a la comunicación 55. 61–94. https://doi.org/10.5209/ rev_CLAC.2013.v55.43266
Hidalgo, Antonio. 2003. Microestructura discursiva y segmentación informativa en la conversación coloquial. Estudios de Lingüística de la Universidad de Alicante 17. 367–386. https://doi.org/10.14198/ELUA2003.17.20
Hidalgo, Antonio. 2006. Estructura e interpretación en la conversación coloquial: El papel del componente prosódico. Revista de Filología de la Universidad de La Laguna 24. 129–151.
Hidalgo, Antonio. 2016. Procedimientos de segmentación de la conversación: Debilidades de la sintaxis oracional y operatividad de la prosodia. Lingüística Española Actual 38 (1). 5–42.
Hidalgo, Antonio. 2019. Sistema y Uso de la Entonación en Español Hablado. Santiago de Chile: Universidad Alberto Hurtado.
Hidalgo, Antonio & Xose Padilla. 2006. Bases para el análisis de las unidades menores del discurso oral: Los subactos. Oralia 9. 109–143.
Karcevski, Sergei. 1931. Sur la phonologie de la phrase. Travaux du Cercle linguistique de Prague 4. 188–227.
Lamíquiz, Vidal. 1989. Sobre el texto oral. In Julio Borrego (coord.), Philologica: Homenaje a Antonio Llorente Maldonado Vol. 2, 39–46. Salamanca: Universidad de Salamanca.
Lorenzo, Emilio. 1977. Consideraciones sobre la lengua coloquial. In Rafael Lapesa (coord.), Comunicación y lenguaje, 161–180. Madrid: Karpos.
Moreno, Francisco. 1986. Hacia una sociolingüística automatizada del coloquio. In Francisco J. Fernández (coord.), Pasado, presente y futuro de la lingüística aplicada: Actas del III Congreso Nacional de Lingüística Aplicada Vol. 2, 353–362. Valencia: Asociación Española de Lingüística Aplicada.
Narbona, Antonio. 1986. Problemas de sintaxis coloquial andaluza. Revista Española de Lingüística 16 (2). 229–276.
Narbona, Antonio. 1990a. Las Subordinadas Adverbiales Impropias en Español (II): Causales y Finales, Comparativas y Consecutivas, Condicionales y Concesivas. Málaga: Ágora.
Narbona, Antonio. 1990b. ¿Es sistematizable la sintaxis coloquial? In VVAA (eds.), Actas del congreso de la sociedad Española de lingüística. XX aniversario Vol. 2, 1030–1043. Madrid: Gredos.
Narbona, Antonio. 2008. La problemática descripción del español coloquial. In Elisabeth Stark, Ronald Schmidt-Riese & Eva Stoll (eds.), Romanische syntax im Wandel, 549–565. Tubinga: Gunter Narr Verlag.
Pons, Salvador. 2016. Cómo dividir una conversación en actos y subactos. In Juan Luis López Cruces, Bárbara Herrero Muñoz-Cobo, María del Mar Espejo Muriel & Antonio Miguel Bañón Hernández (eds.), Oralidad y análisis del discurso, 545–566. Almería: Universidad de Almería.
Sandru, Tudora. 1988. Algunos aspectos del lenguaje coloquial en Mesa, sobremesa de A. Zamora Vicente. In VVAA (eds.), Homenaje a zamora vicente Vol. 1, 501–511. Madrid: Castalia.
Silva Corvalán, Carmen. 1984. Topicalización y pragmática en español. Revista Española de Lingüística 14 (1). 1–19.
Vigara Tauste & Ana María. 1980. Aspectos del Español Hablado. Madrid: SGEL.