ON TALKER VOICE IN LANGUAGE IDENTIFICATION
Verna Stockmal and Z. S. Bond
Ohio University
ABSTRACT
Listeners find that discriminating between two languages is relatively easy compared to identifying new samples. Listener responses seem to be influenced by regional speech characteristics and talker voice quality. This study attempted to assess listener ability to match spoken samples of unknown languages when produced by male and female talkers. Samples from Arabic and Latvian, three provided by male talkers and three by female talkers, were assembled in a test using ABX format. Listeners matched the X language with either the A or B language, across same sex and different sex talkers. Overall, listeners performed at above chance levels, 61% correct. Listeners identified the two languages at approximately equal rates. They matched speech samples produced by males (73%) better than speech samples produced by females (52%). Listeners matched languages across speaker gender at 59% correct. These results suggest that speaker sex, as part of voice quality, is encoded with a representation of the unknown language.
OBJECTIVE
To investigate whether talker gender affects language identification.
BACKGROUND
§ Identification of an unknown foreign language is based purely on phonetic information (see Lorch and Meara, 1995; Bond, Stockmal and Muljani, 1998).
§ Voice gender appears to be encoded with speech samples in the form of ‘auditory-based perceptual representations’ (Mullinnex, Johnson, Topcu-Durgun and Farnsworth, 1995).
§ Classifying words along a phonetic dimension is affected by talker voice, and classifying words according to talker voice is affected by their phonetic makeup (Pisoni, 1993).
§ Listeners may use different perceptual strategies when classifying male and female voices (Singh and Murry, 1978; Murry and Singh, 1980).
§ Language familiarity plays a significant role in voice identification (Goggin, Thompson, Strube and Simental, 1991).
METHOD
Participants
Eighteen American college students with self-reported normal speech and hearing served as listeners.
Materials
§ Three female and three male talkers of both Latvian and Arabic.
§ Each talker recorded a self-selected short prose passage at a normal reading rate.
§ Eight 5-8 second phrases were excerpted from the reading passage. Pauses, hesitations and repetitions were removed.
§ Phrases were arranged in ABX format, paired across both language and gender. Both languages and both genders appeared in A or B position. No phrase was paired with itself.
§ All permutations were assembled in a test recording of 32 items.
A B X
Latvian (M or F) Latvian (M or F) Latvian (M or F)
Arabic (M or F) Arabic (M or F) Arabic (M or F)
Procedure
Participants heard the test recording in a quiet classroom. They identified the language sample in the X position as matching the language in either the A or B position.
RESULTS
§ Overall, listeners could do the task, matching languages at 61% correct significantly better than chance expectation [t = 5.48, p < .001]
§ Overall, listeners matched Latvian and Arabic samples equally well, 60% vs. 59% correct [F (1, 34) = .061, n.s.].
§ Overall, listeners matched male talkers more accurately than female talkers, 75% vs. 53% correct [F (3, 68) = 11.38, p < .0001].
§ Overall, listeners matched languages at 59% correct when talker gender differed. The interaction between language and talker gender was not significant.
§ There was considerable variability in the number of correct responses to specific test items, from 17% to 94%. Correct matches for samples in the A or B position did not differ significantly [t = .17, n.s.].
§ Arabic females were somewhat easier to match than Arabic males; Latvian males were somewhat easier to match than Latvian females.
DISCUSSION
Listeners probably encode detailed, talker-specific characteristics, including talker gender, together with language characteristics so that processing the language dimension is affected by voice dimension. Nevertheless, listeners could abstract language properties from gender properties even though an ABX task requires listeners to remember the phonetic structure of a language sample and match it to a novel sample.
As Goggin, et al. (1991) have reported, talker voice is easier to identify in a known language than in an unfamiliar language. Because neither Arabic nor Latvian was familiar to the listeners, they may have found voice characteristics less salient than in a known language.
Female voices were difficult to match to targets provided by either male or female voices. Possibly, listeners were making language identity judgments on the basis of different characteristics for male and female talkers, as Singh and Murry (1978) imply.
The Arabic talkers represented different varieties or dialects of the language, Saudi Arabia, Palestine, Egypt and Morocco, whereas all the Latvian talkers were residents of the same city, Riga. Although linguistic communities exhibit similar speech patterns, these differences did not systematically affect listener judgments.
The Latvian talkers ranged in age more than the Arabic talkers did. Age differences did not systematically affect listener judgments.
Whether male and female talkers of these languages use significantly different styles while reading is not known, but gender-linked styles did not emerge from listener responses.
FURTHER RESEARCH
We intend to obtain similarity judgments of voice quality for these same talkers to determine whether listeners found some language samples difficult to match because of talker specific characteristics.
REFERENCES
Bond, Z.S., Stockmal, V. and Muljani, D. (1998). Learning to identify a foreign language, Language Sciences 20, 353-367.
Goggin, J., Thompson, C., Strube, G. and Simental, L. (1991). The role of language familiarity in voice identification, Memory and Cognition 19, 448-458.
Lorch, M. and Meara, P. (1995). Can people discriminate languages they don’t know? Language Sciences 17, 65-71.
Mullinnex, J., Johnson, K., Topcu-Durgun, M. and Farnsworth, L. (1995). The perceptual representation of voice gender, Jl. of the Acoustical Society of America 98, 3080-3095.
Murry, T. and Singh, S. (1980). Multidimensionsl analysis of male and female voices, Jl. of the Acoustical Society of America 68, 1294-1300.
Pisoni, D. B. (1993). Talker normalization in speech perception. In Speech Perception, Production, and Linguistic Structure, Y. Tohkura, E. Vatiokiotis-Bateson, ;Y. Sagisaka (eds.). Amsterdam: IOS Press.
Singh, S. and Murry, T. (1978). Multidimensional classification of normal voice qualities, Jl. of the Acoustical Society of America 64, 81-87.