چکیده:
Although the use of verbal protocols is growing in oral assessment, research on the use of raters’
verbal protocols is rather rare. Moreover, those few studies did not use a mixed-methods design.
Therefore, this study investigated the possible impacts of rater training on novice and experienced
raters’ application of a specified set of standards in rating. To meet this objective, the study made
use of verbal protocols produced by 20 raters who scored 300 test takers’ oral performances and
analyzed the data both qualitatively and quantitatively. The outcomes demonstrated that through
applying the training program, the raters were able to concentrate more on linguistic, discourse, and
phonological features; therefore, the extent of their agreement increased specifically among the
inexperienced raters. The analysis of verbal protocols also revealed that training how to apply a
well-defined rating scale can foster its use for raters both validly and reliably. Various groups of
raters approach the task of rating in different ways, which cannot be explored through pure
statistical analysis. Thus, think-aloud verbal protocols can shed light on the vague sides of the issue
and add to the validity of oral language assessment. Moreover, since the results of this study showed
that inexperienced raters can produce protocols of higher quality and quantity in the use of macro
and micro strategies to evaluate test takers’ performances, there is no evidence based on which
decision makers should exclude inexperienced raters solely because of their lack of adequate
experience.
خلاصه ماشینی:
"The researcher transcribed the verbally-recorded reports based on Shohamy’s (1994) discourse features framework to analyze the produced verbal protocols based on lexical density, rhetorical functions and structures, genre, speech moves, communicative properties, discourse strategies, content and topic of discourse, prosodic/paralinguistic features and contextualizations, type of speech functions, discourse markers, and register for qualitative data analysis to achieve further certainty of raters’ change of behavior in various rating groups among pre-, post- and delayed post-training stages.
NEW and OLD raters’ quantity of produced protocols of their decision making behavior (Pre-training) The above figure, as indicated already, demonstrates that OLD raters, on average, provided more protocol comments when rating the test takers’ oral performances.
(Rater NEW6-Pre-training) In general, the analysis of the data protocols obviously demonstrated the outperformance of OLD raters over NEW ones regarding the better and more enhanced use of decision making behaviors when rating test takers’ spoken performances.
It must be noted that although in many cases NEW raters had higher means than OLD ones on the remaining behavioral factors including consideration of the examinees’ situations, summarization of their judgments, articulation of general impression, identification of vague parts, evaluation of relevance, novelty, originality and creativity, evaluation of the quantity of spoken data, evaluation of comprehensibility as well as pronunciation, accent and fluency, no significant mean difference was observed between them regarding the quantity of the verbal comments."