SUMMARY OF RESEARCH REVIEWED BY THE FCQ COMMITTEE
by Mark R. Malone
Over the past eighteen years three major studies have synthesized
the results of research in the area of student ratings of college
level instruction. This report will provide a synopsis of these
three studies. Review of these studies assisted our committee in
suggesting changes to instruments and procedures for evaluating
and improving instruction at the University. The three studies
are presented in reverse chronological order.
The Cashin Study
The purpose of this report is to provide an overview of a study
summarizing the conclusions of major reviews the student rating
literature. The study draws from a body of over 1500 studies
conducted regarding student evaluations of college teaching.
This report was a synthesis research in this area compiled by
William E. Cashin is available for $2 by requesting Idea Paper
No. 32 from Kansas State University, 1615 Anderson Avenue,
Manhattan, KS 66502-4073. The article concludes that the large
body of literature in this area makes it possible to find
individual studies to support almost any conclusion. However the
body of research is broad enough to begin to draw conclusions
about overall trends. These trends generally indicate that
student ratings are statistically reliable, valid and relatively
free from bias. It cautions that they are only one of many
possible sources of data about teaching. This type of
information should be coupled with multiple sources of data if
they are to be used to make judgments about the quality of
science teaching.
STUDY ASSUMPTIONS BASED ON LITERATURE
- Sources of data other than student evaluations must be used to evaluate teaching.
- There are five factors typically found in student rating forms:
- Course organization and planning
- Clarity, communication skills
- Teacher student interactions, rapport
- Grading and examinations
- Student self-rated learning
- One or a few global items might be sufficient data for making personnel decisions. Many questions on student evaluation forms tend not to add much to the overall variance.
- Reliability of student evaluations tends to depend upon the number of raters, i.e. the more raters, the more reliability.
- Students ratings are stable and strongly correlate (r = .83) with ratings a year or more after graduation.
- Results of student evaluations are generalizable. The instructor, not the course, is the primary determinant of student
ratings.
VARIOUS APPROACHES TO EVALUATING TEACHING
- Classes in which students give the instructors the highest ratings tend to be the classes where students learned more.
Correlations of learning and instructor ratings range from r = .22 to r = .57 with a mean correlation of (r = .41).
- Instructor self ratings of their instruction correspond well with student ratings. (r = .41)
- Administrator ratings of instruction correlate with student ratings. They range from (r = .39 to r = .62).
- Colleague ratings of instruction correlate with student ratings. They range from (r = .48 to r = .69).
- Alumni ratings of instruction correlate with student ratings. They range from (r = .40 to r = .75). This belies the
conventional wisdom that students come to appreciate teaching sometime after entering the real world of working adults.
- External observers ratings of instruction correlate with student ratings. They range from (r = .50 to r = .76). Correlations are generally higher when raters have received training in observing.
- Student written comments to open-ended questions correlate very strongly with global student ratings. Correlations ranged from (r = .75 to r = .93).
VARIABLES THAT GENERALLY DO NOT CORRELATE WITH STUDENT
EVALUATIONS
Instructors often suspect that variables beyond their control
create bias in student ratings. The research indicates that most
factors considered in the research do not contribute
significantly to bias in student ratings. The major factors
addressed by the research as potential sources of bias are listed
below.
- Age and teaching experience of the instructor. Where small correlations were found they generally tended to be negative for older instructors.
- Gender of the instructor. However there is a general trend for male students to rate male faculty members higher and for female students to rate female faculty members higher.
- Race. Few studies have been done but those available show no trends.
- Personality factors. Few personality factors have any consistent relationship with student ratings. Those showing a modest
- correlation were positive self esteem (r = .30) and energy and enthusiasm (r = .27) of the instructor.
- Research productivity. There is a very low positive correlation (r = .12) between research productivity and student ratings.
- Age of the student.
- Gender of the student.
- Level of the student - e.g., freshman
- Student GPA
- Student Personality
- Class size. There is a very weak tendency for smaller classes to give higher ratings. (r = -.09)
- Time of day when the course is taught.
- Time during the term. Any time after the second half of the term yields similar ratings.
VARIABLES THAT GENERALLY DO CORRELATE WITH STUDENT EVALUATIONS
- Faculty rank. Regular faculty tend to receive higher ratings than graduate teaching assistants.
- Expressiveness. Students tend to rate highly expressive instructors higher than other instructors even if content is lacking. However, instructor expressiveness does tend to enhance student learning.
- Student motivation. Students tend to rate courses higher when they have prior interest in the subject matter (r = .40). Required courses tend to receive lower ratings.
- Expected grades. There is a low positive correlation (r = .10 to .30) between students ratings and expected grades. There are three competing hypotheses about what this might mean: (1) Student who learn more give higher ratings; (2) Instructors inflating grades receive inflated ratings; or (3) Student characteristics such as motivation lead to higher grades and therefore higher ratings.
- Level of Course. Higher level courses, especially graduate courses, tend to get higher ratings.
- Academic field. Humanities and arts courses tend receive rating higher than social science type courses which rank higher than mathematical type courses.
- Workload difficulty. Contrary to conventional wisdom, course difficulty correlates positively with course ratings. These correlations are not strong but include a number of studies considering: Amount of reading (r = .11); Amount of non-reading assignments (r = .16); Difficulty of subject matter (r = .15); Worked harder in the course (r = .29).
- Non anonymous ratings. Signed ratings tend to be higher.
- Instructors present while students complete ratings. Ratings with the instructor present tend to be higher.
- Purpose of Ratings. If instructions state that the ratings will be used for personnel decisions, ratings tend to be higher than if they are used only for instructor improvement.
The Scriven Study
The purpose of this report is to summarize a report regarding the
utility of student ratings of university courses and instructors.
The study summarized is Ratings offer useful input to teacher
evaluations. This report was a synthesis research in this area
compiled by Michael Scriven and is available as an ERIC Document
No. 398240. This document is available on line through the ERIC
Search Wizard (http://ericae.net/scripts/ewiz/amain2.asp) using
the key words Scriven + student + evaluations. The article
addresses concerns about the validity of student ratings and
presents a case for their use in teacher evaluation. The article
cautions that while student evaluations of teaching have merit,
they should not be the only sources used for determining the
merit of teaching.
CONCERNS
- Student ratings often ask questions students are not in a position to judge reliably.
- Student ratings of faculty are only statistically related to their learning gains. There are serious concerns about the use
of statistical indicators to make personnel decisions.
- Studies summarized use questionable indicators such as correlation of student ratings with peer ratings of teacher merit
rather than correlation with measures of student learning gains.
ARGUMENTS FOR USING STUDENT RATINGS
- Students are in a unique position to rate their increased knowledge, comprehension and motivation toward subjects taught.
- Students are in a good position to judge whether tests cover the material taught.
- The research indicates that there are positive and statistically significant correlation of student ratings with learning gains.
- Student ratings represent student participation in "democratic decision making".
- Student ratings appears to be the "best available alternative" for faculty evaluation.
PROBLEMS WITH EVALUATION FORMS
Most student evaluation forms are invalid as a basis for making personnel decisions because they tend to make a multitude of common errors including:
- Asking questions that are potentially prejudicial because they ask question about the teacher's personality or the appeal of the subject matter.
- Asking for comparisons with other teachers.
- Asking whether one would recommend the course to a friend with similar interests.
- Asking whether a course is one of the best courses one has had.
- Forms that are too long.
SUGGESTIONS FOR IMPROVING THE VALIDITY OF STUDENT RATINGS
If student ratings are to be valid, they must be properly
administered, stringently controlled, and include a thorough
analysis of test results. Common errors include:
- Using instructors to collect forms.
- Lack of controls for administering forms.
- Inadequate time to complete forms.
- Failing to ensure an acceptable return rate.
- Failure to ensure the validity of results, errors in data processing, report design, and interpretation of results.
The Cohen Study
The purpose of this report is to summarize the findings of the
most recent meta-analysis of studies researching the validity of
student ratings of instructors in college courses. The study
summarized was Student ratings of instruction and student
achievement: a meta-analysis of multisection validity studies.
The study was conducted by Peter A. Cohen and was reported in the
Review of Educational Research, Fall, 1981, vol. 51 (3), pp.
281-309. While this study is seventeen years old, it is the most
recent comprehensive meta-analysis study available.
STUDY ASSUMPTIONS
- Although teaching effectiveness is difficult to define, it is generally thought of as the degree to which an instructor
facilitates student achievement.
- Correlations of instruction and how much students learn may be a crude index of teaching effectiveness, however if student ratings are to have any utility they must show at least a moderately strong relationship how much student learn.
- There is a lack of unanimity in defining good teaching but most researchers agree that student learning is the most important criterion for judging teaching effectiveness.
LIMITATIONS
- Most studies summarized included only lower division courses. Extrapolation of these results to upper division or graduate courses have some inherent limitations.
- This study summarizes only studies that view the course as the unit of analysis. It ignores a large number of studies that
considered students as the unit of analysis. Some researchers would consider this as a selection bias for this type of
analysis.
- The methods of this study utilized average correlations rather than the more common measure of "z" scores. To some extent this limits the interpretation of how significant the relationships are against a backdrop of a standard unit of measurement such as a "z" score or a standard deviation.
STUDY FINDINGS
The study calculated a correlation for each of the following
dimensions of teaching with measures of student achievement.
Each dimension of teaching is followed by its correlation with
student achievement. The reporting of “n” indicates the number
of studies summarized not the number of students in these
studies.
- Overall Course. (r = 0.47, n = 22)
- Overall Instructor. (r = 0.43, n = 67)
- Skills. This is based on rating of how well students judge the quality of an instructor. It includes student judgments of instructor command of the subject, giving clear explanations, teaching near the class level of understanding. (r = 0.50, n = 40)
- Rapport. This is based ratings of teacher empathy, friendliness, approachability, and accessibility. (r = 0.31, n = 28)
- Structure. This is a student judgment of how well the instructor planned and organized the course. This includes the instructor tendency to keep on schedule, use class time well, and explain course requirements. (r = 0.47, n = 27)
- Difficulty. This deals with the perceived difficulty of the teachers work expectations. This relates to students perception of the difficulty of assigned readings, assigning more work that student could complete, and workload compared to other courses. (r = -0.02, n = 24)
- Interaction. This deals with the degree to which students are encouraged to share ideas and become involved in class sessions. It includes the instructors encouragement to express points of view, opinions, and participate in discussions. (r = 0.22, n = 14)
- Feedback. This is a measure of an instructor's concern for the quality of student work including complementing student work, checking for understanding before proceeding, and keeping students informed of progress. (r = 0.31, n = 5)
- Evaluation. A student judgment of how well evaluation instruments fairly assess their ability (r = 0.23, n = 25)
- Student progress. This item is based on student self-ratings of how much they feel they learned in a class. (r = 0.47, n = 14)