Volume 22 Numbers 1 & 2, 2012
Evaluating doctoral programs in communication on the basis of citations
A critical question about any method of departmental ranking involves understanding the basis for the rankings. Essentially, the process of comparative evaluation provides judgments about the relative merits of the programs, invoking some metric of assessment. The evaluation seeks to give the value of a program based on the characteristics used to assess or rank the program. Understanding what elements go into the judgments provides a means to determine the merit of the process and how to interpret the results.
The National Communication Association (NCA, formerly known as the Speech Communication Association, or SCA) has made two efforts (Hollihan, 2004; Speech Communication Association, 1996) at ranking doctoral programs in communication. Both NCA efforts generated controversy and the usual negative reactions typical to any survey of persons that ranks the quality of organizations. The problem is that creating a ranking to evaluate a set of programs without using and articulating clear criteria causes doubt about interpretation of the rankings and can raise suspicions of unfairness. Generating rankings from a survey can reflect perceptions of the quality of the doctoral programs without the accompanying articulation of the basis of those perceptions. In the case of the NCA rankings, survey respondents provided an opinion about the relative quality of programs, and the opinions were aggregated by the NCA evaluators, without a corresponding articulation of the premises used in forming the opinions of the participants or the conclusions of the authors.
What do rankings indicate?
Surveying members of a discipline to provide a means of evaluation can raise a number of questions. The questions can be divided into categories that generally involve three sets of issues: (a) sample of raters, (b) basis used by the raters to make an evaluation, and (c) categories used to characterize the programs under study.
The first set of issues considers the raters and issues of sampling. Do the samples reflect the underlying population? A guarantee of anonymity means that the person providing any particular response remains unknown. Other than some descriptive information about the raters who complete the survey, little is known about the source of information used for the analysis. Critics can raise suspicions about conflicts of interest on the part of the raters (e.g., graduating from an institution, teaching, having taught at an institution, regional affiliation, content area and methodological preferences).There exists no ability to determine the seriousness of these issues. The number of potential professional and personal issues that can arise make any estimation a concern because the sample used to generate the estimates can provide a real or perceived bias. Concern about conflict of interest is very important when the number of graduates from programs varies widely, because the potential for personal experience with a graduate of the program depends to a large degree on the number of persons graduating from the program. In surveys, larger programs might be favored over smaller programs due to a combination of name recognition and first-hand experience with students of that program. Of course, perhaps the size of the program and the number of graduates reflects a more valuable program because of the greater level of contribution made to the discipline. Various factors can cause bias, perhaps unintentionally, which can result in distorted, non-representational conclusions.
Like coders in a content analysis, raters can use criteria provided by the researchers, but a survey of members simply provides a perception of what the raters feel as a fair means of evaluation or ranking. The problem is that unless the raters use metrics that meet clearly defined expectations, , the basis of each rating decision is unclear. The problem is increased when a sense of reliability cannot be provided for the ranking data.
The question of sampling and the fairness of which persons participated become an issue, particularly when the participation rate is relatively low or the choice of raters is limited. When evaluating the quality of a doctoral program, who are the appropriate evaluators? Perhaps the evaluation of the program should be based on an analysis of the graduates (and dropouts) from that program. For most educational institutions, the proof is whether the graduates of that program are able to undertake and succeed at professional tasks. The practice of simply asking persons to provide a list of the “top five” doctoral programs allows a lot of potential to generate incomplete evaluations.
This set of issues considers what the raters undertake when asked to provide an opinion. In the minds of the evaluators, what served as the basis for decision when making distinctions among doctoral programs? The various evaluators use some type of system or set of comparisons that permit them to decide which programs are considered superior programs. Evaluators could use the size of the program, number of doctoral degrees granted, diversity of the program (race, nationality, methodology, content areas), the depth of the program in an area of specialization, participation in leadership in regional, national, or international professional organizations, the number and type of outside funding or grants, the quality of instruction provided doctoral students, the quality and support for instructional development of graduate student teaching assistants, or some combination of these elements as well as other additional criteria.
US News and World Report, for example, sets out a complex set of mathematical relations that provides a basis for determining which programs are considered in the annual report as the best programs. Not surprisingly, surveys and other ranking systems have been used to evaluate doctoral programs in communication (Edwards, Watson, & Barker, 1988a, 1998b). One can disagree with the evaluative criteria US News uses, but the fact that the standards are completely articulated means that arguments against those standards operate from an understanding of the basis of the original ranking.
The final set of considerations for evaluation provides the question about how to divide the available set of objects into categories. In intercollegiate sports, setting up categories such as Division I, Division II, and Division III provides a basis for championship matches that are considered more equitable than simply having all institutions in the same set of comparisons. The key is to provide a way of making comparisons among similar programs focused on a common objective.
Unfortunately, little agreement exists in the discipline of communication about how one would classify programs to permit more equitable comparisons.
For doctoral programs in communication, the basis is usually related to degree content-specific subject areas within the field of communication — learned by students graduating from that program. For example, a program may be excellent at interpersonal communication and not offer a course of study in rhetoric or communication education. The distinction between the specializations of programs can involve excellence expectations specific to a particular area and not consider other areas or content focus. The ranking must consider the particulars of the program content and how excellence may be achieved in one area or a limited number of areas without inclusion of all areas. The second survey of doctoral programs put programs into categories based on areas of specialization, but there exist a number of arguments about whether the classification of programs into the areas was appropriate. At the same time, specialization by area of academic concentration does not provide an overall assessment across the various programs at a meta-level. Consideration by area makes the issues raised in sampling and standards of evaluation even more of an issue since the problems now occur in smaller units.
The desire for ranking
The desire for comparisons and various rankings and rating systems requires a search for a means of evaluation that is considered fair. The notion of objective rankings would appear impossible and relatively undesirable to even suggest. The problem is that any rating system privileges a set of issues or various underlying values that cannot be considered universal from either a cross-sectional or a longitudinal standpoint. Evaluations of individual scholarly productivity have this problem and require attention (Allen, Antos, Hample, Hebl, Kulovitz, Liang, Ogi, Zhao, & Pederson, 2009).
The point of the process of the evaluation should not be an exact set of final or objective rankings but a relative set of understandings. For example, the difference between the top two programs may not be all that important, but a rating system should be able to distinguish between the top two and the bottom two programs relatively accurately, using the chosen metric of measurement. Like all measurement, the problem of error of measurement provides a basis for the consideration of how unreliable or uncertain a particular ranking is.
Sources of information
The National Communication Association website provides a list of doctoral departments that was used in the ranking of programs (the current list can be found at . A program was included if its focus was communication. Only one doctoral program per university was included. Departments whose title reflected journalism, radio, television, advertising, or some other focus were not included. This provided a total of 60 programs/universities included in this analysis.
The source for information on number of citations was the Web of Knowledge. The Web of Knowledge is one of two indexing systems that could be used; the other is Google Scholar (scholar.google.com). There are comparative advantages for each of the systems. Google Scholar is more complete in that the system includes more journals, books, and other publications. The cost of the higher inclusion is less accuracy, as Google Scholar tends to omit works where the person is listed as a coauthor (Levine, 2010). Web of Knowledge is more accurate in terms of the completeness of inclusion of authors and indexing of articles. However, Web of Knowledge includes far fewer journals and is less comprehensive and inclusive of works across the social sciences and humanities. The decision by the authors was to strive for accuracy of citations at the expense of less inclusiveness of potential journals and other materials. The citation score for each department is the combined number of citations for all members of that department as well the combined number of different manuscripts (articles) that received a citation authored or coauthored by members of that department.
The list of faculty was taken from the web sites for each department in February and March of 2010. The requirement for inclusion as a faculty member who contributed towards the departmental total was that the listed member had to be a full-time, tenure-track faculty member. The faculty member had to be listed as having a primary or central affiliation with the department. Joint-appointment faculty were scrutinized to determine whether the primary affiliation was with the Department of Communication or the faculty member appointment was a courtesy or represented only a partial appointment.
The faculty member could not be a full-time administrator (dean, provost) for the university outside of the department. So, a faculty member with a listing as a chair or head would still be considered as a member of the department whereas a faculty member serving as provost would not be considered a member of the department.
Disagreements were resolved with consideration of the publication record and membership in communication organizations, such as the National Communication Association, International Communication Association, or regional communication organizations. When a faculty member was also affiliated with another (non-communications) department, membership in communications organizations, regular attendance at organizational conventions, publications in communication journals all indicate more regular affiliation with the discipline of communications rather than the field of the other department. One factor considered as well was whether the faculty member earned a degree in communication or some other discipline. A faculty member who was solely a member of a department of communication, regardless of other affiliations and original department granting the degree, was considered a member of the department for the purposes of this review.
Data compilation was relatively straightforward; the number of citations was simply combined to produce a total for each department. Table 1 provides a ranking based on the total number of citations and provides the total number of faculty considered in that summary. The table provides the total number of publications used to generate the number of citations as well as an average number of citations per publication and an average number of citations per faculty member. Both averages indicate another way of viewing citations that considers the number of faculty (since a raw total favors larger departments) and provides a crude indication of quality of publication (average number of citations per work). The goal is not to provide a single metric but to provide a number of different ways to evaluate the contribution of faculty to scholarship.
Table 2 provides the means, standard deviations, and correlations among the various elements. This set of correlations can be used to assess whether or not various elements can account for the results (e.g., size of the faculty as a means of explaining differences in total number of citations).
The citation counts provided in Table 1 provide a rank order listing of doctoral programs in communication. According to our analysis, the University of Pennsylvania (Annenberg School) demonstrates the most total combined citations to articles by the faculty in the Web of Knowledge. This is followed in order to complete the top ten by: University of California-Santa Barbara, Michigan State, Ohio State, Illinois, Southern California, Pennsylvania State, Michigan, Iowa, and Texas-Austin.
What is impressive about the top ranking of the University of Pennsylvania is that the top ranking is maintained whether the total citations, number of cited articles, average citation per article, or average citation per faculty member metric is employed. Table 2 indicates that the average faculty member has about 112 citations to articles that have been published. What this analysis indicates is that the top institutional ranking for the University of Pennsylvania remains the same, regardless of the particular metric chosen.
Table 2 indicates high correlations among the various metrics as well as with departments with larger faculty. It is not surprising that the number of faculty in a department would correlate with the total number of citations (r = .49) or with the number of publications receiving a citation in Web of Knowledge (r = .70), but the number of faculty in a department also correlates with the average number of publications receiving a citation in Web of Knowledge (r = .40) and the average number of citations per faculty member (r = .34). What this indicates is that while the number of faculty is important, larger departmental faculties also have a higher average number of publications as well as a higher average number of citations. This indicates that larger departments demonstrate higher productivity per faculty member (as measured by number of average publications receiving a citation) and higher numbers of citations per article.
The results of the analysis provide a means to consider an alternative to NCA rankings based on surveys of members that is more transparent, capable of replication, and that can serve as the basis for the generation of alternatives. Essentially, departments with more faculty that produce works that receive citations represents one measure of value. The findings support a means for a doctoral program wishing to increase the perception of quality to argue that well-published faculty whose work is considered important and foundational to the published research of other scholars constitutes a quality degree program.
The results in Table 1 indicate clearly that faculty at the top end of the ranking overwhelmingly dominate.. Pennsylvania faculty, for example, have more citations than the bottom 30 institutions combined (representing 50% of the available doctoral programs).
The analysis provides a useful alternative to the survey methods such as those used by the National Communication Association. While surveys take years to both establish and conduct, this method can be conducted and finished within a couple of months at relatively low cost beyond labor.
The expectation is that any survey of members would show a relationship to these results because the rankings would reflect a perception of faculty quality (assuming one measure of faculty quality is the degree to which the scholarship is perceived as influential-reflected by the number of citations to that work). The view in many respects is self-sustaining because if adopted as a valid standard, the training of doctoral students would be by programs that endorse this view of quality and the resulting education, with an emphasis on published research, creates an interesting form of indoctrination, mentorship, and acculturation into an academic culture with a clear set of values and assumptions of hierarchy (Erickson, Fleuriet, & Hosman, 1993). The question of what values or outcomes should be sought by a doctoral program remains subject to discussion and articulation. For example, Barnett, Danowski, Feeley, and Stalker (2010) argue placement of doctoral students into centrally located doctoral programs should be the ultimate goal of a doctoral program.
One aspect relates to the timing of the evaluation and the ongoing changes in departmental faculty. The data collection took place in the Spring of 2010 and after that date a number of departments underwent significant change with faculty changes due to deaths, retirement, and relocation to another institution. The result of a retirement or relocation can have significant impact on the total number of citations in a very quick and sudden manner. Reputations for a doctoral program are probably not as potentially volatile as a citation count and instead represent the sustained effort of many scholars across a long period of time to generate a perception in members of the discipline about the quality of the education. At the same time, a new doctoral program can establish “legitimacy” in the discipline by hiring several faculty considered “top scholars” to boost the perception of the quality of the program and the educational opportunities available. Often, new programs are implemented with a set of new positions targeted both increasing resources and raising the profile of the program.
One limitation involves the use of Web of Knowledge as a basis to obtain data for the comparison. Other possible indexes like Google Scholar exist that could have served as the basis for generating the list of citations to the work of departmental faculty. The potential for other indexes or systems of evaluation of the contribution by faculty in terms of productivity remains. Indexes also tend to favor the publication of articles because journals are easier to index on a regular basis than books. The result is that social science scholars would tend to publish more and find greater numbers of citations than persons in the humanities. Scholars in the humanities, focused more on the production of a book, would publish with less frequency and have a different pattern of citation. For example, rhetoricians may have a large number of citations to the various artifacts under study and far fewer to the work of other scholars. Whereas the work of a social scientist may be spread across four or five articles, each receiving a separate citation, the work of someone in humanities may generate a single book. There may be far fewer separate citations to that single work. The impact of a review of the literature in a single example or when considering a summary piece means that the citation pattern favors the work of the social scientist when considering the number of citations expected.
Rankings only indicate a set of opinions that have been gathered in an effort to provide some usable information to permit consumers and others to evaluate efforts. The question is whether the ratings reflect elements that are desirable. For example, does a faculty member publishing an article provide additional value to a student making a choice among various doctoral programs? Translating the elements that contribute to a positive or higher ranking to the value assigned to the education received as a part of the degree process remains difficult to evaluate. However, students deciding on which program to apply to make evaluations about the quality of the program and the degree to which a quality education is available at the institution all the more important.
A risk of this work is the impact that rankings have on the focus of departments and administrators seeking institutional excellence. Rather than simply viewed as one alternative to a current method of ranking and rating scholars (see previous scholar rankings, Erickson, Fleuriet, & Hosman, 1996; Hickson, Bodon, & Turner, 2004; Hickson, Stacks, & Amsbary, 1989; 1992; 1993; Hickson, Stacks, & Bodon, 1999; Hickson, Turner, & Bodon, 2003), the goal of this article is to demonstrate that there are alternative means to establishing the value of a scholar’s contributions. Such a focus on multiplicity of outcomes and the development of those demonstrations should be viewed as a means to serve the discipline by providing options that scholars and programs can choose to accept in combination or as substitutes. One of the arguments make by Blair, Brown, and Baxter (1994) about the need for multivocality in dealing with academic assessment and evaluation requires the exploration and development of multiple metrics. The more fundamental challenge becomes a definition of what constitutes success and the focus on the measurement of those outcomes. The process of programs choosing different definitions and measures of success may mean that a “ranking” becomes not comparative to other programs and institutions but instead reflects the ability to meet established criteria and outcomes defined by the institution and department. Such a view is an inward view of determining institutional excellence, whereas most views are focused on comparative views that require some type of ranking and put programs in direct competition with each other without consideration of the adequacy or excellence of the particular program or faculty.
One question beyond the scope of this essay concerns the value of reputation in terms of determining suitability of a candidate for a faculty position. Trying to determine the status of a department in many respects should probably be best reflected by what the graduates of the institution accomplish. The problem is that of the three main categories of achievement (teaching, research, and service) only one, research, seems to have readily developed metrics for accomplishment evaluation. There exist few national or even institutional means of evaluating the effectiveness of instruction for a department or the members of a department. The nature of generating comparative data remains elusive and difficult to determine. Given that the overwhelming majority of employment opportunities require a primary emphasis on teaching, a disconnect exists between the means of evaluation and ranking sought by doctoral programs, and ultimately the qualities sought by employers of the graduates of those programs. Doctoral program rankings reflect a focus on research productivity by faculty but most of the employers of the graduates of doctoral programs focus on excellence in classroom instruction. The problem is that there is little attention paid to determining and measuring the degree of excellence for classroom instruction. The result is a set of evaluative criteria weighted and considering really almost exclusively only one of the three elements of faculty evaluation, research, to the exclusion of the other two elements, teaching and service. The question of how to develop measures for the other two elements of faculty development remains something both of a mystery and necessity to develop.
Allen, M., Antos, A., Hample, J.M., Hebl, M., Kulovitz, K., Liang, X., Ogi, M., Zhao, X., & Pederson, J.R. (2009, May). A method of evaluating the impact of scholars. Paper presented at the International Communication Association Convention, Chicago, IL.
Barnett, G. A., Danowski, J. A., Feeley, T. H., & Stalker, J. (2010). Measuring quality in communication doctoral education using network analysis of faculty-hiring patterns. Journal of Communication, 60, 388-411.
Hickson, M. III., Stacks, D.W., & Amsbary, J.H. (1989). An analysis of prolific scholarship in speech communication, 1915-1985: Toward a yardstick for measuring research productivity. Communication Education, 38, 230-236.
Hollihan, T. (2004). NCA doctoral reputational study, 2004. National Communication Association. Retrieved from http://dev.natcom.org/uploadedFiles/ More_Scholarly_Resources/Chairs_Corner/Doctoral_Chairs_Section/ PDF-DoctoralChairs-A_Study_of_the_Reputations_of_Doctoral_Programs_in_Communication_2004.pdf
Copyright 2012 Communication Institute for Online Scholarship, Inc.
This file may not be publicly distributed or reproduced without written permission of