내용분석의 개괄

An Overview of Content Analysis

Steve Stemler

Yale University



내용분석은 코딩규칙을 바탕으로 텍스트의 많은 단어를 적은 수의 카테고리로 줄이는 체계적, 재현가능한 테크닉이다. Holsti는 내용분석을 "any technique for making inferences by objectively and systematically identifying specified characteristics of messages"라고 정의하였다. Holsti의 정의에 따르면, 내용분석에 사용되는 테크닉은 텍스트분석 뿐만 아니라 그림, 비디오녹화된 행동 등에도 적용가능하다. 그러나 재현가능성을 위해서는 자연상태에서 'durable'한 데이터를 활용해야 한다.

Content analysis has been defined as a systematic, replicable technique for compressing many words of text into fewer content categories based on explicit rules of coding (Berelson, 1952; GAO, 1996; Krippendorff, 1980; and Weber, 1990). Holsti (1969) offers a broad definition of content analysis as, "any technique for making inferences by objectively and systematically identifying specified characteristics of messages" (p. 14). Under Holsti뭩 definition, the technique of content analysis is not restricted to the domain of textual analysis, but may be applied to other areas such as coding student drawings (Wheelock, Haney, & Bebell, 2000), or coding of actions observed in videotaped studies (Stigler, Gonzales, Kawanaka, Knoll, & Serrano, 1999). In order to allow for replication, however, the technique can only be applied to data that are durable in nature.


내용분석을 통해서 많은 양의 자료를 체계적인 형태로 꼼꼼하게 추릴 수 있다. 개인/단체/기관 등이 어디에 초점을 두고 있는지 찾아내어 묘사하기에 유요한 방법이다. 또한 다른 자료수집방법을 통해서 확증될 수 있는 추론을 할 수도 있다. Krippendorff는 "내용분석연구 중 많은 수가 다른 테크닉을 활용하기에는 너무 비용이많이 들거나, 더 이상 불가능하거나, 지나치게 두드러지는(obtrusive) 상징성 데이터로부터의 추론을 위해서 사용되어왔다" 라고 한다.

Content analysis enables researchers to sift through large volumes of data with relative ease in a systematic fashion (GAO, 1996). It can be a useful technique for allowing us to discover and describe the focus of individual, group, institutional, or social attention (Weber, 1990). It also allows inferences to be made which can then be corroborated using other methods of data collection. Krippendorff (1980) notes that "[m]uch content analysis research is motivated by the search for techniques to infer from symbolic data what would be either too costly, no longer possible, or too obtrusive by the use of other techniques" (p. 51).



내용분석의 적용 Practical Applications of Content Analysis


내용분석은 authorship을 결정하기 위해서 효과적인 방법이다. 예를 들어서, authorship을 결정하는 한 가지 방법은 잠정적인 author의 리스트를 만들고, 그들의 이전 저작물을 분석하고, 각 author가 관심자료를 작성했을 가능성을 추측하기 위하여 명사와 기능어의 빈도와의 상관관계를 구하는 것이다. Mosteller와 Wallace는 단어빈도에 기반한 Bayesian technique를 활용하여 Madison이 Federalist papers의 저자임을 밝혀낸 바 있다. Forster는 1992년에 쓰인 Primary Colors라는 저자가 알려지지 않은 책의 정체를 밝히기 위해서 더 총체적(holistic)인 접근법을 사용한 바 있다.

Content analysis can be a powerful tool for determining authorship. For instance, one technique for determining authorship is to compile a list of suspected authors, examine their prior writings, and correlate the frequency of nouns or function words to help build a case for the probability of each person's authorship of the data of interest. Mosteller and Wallace (1964) used Bayesian techniques based on word frequency to show that Madison was indeed the author of the Federalist papers; recently, Foster (1996) used a more holistic approach in order to determine the identity of the anonymous author of the 1992 book Primary Colors.


내용분석은 문서의 트렌드와 패턴을 파악하기 위한 용도로도 유용하다. Stemler와 Bebell은 학교의 mission statement에 대한 분석을 수행하였는데, 이를 통해 학교들이 어떤 존재이유를 표방하는지에 대한 추론을 하였다. 주요 연구질문 중 하나는, 교육과정(program)효과성을 측정하기 위한 준거들이 전체적인 교육과정목표 또는 학교의 존재이유와 잘 연계되어있는지(aligned)에 대한 것이었다.

Content analysis is also useful for examining trends and patterns in documents. For example, Stemler and Bebell (1998) conducted a content analysis of school mission statements to make some inferences about what schools hold as their primary reasons for existence. One of the major research questions was whether the criteria being used to measure program effectiveness (e.g., academic test scores) were aligned with the overall program objectives or reason for existence.


추가적으로, 내용분석은 대중의 의견이 어떻게 변하는가를 보는 목적으로도 활용가능하다. 1990년대 후반의 Mission statement로부터 얻어낸 데이터는 미래의 어떤 시점에서 수집된 자료와 객관적으로 비교할 수 있다.

Additionally, content analysis provides an empirical basis for monitoring shifts in public opinion. Data collected from the mission statements project in the late 1990s can be objectively compared to data collected at some point in the future to determine if policy changes related to standards-based reform have manifested themselves in school mission statements.


내용분석의 시행 Conducting a Content Analysis


Krippendorff에 따르면 다음의 여섯 가지 질문을 던져야 한다.

According to Krippendorff (1980), six questions must be addressed in every content analysis:


1) Which data are analyzed? : 어떤 자료를 분석하는가?

2) How are they defined? : 자료는 어떻게 정의되었는가?

3) What is the population from which they are drawn? : 어떤 모집단으로부터 수집되었는가?

4) What is the context relative to which the data are analyzed? : 분석하려는 자료와 관계된 맥락은?

5) What are the boundaries of the analysis? : 분석의 경계는?

6) What is the target of the inferences? : 추론의 대상은?


내용분석을 위한 문서를 수집할 때 최소한 세 가지의 문제가 생길 수 있다. 첫 번째로 모집단에서 다수의 문헌이 포함되지 않았다면, 그 내용분석은 버려져야 한다. 두 번째로, 부적절한 기록 역시 버려져야 한다. 그러나 다른 이유로 기록을 보존할 수도 있다. 마지막으로, 일부 문헌들은 분석에 필요한 요구조건을 다 만족시켰더라도 코딩이 불가능할 수 있다.

At least three problems can occur when documents are being assembled for content analysis. First, when a substantial number of documents from the population are missing, the content analysis must be abandoned. Second, inappropriate records (e.g., ones that do not match the definition of the document required for analysis) should be discarded, but a record should be kept of the reasons. Finally, some documents might match the requirements for analysis but just be uncodable because they contain missing passages or ambiguous content (GAO, 1996).



자료분석 Analyzing the Data


질적연구에 대한 가장 흔한 생각 중 하나는, 내용분석이 단순히 단어의 빈도를 세는 것이라는 생각이다. 가장 많이 언급된 단어가 가장 중요한 단어라는 가정을 가지기 때문인데, 일부 경우에서는 사실일 수도 있지만 그렇지 않을 수도 있다.

Perhaps the most common notion in qualitative research is that a content analysis simply means doing a word-frequency count. The assumption made is that the words that are mentioned most often are the words that reflect the greatest concerns. While this may be true in some cases, there are several counterpoints to consider when using simple word frequency counts to make inferences about matters of importance.


또 다른 고려사항은 동의어가 문서 전체에 걸쳐 어떤 스타일적 이유로 반복되어 사용될 수 있어서, '개념'의 중요성을 과소평가할 수도 있다. 그러나 각 단어가 그 단어가 속해있는 카테고리를 동일하게 대변하지 않는다는 것을 염두에 두어야 한다. 불행하게도, 가중치를 주는 것에 대해 정해진 절차는 없으며, 단어의 수를 셀 때는 이런 점을 염두에 두어야 한다. 더 나아가 Weber는 "이슈를 제기하고자 할 때, 모든 이슈가 거론하기 쉬운 것은 아니다. 정당이 정치적 이슈를 제기하기는 쉽지만 역사나 미국 원주민의 문제를 제기하기는 어렵다. 마지막으로 단어 숫자를 셀 때에 일부 단어는 다양한 의미를 가질 수 있음을 고려해야 한다.

One thing to consider is that synonyms may be used for stylistic reasons throughout a document and thus may lead the researchers to underestimate the importance of a concept (Weber, 1990). Also bear in mind that each word may not represent a category equally well. Unfortunately, there are no well-developed weighting procedures, so for now, using word counts requires the researcher to be aware of this limitation. Furthermore, Weber reminds us that, "not all issues are equally difficult to raise. In contemporary America it may well be easier for political parties to address economic issues such as trade and deficits than the history and current plight of Native American living precariously on reservations" (1990, p. 73). Finally, in performing word frequency counts, one should bear in mind that some words may have multiple meanings. For instance the word "state" could mean a political body, a situation, or a verb meaning "to speak."


단어의 빈도를 셀 때 유용한 방법 중 하나는, 관심이 있는 단어의 빈도를 세고 KWIC검색을 통해서 그 단어의 용법이 일관된지 확인하는 것이다. 대부분의 질적연구소프트웨어는 특정 단어가 사용된 문장만 끌어올 수 있다. 이런 과정이 추론의 타당도를 더욱 강화시켜줄 것이다. 어떤 소프트웨어는 인공지능을 활용하여 맥락에 근거하여 단어가 동일하게 사용된 문장을 구분할 수 있다. 

A good rule of thumb to follow in the analysis is to use word frequency counts to identify words of potential interest, and then to use a Key Word In Context (KWIC) search to test for the consistency of usage of words. Most qualitative research software (e.g., NUD*IST, HyperRESEARCH) allows the researcher to pull up the sentence in which that word was used so that he or she can see the word in some context. This procedure will help to strengthen the validity of the inferences that are being made from the data. Certain software packages (e.g., the revised General Inquirer) are able to incorporate artificial intelligence systems that can differentiate between the same word used with two different meanings based on context (Rosenberg, Schnurr, & Oxman, 1990). There are a number of different software packages available that will help to facilitate content analyses (see further information at the end of this paper).


내용분석은 단순한 단어 빈도를 세는 것 이상이다. 이 기술을 풍요롭게 의미있게 만드는 것은 데이터에 코딩과 카테고리화를 의존한다는 것이다. 카테고리에 대한 기본은 다음을 보면 좋다. "카테고리는 유사한 함의를 가진 단어의 그룹이다". "카테고리는 상호배타적이면서 완결성(exhaustive)이 있어야 한다" 상호배타적인 카테고리는 두 데이터 포인트 사이에 어떤 unit도 포함되지 않는 다는 것을 의미하며, 각 유닛은 단 하나의 데이터포인트에 의해서 대표된다. 완결적 카테고리는 예외를 두지 않고 모든 기록물에 나와있는 언어를 다 포괄해야 한다.

Content analysis extends far beyond simple word counts, however. What makes the technique particularly rich and meaningful is its reliance on coding and categorizing of the data. The basics of categorizing can be summed up in these quotes: "A category is a group of words with similar meaning or connotations" (Weber, 1990, p. 37). "Categories must be mutually exclusive and exhaustive" (GAO, 1996, p. 20). Mutually exclusive categories exist when no unit falls between two data points, and each unit is represented by only one data point. The requirement of exhaustive categories is met when the data language represents all recording units without exception.


Emergent vs. a priori coding

자료를 코딩하는 두 가지 방법이 있다. Emergent coding은 자료의 예비조사를 통해서 카테고리를 형성한 이후에 코딩하는 것이다. 그 단계는 다음과 같다.

There are two approaches to coding data that operate with slightly different rules. With emergent coding, categories are established following some preliminary examination of the data. The steps to follow are outlined in Haney, Russell, Gulek, & Fierros (1998) and will be summarized here. 

      1. First, two people independently review the material and come up with a set of features that form a checklist. 
      2. Second, the researchers compare notes and reconcile any differences that show up on their initial checklists. 
      3. Third, the researchers use a consolidated checklist to independently apply coding. 
      4. Fourth, the researchers check the reliability of the coding (a 95% agreement is suggested; .8 for Cohen's kappa). If the level of reliability is not acceptable, then the researchers repeat the previous steps. Once the reliability has been established, the coding is applied on a large-scale basis. 
      5. The final stage is a periodic quality control check.

a priori coding을 할 때는 분석에 앞서서 이론에 따라 카테고리를 만들게 된다. 전문가집단이 카테고리에 합의를 이루면 자료의 코딩을 하게 되고, 필요에 따라 상호베타성과 완결성을 최대화하기 위해서 교정을 할 수 있다.

When dealing with a priori coding, the categories are established prior to the analysis based upon some theory. Professional colleagues agree on the categories, and the coding is applied to the data. Revisions are made as necessary, and the categories are tightened up to the point that maximizes mutual exclusivity and exhaustiveness (Weber, 1990).


Coding units

코딩유닛을 정의하는 몇 가지 방법이 있다. (물리적/구문론적/참조적/명제적)

There are several different ways of defining coding units. 

      • The first way is to define them physically in terms of their natural or intuitive borders. 
        • For instance, newspaper articles, letters, or poems all have natural boundaries. 
      • The second way to define the recording units syntactically, that is, to use the separations created by the author
        • such as words, sentences, or paragraphs. 
      • A third way to define them is to use referential units. Referential units refer to the way a unit is represented. 
        • For example a paper might refer to George W. Bush as "President Bush," "the 43rd president of the United States," or "W." Referential units are useful when we are interested in making inferences about attitudes, values, or preferences. 
      • A fourth method of defining coding units is by using propositional units. Propositional units are perhaps the most complex method of defining coding units because they work by breaking down the text in order to examine underlying assumptions. 
        • For example, in a sentence that would read, "Investors took another hit as the stock market continued its descent," we would break it down to: The stock market has been performing poorly recently/Investors have been losing money (Krippendorff, 1980).


일반적으로 세 종류의 유닛이 활용된다. 표본수집단위, 맥락단위, 기록단위가 그것이다.

Typically, three kinds of units are employed in content analysis: sampling units, context units, and recording units.


      • Sampling units will vary depending on how the researcher makes meaning; they could be words, sentences, or paragraphs. In the mission statements project, the sampling unit was the mission statement.
        • Mission statement분석에서 표본수집단위는 Mission statement이다.
      • Context units neither need be independent or separately describable. They may overlap and contain many recording units. Context units do, however, set physical limits on what kind of data you are trying to record. In the mission statements project, the context units are sentences. This was an arbitrary decision, and the context unit could just as easily have been paragraphs or entire statements of purpose. 
        • Mission statement연구에서 맥락단위는 문장이다. 이것은 연구자가 임의로 내린 결정이었으며, 목적에 따라 맥락단위는 문단 또는 전체 mission statement가 될 수도 있다.
      • Recording units, by contrast, are rarely defined in terms of physical boundaries. In the mission statements project, the recording unit was the idea(s) regarding the purpose of school found in the mission statements (e.g., develop responsible citizens or promote student self-worth). Thus a sentence that reads "The mission of Jason Lee school is to enhance students' social skills, develop responsible citizens, and foster emotional growth" could be coded in three separate recording units, with each idea belonging to only one category (Krippendorff, 1980).
        • 기록단위는 물리적 경계에 의해서 결정되는 경우가 거의 없다. Mission statement에서 기록단위는 각 학교가 MS에서 나타내는 목표이다. A학교의 목적은 학생들로 하여금 1, 2, 3을 배양하는 것이다. 라고 했다면 기록단위는 1, 2, 3의 세 가지가 된다. 


Reliability

웨버는 "텍스트로부터 추론을 하기 위해서는 분류 작업이 일관되어야 한다. 서로 다른 사람들이 같은 텍스트를 같은 방법으로 코딩해야 한다"라고 말했다. 또한 "신뢰도의 문제는 단어의 의미/카테고리의 정의/코딩규칙 등이 불분명할 때 생긴다" 라고 하였다. 따라서 코딩스킴을 개발한 사람이 그 코딩의 숨겨진 의미를 공유하는 것이 중요하다. 보고되는 신뢰도계수가 과장되는 경우가 많다는 보고가 있는데, 이를 피해기 위해서 가장 중요한 단계는 확실한 기록지침을 마련하는 것이다. 이러한 지침은 외부의 코딩자가 충분한 신뢰도를 확보할 수 있을 때까지 훈련이 가능하게 해준다.

Weber (1990) notes: "To make valid inferences from the text, it is important that the classification procedure be reliable in the sense of being consistent: Different people should code the same text in the same way" (p. 12). As Weber further notes, "reliability problems usually grow out of the ambiguity of word meanings, category definitions, or other coding rules" (p. 15). Yet, it is important to recognize that the people who have developed the coding scheme have often been working so closely on the project that they have established shared and hidden meanings of the coding. The obvious result is that the reliability coefficient they report is artificially inflated (Krippendorff, 1980). In order to avoid this, one of the most critical steps in content analysis involves developing a set of explicit recording instructions. These instructions then allow outside coders to be trained until reliability requirements are met.


신뢰도는 다음과 같은 맥락에서 논의될 수 있다.

Reliability may be discussed in the following terms:

    • Stability, or intra-rater reliability. Can the same coder get the same results try after try?
    • Reproducibility, or inter-rater reliability. Do coding schemes lead to the same text being coded in the same category by different people?

신뢰도를 측정하는 한 가지 방법은 평가자들간의 일치도를 확인하는 것이다. 이는 일치한 사례를 전체 사례로 나누는 단순한 방법도 가능하지만, 이러한 식의 접근은 '우연해 의해서 일치한 정도'를 고려하지 못한다는 한계가 있다. 이러한 한계를 극복하기 위해서 신뢰도는 Cohen's Kappa를 사용하여 계산하는 것이 좋으며, 1은 완벽한 신뢰성을, 0은 모든 일치된 경우가 우연에 의한 것임을 의미한다.

One way to measure reliability is to measure the percent of agreement between raters. This involves simply adding up the number of cases that were coded the same way by the two raters and dividing by the total number of cases. The problem with a percent agreement approach, however, is that it does not account for the fact that raters are expected to agree with each other a certain percentage of the time simply based on chance (Cohen, 1960). In order to combat this shortfall, reliability may be calculated by using Cohen's Kappa, which approaches 1 as coding is perfectly reliable and goes to 0 when there is no agreement other than what would be expected by chance (Haney et al., 1998). Kappa is computed as:




실제로 이 값은 우연에 의한 일치를 배제한 후, 평가자간의 일치한 비율을 해석하는데 사용될 수 있다. Crocker&Algina는 Kappa가 0이라는 것은 일치한 수준이 우연에 의한 일치 그 이상도 아니라는 것을 의미하며, Kappa가 0보다 작다는 것은 일치한 결과가 우연에 의한 일치에도 못 미친다는 것을 의미한다. Kvalseth는 Kappa가 0.61이상이면 충분히 괜찮은 일치도라고 말했다. 

In practice, this value may be interpreted as the proportion of agreement between raters after accounting for chance (Cohen, 1960). Crocker & Algina (1986) point out that a value of κ = 0 does not mean that the coding decisions are so inconsistent as to be worthless, rather, κ = 0 may be interpreted to mean that the decisions are no more consistent than we would expect based on chance, and a negative value of kappa reveals that the observed agreement is worse than expected on the basis of chance alone. "In his methodological note on kappa in Psychological Reports, Kvalseth (1989) suggests that a kappa coefficient of 0.61 represents reasonably good overall agreement." (Wheelock et al., 2000). In addition, Landis & Koch (1977, p.165) have suggested the following benchmarks for interpreting kappa:





Cohen은 이 방법을 사용하기 위해서는 세 가지 가정이 충족되어야 한다고 말했다.

Cohen (1960) notes that there are three assumptions to attend to in using this measure. 

      • First, the units of analysis must be independent. 분석의 단위들이 독립적이어야 한다.
        • For example, each mission statement that was coded was independent of all others. This assumption would be violated if in attempting to look at school mission statements, the same district level mission statement was coded for two different schools within the same district in the sample.
      • Second, the categories of the nominal scale must be independent, mutually exclusive, and exhaustive. 카테고리가 독립적이고, 상호배타적이며, 완결성을 지녀야 한다.
        • Suppose the goal of an analysis was to code the kinds of courses offered at a particular school. Now suppose that a coding scheme was devised that had five classification groups: mathematics, science, literature, biology, and calculus. The categories on the scale would no longer be independent or mutually exclusive because whenever a biology course is encountered it also would be coded as a science course. Similarly, a calculus would always be coded into two categories as well, calculus and mathematics. Finally, the five categories listed are not mutually exhaustive of all of the different types of courses that are likely to be offered at a school. For example, a foreign language course could not be adequately described by any of the five categories.
      • The third assumption when using kappa is that the raters are operating independently. 평가자들이 독립적으로 평가를 해야 한다.
        • In other words, two raters should not be working together to come to a consensus about what rating they will give.


Validity

연구방법론은 연구질문에 따라서 선택되어져야 한다. 따라서 한 분석적 접근법에 기반하여 어떤 자료를 분석한 결과가 타당한가에 대해서 논할 때는 여러 정보를 함께 사용해야 한다. 이것이 가능하지 않다면 연구자는 타당성을 검증할 수 있는 방법을 연구에 같이 설계해야 한다. 질적연구에서 타당성 검증은 '삼각측량'의 형태로 이뤄진다. 삼각측량은 다양한 자료원, 방법, 연구자, 이론을 통해서 관찰결과에 타당성을 부여한다. 

It is important to recognize that a methodology is always employed in the service of a research question. As such, validation of the inferences made on the basis of data from one analytic approach demands the use of multiple sources of information. If at all possible, the researcher should try to have some sort of validation study built into the design. In qualitative research, validation takes the form of triangulation. Triangulation lends credibility to the findings by incorporating multiple sources of data, methods, investigators, or theories (Erlandson, Harris, Skipper, & Allen, 1993).


예컨대, 이 Mission statement 연구에서 연구질문은 각 기관의 관점에서 학교의 목적을 밝히려는 것이었다. 내용분석을 통한 결과를 cross-validate (교차검증)하기 위해서 교장과 의사결정자를 인터뷰하여 mission statement에서 중점을 두는 가치를 이후 선생님을 추가로 고용할 때 얼마나 중요시하는가를 물어볼 수 있다. 또 다른 방법은 학생들과 교사들을 대상으로 mission statement에 대한 설문조사를 하여, 학교의 목표에 대한 인지도를 확인해볼 수 있다. 세 번째 옵션은 Mission statement에서 언급된 이상향들이 실제 교실 현장에 반영되는지 확인할 필요가 있다.

For example, in the mission statements project, the research question was aimed at discovering the purpose of school from the perspective of the institution. In order to cross-validate the findings from a content analysis, schoolmasters and those making hiring decisions could be interviewed about the emphasis placed upon the school's mission statement when hiring prospective teachers to get a sense of the extent to which a school뭩 values are truly reflected by mission statements. Another way to validate the inferences would be to survey students and teachers regarding the mission statement to see the level of awareness of the aims of the school. A third option would be to take a look at the degree to which the ideals mentioned in the mission statement are being implemented in the classrooms.


Shapiro와 Markoff는 내용분석 이 자체는 그 결과가 다른 측정치와 관계가 있을 때에만 의미가 있다고 주장하기도 한다. 이러한 관점에서 평균적인 학생의 학업성취도와 학업적 성과를 얼마나 강조하는지에 대한 관계가 연구의 타당성을 더 높요질 수 있다. 

Shapiro & Markoff (1997) assert that content analysis itself is only valid and meaningful to the extent that the results are related to other measures. From this perspective, an exploration of the relationship between average student achievement on cognitive measures and the emphasis on cognitive outcomes stated across school mission statements would enhance the validity of the findings. For further discussions related to the validity of content analysis see Roberts (1997), Erlandson et al. (1993), and Denzin & Lincoln (1994).


Conclusion


When used properly, content analysis is a powerful data reduction technique. Its major benefit comes from the fact that it is a systematic, replicable technique for compressing many words of text into fewer content categories based on explicit rules of coding. It has the attractive features of being unobtrusive, and being useful in dealing with large volumes of data. The technique of content analysis extends far beyond simple word frequency counts. Many limitations of word counts have been discussed and methods of extending content analysis to enhance the utility of the analysis have been addressed. Two fatal flaws that destroy the utility of a content analysis are faulty definitions of categories and non-mutually exclusive and exhaustive categories.





Stemler, Steve (2001). An overview of content analysis. Practical Assessment, Research & Evaluation, 7(17). Retrieved November 11, 2014 from http://PAREonline.net/getvn.asp?v=7&n=17 . This paper has been viewed 384,239 times since 6/7/2001.

An Overview of Content Analysis

Steve Stemler

Yale University

+ Recent posts