ارتباطات و ژورناليزم Communication and Journalism: Media Research: Content analysis

Content analysis is a method for summarizing any form of content by counting various aspects of the content. This enables a more objective evaluation than comparing content based on the impressions of a listener. For example, an impressionistic summary of a TV program, is not content analysis. Nor is a book review: it’s an evaluation.

Content analysis, though it often analyses written words, is a quantitative method. The results of content analysis are numbers and percentages. After doing a content analysis, you might make a statement such as "27% of programs on Radio Lukole in April 2003 mentioned at least one aspect of peacebuilding, compared with only 3% of the programs in 2001."

Though it may seem crude and simplistic to make such statements, the counting serves two purposes:

to remove much of the subjectivity from summaries

to simplify the detection of trends.

Also, the fact that programs have been counted implies that somebody has listened to every program on the station: content analysis is always thorough.

As you’ll see below, content analysis can actually be a lot more subtle than the above example. There’s plenty of scope for human judgement in assigning relevance to content.

1. What is content?

The content that is analysed can be in any form to begin with, but is often converted into written words before it is analysed. The original source can be printed publications, broadcast programs, other recordings, the internet, or live situations. All this content is something that people have created. You can’t do content analysis of (say) the weather - but if somebody writes a report predicting the weather, you can do a content analysis of that.

All this is content...

Print media

Newspaper items, magazine articles, books, catalogues

Other writings

Web pages, advertisements, billboards, posters, graffiti

Broadcast media

Radio programs, news items, TV programs

Other recordings

Photos, drawings, videos, films, music

Live situations

Speeches, interviews, plays, concerts

Observations

Gestures, rooms, products in shops

Media content and audience content

That’s one way of looking at content. Another way is to divide content into two types: media content and audience content. Just about everything in the above list is media content. But when you get feedback from audience members, that’s audience content. Audience content can be either private or public. Private audience content includes:

open-ended questions in surveys

interview transcripts

group discussions.

Public audience content comes from communication between all the audience members, such as:

letters to the editor

postings to an online discussion forum

listeners’ responses in talkback radio.

The analysis of private audience content, in verbal form, is covered in chapter 12, on depth interviews. Therefore this chapter will focus mainly on public audience content and on media content.

Why do content analysis?

If you’re also doing audience research, the main reason for also doing content analysis is to be able to make links between causes (e.g. program content) and effect (e.g. audience size). If you do an audience survey, but you don’t systematically relate the survey findings to your program output, you won’t know why your audience might have increased or decreased. You might guess, when the survey results first appear, but a thorough content analysis is much better than a guess.

For a media organization, the main purpose of content analysis is to evaluate and improve its programming. All media organizations are trying to achieve some purpose. For commercial media, the purpose is simple: to make money, and survive. For public and community-owned media, there are usually several purposes, sometimes conflicting - but each individual program tends to have one main purpose.

As a simple commercial example, the purpose of an advertisement is to promote the use of the product it is advertising: first by increasing awareness, then by increasing sales. The purpose of a documentary on AIDS in southern Africa might be to increase awareness of ways of preventing AIDS, and in the end to reduce the level of AIDS. Often, as this example has shown, there is not a single purpose, but a chain of them, with each step leading to the next.

Using audience research to evaluate the effects (or outcome) of a media project is the second half of the process. The first half is to measure the causes (or inputs) - and that is done by content analysis. For example, in the 1970s a lot of research was done on the effects of broadcasting violence on TV. If people saw crimes committed on TV, did that make them more likely to commit crimes? In this case, the effects were crime rates, often measured from police statistics. The problem was to link the effects to the possible causes. The question was not simply "does seeing crime on TV make people commit crimes?" but "What types of crime on TV (if any) make what types of people (if any) commit crimes, in what situations?" UNESCO in the 1970s produced a report summarizing about 3,000 separate studies of this issue - and most of those studies used some form of content analysis.

When you study causes and effects, as in the above example, you can see how content analysis differs from audience research:

content analysis uncovers causes

audience research uncovers effects.

The entire process - linking causes to effects, is known as evaluation.

The process of content analysis

Content analysis has six main stages, each described by one section of this chapter:

Selecting content for analysis (section 2)

Units of content (section 3)

Preparing content for coding (section 4)

Coding the content (section 5)

Counting and weighting (section 7)

Drawing conclusions (section 8).

Section 6 gives some examples, illustrating the wide range of uses for content analysis.

2. Selecting content for analysis

Content is huge: the world contains a near-infinite amount of content. It’s rare that an area of interest has so little content that you can analyse it all. Even when you do analyse the whole of something (e.g. all the pictures in one issue of a magazine) you will usually want to generalize those findings to a broader context (such as all the issues of that magazine). In other words, you are hoping that the issue you selected is a representative sample. Like audience research, content analysis involves sampling, as explained in chapter 2. But with content analysis, you’re sampling content, not people. The body of information you draw the sample from is often called a corpus – Latin for body.

Deciding sample size

Unless you want to look at very fine distinctions, you don’t need a huge sample. The same principles apply for content analysis as for surveys: most of the time, a sample between 100 and 2000 items is enough - as long as it is fully representative. For radio and TV, the easiest way to sample is by time. How would you sample programs during a month? With 30 days, you might decide on a sample of 120. Programs vary greatly in length, so use quarter-hours instead. That’s 4 quarter-hours each day for a month. Obviously you need to vary the time periods to make sure that all times of day are covered. An easy way to do this, assuming you’re on air from 6 am to midnight, is to make a sampling plan like this:

Day

Quarter-hours beginning

1 0600 1030 1500 1930

2 0615 1045 1515 1945

3 0630 1100 1530 2000

and so on. After 18 days you’ll have covered all quarter-hours. After 30 days you’ll have covered most of them twice. If that might introduce some bias, you could keep sampling for another 6 days, to finish two cycles. Alternatively, you could use an 18-minute period instead of 15, and still finish in 30 days.

With print media, the same principles apply, but it doesn’t make sense to base the sample on time of day. Instead, use page and column numbers. Actually, it’s a lot easier with print media, because you don’t need to organize somebody (or program a computer) to record the on-air program at regular intervals.

The need for a focus

When you set out to do content analysis, the first thing to acknowledge is that it’s impossible to be comprehensive. No matter how hard you try, you can’t analyse content in all possible ways. I’ll demonstrate, with an example. Let’s say that you manage a radio station. It’s on air for 18 hours a day, and no one person seems to know exactly what is broadcast on each program. So you decide that during April all programs will be taped. Then you will listen to the tapes and do a content analysis.

First problem: 18 hours a day, for 30 days, is 540 hours. If you work a 40-hour week, it will take almost 14 weeks to play the tapes back. But that’s only listening - without pausing for content analysis! So instead, you get the tapes transcribed. Most people speak about 8,000 words per hour. Thus your transcript has up to 4 million words – about 40 books the size of this one.

Now the content analysis can begin! You make a detailed analysis: hundreds of pages of tables and summaries. When you’ve finished (a year later?) somebody asks you a simple question, such as "What percentage of the time are women’s voices heard on this station?"

If you haven’t anticipated that question, you’ll have to go back to the transcript and laboriously calculate the answer. You find that the sex of the speaker hasn’t always been recorded. You make an estimate (only a few days’ work, if you’re lucky) then you’re asked a follow-up question, such as "How much of that time is speech, and how much is singing?"

Oops! The transcriber didn’t bother to include the lyrics of the songs broadcast. Now you’ll have to go back and listen to all those tapes again!

This example shows the importance of knowing what you’re looking for when you do content analysis. Forget about trying to cover everything, because (a) there’s too much content around, and (b) it can be analysed in an infinite number of ways. Without having a clear focus, you can waste a lot of time analysing unimportant aspects of content. The focus needs to be clearly defined before you begin work.

An example of a focus is: "We’ll do a content analysis of a sample of programs (including networked programs, and songs) broadcast on Radio Lukole in April 2003, with a focus on describing conflict and the way it is managed."

3. Units of content

To be able to count content, your corpus needs to be divided into a number of units, roughly similar in size. There’s no limit to the number of units in a corpus, but in general the larger the unit, the fewer units you need. If the units you are counting vary greatly in length, and if you are looking for the presence of some theme, a long unit will have a greater chance of including that theme than will a short unit. If the longest units are many times the size of the shortest, you may need to change the unit - perhaps "per thousand words" instead of "per web page." If the interviews vary greatly in length, a time-based unit may be more appropriate than "per interview."

Units of media content

Depending on the size of your basic unit, you’ll need to take a different approach to coding. The main options are (from shortest to longest):

A word or phrase. If you are studying the use of language, words are an appropriate unit (perhaps can also group synonyms together, and include phrases). Though a corpus may have thousands of words, software can count them automatically.

A paragraph, statement, or conversational turn: up to a few hundred words.

An article. This might be anything from a short newspaper item to a magazine article or web page: usually between a few hundred and a few thousand words.

A large document. This can be a book, an episode of a TV program, or a transcript of a long radio talk.

The longer the unit, the more difficult and subjective is the work of coding it as a whole. Consider breaking a document into smaller units, and coding each small unit separately. However, if it’s necessary to be able to link different parts of the document together, this won’t make sense.

Units of audience content

When you are analysing audience content (not media content) the unit will normally be based on the data collection format and/or the software used to store the responses. The types of audience content most commonly produced from research data are

Open-ended responses to a question in a survey (usually all on one large file).

Statements produced by consensus groups (often on one small file).

Comments from in-depth interviews or group discussions. (Usually a large text file from each interview or group.)

In any of these cases, the unit can be either a person or a comment. Survey analysis is always based on individuals, but content analysis is usually based on comments. Most of the time this difference doesn’t affect the findings, but if some people make far more comments than others, and these two groups give different kinds of comments, it will be best to use individuals as the unit.

Large units are harder to analyse

Usually the corpus is a set of the basic units: for example, a set of 13 episodes in a TV series, an 85-message discussion on an email listserv over several months, 500 respondents’ answers to a survey question - and so on. What varies is (a) the number of units in the corpus, and (b) the size of the units.

Differences in these figures will require different approaches to content analysis. If you are studying the use of language, focusing on the usage of new words, you will need to use a large corpus - a million words or so - but the size of the unit you are studying is tiny: just a single word. The word frequencies can easily be compared using software such as Wordsmith.

At the other extreme, a literary scholar might be studying the influence of one writer on another. The unit might be a whole play, but the number of units might be quite small - perhaps the 38 plays of Shakespeare compared with the 7 plays of Marlowe. If the unit is a whole play, and the focus is the literary style, a lot of human judgement will be needed. Though the total size of the corpus could be much the same as with the previous example, far more work is needed when the content unit is large - because detailed judgements will have to be made to summarize each play.

Dealing with several units at once

Often, some units overlap other units. For example, if you ask viewers of a TV program what they like most about it, some will give one response, and others may give a dozen. Is your unit the person or the response? (Our experience: it’s best to keep track of both types of unit, because you won’t know till later whether using one type of unit will produce a different pattern of responses.)

4. Preparing content for coding

Before content analysis can begin, it needs to be preserved in a form that can be analysed. For print media, the internet, and mail surveys (which are already in written form) no transcription is needed. However, radio and TV programs, as well as recorded interviews and group discussions, are often transcribed before the content analysis can begin.

Full transcription – that is, conversion into written words, normally into a computer file – is slow and expensive. Though it’s sometimes necessary, full transcription is often avoidable, without affecting the quality of the analysis. A substitute for transcription is what I call content interviewing (explained below).

When content analysis is focusing on visual aspects of a TV program, an alternative to transcription is to take photos of the TV screen during the program, or to take a sample of frames from a video recording. For example, if you take a frame every 15 seconds from a 25-minute TV program, you will have 100 screenshots. These could be used for a content analysis of what is visible on the screen. For a discussion program this would not be useful, because most photos would be almost identical, but for programs with strong visual aspects – e.g. most wildlife programs – a set of photos can be a good substitute for a written transcript. However this depends on the purpose of the content analysis.

It’s not possible to accurately analyse live radio and TV programs, because there’s no time to re-check anything. While you’re taking notes, you’re likely to miss something important. Therefore radio and TV programs need to be recorded before they can be content-analysed.

Transcribing recorded speech

If you ’ve never tried to transcribe an interview by writing out the spoken words, you’d probably don ’t think there’s anything subjective about it. But as soon as you start transcribing, you realize that there are many styles, and many choices within each style. What people say is often not what they intend. They leave out words, use the wrong word, stutter, pause, and correct themselves mid-sentence. At times the voices are inaudible. Do you then guess, or leave a blank? Should you add "stage directions" - that the speaker shouted or whispered, or somebody else was laughing in the background?

Ask three or four people (without giving them detailed instructions) to transcribe the same tape of speech, and you’ll see surprising differences. Even when transcribing a TV or radio program, with a professional announcer reading from a script, the tone of voice can change the intended meaning.

The main principle that emerges from this is that you need to write clear instructions for transcription, and ensure that all transcribers (if there is more than one) closely follow those instructions. It’s useful to have all transcribers begin by transcribing the same text for about 30 minutes. They then stop and compare the transcriptions. If there are obvious differences, they then repeat the process, and again compare the transcriptions. After a few hours, they are coordinated.

It generally takes a skilled typist, using a transcription recorder with a foot-pedal control, about a day’s work to transcribe an hour or two of speech. If a lot of people are speaking at once on the tape, and the transcriber is using an ordinary cassette player, and the microphone used was of low quality, the transcription can easily take 10 times as long as the original speech.

Another possibility is to use speech-recognition software, but unless the speaker is exceptionally clear (e.g. a radio announcer) a lot of manual correction is usually needed, and not much time is saved.

Partial transcription

Transcribing speech is very slow, and therefore expensive. An alternative that we (at Audience Dialogue) often use is to make a summary, instead of a full transcription. We play back the recording and write what is being discussed during each minute or so. The summary transcript might look like this:

0’ 0" Moderator introduces herself

1’ 25" Each participant asked to introduce self

1’ 50" James (M, age about 25)

2’ 32" Mary (F, 30?, in wheelchair)

4’ 06" Ayesha (F, 40ish)

4’ 55" Markus (M, about 50) - wouldn’t give details

5’ 11" * Grace (F, 38)

6’ 18" Lee (M, 25-30, Java programmer)

7’ 43" Everybody asked to add an agenda item

7’ 58" ** James - reasons for choosing this ISP

This takes little more time than the original recording took: about an hour and a half for a one-hour discussion. The transcriber uses asterisks: * means "this might be relevant" and ** means "very relevant." These marked sections can be listened to again later, and transcribed fully.

If you are testing some particular hypothesis, much of the content will be irrelevant, so it is a waste of time to transcribe everything. Another advantage of making a summary like that above is that it clearly shows the topics that participants spent a lot of time discussing.

An important note: if you record the times as above, using a tape recorder, make sure the tape is rewound every time you begin listening to it and the counter is reset to zero, otherwise the counter positions won’t be found.

What form does the content take?

Content is often transformed into written form before content analysis begins. For example, if you are doing a content analysis of a photo exhibition, the analysis will probably not be of uninterpreted visual shapes and colours. Instead, it might be about the topics of the photos (from coders ’ descriptions, or a written catalogue), or it might be about visitors’ reactions to the photos. Perhaps visitors’ comments were recorded on tape, then transcribed. For most purposes, this transcription could be the corpus for the content analysis, but analysis with an acoustic focus might also want to consider how loud the visitors were speaking, at what pitch, and so on.

Conversion into computer-readable form

If your source is print media, and you want a text file of the content (so that you can analyse it using software) a quick solution is to scan the text with OCR software. Even a cheap scanner works very well with printed text, using basic OCR software, supplied free with many scanners. Unless the text is small and fuzzy (e.g. on cheap newsprint) only a few corrections are usually needed per page.

If the content you are analysing is on a web page, email, or word processing document, the task is easier still. But to analyse this data with most text-analysis software, you will first need to save the content as a text file, eliminating HTML tags and other formatting that is not part of the content. First save the web page, then open it with a word processing program, and finally save it as a text file.

Live coding

If your purpose in the content analysis is very clear and simple, an alternative to transcription is live coding. For this, the coders play back the tape of the radio or TV program or interview, listen for perhaps a minute at a time, then stop the tape and code the minute they just heard. This works best when several coders are working together. It is too difficult for beginners at coding, but for experienced coders it avoids the bother of transcription. Sometimes, content analysis has a subtle purpose, and a transcript doesn’t give the information you need. That’s when live coding is most useful: for example, a study of the tone of voice that actors in a drama use when speaking to people of different ages and sexes.

Analysing secondary data

Sometimes the corpus for a content analysis is produced specifically for the study - or at least, the transcription is made for that purpose. That’s primary data. But in other instances, the content has already been transcribed (or even coded) for another purpose. That’s secondary data. Though secondary data can save you a lot of work, it may not be entirely suitable for your purpose.

This section applies to content that was produced for some other purpose, and is now being analysed. Content created for different purposes, or different audiences, is likely to have different emphases. In different circumstances, and in different roles, people are likely to give very different responses. The expectations produced by a different role, or a different situation, are known as demand characteristics.

When you find a corpus that might be reusable, you need to ask it some questions, like:

Who produced this, for what audience, in what context?

It’s often misleading to look only at the content itself - the content makes full sense only in its original context. The context is an unspoken part of the content, but is often more important than the text of the content.

Why was it produced?

Content that was prepared to support a specific cause is going to be more biased than content that was prepared for general information.

It’s safe to assume that all content is biased in some way. For example, content that is produced by or for a trade union is likely to be very different in some ways) from content produced by an employer group. But in other ways the two sets of content will share many similarities - because they are likely to discuss the same kinds of issues. Content produced by an advertiser or consumer group, ostensibly on that same topic, is likely to have a very different emphasis. That emphasis is very much part of the content - even if this is not stated explicitly.

How old is this corpus?

Is it still valid for your current purpose? It’s tempting to use a corpus that’s already prepared, but it may no longer be relevant.

5. Coding content

Coding in content analysis is the same as coding answers in a survey: summarizing responses into groups, reducing the number of different responses to make comparisons easier. Thus you need to be able to sort concepts into groups, so that in each group the concepts are both

as similar as possible to each other, and

as different as possible from concepts in every other group.

Does that seem puzzling? Read on: the examples below will make it clearer.

Another issue is the stage at which the coding is done. In market research organizations, open-ended questions are usually coded before the data entry stage. The computer file of results has only the coded data, not the original verbatim answer. This makes life easier for the survey analysts - for example, to have respondents’ occupations classified in standard groups, rather than many slightly varying answers. However, it also means that some subtle data is lost, unless the analyst has some reason to read the original questionnaires. For occupation data, the difference between, say "clerical assistant" and "office assistant" may be trivial (unless that is the subject of the survey). But for questions beginning with "why," coding usually over-simplifies the reality. In such cases it’s better to copy the verbatim answers into a computer file, and group them later.

The same applies with content analysis. Coding is necessary to reduce the data to a manageable mass, but any piece of text can be coded in many different ways. It’s therefore important to be able to check the coding easily, by seeing the text and codes on the same sheet of paper, or the same computer screen.

Single coding and multi-coding

It’s usual in survey analysis to give only one code to each open-ended answer. For example, if a respondent’s occupation is "office assistant" and the coding frame was this ...

Professionals and managers = 1

Other white collar = 2

Skilled blue-collar = 3

Unskilled blue-collar = 4

... an office assistant would be coded as group 2. But multiple coding would also be possible. In that case, occupations would be divided in several different "questions," such as

Question 1: Skill level

Professional or skilled = 1

Unskilled = 2

Question 2: Work environment

Office / white collar = 1

Manual / blue collar = 2

An office assistant might be classified as 2 on skill level and 1 on work environment.

If you are dealing with transcripts of in-depth interviews or group discussions, the software normally used for this purpose (such as Nud*ist or Atlas) encourages multiple coding. The software used for survey analysis doesn’t actually discourage multiple coding, but most people don’t think of using it. My suggestion is to use multiple coding whenever possible - unless you are very, very certain about what you are trying to find in a content analysis (as when you’ve done the same study every month for the least year). As you’ll see in the example below, multiple coding lets you view the content in more depth, and can be less work than single coding.

Coding frames

A coding frame is just a set of groups into which comments (or answers to a question) can be divided – e.g. the occupation categories shown above. In principle, this is easy. Simply think of all possible categories for a certain topic. In practice, of course, this can be very difficult, except when the topic is limited in its scope - as with a list of occupation types. As that’s not common in content analysis, the usual way of building a coding frame is to take a subset of the data, and to generate the coding frame from that.

An easy way to do this is to create a word processing file, and type in (or copy from another file) about 100 verbatim comments from the content being analysed. If you leave a blank line above and below each comment, and format the file in several columns, you can then print out the comments, cut up the printout into lots of small pieces of paper, and rearrange the pieces on a table so that the most similar ones are together. This sounds primitive, but it’s much faster than trying to do the same thing using only a computer.

When similar codes are grouped together, they should be given a label. You can create either conceptual labels (based a theory you are testing), or in vivo labels (based on vivid terms in respondents’ own words).

How large should a coding frame be?

A coding frame for content analysis normally has between about 10 and 100 categories. With fewer than 10 categories, you risk grouping dissimilar answers together, simply because the coding frame doesn’t allow them to be separated. But with more than 100 categories, some will seem very similar, and there’s a risk that two near-identical answers will be placed in different categories. If it’s important to have a lot of categories, consider using hierarchical coding.

Hierarchical coding

This is also known as tree coding, with major groups (branches) and sub-groups (twigs). Each major group is divided into a number of sub-groups, and each subgroup can then be divided further, if necessary. This method can produce unlimited coding possibilities, but sometimes it is not possible to create an unambiguous tree structure – for example, when the codes are very abstract.

As an example, a few years ago I worked on a study of news and current affairs items for a broadcasting network. We created a list of 122 possible topics for news items, then divided these topics into 12 main groups:

Crime and justice

Education

Environment

Finance

Government and politics

Health

International events and trends

Leisure activities and sport

Media and entertainment

Science and technology

Social issues

Work and industry

This coding frame was used for both a survey and a content analysis. We invited the survey respondents to write in any categories that we’d forgotten to include, but our preliminary work in setting up the structure had been thorough, and only a few minor changes were needed.

Because setting up a clear tree-like structure can take a long time, don’t use this method if you’re in a hurry – a badly-formed tree causes problems when sub-groups are combined for the analysis. (The Content Analysis section at the end of chapter 12 has practical details of tree coding.)

Using an existing coding frame

You don’t always need to create a coding frame from scratch. If you know that somebody has done the same type of content analysis as you are doing, there are several advantages to using an existing coding frame. It not only saves all the time it takes to develop a coding frame, but also will enable you to compare your own results with those of the earlier study. Even if your study has a slightly different focus, you can begin with an existing coding frame and modify it to suit your focus. If you’d like a coding frame for news and current affairs topics, feel free to use (or adapt) my one above. Government census bureaus use standard coding frames, particularly for economic data - such as ISCO: the International Standard Classification of Occupations. Other specialized coding frames can be found on the Web. For example, CAMEO and KEDS/TABARI are used for coding conflict in news bulletins.

The importance of consistency in coding

When you are coding verbatim responses, you’re always making borderline decisions. "Should this answer be category 31 or 73?" To maintain consistency, I suggest taking these steps:

Have a detailed list showing each code and the reasons for choosing it. When you update it, make sure that each coder gets a copy of the new version.

Use the minimum number of people to do the coding. The consistency is greatest if one person does it all. Next best is to have two people, working in the same room.

Keep a record of each borderline decision, and the reasons why you decided a particular category was the most appropriate.

Have each coder re-code some of each other coder’s work. A 10% sample is usually enough, but if this uncovers a lot of inconsistency, re-code some more.

If you have created a coding frame based on a small sample of the units, you often find some exceptions after coding more of the content. At that point, you may realize your coding frame wasn’t detailed enough to cover all the units. So what do you do now?

Usually, you add some new codes, then go back and review all the units you’ve coded already, to see if they include the new codes. So it helps if you have already noted the unit numbers of any units where the first set of codes didn’t exactly apply. You can then go straight back to those units (if you’re storing them in numerical order) and review the codes. A good time to do this review is when you’ve coded about a quarter of the total units, or about 200 units - whichever is less. After 200-odd units, new codes rarely need to be added. It’s usually safe to code most late exceptions as "other" - apart from any important new concepts.

This works best if all the content is mixed up before you begin the coding. For example, if you are comparing news content from two TV stations, and if your initial coding frame is based only on one channel, you may have to add a lot of new categories when you start coding items from the second channel. For that reason, the initial sample you use to create the coding frame should include units of as many different types as possible. Alternatively, you could sort all the content units into random order before coding, but that would make it much harder for the coders to see patterns in the original order of the data.

Questioning the content

When analysing media content (even in a visual form - such as a TV program) it’s possible to skip the transcription, and go straight to coding. This is done by describing the visual aspects in a way that’s relevant to the purpose of the analysis.

For example, if you were studying physical violence on TV drama programs, you’d focus on each violent action and record information about it. This is content interviewing: interviewing a unit of content as if it were a person, asking it "questions," and recording the "answers" you find in the content unit.

What you can’t do, of course, when interviewing the content is to probe the response to enable the coding to be more accurate. A real interview respondent, when asked "Why did you say that?" will answer - but media content can’t do that.

When you’re interviewing content, it’s good practice to create a short questionnaire, and fill in a copy for each content unit. This helps avoid errors and inconsistencies in coding. The time you save in the end will compensate for the extra paper used. The questionnaires are processed with standard survey software.

The main disadvantage of content interviewing is that you can’t easily check a code by looking at the content that produced it. This helps greatly in increasing the accuracy and consistency of the analysis. But when there’s no transcript (as is usual when interviewing content) you can check a code only by finding the recording of that unit, and playing it back. In the olden days (the 20th century) content analysts had a lot of cassette tapes to manage. It was important to number them, note the counter positions, make an index of tapes, and store them in order. Now, in the 21st century, computers are faster, and will store a lot more data. I suggest storing sound and video recordings for content analysis on hard disk or CD-ROM, with each content unit as a separate file. You can then flick back and forth between the coding software and the playback software. This is a great time-saver.

Overlapping codes

When the content you are analysing has large units of text – e.g a long interview, this can be difficult to code. A common problem is that codes overlap. The interviewee may be talking about a particular issue (given a particular code) for several minutes. In the middle of that, there may be a reference to something else, which should be given a different kind of code. The more abstract your coding categories, the more likely you are to encounter this problem. If it’s important to capture and analyse these overlapping codes, there are two solutions:

High-tech:

Use software specially designed for this purpose, such as Nud*ist or Atlas TI. Both are powerful, but not cheap – and it takes months to learn to use them well.

Low-tech:

Cutting up transcripts and sorting the pieces of paper – as explained above. Disadvantages: if interrupted, you easily lose track of where you’re up to – and never do this in a windy place!

My suggestion: unless you’re really dedicated, avoid this type of content analysis. Such work is done mainly by academics (because they have the time) but not by commercial researchers, because the usefulness of the results seldom justifies the expense.

Content analysis without coding

Coding is another form of summarizing. If you want to summarize some media content (the usual reason for doing content analysis) one option is to summarize the content at a late stage, instead of the usual method of summarizing it at an early stage.

If your content units are very small (such as individual words) there’s software that can count words or phrases. In this case, no coding is needed, and the software does the counting for you, but you still need to summarize the results. This means a lot less work near the beginning of the project, and a little more at the end. If you don’t do coding, there’s much less work.

Unfortunately, software can’t draw useful conclusions. Maybe in 10 years the software will be much cleverer, but at the moment there’s no substitute for human judgement - and that takes a lot of time. Even so, if your units are not too large, and all the content is available as a computer file, you can save time by delaying the coding till a later stage than usual. The time is saved when similar content is grouped, and a lot of units can all be coded at once.

For example, if you were studying conflict, you could use software such as NVivo to find all units that mentioned "conflict" and a list of synonyms. It would then be quite fast to go through all these units and sort them into different codes.

Using judges

A common way to overcome coding problems is to appoint a small group of "judges" and average their views on subjective matters. Though it’s easy to be precise about minor points (e.g. "the word Violence was spoken 29 times"), the more general your analysis, the more subjective it becomes (e.g. the concept of violence as generally understood by the audience).

Use judges when there are likely to be disagreements on the coding. This will be when any of these conditions applies:

units are large (e.g. a whole TV program instead of one sentence)

you are coding vague concepts, such as "sustainability" or "globalization"

you are coding nonverbal items, such as pictures, sounds, and gestures

your findings are likely to be published, then challenged by others.

The more strongly these conditions apply, the more judges you need. Unless you are being incredibly finicky (or the project has very generous funding!) 3 judges is often enough, and 10 is about the most you will ever need. The more specific the coding instructions, the fewer the judges you will need. If you only have one person coding each question, he or she is then called a "coder" not a "judge" - though the work is the same.

Any items on which the judges disagree significantly should be discussed later by all judges and revised. Large differences usually result from misunderstanding or different interpretations.

Maybe you are wondering how many judges it takes before you’re doing a survey, not content analysis. 30 judges? 100?

Actually, it doesn’t work like that. Judges should be trained to be objective: they are trying to describe the content, not give their opinions. All judges should agree as closely as possible. If there’s a lot of disagreement among the judges, it usually means their instructions weren’t clear, and need to be rewritten.

With a survey, respondents are unconstrained in their opinions. You want to find out their real opinions, so it makes no sense to "train" respondents. That’s the difference between judging content and doing a survey. However if you’re planning to do a content analysis that uses both large units and imprecise definitions, maybe you should consider doing a survey instead (or also).

7. Software for content analysis

When all the preparation for content analysis has been done, the counting is usually the quickest part - specially if all the data is on a computer file, and software is used for the counting.

Software is an important tool for content analysis, but this page has mentioned it only briefly. Because software and links to it are constantly changing, we have a separate page on content analysis software. For the type of content analysis that you can use to analyse long interviews, our page on qualitative software also has some useful references, covering software such as Nud*ist and Atlas TI.

8. Coming to conclusions

An important part of any content analysis is to study the content that is not there: what was not said. This sounds impossible, doesn’t it? How can you study content that’s not there? Actually, it’s not hard, because there’s always an implicit comparison. The content you found in the analysis can be compared with the content that you (or the audience) expected - or it can be compared with another set of content.

It’s when you compare two corpora (plural of corpus) that content analysis becomes most useful. This can be done either by doing two content analyses at once (using different corpora but the same principles) or comparing your own content analysis with one that somebody else has done. If the same coding frame is used for both, it makes the comparison much simpler.

Comparisons can be:

chronological (e.g. this year’s content compared with last)

geographical (your content analysis compared with a similar one in another area)

media-based (e.g. comparing TV and newspaper news coverage)

program content vs. audience preferences

...and so on. Making a comparison between two matched sets of data will often produce very interesting results.

There’s no need to limit comparisons to two corpora: any number of content analyses can be compared, as long as they used the same principles. With two or three comparisons, results are usually very clear, and with 10 or more, a few corpora usually stand out as different - but with about 4 to 9 comparisons, comparisons can become rather messy.

These comparisons are usually made using cross-tabulation (cross-tabs) and statistical significance testing. A common problem is that unless the sample sizes were huge, there are often too few entries in each cell to make statistical sense of the data. Though it’s always possible to group similar categories together - and easy, if you were using a hierarchical coding frame - the sharpness of comparison can be lost, and the results, though statistically significant, may not have a much practical meaning.

Reporting on a content analysis

Ten people could analyse the same set of content, and arrive at completely different conclusions. For example, the same set of TV programs analysed to find violent content could also be analysed to study the way in which conversations are presented on TV. The focus of the study controls what you notice about the content. Therefore any content analysis report should begin with a clear explanation of the focus – what was included, what was excluded, and why.

No matter how many variables you use in a content analysis, you can’t guarantee that you haven’t omitted an important aspect of the data. Statistical summaries are often highly unsatisfying, specially if you’re not testing a simple hypothesis. If readers don’t understand how you’ve summarized the content, they will be unlikely to accept your conclusions.

Therefore it’s important to select and present some key units of content in the report. This can be done when you describe the coding frame. For each code, find and reproduce a highly typical (and real) example. This is easier when units are short (e.g. radio news items). When units are long (such as whole TV programs) you will need to summarize them, and omit aspects that are important to some people. If you used judges (who will have read through the content already) ask them to select typical examples. If the judges achieve a consensus on this, it will add credibility to your findings.

Another approach is to cite critical cases: examples that only just fitted one code rather than another - together with arguments on why you chose the code you did. This helps readers understand your analysis, as well as making the data more vivid.

Explain your coding principles

When you read a content analysis report, the process can seem forbiddingly objective - something like "37.3% of the references to Company A were highly favourable, and only 6.4% highly unfavourable." However, this seeming objectivity can actually be very subjective. For example, who decides whether a reference is "highly favourable" or just "favourable"? Different people may have different views on this - all equally valid. More difficult still, can you count a reference to Company A if it’s not mentioned by name, but by some degree of association? For example, "All petrochemical factories on the Port River are serious polluters." Does that count as a reference to Company A if its factory is on that river? And what if many of the audience may not know that fact?

The point is that precise-looking percentages are created from many small assumptions. A high-quality report will list all the assumptions made, give the reasons why those assumptions were made, and also discuss how varying some assumptions would affect the results.

Conclusion

Content analysis can produce quite trivial results, particularly when the units are small. The findings of content analysis become much more meaningful when units are large (e.g. whole TV or radio programs) and when those findings can be compared with audience research findings. Unfortunately, the larger the units, the more work the content analysis requires.

When used by itself, content analysis can seem a shallow technique. But it becomes much more useful when it’s done together with audience research. You will be in a position to make statements along the lines of "the audience want this, but they’re getting that." When backed by strong data, such statements are very difficult to disagree with.

ارتباطات و ژورناليزم Communication and Journalism

ارتباطات و ژورناليزم

۱۳۸۹ اسفند ۸, یکشنبه

Media Research: Content analysis

هیچ نظری موجود نیست: