Selections from Analyzing Streams of Language

Cheryl Geisler

From Chapter 3:  Segmenting the Data

Summary:  In this chapter, you will segment the data you have selected for analysis into  units appropriate for analysis. After learning about how the units characterize various kinds of verbal data, you will use one or more of these units to select and segment your data.

  • The unit of analysis is the level at which the phenomena of interest occurs.  So, a paragraph or a sentence would be a unit of analysis (29).  Yet, sometimes just because something is a unit doesn’t mean it is the best unit for analysis.
  • Basic unit of language 1:  Words – A difficult option as every word must be categorized if this is to be the selected segment of data.  These are often used for “selected analysis” or analysis that concentrates on one particular noun/theme (ex., “human agents”) (30).
  • T-Units:  a t-unit is the smallest group of words that can make a move in language.  As G. notes, “A t-unit consists of a principle clause and any subordinate clauses or nonclausal structures attached or embedded in it” (31).  Examples of t-units include:  descriptions, proposals, questions, evaluations, etc.[1. Patrick was counting t-units in his work on student discussion boards that you helped code.]
  • Clauses:  the smallest unit of language that can make a claim about an entity in the world.  A clause must contain a subject (entity) and a predicate (the claim being made about a subject).  t-units are comprised of an independent clause that also contains all dependent clauses.  Because the clause unit is directly related to “claims” that a speaker or writer makes about the world, the entire dependent and independent clause comprise a t-unit and the t-unit is likely the most useful unit of analysis for my own work (32).  The clause is a useful unit of analysis for phenomenological research in text.
  • Nominals – concerned with objects in the world.
  • Verbals – concerned with the actions in the world (or in our heads about the world).  Verbals are useful for tracking “schemas” of action:  computer use, courtship, piracy?, etc.  If your interest is in genre, then the predicate and the tense of the predicate are probably good units of analysis [2. This seems to be a possibly useful way to think about the framing of quotations in the citation project re:genre.  So, if you’re wanting to track generic conventions then coding for predicate and tense might be great ways to work with the data to demonstrate not just source use but source use in relation to generic convention through framing quotation language analysis.]
  • Indexicals – Indexicals anchor interactions to specific contexts in which they occur.  Essential indexicals include I, here, and now.
  • Prounouns – indexical units that point to the “world of interlocutors” (34).
  • Topical chains – According to G., “topical chains” allow participants to “understand their discourse as about something” (35).  These are usually identified by referentials that demonstrate that a line of thought/concept-development is being extended.  Words like it, they, he, she, this, that, the, and such are examples of referentials that signal a topical chain.  Topical chains are useful for tracing the conceptual complexity of discourse (depth of interaction, development, etc.).  The topical chain is super useful as it provides a way to do “selective analysis of discussions that concern a specific topic” to demonstrate interaction and conceptual development/extension (35).
  • Modals:  modals indicate the “attitude” or “stance” of the writer and communicate probability (she might go next week), advisability (she ought to go next week), or conditionality (she would have gone yesterday if) (35).  Also conveyed through auxiliary verbs like might, may, must, can, could, etc.  Modals are indicative of an individuals sense of obligation or certainty.  Often modals are great ways to track how speakers assess the confidence of their claims.
  • Response:  there are four possiblities for response by interlocutors in communication:  compliance (speaker takes up proposition), alteration (speaker proposes an alternative proposition), declination (speaker declines proposition), withdrawal (speaker withdraws from consideration of proposition (37-8).
  • Textual Unit 1:  Text – the text itself often provides PoV, quality of text, genre of text, implied audience of text, textual persuasiveness, authorial familiarity, textual importance, etc.
  • TU2:  Publication date – good for citational analysis or temporal analysis for trends/concept mappings.  Think the work you’ve done in Collin’s projects.
  • TU3:  Publication Venue – the forum is a unit of analysis.
  • TU4: Organization – organizational texts in technical communication often say a lot about the culture or the organization.  So, the institution/organization can be a unit of analysis.
  • Genres:  as typified responses to recurring rhetorical situations, genres can be a good mode of analysis to understand rhetorical moves, relationships to audience, reading patterns, and publication venues.
  • In the section “Shaping the Text,” G. goes in to what happens after you’ve selected your appropriate unit of analysis.  She claims that this involves three steps:  1) shaping the text in word; 2) moving the text from Word to Excel; and 3) labeling the text once in Excel.  (45).

From Chapter 4:  Coding the Data

Summary:  In this chapter, you will code the data you segmented in the previous chapter. After you devise an initial start list of codes, you will use an iterative process to move back and forth between sample data and coding scheme to develop a coding scheme that best tracks the phenomenon in which you are interested.

  • Each segment of data should be assigned only one code.  The coding scheme is the way that codes are governed/structured/framed.  The scheme should “articulate clearly the procedures you have used to code your data” (56).  It will demonstrate to future/secondary coders the directions on how to code and it lets your reader understand how you have defined your categories and assigned data to them.
  • The coding scheme should include a dimension as well as the coding categories.  They should also include the unit of analysis to which codes are applied (clauses in my case), a definition for each coding category, and the cases that the coding category includes.  Finally, the scheme should include examples of each for future coders.  Here’s an example of a good coding scheme:

  • G. notes that developing a coding scheme is an iterative process that involves moving back and forth between the developing code scheme and the sample of data that needs to be coded.  This results in grounded theory?  (58).
  • Selecting a sample:  you should maximize the amount of variation in the phenomenon being investigated in the sample.  This means that the sample should embody multiple contrasts.  You should select between 200 and 500 segments if you’re working with clauses, t-units, etc.  This will likely result in about 10 pages of text to be coded on the initial sample.
  • Generating the coding scheme:  In the first method you can begin with existing categories found in the anchoring literature for the topic you’re investigating (in my case, perspectives on copyright that you identified in your 2010 C&W presentation), your intuition from working with the data, and the built-in comparisons that are present in the data.  The second method allows you to look at the existing data set and let it speak to you.  This means that each segment will suggest a particular category to describe the kind of phenomenon you’re considering in your research.  This second method is “grounded theory” in that it is gleaned from the data itself. . . not the pre-existing literature that you’re looking at to frame the data (60).
  • What happens when you encounter a segment of data that doesn’t fit into the scheme?  You revise it!  This is the iterative process of developing a coding scheme that is grounded. [3. There are interseting resonances with the self-learning/automata of feedback loops in distributed cognition that might provide an interesting philosophical conversation about the intersections of developing coding schemes and emergent becoming.]
  • If you break apart a code into two or more codes, you must first edit the code book to remove the current category and replace it with new categories.  Second, you must revisit all the data thus coded to replace the wrong codes.  This means you must take care when splitting categories because the new codes need to be substantively different in order to matter.  When considering breaking up a code . . . you may want to instead ensure that the codes you’re using are representing like things (c.f., Tool vs. Structure vs. Activity).  This is where AT is particularly useful for explaining the work you’ll be doing.    They may all involve different codings.  Perhaps AT is even a theoretical orientation that could serve as the coding categories?!?  WOW. . . is that possible?
  • Selective Coding:  When you don’t code the entire clause but only a selection.  This seems problematic as the selection is somewhat motivated/political.

Chapter 5: Achieving Reliability

Summary:  In this chapter, you will revise your coding scheme with the help of a second person in order to achieve reliability. After you get a second coding of your data set, you will calculate the agreement between coders, using formulas for both simple and corrected agreement. You will then inspect the disagreements

between coders and refine your analytic procedures to reduce them. This process is repeated until an adequate level of agreement has been reached.

  • Reliability:  the degree of consistency with which instances are assigned to the same category by different coders.  There is a bit of a problem/conflict when considering reliability in the work of language meaning-making on account of the indeterminancy, contextuality, rhetoricity, and polyvocality/dialogism of language.  Language – because of its hermeneutic pitfalls – is inherently anti-positivistic; however, there must be sub consubstantial reasoning/context/common perception on which reliability can be founded.
  • Cohen’s kappa is the measurement that mitigates the problem of chance reliability because of a fairly small number of coding categories.  Cohen’s kappa can be automated using various technologies found on the web.

Chapter 6: Calculating Frequency

Summary:  In this chapter, you will calculate the frequency and relative frequency with which you placed  data in your coding categories. This process involves building a frequency table: naming its data ranges, defining its criteria, calculating its frequencies, and calculating its marginals. Formulas for calculating relative

frequency are also included. Simple techniques for writing and copying formulas, designating data ranges, and working with databases are introduced.

  • Frequency is the percentage of assigning any particular category while coding data.  Relative frequency is the frequency of one category to any other category in the coding scheme.
  • G. covers numerous formulas in this chapter to allow for quick frequency counts.

Chapter 7:  Seeing Patterns of Distribution

Summary:  In this chapter, you will look for patterns in how your verbal data is distributed across the categories of your coding scheme. Using the frequency table from the last chapter, you will create and interpret distribution graphs of their patterns. Graphing techniques are introduced.

  • Distribution answers the question:  “How did the way I assigned data to my coding categories vary with my built-in contrasts?” (104).  Or, put differently, distribution is “the way your data is distributed across the categories in your coding scheme” (ibid.).

Chapter 9:  Following Patterns over Time

Summary:  In this chapter, you will look at patterns in your verbal data that indicate how aspects of your data vary over time. Looking for patterns in time helps to define the temporal shape of your coding categories. We will consider simple temporal indexes and then go on to look at aggregate patterns.



To Be Done:

1.  Segment text into necessary units for analysis.

2.  Remove unwanted carriage returns (Backspace delete+insert space+move to beginning of next line).

3.  Create a segmenting style to  differentiate selections.

4.  Move textual units into excel.

5.  Label data

Leave a Reply