What Counts, How, and So What?

Main Clauses

Last Revise 5/18/99

     If there is interest in it, I will go back and pull out more of the the relevant information on defining main clauses from the studies by Hunt, O'Donnell, Loban, Mellon, O'Hare, and Bateman/Zidonis. The picture, especially that presented in the latter three studies, won't be pretty. I consider my own definition of a main clause to be closest to Hunt's "T-Unit." In essence, his study proved that counting words per main clause is the most effective, basic way to measure syntactic maturity. He had, however, no reason for this, or for explaining the various errors that he must have found in students' writing.
     O'Hare's study, the one most widely acclaimed as proving that the study of grammar is useless, is particularly questionable. O'Hare writes:
 

     This study was interested in the students' writing ability and not at all in their spelling, punctuation, or handwriting talents. In order to eliminate the possible effects of these extraneous factors on the evaluators' judgments, the thirty pairs of compositions were typewritten so that spelling and punctuation could be corrected. The corrections were made by a secretary at the University School. While fully aware that discourse can be punctuated in different ways that could possibly affect meaning, this researcher was satisfied that no bias was introduced because all the punctuation and spelling changes were made by one person who was never aware of the group to which a particular composition belonged. (Sentence Combining, 51-52, emphasis added.)

In itself, this passage invalidates O'Hare's entire study. How can he possibly claim to be improving students' writing when most of the significant errors in that writing are eliminated from consideration? If you are not familiar with O'Hare's study, his entire concept of "improvement" is based on longer main clauses -- the more words per main clause, the better the writing!
     O'Hare's "corrections" raise still other questions. Since the secretary corrected all the writing, the researchers were faced with no fragments, no comma-splices, no run-ons. O'Hare, therefore, did not have to face the question of how fragments affect the count. (See below). More importantl, the primary difference between the control and experimental groups in O'Hare's study is that the experimental group "was exposed to the sentence-combining practice" (35). If we look at this from the students' perspective, the students in the experimental group were asked to combine sentences to make them longer; the control group was not. Surely the experimental group got the message that longer is better, a message to which they responded in their writing. But as early as 1965, Hunt wrote: "As more nonclausal structures are packed into a clause the likelihood of stylistic faults occuring increases apace. The greater the congestion the greater the hazard" (152) O'Hare eliminated the real problem he faced by simply having most of the errors corrected before the passages were analysed!

        The psychological model underlying the KISS approach provides, I believe, a much better set of reasons for what counts. No corrections were made to students' writing. Fragments, comma-splices, and run-ons were marked, as were errors in subject/verb agreement. The basic idea of the model is that the reader's (and writer's) brain chunks words together in short-term memory. Every word (except interjections) is chunked to another word or construction until everything is eventually chunked to a main subject / verb / complement. At the end of a main S / V / C pattern, the content of short-term memory is dumped to long-term, and STM is cleared for the next sentence.
     If the preceding hypothesis is correct, it explains some of students' major errors. For one thing, as I will try to show in the section on errors, comma-splices and run-ons result from writers sensing a connection between two main clauses, but not understanding how to punctuate it. Thus, in their own processing, they dump to LTM, but they signal this dump with a comma. Or, not knowing what to do, they simply leave out all punctuation altogether and end up with a run-on. To explore this idea, in the analyzed texts comma-splices and run-ons have been counted as separate main clauses.
 

In the analyzed texts, the beginning of a main clause is marked by \-\; a comma-splice, by \,\; a run-on, by \R\; and a fragment by \F\. To distinguish main clause length from sentence length, the beginning of a main clause that functions as a compound is marked by \C\.

     As Hunt, O'Donnell, and Loban showed, main-clause length increases naturally, with age. The older, or more experienced we become, the more words we, as both readers AND WRITERS, can juggle in STM. Most fragments occur, I would suggest, because the complexity of the ideas in the writer's head exceeds that writer's ability to juggle words and constructions in STM. The result is that the writer gets part of the main clause on paper, becomes confused, sticks in a period and capital letter, and then writes the rest of the main clause as a separate sentence. Many teachers will recognize the probable validity of this hypothesis simply because the advice often given to students to fix fragments is -- "Combine them with the preceding or following sentence." 
     But if this hypothesis is correct, it has implications for what should be counted. If words per main clause is a basic measure of syntactic maturity, and if fragments result from the writer's exceeding that maturity, then fragments should be counted as separate main clauses. And that is what I have done in these studies.


Words per Main Clause

Introduction

     Conducted in the 1960's and 70's, the studies of Loban, Hunt, and O'Donnell demonstrated that a writer's average number of words per main clause naturally increases with age. The following table is a compilation of their studies:

Average Number of Words per Main Clause by Grade Level

Grade
Level
Loban's
Study
Hunt's
Study
O'Donnell's
Study
3 7.60 7.67
4 8.02 8.51
5 8.76 9.34
6 9.04
7 8.94 9.99
8 10.37 11.34
9 10.05
10 11.79
11 10.69
12 13.27 14.4
Professional
Writers
20.3
Loban's data taken from Language Development: Kindergarten through Grade 
Twelve. Urbana, IL.: NCTE. 1976. 32. Hunt's and O'Donnell's data taken from the 
summary in Frank O'Hare, Sentence Combining. Urbana, IL.: NCTE. 1971. 22.

The differences in the studies (such as O'Donnell's showing 9.99 words/main clause for 7th grade students and Loban's showing 8.94) should raise questions, but there is little doubt that the average number of words per main clause increases with age. Because a reader's brain dumps to long-term memory at the end of main clauses, the clearing of STM creates a rhythm to the text. Even if readers can not identify main clauses, they must surely sense this rhythm.

Theoretical Considerations

 Is Longer More Mature?

      Mellon, Bateman, Zidonis, especially O'Hare, and many others have assumed that more words per main clause is a reflection of "better" writing. (An increase in words per main clause is, after all, the primary "proof" offered in their studies.) Many teachers have questioned the assumption that longer is better. (Is it an American fallacy? Or perhaps a male fallacy?) Little has been done, however, to challenge the assumption directly, probably because of a lack of a theoretical framework and a method for doing so.
       Stephen Jay Gould's Full House: The Spread of Excellence from Plato to Darwin (NY: Harmony Books, 1996) may provide both a theory and a method. Gould's primary purpose in the book is to disprove the theory of evolution as progress toward more complex organisms. (For those who might be interested -- in setting up his argument, he devotes a large part of the book to explaining the disappearance of the 0.400 batting average.) Although Gould's concern is biology, his discussion of progress toward more complex organisms may be very comparable to the question of progress (improvement) toward more complex (longer) main clauses.
     Gould's primary argument is that, in biology, we have focussed on the more complex and generally ignored the "full house" of all organisms -- which includes many, many more simple organisms than it does complex. Rather than try (inadequately) to summarize Gould's argument, I will attempt to apply his concepts to the question of main clause length and natural syntactic development.
     Although we usually think of children's first "sentences" as consisting of two words, there are single word sentences: "Think!" Now suppose we want to make a graph of the number of sentences (or main clauses) of different lengths in written texts. The left side of our graph has what Gould would call a "wall" at the number one -- there are no sentences that consist of less than one word. The right "wall" of our graph -- as James Joyce among others has taught us -- is fairly wide open. Theoretically, sentences (or main clauses) could be thousands of words long.
     In his biological argument, Gould argues for a left wall of single-celled organisms plus random variation. With life originating -- at least to the extent that we can track it -- at the left wall, random variation can only  result in greater complexity. At this point, natural language development clearly differs from Gould's biological model. Gould points out that the world is still overwhelmingly full of single-celled organisms. Very rare, however, is the adult who speaks or writes only in one- or two-word sentences. Clearly there is a natural tendency to increase main-clause length beyond the minimal. And clearly this tendency is good -- to a certain extent. If longer is simply better, then all of us should be writing like James Joyce, and none of us should be writing like Hemingway. At what point (10 words per main clause? 15 words per main clause? 20 words per main clause? 25 words per main clause?) does longer stop becoming better and become worse?
       The question is very complicated, especially because some constructions which make writing better (appositives and gerundives) also decrease the number of words per main clause. Nevertheless, the question is approachable through statistical research. Almost every semster, for example, I have my students analyze a passage of their own writing and calculate the average number of words per main clause. We then average these averages and almost invariably end up with a number between fifteen and sixteen. And, based on out theory of how the human brain processes language, we discuss the advantages of being near the average and the disadvantages of being at either tail of the distribution. (If the average in a passage is too low, the writing may sound too simple and immature; if it is too high, it may tire -- or even be incomprehensible for the reader.)
     Fortunately or unfortunately, Gould has also shown me that my -- and others' -- calculations of averages in this context may be misleading. In calculating averages, I have always used the "mean" -- add all the values and then divide by the number of cases. As far as I can remember -- it's something that I probably should check, but then there are a lot of things I should do -- Mellon, O'Hare, etc. also all used the "mean." We are probably, however, dealing with what Gould calls a "skewed distribution" -- there are probably a lot more one-, two-, three-, four-, etc. word main clauses than there are thirty one-, thirty-two-, thirty three-, etc. According to Gould, "The mean is a terrible measure for any vernacular notion of 'average' or 'central tendency' in ... a highly skewed distribution, because the introduction of just one Bill Gates will pull the mean way to the right." (159) The "median" (half-way point) may thus be better than the "mean"; and Gould argues for the mode ("most common value" or "the peak value of the bell curve itself" 159).
     At times like this, statistics give me a headache, but if we ever tell students that their sentences are too short, or too long, we should have the responsibility of at least trying to understand what we are talking about. I intend to revise this section as I attempt to apply Gould's ideas, but for now an imaginary example will help me see if I understand:

Suzzie (If you love Suzzie....) writes an essay comprised of ten main clauses with the following word-counts -- 9, 10, 11, 11, 12, 12, 14, 15, 16, 33.
The "mean" word-count is 14.3; the median, 12; and the mode 11.5. This may not seem like a big difference, but according to Loban's study (See above.) it is the difference between pre-tenth graders and high school graduates.
It looks as if I need to recompute and reconsider some statistics.

This border is a reproduction of
Salvador DALI's
Leda atomica
1949, Oil on canvas, Teatro-Museo Dali, Figueras

Adapted from Mark Harden's WWW Artchive http://artchive.com/core.html

Click here for the directory of my backgrounds based on art.
[for educational use only]