Updated: 7/5/11
 
KISS Statistical Research

Key to Abbreviations

% W in PP: The number of words in prepositional phrases divided by the total number of words.

Appositive: A construction in which the syntactic connection is established by identity of meaning: "Aluminum, a metal, is abundant and has many uses."

App/MC: The total number of appositives divided by the total number of main clauses.

Averages: There are all kinds of problems in computing averages ("mean," "mode," etc.), but I have decided to stick with simple averages, both because it is best understood by the general public and because it probably best reflects the "average" work required of the reader's working memory (STM).
     There is, however, still another problem involved in computing averages (of averages) in projects such as these. Suppose, for example, that we want to calculate the average number of words per main clause in the writing of twenty seventh graders. There are two ways to go about this. In one, we could count the total number of words in all twenty passages, and divide that number by the total number of main clause in all twenty passages. The problem with this approach is that, unless all the passages have almost the same number of words, longer passages get more weight in the average. I have, therefore, chosen to use the second method -- to average the averages of each passage, thereby giving equal weight to each passage.

Branching -- Left, Mid, and Right: Left branching constructions appear before the subject and verb in the clause in which they are embedded. In mid branching, the construction appears between the subject and verb, and in right, it appears after:

Left: [When Sarah saw Bill,] he left.
Mid: Bill, [when Sarah saw him,] left.
Right: Bill left [when Sarah saw him.]
Most grammarian consider right branching to be more common. Walker Gibson has suggested that left branching suggests a more organized mind because the writer/speaker must already "see" the main clause before writing the subordinate.

CCm/FV: The total number of compounded complements divided by the total number of finite verbs.

CMC/MC: The total number of compounded main clauses divided by the total number of main clauses. 

Compounds: An obvious question is the degree to which students used simple compound (rather than subordinating structures) to combine sentences. MC = Compound Main Clauses; Subjects = Compound Subjects of finite verbs; F Verbs = Compound Finite Verbs; Complements = Compound Complements (Direct Objects, Predicate Nouns, or Predicate Adjectives). Fragments (Frag), Comma Splices (CS), and Run-ons (RO) are also counted here because they probably reflect attempts at compounding main clauses.

CS/MC: The total number of coma-splices divided by the total number of main clauses.

CSu/FV: The total number of compound subjects divided by the total number of finite verbs.

CVb/FV: The total number of compound finite verbs divided by the total number of finite verbs.

Delayed Subject: Aan empty "it" usually fills the normal subject slot, and the meaningful subject is delayed until later in the sentence: "It is interesting that the delayed subject is not often discussed by traditional grammarians."

DO Ellipsed (Infinitive): A concept based on transformational theory which replaces the objective and subjective complements of traditional grammar. It is most often seen in sentences such as "They elected Bill president." in which "Bill president" is considered a nexus based on an ellipsed "to be." The infinitive phrase (with its subject, complement, and modifiers, is the complement of the finite verb. (Note the similarity to "They wanted Bill to be president.")

Dr A: Direct Address

Embedded Level of Subordinate Clauses: Embedding levels can be a very complicated question, which, someday, I may address in more detail. Currently, KISS considers only embedded levels of subordinate clauses. A subordinate clause embedded at Level 1 is directly related to the main clause: "This is the house [that I lived in]." A level 2 embedding is a clause within a Level 1: "This is the house [that, [when I was young,] I lived in]." Level 3 is within a level 2, etc. Except for things such as "The House that Jack Built," writers rarely get beyond level 4.

Frag/MC: The total number of fragments divided by the total number of  main clauses. 

Gn/MC: The total number of gerunds divided by the total number of main clauses.

Gve/MC: The total number of gerundives (participles) divided by the total number of main clauses.

Inf/MC: The total number of infinitives divided by the total number of main clauses. ("Infinitives" that form part of a finite verb phrase (have to go, am going to go) are not counted.

L2+SC/MC: The total number of subordinate clauses that are embedded within other subordinate clauses: "We saw the boy [who was reading the book [that you recommended.]]"

Level 1: These numbers reflect the total number of subordinate clauses embedded at different levels. See "L2+SC/MC" above.

MC Long: The longest main clause in the passage.

MC Short: The shortest main clause in the passage.

MC Var: This is an attempt to calculate variety in main clause length. This is something that teachers often talk about, but relatively little research has been done about it. It was, I believe, Edward Corbett who suggested that this could be measured by counting the  words in each sentence, and then looking at the number of sentences that are ten or fifteen percrent above or below the average length. This approach seems faulty, both for its use of the sentence (instead of the main clause) as the basic yardstick, and also because it does not take into consideration the rhythm, or the sequential location of the sentences in the text.
    KISS calculates this statistic first by adding the difference in length between each main clause and the main clause that precedes it. Suppose, for example, that a student writes a sequence of main clauses of the following lengths: 10, 15, 8, and 20. The three differences in length would be 5, 7, and 12. These differences are then averaged, which, is this case, results in an average differential of 8.
     Although it seemed at first that this average would be a sufficient indicator, further thought led to the idea that adults, who on average write longer main clauses, would automatically have higher differentials. In order to attempt to make comparisons along the continuum from fourth graders to professional writers, I therefore decided on a further step --  to calculate the variation by using the differential as a percentage of the writer's average words per main clause. Thus, in the statistics reported, the figure is the differential, multiplied by 100 and divided by the average words per main clause. This should be a reflection of how much the writer varies main clause length from the writer's own basic norm.

NC / TSC; AdjC /TSC; AdvC / TSC; InjC / TSC: Noun clauses, adjectival clauses, adverbial clauses, and clauses used as Interjections, expressed as a percentage of the total number of subordinate clauses. Obviously, this is an attempt to look at which types of clauses writers use most often.
     In the pages which present further details of these categories, the four letter identifying codes work as follows. The first letter ("L," "M," or "R") represents branching. In noun clauses, the second letter is "N." The last two letters  indicate direct objects ("DO"), predicate nouns ("PN"), objects of prepositions ("OP"), delayed subjects ("DS"), subjects ("SU"), and appositives ("AP"). For adjective clauses, the second and third letters are "AJ"; for adverbial, "AV." A last letter of "F" indicates full clauses; "R" designates semi-reduced clauses.

Noun Absolutes: This construction usually consists of a noun modified by a gerundive. 

NuA: Noun Used as an Adverb

Passive F Verbs: Passive Finite Verbs. Because the Aluminum passage describes a process, one obvious question is the extent to which students used passive verbs.

PPA: A Post-positioned adjective. This construction appears to be a late-bloomer based on the reduction of a subordinate clause -- "He was an old man, who was tired and fragile." --> "He was an old man, tired and fragile."

PV/FV: The total number of passive verbs divided by the total number of finite verbs.

RCm: A retained complement after a passive verb.

RO/MC: The total number of run-ons divided by the total number of main clauses.

Semi-Reduced Subordinate Clause: A Subordinate clause from which (usually) the subject and the auxiliary verb have been deleted. "When it is put through several other processes, it yields a chemical." --> "When put through several other processes...."

SOPP: The percent of sentences that begin with a prepositional phrase.

SOSC: The percent of sentences that begin with a subordinate clause.

SOBut: The percent of sentences that begin with "But."

SVAgr/FV: The total number of subject/verb agreement errors divided by the total number of finite verbs.

Total W: The total number of words in the passage.

TSC/MC:  The total number of subordinate clauses divided by the number of main clauses. One hypothesis is that subordinate clauses develop before "advanced" constructions such as gerundives (participles) and appositives.

Words: The total number of words in the student's revision.

W/MC: The total number of words divided by the number of main clauses. This is the roughly equivalent of Hunt's "Words per T-Unit." (Hunt's "T-Unit" is a main clause defined as including all its subordinate clauses. See "Defining the 'T-Unit.'")

W/SCL1: The total number of words in level one subordinate clauses, divided by the number of level one subordinate clauses. Theoretically, this number should grow before reductions to gerundives and appositives begin to blossom.

W/Sent: Kellogg Hunt's research was fundamental in establishing words per main clause as the basic yardstick of syntactic maturity. His basic hypothesis was that young students create long sentences by stringing main clauses together with "and." But when it comes to the statistical analysis of students' writing, the primary problem is deciding what counts as a "sentence." Obviously, two clearly compounded main clauses count as one sentence -- "Bill went swimming, and Mary did the dishes." But what if those two clauses are joined by a comma-splice? --  "Bill went swimming, Mary did the dishes." Or are a run-on? -- "Bill went swimming Mary did the dishes." Does a fragment count as a sentence? --"Because I said so." 
     This is a psychological as well as a statistical question. At what point do students have a sense of "sentence"? Books could be devoted to this question, so, for statistical purposes KISS simply relies on the psycholinguistic model of how the human brain processes sentences and on the "final stop" concept of punctuation. A "sentence" thus ends wherever a write puts final stop punctuation. Therefore compounded main clauses, comma-splices, run-ons count as one sentence, as do fragments. (The idea here is that "bad" fragments are created because the writer's short-term memory is overwhelmed by length. As a result, the writer puts "end-stop" punctuation and begins a new sentence.)
     The KISS statistical program has no codes for "sentences." It therefore calculates "Words per sentence" by using the "main-clause" codes. In statistical studies, these codes are put at the beginning of each main clause:

\-\ notes the beginning of a main clause (The hyphen implies that short-term memory is cleared of the preceding sentence.)
\C\ denotes that the following main clause is compounded with the preceding one
\,\ denotes a comma-splice (two main clauses joined by a comma)
\R\ denotes a run-on (two main clauses run together with nothing to separate them
\F\ denotes that the following "main clause" is a fragment
The program counts total main causes by adding each of the above four codes. To count words per sentence, the total number of sentences is calculated by subtracting the number of fragments (\F\) from the above, and dividing the total number of words by the result.
 

This border presents
Jan Steen's
(1626-1679, Dutch)
Rhetoricians at a Window
1662-66, oil on canvas, Philadelphia Museum of Art
Carol Gerten's Fine Art http://metalab.unc.edu/cgfa/

Click here for the directory of my backgrounds based on art.

[for educational use only]