Introduction to the Spring 95 Research Project

In the Spring of 1995, I decided to start a "preliminary" research project with at least three purposes. First, I wanted to explore further the nature of such research. Although I have criticized the work of Mellon, O'Hare, and others who have done statistical research, I have also suggested that such research can be important. But how important and for what purposes? How reliable IS such research? What specific questions are worth pursuing, and to what degree? Anyone who gets into this type of research will soon find that there are thousands of possible questions. When do children, for example, start using compound verbs with any frequency? The young child repeats subjects -- "We went to the store, and we bought candy, and then we went to the park." The preceding sentence has sixteen words in three main clauses, or 5.3 words per main clause. The sentence "We went to the store, bought candy, and then went to the park." has thirteen words in one main clause, i.e., 13 words per main clause. If we are going to map syntactical development satistically, clearly this is one of the questions that should be addressed.
My second purpose was to test the idea of a "national" database of students' writing. It is impossible to reexamine (and hence seriously question) the work of Hunt, Loban, O'Donnell, O'Hare, etc., because transcripts of the student writing that they analyzed are simply not available. Computers have now made it possible to make such transcripts widely available. As the suggestions for Selecting Passages for Analysis indicates, collecting writing samples is itself a major obstacle in this type of research. I suggested that NCTE create such a database, but my suggestion did not receive much attention. Originally, my intent was to make the transcripts of this project available as files on a disk. Anyone who wanted them could simly pay for copying and mailing the disk to them. The advent of the net, of course, has changed that. The transcripts are here for the taking. I hope others will take and use them, both for their own work and to challenge mine.

My third purpose was more focussed. Most previous research has been based on one writing sample from each student. Walter Loban, who tracked students from fourth to twelfth grades, used more, but he used one sample a year. In attempting to prove that instruction in grammar is useless, Mellon, O'Hare and others used two samples, but these were pre- and post-tests. Although all the research combined reinforces itself, i.e., the AVERAGE main clause written by a high school senior is about 14 words long, no one, to my knowledge, has adequately address a simple question -- how much does average main clause length by the same writer vary in different writing assignments? Suppose, for example, that many writers -- including young students -- have the ability to write longer or shorter main clauses at will? One day they may feel like writing short and simple; the next, they may feel serious and long? If this is the case, then it further undercuts the research of Mellon and O"Hare -- if many students can write longer or shorter, as they please, then those students who did sentence combining exercises surely got the message that longer was better. In their writing for the class, they simply went into "longer" gear. The question, which someone may want to test by the way, is could O'Hare's results be duplicated by replacing the sentence-combining with the simple statement -- "Write longer sentences."
I wanted, therefore, to compare the statistics for one student across several different assignments. Most researchers, for example, are aware that the mode of writing (narrative vs. expository) affects the group average of average length per main clause. But does it affect everyone's average? If not, why not? How many students are stable across modes? Which students (the high end, or the low) are stable across modes? Still another quesiton concerns the difference between in- and out-of-class writing. Almost all the research is based on writing that students do in class. One of the obvious reasons for this is that it avoids the problem of contamination, i.e., parents or others helping the writer. But in-class writing is done under pressure of time, with little if any opportunity for revision. As I watch myself writing, at home, at ease, I see myself continually combining or chopping sentences. Perhaps the better student writers do likewise? If so, what effect does that have on their syntax?

Collection of Writing Samples

Just before the Spring of 95 semester, I decided to collect as many writing samples from the students in my Freshman composition sections as I could. In all, I had 66 students. To collect as many samples as I could I asked each student to sign a permission to publish form so that I could use their writing for this research. Unfortunately, only 40 students signed the form. And some of them did not do all the writing assignments. I decided to analyze all the samples I had, even if I did not have permission to make them all public. In the Tables of Results, the second colum, "PtP," indicates whether or not I got permission. If I did, then the student's text is provided, both as a plain transcript and with my codes for analysis.

I decided to analyze six samples from each student, four of which were written in-class. The first of these, a pre-course sample, was written during the first week of the semester, before any instruction was done. The second is their first regular in-class essay. The last two are in-class essays done at the end of the course as a final exam. Of these, thus far I have "finished" (See below.) analyzing only the pre-course sample. Although the students wrote four major out-of-class papers, I decided to use only two for the simple reason that the other two were research papers. (Sorting out the maze of paraphrases and sentences that were quoted but are not in quotation marks would be impossible.) The first set of these has been analyzed, the second is almost complete. "Finished," above, is in quotation marks because I may decide to return to these samples for futher analysis.

What Was Counted, and How

All texts are being made into electronic documents. (Copies of the originals for which I have permission to publish are available for the cost of copying and mailing.) Using ToolBook, an authoring program, I created a set of programs which act as repositories for the original transcripts, but which also allow me to analyze them by clicking on words to insert codes. Because this analysis is still very time-consuming, and because I wanted an overall view of the documents, I decided to limit analysis to prepositional phrases and main and subordinate clauses. Because it was relatively easy to do, I also decided to count fragments, comma-splices, run-ons, and subject/verb agreement errors. A discussion of each of most of these is available by clicking on one of the links below:

Counting Words
Counting Main Clauses
Counting Subordinate Clauses
Counting Prepositional Phrases

Comma-Splices and Run-ons
Fragments
Subject-Verb Agreement Errors

In counting subordinate clauses, my program enabled me to count level of embedding. Thus a "Level-One" subordinate clause is a subordinate clause embedded in a main clause, a levevl two is a subordinate clause embedded in a Level One, etc. (See "Counting Subordinate Clauses.") I should have designed the program to count Prepositional Phrases in the same way, but I did not. (This is something I go back and do later.) Because the orgininal transcripts are archived here, and because I have suggested that a database of students' writing would be beneficial for a number of types of research, I have also counted the number of paragraphs and number of words per paragraph.

In previous, smaller studies, I have counted and examined numerous other things -- functions of subordinate clauses (DO, PN, Adjectival, Adverbial, etc.), branching of subordinate clauses (Left, ie., before the main SV; Mid i.e., between the main subject and its verb; and Right -- after the main subject verb.) I have also counted infinitives, gerunds, gerundives, appositives, and noun absolutes. Because of the number of texts I was facing, I decided not to analyze these at this time -- I may return to them later.

Table(s) of Results

Because this project in still in progress, I have included only one Table, for Words per Main Clause, thus far. That table lists students (by number), whether or not I have Permission to Publish (PtP), and cells for the average number of words per main clause on each of the six samples. If I do have permission to publish, the number for the results of each student has been made a link to the student's writing, both analyzed and unanalyzed. The document which includes these texts also includes the statistical results for that sample for all catergories that were analyzed. Click here to see a sample.

Discussion

Because this project is still in progress, I am not ready for a general discussion of if. In the process of analysing texts and converting them to the HTML documents presented here, I have made , and will continue to make, observations about problems involved in counting and on various other things that I find. These observations are being collected here in a variety of documents, primarily lists under "What Counts, How, and So What?", under "The Question of Errors," and under "Interesting Questions." As each of these documents notes, they are work in progress and will be continually updated as I add new writing samples to this web site.