What Counts, How, and So What?
Words
Word Count
In these studies, words were counted electronically, which means that a space between characters defines a word. No corrections or changes were made to the text. If a student spelled "a lot" as one word, it was left as one word. (My sense is that separating such instances would have had a negligible effect on the statistical results.) In cases where a student used a proper name in a negative context, the name was replaced by Xes, equal in number to the number of letters in the name. Some of the assignments in these studies were submitted as typed papers. I have made every attempt to reproduce the students' originals, letter for letter, punctuation mark for punctuation mark. Unfortunately, with the hand-written assignments, I did not have the money to pay someone to type them, also letter for letter, punctuation mark for punctuation mark. Thus I had to do so myself. I will, however, send xeroxes of the orgininals to anyone who is interested in seeing them and who is willing to pay for their reproduction.
Perhaps a major problem in statistical studies of syntactic development is that researchers count words in different ways. O'Donnell, for example, states:
Before records of the children's speech
and writing were processed any further, elements in them regarded as syntactically
irrelevant were marked (in red ink) for special treatment. In the speech
transcripts, representations of audible pauses (usually recorded as uh)
were thus eliminated from all computations. False starts, redundant subjects
(such as he in the ant he went home), and word-tangles as
well as noncommunicative repetitions (called "mazes" on the analysis
worksheets-- see Appendix C) were excluded from subsequent study of syntax,
but they were tabulated for reporting as "garbles." With garbles and representations of audible pauses eliminated, a word count of each individual set of responses was made. Conventional word division as represented in dictionary entries was generally honored, but two special rules were adopted to make the count more uniform and meaningful. Contractions such as he'd and isn't were regarded as two words, and compound nouns (whether written solid or hyphenated in dictionaries) were given the count indicated by the number of bases involved. Thus "snowball" would be counted as two words. (Syntax, 33) |
One of the general problems in this type of research is that most researchers
are not as explicit as O'Donnell is about the way in which they count words.
O'Donnell's explanation, however, raises at least three problems.
Counting compounds based on the number of
bases involved is one. To me, O'Donnell's decision seems both arbitrary
and fuzzy. It seems arbitrary because he gives no psychological or linguistic
reason for it. It seems fuzzy because, if I wanted to follow him,
I would have problems deciding which words count for what. Would the
Whitehouse count as two words, or three? What about tablecloth?
Automobile? It seems to me that such compound nouns are not aspects
of syntax as much as they are of vocabulary. I have therefore not followed
his practice.
Counting contractions as separate words also
raises questions. For one, there is the laborious mechanical question of
how one makes the count. Counting words by hand is extremely time confusing
and not very reliable. Counting while looking for contractions (or what
should be contractions as in its vs. it's) would be even
more so. Then there is the question of whether or not the use of contractions
might not itself be a reflection of syntactic maturity. For these two reasons,
I have counted contractions as one word.
Perhaps the most important problem in O'Donnell's
method of counting is his treatment of garbles. O'Donnell marks
them, but they are "eliminated from all computations." In hand-written
samples, garbles may be the result of sloppy handwriting, but they may
also be a reflection of problems in the writer's mental processing. The
writer, after all, has to syntactically process (chunk) what he or she
wants to write, and then has to hold it in STM while the hand transfers
it to the page. Problems in this process, especially with the hand keeping
up with the brain, probably account for the not uncommon problem of words
left out. And they probably account for many garbles. Overall, garbles
do not seem to be particularly frequent, but, in my research, they are
transcribed as a series of Xes, as close to the letter and space count
of the original as I could make it.
The problems discussed above are particularly significant because the original samples and transcripts of the research are not available for examination. I have attempted to avoid this problem by making the transcripts and originals available.