What Counts, How, and So What?

Words

Word Count

In these studies, words were counted electronically, which means that a space between characters defines a word. No corrections or changes were made to the text. If a student spelled "a lot" as one word, it was left as one word. (My sense is that separating such instances would have had a negligible effect on the statistical results.) In cases where a student used a proper name in a negative context, the name was replaced by Xes, equal in number to the number of letters in the name. Some of the assignments in these studies were submitted as typed papers. I have made every attempt to reproduce the students' originals, letter for letter, punctuation mark for punctuation mark. Unfortunately, with the hand-written assignments, I did not have the money to pay someone to type them, also letter for letter, punctuation mark for punctuation mark. Thus I had to do so myself. I will, however, send xeroxes of the orgininals to anyone who is interested in seeing them and who is willing to pay for their reproduction.

Garbles

Perhaps a major problem in statistical studies of syntactic development is that researchers count words in different ways. O'Donnell, for example, states:

Before records of the children's speech and writing were processed any further, elements in them regarded as syntactically irrelevant were marked (in red ink) for special treatment. In the speech transcripts, representations of audible pauses (usually recorded as uh) were thus eliminated from all computations. False starts, redundant subjects (such as he in the ant he went home), and word-tangles as well as noncommunicative repetitions (called "mazes" on the analysis worksheets-- see Appendix C) were excluded from subsequent study of syntax, but they were tabulated for reporting as "garbles."
With garbles and representations of audible pauses eliminated, a word count of each individual set of responses was made. Conventional word division as represented in dictionary entries was generally honored, but two special rules were adopted to make the count more uniform and meaningful. Contractions such as he'd and isn't were regarded as two words, and compound nouns (whether written solid or hyphenated in dictionaries) were given the count indicated by the number of bases involved. Thus "snowball" would be counted as two words. (Syntax, 33)

One of the general problems in this type of research is that most researchers are not as explicit as O'Donnell is about the way in which they count words. O'Donnell's explanation, however, raises at least three problems.
     Counting compounds based on the number of bases involved is one. To me, O'Donnell's decision seems both arbitrary and fuzzy. It seems arbitrary because he gives no psychological or linguistic reason for it. It seems fuzzy because, if I wanted to follow him, I would have problems deciding which words count for what. Would the Whitehouse count as two words, or three? What about tablecloth? Automobile? It seems to me that such compound nouns are not aspects of syntax as much as they are of vocabulary. I have therefore not followed his practice.
     Counting contractions as separate words also raises questions. For one, there is the laborious mechanical question of how one makes the count. Counting words by hand is extremely time confusing and not very reliable. Counting while looking for contractions (or what should be contractions as in its vs. it's) would be even more so. Then there is the question of whether or not the use of contractions might not itself be a reflection of syntactic maturity. For these two reasons, I have counted contractions as one word.
     Perhaps the most important problem in O'Donnell's method of counting is his treatment of garbles. O'Donnell marks them, but they are "eliminated from all computations." In hand-written samples, garbles may be the result of sloppy handwriting, but they may also be a reflection of problems in the writer's mental processing. The writer, after all, has to syntactically process (chunk) what he or she wants to write, and then has to hold it in STM while the hand transfers it to the page. Problems in this process, especially with the hand keeping up with the brain, probably account for the not uncommon problem of words left out. And they probably account for many garbles. Overall, garbles do not seem to be particularly frequent, but, in my research, they are transcribed as a series of Xes, as close to the letter and space count of the original as I could make it.

The problems discussed above are particularly significant because the original samples and transcripts of the research are not available for examination. I have attempted to avoid this problem by making the transcripts and originals available.