Quantcast
Channel: anthropology news ticker - antropologi.info » anthropology
Viewing all articles
Browse latest Browse all 2364

Language Log: Proportion of adjectives and adverbs: Some facts

$
0
0
Adam Okulicz-Kozaryn, "Cluttered writing: adjectives and adverbs in academia", Scientometrics 2013: [H]ow do we produce readable and clean scientific writing? One of the good elements of style is to avoid adverbs and adjectives (Zinsser 2006). Adjectives and adverbs sprinkle paper with unnecessary clutter. This clutter does not convey information but distracts and has no point especially in academic writing, say, as opposed to literary prose or poetry. If you've seen my earlier discussion of this paper ("'Clutter' in (writing about) science writing", 8/30/2013), you'll recall that Dr. O-K goes on to count adjectives and adverbs in some word lists from samples of scientific writing. He asserts that "social science" writing uses about 15% more adjectives and adverbs than "natural science" writing — although he doesn't tell us enough about his methods to dispel concerns about several likely sources of artifact — and he concludes by asking "Is there a reason that a social scientist cannot write as clearly as a natural scientist?" In the interests of science of all kinds, I decided to devote this morning's Breakfast Experiment™ to this issue. I wrote a python script using NLTK to calculate the proportions of various parts of speech in a document; and then I tried this script out on samples of various sorts of writing, to check the association of text readability with the percentages of adjectives and adverbs. Now I'll show you some of what I found. To start with, I decided to try some really cluttered prose, prose that is not at all "readable and clean": Edward Bulwer-Lytton's Paul Clifford. Wikipedia tells us that this novel is considered to represent 'the archetypal example of a florid, melodramatic style of fiction writing'". Its first sentence: It was a dark and stormy night; the rain fell in torrents, except at occasional intervals, when it was checked by a violent gust of wind which swept up the streets (for it is in London that our scene lies), rattling along the house-tops, and fiercely agitating the scanty flame of the lamps that struggled against the darkness. I put essentially all of the first chapter of this work into a file (minus the paragraphs that are mostly dialogue, much of which is in dialect). According to NLTK's pos_tag() function, which should be about 95% correct, the score was: 1775 words, 184 punctuation tokens = 1591 real words 108 adjectives = 6.8 percent 78 adverbs = 4.9 percent 186 adjectives+adverbs = 11.7 percent So Bulwer-Lytton's chapter is about 12% adjectives and adverbs. What should we compare this to? Well, Dr. O-K cites William Zinsser's On Writing Well as his authority for the cluttering nature of adjectives and adverbs, so let's try the first three sections of that work (minus quotations from others, of course): 3939 words, 439 punctuation tokens = 3500 real words 241 adjectives = 6.9 percent 208 adverbs = 5.9 percent 449 adjectives+adverbs = 12.8 percent Hmm. Well, maybe this is experimental error. And Bulwer-Lytton's writing is clear enough, it's just kind of overwrought. So let's take a look a something by Jacques Derrida, whose prose is about as unreadable as anything I've ever encountered. Here's the score for chapter 2 of "Of Grammatology" (in English translation, of course): 19239 words, 2105 punctuation tokens = 17134 real words 1434 adjectives = 8.4 percent 946 adverbs = 5.5 percent 2380 adjectives+adverbs = 13.9 percent OK, that's better —  Derrida has 19% more adjectives and adverbs than Bulwer-Lytton. But he's only got 8% more than Zinsser, and Zinsser has more than Bulwer-Lytton, so this still doesn't all seem to be working out the way we were told it would. Let's go for another paragon. Dr. O-K opens his paper with a quote from Mark Twain: "When you catch an adjective, kill it." So let's try the whole letter that the quote came from: 1474 words, 170 punctuation tokens = 1304 real words 89 adjectives = 6.8 percent 95 adverbs = 7.3 percent 184 adjectives+adverbs = 14.1 percent Oops. We're really going in the wrong direction here — Saint Mark uses the highest proportion of adjectives and adverbs that we've seen so far. And what about Dr. O-K's own writing? Here's the score for the text of "Cluttered writing: adjectives and adverbs in academia" itself (of course minus the quotations from others): 883 words, 80 punctuation tokens = 803 real words 85 adjectives = 10.6 percent 42 adverbs = 5.2 percent 127 adjectives+adverbs = 15.8 percent We have a winner! Dr. Okulicz-Kozaryn's text, about the importance of eliminating adjectives and adverbs from prose, has fully 35% more adjectives and adverbs than the infamous "It was a dark and stormy night" passage, which has given its author's name to an annual bad writing contest! (127/803)/(186/1591) = 1.3528 Seriously, the problem is not in Dr. O-K's writing (despite the sprinkling of slavicisms), but in his ideas. As far as I can tell, calculating the percentage of adjectives and adverbs in a text tells us nothing whatever about its readability, clarity, or efficiency. I'll spare you the reports for the other 45 texts that's I've tested. But just to let Dr. O-K off the hook for the "most modifiers" prize, let me note that the text of Ben Yagoda's piece from the Chronicle of Higher Education on adjectival anxiety ("The Adjective — So Ludic, So Minatory, So Twee", 2/20/2004), beats him out: 1908 words, 301 punctuation tokens = 1607 real words 208 adjectives = 12.9 percent 86 adverbs = 5.4 percent 294 adjectives+adverbs = 18.3 percent Finally, I need to point out that there's a technical flaw in the whole "avoid adjectives and adverbs" idea — nouns are often modified by other nouns, or by prepositional phrases, or in other ways that don't involve adjectives; and verbs are often modified by prepositional phrases, subordinate clauses used as verbal adjuncts, and so on. If it were true, counterfactually, that modification in general was a Bad Thing, then we'd need to count these other sorts of modifiers as well, not just adjectives and adverbs. Some of the previous LL posts on modificational anxiety: "Those who take the adjectives from the table", 2/18/2004 "Avoiding rape and adverbs", 2/25/2004 "Modification as social anxiety", 5/16/2004 "The evolution of disornamentation", 2/21/2005 "Adjectives banned in Baltimore", 3/5/2007 "Automated adverb hunting and why you don't need it", 3/5/2007 "Worthless grammar edicts from Harvard", 4/29/2010 "Getting rid of adverbs and other adjuncts", 2/21/2013 "'Clutter' in (writing about) science writing", 8/30/2013 N.B. Someone who took this whole business seriously enough to want to look at differences in part-of-speech distributions among scientific disciplines should know that Okulicz-Kozaryn is wrong when he writes that as of 2012 I cannot bulk download enough full texts to have a representative sample of a discipline. Between arXiv,  the PLoS collections, SSOAR, the resources available from the ACL, and so on, it would not be hard to create large enough samples in enough different disciplines and subdisciplines to engage the question more seriously than Okulicz-Kozaryn did. But you ought to have another hypothesis to test as well, because the modifier-percentage idea looks like a loser. Update — I realize that it's only fair for me to report the score for this blog post. Leaving out the quotations and so on, and without this update, I get: 1143 words, 146 punctuation tokens = 997 real words 82 adjectives = 8.2 percent 59 adverbs = 5.9 percent 141 adjectives+adverbs = 14.1 percent The same overall percentage as Mark Twain…

Viewing all articles
Browse latest Browse all 2364

Trending Articles