Quantcast
Channel: anthropology news ticker - antropologi.info » anthropology
Viewing all articles
Browse latest Browse all 2364

Language Log: Long is good, good is bad, nice is worse, and ! is questionable

$
0
0
Sanette Tanaka, "Fancy Real-Estate Listing, Fancier Verbiage", WSJ 6/6/2013: Savvy real-estate agents know it's not just what you say. It's how long it takes you to say it. More-expensive homes go hand-in-hand with longer real-estate agents' remarks—the language written by the agent that supplements the house description and photos in a listing. Agents use a median 250 characters for homes listed under $100,000, according to an analysis for The Wall Street Journal by real-estate listings company Zillow. For homes priced over $1 million, they go nearly twice as long, with a median 487 characters. (That's about the length of this paragraph.) "Generally, what you find is that regardless of the region, the more expensive the home is, the more characters are used to describe that home," says Stan Humphries, chief economist at Zillow. Here's a plot of the relationship between description-length and price, from the cited article: Ms. Tanaka contacted me while she was researching this story, and I sent her a link to "The quality of quantity", 4/24/2012, in which I noted a similar relationship in the cases of wine reviews and letters of recommendation: The longer it is, the higher the rating: We're talking about the length of wine reviews, measured in words, and the numerical rating given to the associated wine. (Well, actually, the length of the reviews is measured in terms of the output of a tokenizer that sets off punctuation as well as alphanumeric strings…). Unfortunately, wine reviews and letters of recommendation were too far off topic, and so my contribution to the article ended up being some speculations on the generation and interpretation of real-estate listings, a topic that I had never specifically studied: Writing long for pricier homes has become standard practice in real estate, Mr. Liberman says. In fact, a short remark or a lack of hyped-up adjectives could suggest that there's something wrong with the home, he adds. "Given that all the descriptions of better properties are full of these empty-enthusiasm words, it might be interpreted by readers as an indication of problems if they're absent," he says. Given this published commitment to empirically-unsupported common sense, I felt obliged to look into the facts. So with the kind permission of the PR department at trulia.com, I've downloaded their listings for  Philadelphia, Boston, Los Angeles, New York City, Las Vegas, Miami-Dade, Denver, and Chicago, a total of 52565 descriptions of properties for sale. It's taken my computer a week, since their server hands out pages at a slow pace, saving me the trouble of slowing the process down on my end so as not to abuse their bandwidth– but eight cities is enough to support a Breakfast Experiment™ more or less on the topic of the WSJ story. To start with, this dataset from Trulia qualitatively replicates the findings from Zillow: I was worried that part of the effect might be due to the fact that lower-priced listings are more likely to lack a description entirely — and indeed, 12.3% of the listings in the $100-200K price range are descriptionless, compared to just 2.4% of the listings in the $1M+ range. But this doesn't affect the trend much — the red line in the plot above gives the results for listings that actually contain a description. However, when things are calculated this way, the effects look very different in different cities. Here's a plot showing Miami (M), Chicago (C), New York City (N), and Philadelphia (P): This is partly but not entirely due to the large differences in price distributions among the cities — for instance: Anyhow, I'll leave for another day the problem of separating region effects from price effects, and turn briefly to the issue of what description-writers are doing with those extra characters. Returning to common sense, it seems likely that some of the extra length is due to more expensive properties having a larger number of positive features to describe, while some of it is due to describing more expensive things at greater length, for example by using more empirically-empty evaluative adjectives like "stunning" or "spectacular". And indeed there's evidence for both of these effects. More expensive listings are more likely to talk about decks, offices, fireplaces, and so on, because more expensive properties are more likely to have those features. Thus the word "fireplaces" is ten times more likely to occur in the pricier half of each city's listings than in the cheaper half. And there are many evaluative adjectives that are more likely to be used in describing more expensive properties, for example: Rate per MW in top 50% Rate per MW in bottom 50% Ratio exquisite 235 53 4.5 dramatic 247 57 4.3 soaring 215 54 4.0 expansive 361 97 3.7 sophisticated 149 40 3.7 luxurious 512 143 3.6 lush 183 60 3.0 breathtaking 256 90 2.8 prestigious 236 90 2.6 On the other hand, there are some evaluative adjectives, even adjectives that are positively-evaluated in general, that go in the opposite direction: Rate per MW in top 50% Rate per MW in bottom 50% Ratio cute 9 75 0.12 nice 173 1196 0.14 good 205 819 0.25 clean 100 292 0.34 convenient 147 391 0.38 fresh 65 164 0.46 lovely 301 551 0.55 excellent 518 873 0.59 charming 262 419 0.63 In the tables above, the listings are divided into top and bottom price quantiles city by city, rather than across the board. But there seem to be some overall price effects as well, so that the frequency of price-associated terms varies across cities in a way that's partly explained by the city-to-city differences in price distributions. Thus the exclamation point is more likely to be used in lower-priced listings — if we calculate the rates based on city-by-city price quantiles, we find that the rate in the top half of listings is 7124 per MW, while the rate in the bottom half is 12310 per MW. But here's a scatterplot showing the relationship between city-wise exclamation-point frequency and log city-wise median listing price: Here's what happens if we break the cities up into their pricewise top half (NYC1, LA1, …) and bottom half (NYC2, LA2, …): I believe that both the absolute (across-city) price and the relative (within-city) price are playing a role in these distributions, but proving that will have to wait for another breakfast time.

Viewing all articles
Browse latest Browse all 2364

Trending Articles