13 de mai. de 2009

O que fazer com a base de dados

Quer saber quantas vezes uma palavra apareceu no NY Times? A resposta é base de dados, e o Times Lab responde como fazer:

"The above graph shows how many stories containing particular words were printed in The Times in a given year. The giant spike in 2001 - representing 1,581 stories - is “terrorism”. Other words to have waxed and waned are the Princess of Wales, which peaked in 1997 and rose again in 2007 - the year of the inquest into her death, and Google, which has been a steady riser since our early articles about it in 2000. 

We compiled the data using a Ruby script which scoured a web-based version of the Times database. The script conducted separate searches for a particular term in the database over a number of years, and extracted - from the results page of each search - the number of articles in which that term appeared. We then normalised the results to account for the overall rise in the number of articles written in the Times over the past quarter century.

Having written the program, we’ll probably re-use it to conduct searches across particular themes. Environment is an obvious one: when did ‘climate change’ replace ‘global warming’? And whatever happened to ‘the greenhouse effect’? Any suggestions about what other terms we should investigate are welcome."


LM

Nenhum comentário: