| Search the site | Contact us |
|
December 2002 newsletter
Mining the WebThe Internet is the lexicographer's friend. It can provide factual information to help refine definitions, it can suggest new avenues of research, but, perhaps most importantly for the OED, it can supply quotations. As regular readers of OED News will know, a large part of lexicographers' time is spent putting together and updating paragraphs of quotations which give contextual examples for each word we define. We have for several decades used electronic databases to aid this work, notably the British National Corpus and Lexis-Nexis. Now, with the arrival of the Internet, tens of thousands of scholarly texts and individual works of literature are available to us in a searchable form. However, bigger isn't necessarily better: we need to be discriminating. A search engine, such as Google, provides a scattergun approach, returning a vast set of results with no indication of the date or reliability of sources. We are therefore most interested in material that has been collected together in databases, where we are able to carry out sophisticated searches (by date, in proximity to other terms, etc.) and where we can rely on the provenance of the information we are viewing. Some of the databases we use are freely available; in other cases we must pay a subscription. Editors are becoming increasingly adept at knowing which database is most likely to provide evidence for the term they are working on. Literature Online: A Millennium of EnglishLiterature Online is perhaps the largest electronic resource used by the OED. It is certainly the database which covers the longest period, with texts ranging from Beowulf to the poems of Benjamin Zephaniah, and from the plays of Aphra Behn to the novels of the Brontë sisters. As this list suggests, the database also includes an extremely diverse range of genres and styles; American Poems jostle with Renaissance Plays, Medieval Lyrics with Victorian Novels. An important role which Literature Online plays in revision is as a source of antedatings. Some of these have been dramatic, transforming our view of the period in which a word was current. For example, the earliest known use of the noun olive oil has been pushed back from 1774 to 1566, while the adjectives misspelt and outspoken have been antedated from 1838 to 1762 and from 1808 to 1661 respectively. In other cases the change in the date at which a word was first used may not be that great. However, a large body of small antedatings may have important cumulative significance. For example, in both the first and second editions of the OED, Shakespeare appears as the originator of a large number of words and senses (e.g. majestically, neglected), but by looking at texts written by less well-known sixteenth-century authors on Literature Online, we have confirmed our suspicions that in many of these cases Shakespeare was using words which were already current. This gives us an altered view of Shakespeare as a writer; more importantly for the lexicographer, it begins to modify our understanding of how and why English vocabulary changed in the sixteenth and early seventeenth century. JSTOR: The Origins of Scientific LiteratureJSTOR documents a much more formal type of writing: scholarly articles published in academic journals and periodicals. Publications held on JSTOR are drawn from a wide range of disciplines in the humanities, social and natural sciences, with titles including the Bulletin of Symbolic Logic, Family Planning Perspectives, and the Slavonic Year-Book. The oldest journal on JSTOR, the Philosophical Transactions of the Royal Society, dates back to 1665. Given this early starting date, it is perhaps no surprise that JSTOR has provided some important antedatings. Recent examples include Newtonian (antedated to 1676 from 1713), nucleus (1668 from 1708), and molecule (1701 from 1794). JSTOR has also been useful in documenting the emergence of more recent words and expressions (it seems that many everyday terms started life in academic literature). The adjectives caffeinated, benchmarked, and off-peak, and the nouns child-minding, multiculturalism , and comfort zone are all examples of recent additions to the OED which currently have first quotations from JSTOR sources. The Times: New HorizonsEditors at the OED are constantly on the lookout for new databases to help in our work. The latest (and very recent) addition to the OED's collection of electronic resources is the Times Digital Archive. This database will eventually include the text of every edition of the Times newspaper printed between 1785 and the present day. The archive is still under construction and at present we can only search papers published between 1924 and 1955. But even this period has proved fruitful, providing, amongst other things, an antedating for the phrase man of the match (from 1963 to 1924). As the database expands it is likely to become a key research tool for the OED. The resources described represent just some of those used in our research. Such mining of the Web is enjoyable and frustrating by turns. Nuggets of text lie awaiting discovery; our job is to sift through surrounding material in the most efficient way we can. The Internet will never replace our paper files, reading programmes, or the work of individual contributors as a source of quotations, but it is certainly worthy of a place alongside these. |
|
| Copyright © Oxford University Press 2009
Privacy policy and legal notice www.oed.com/newsletters/2002-12/mining.html |
![]() |