A Corpus Based Analysis of William Blake
A Corpus Based Analysis of William Blake
A Corpus Based Analysis of William Blake
21st Century
Edited by
Zekiye Antakyalıoğlu,
Kyriaki Asiatidou,
Ela İpek Gündüz,
Enes Kavak
and Gamze Almacıoğlu
English Studies in the 21st Century
All rights for this book reserved. No part of this book may be reproduced,
stored in a retrieval system, or transmitted, in any form or by any means,
electronic, mechanical, photocopying, recording or otherwise, without
the prior permission of the copyright owner.
A CORPUS-BASED ANALYSIS
OF WILLIAM BLAKE’S SONGS OF INNOCENCE
AND SONGS OF EXPERIENCE
MELTEM MUŞLU
learner corpora, second language acquisition and foreign language teaching, ed.
Granger, Sylviane. & Hung, Joseph. & Petch-Tyson, Stephanie. (Amsterdam: John
Benjamins, 2002).
2 Tony McEnery and Andrew Wilson, Corpus Linguistics (Second Ed.). (Edinburg:
3, (2009): 1–17.
4 McEnery and Wilson, Corpus Linguistics (Second Ed.).
A Corpus-based Analysis of William Blake’s Songs of Innocence 245
and Songs of Experience
language of poetry and drama?” ed. McCarthy, Michael; O’Keeffe, Anne. The
Routledge Handbook of Corpus Linguistic. (New York: Routledge, 2010), 516-
530.
11 Michaela Mahlberg and Dan McIntyre, “A case for corpus stylistics: Ian
(2007): 33–63.
23 Viana Viana, Fausto Fabiana and Zyngier Sonia, “Corpus Linguistics and
in literature.
25 Leila Baradaran Jamili and Sara Khoshkam, “Interrelation/Coexistence between
Research questions
1. What are the key semantic categories of the most frequent words
used in Songs of Innocence and Songs of Experience?
2. What are the words used in describing the semantic categories
- emotions
- thought, belief, and knowledge
- sensory systems, and
- time?
3. To what extent are these categories similar to each other?
Statistical analysis
To have an overall idea about a text, generally, most frequent words
used in a specific text are found by using different software programs,
such as WordSmith and AntConc. With the help of the WordList tool,
these programs give a list of the words used in a text with their
frequencies regardless of their parts of speech. Although looking at
frequency lists are an essential starting point for a systematic textual
analysis, counting individual words is not sufficient to understand a text
fully.26 For a more efficient analysis, lexical variety of a text should also
be determined, and following it, a more detailed contextual analysis
should be done. To find out the lexical variety, the type-token ratio,
referred to as T/t hereafter, is calculated when doing the analysis. T/t ratio
is a measure of vocabulary variation within a written text or a person’s
speech and it is a helpful measure of lexical variety.27 A high T/t indicates
a high degree of lexical variation while a low T/t indicates the opposite.
In other words, the more types there are in comparison to the number of
tokens, then the more varied the vocabulary is. T/t can be expressed as a
percentage, multiplying the ratio by 100.28 T/t ratio of a word represents
token-ratio.html
28 Dax Thomas, “Type Token Ratios in One Teacher’s Classroom Talk: An
the percentage of that word within all words in a corpus, that is, the
number of words that fall into per 100 words. The T/t ratio is calculated
according to the following formula:
http://www.birmingham.ac.uk/Documents/collegeartslaw/cels/essays/languageteac
hing/DaxThomas2005a.pdf.
29 Graham Williamson (2009), http://www.sltinfo.com/wp-content/uploads/2014/
It is found that SoI consists of 2422 words, which are grouped under
173 different semantic categories, whereas, SoE consists of 2880 words
that are grouped under 220 different categories. In the analysis, some
words were classified under two or more different categories depending on
the context in which they were used. For instance, the word little was
grouped under these categories: N5 Quantities, A 13.7 Degree/minimizer,
N 3.2 Size: small, T3 Time: new and young (e.g.: little lamb), and Z1
Personal names (e.g.: Little Lamb). The USAS system not only includes
content words but also function words. For example, apostrophe s (’s) is
not a word itself, but it is counted as a word in the semantic category list
under the code Z5 (grammatical bin). Table 3 below shows the twenty
most frequently used semantic categories in SoI and SoE. In the table,
words with a grammatical or discourse function, such as grammatical bins
(Z5), pronouns (Z8) and discourse bins (Z4) were excluded since the aim
of the study was to focus on the meaning of the words used rather than
their grammatical functions.
When the most frequent semantic categories in SoI and SoE were
compared, it was seen that most words were related to emotions,
descriptions of nature, and religion. It is not a surprising finding since like
other Romantic poets, there is a special concern for passion, emotion, and
nature in Blake’s poetry. Blake focused on man’s emotions and his
interrelation with nature, which is also common in every Romantic
poem.32 Poetry expresses the poet’s own mind, imagination and emotion.
Therefore, Romantic poems take the experiences, thoughts, and feelings of
the poets who wrote them as their subject matter, not the actions of the
other men. In order to understand Blake’s poetry better, a deeper analysis
focusing on the aforementioned groups in the research questions would be
necessary.
The most frequently used words in SoI were related to the category
happy; which is classified under the group of emotions. When counting
the number of words used in each category, different syntactic functions
(parts of speech) or inflection (e.g. present or past tense conjugation, and
gender and number agreement) were also calculated. For example, the
word merry was used both as an adjective and as an adverb 9 and 7 times
respectively. In SoI, the words used related to happiness was happy,
merry/merrily, smile(s/d), laugh(ing/s), delight, cheerful, joys, rejoice and
relief, all of which were used 80 times in total. On the contrary, words
with the meaning of being happy were used only 26 times in SoE. The
words used in SoE were a delight, joy(s), happy, smile(d/s), and rejoicing.
When the words related to sadness were compared in SoI and SoE, it was
found that more words related to the category sad were used in SoE. The
SoI SoE
Sad 44 50
Happy 80 26
Fear/shock 2 21
Violent/angry 2 19
Calm 9 1
Bravery - 4
SoI SoE
SoI SoE
Taste 19 8
Sound 21 12
Touch 1 -
Sight 14 13
Smell - 1
The table above shows the (positive) words used related to the sensory
systems. Related to the sense of taste, sweet was the common word used
both in SoI and SoE. It was used 8 times in SoI, whereas 19 times in SoE
(sweet 18, sweeter 1). However, when the concordance lines for sweet
were read by using AntConc software, it was found that the word was used
only once with the meaning of taste in The Human Abstract in SoE (‘sweet
to eat’). In all other usages, it was used figuratively, such as sweet smile,
sweet love, sweet moans, sweet dreams, sweet flower, and sweet kiss.
When the words related to the sense of hearing were analyzed, it was also
seen that more words were used in SoI (21) compared to SoE (12). These
frequencies include positive words, yet when the negative words (silent
category) related to the sense of hearing were calculated it was found that
there was a total of 30 words used in SoI and 15 words in SoE. The words
and the frequencies are as follows: hear (9), hears (3), heard (3), sound (2),
noise (2), groan (1), echoed (1), louder (1), thunderings (1), silent (4),
quiet (1), hush (1), mute (1) in SoI; and heard (5), hear (4), hearing (1),
growl (1) and groaned (1) in SoE. Only the word stroke was used once in
SoI related to the sense of touch, and the word smelling related to the
sense of smell was used once in SoE. See (4), seen (3), sees (1), saw (2),
sight (2), watching (1), watchman (1), unseen (1), blind (1) were the words
A Corpus-based Analysis of William Blake’s Songs of Innocence 257
and Songs of Experience
used related to the sense of sight in SoI. See (6), saw (3), sees (1), seen (1),
behold (1), beheld (1), invisible (1), blind (1) were used in SoE.
Considering the importance of senses in Blake, it is surprising that the
percentage of words used in the sensory system category is low. It is also
surprising that the lexical variety is also low; generally, the same words
were repeated and inflected.
SoI SoE
Past X used to (1)
Future shall (8) soon (1) ‘ll (4) shall (6) future (2) wilt
will (2) (1) soon (4) shalt (1)
Time: period night (10) day (6) night (17) nights (1) day
morning (3) evening (1) (10) summer (6) winter
year (3) Thursday (2) (5) morning (3) evening
forever (1) (2) may (1) hour (1)
year (1) Thursday (1)
Time: beginning X eternal (2) endless (1)
Time: old, grown- old (2) aged (1) old (1) outworn (1)
up + ancient (3)
Time: new, young little ones (2) babe (2) youth (6) youthful (3)
(-) young (2) new (1) infant fresh (1) little ones (1)
(11) infants (1) infant (3)
As seen in Table 7, when the time words used in SoI and SoE were
compared, it was found that no words related to the past and beginning
were used in SoI. SoI talks about the pastoral world of childhood, a period
of being innocent and naïve, and having naïve hopes and fears. Therefore,
it is not surprising that words under the category of time are mostly related
to the present and future rather than the past. Similarly, since SoE talks
about the experiences, corruption and repression of the adult world, words
258 Chapter Eighteen
related to present, past and future were all used. To have a better
understanding of the time concept, the corpora were also POS tagged.
Table 8 presents the results of POS tagging.
SoI SoE
Present 172 202
Past 91 136
Although no words in SoI were found in the time category when the
semantic fields of the words were looked at, when the corpus was POS
tagged, it was found that the past tense was used 91 times. However, both
in SoI and SoE, the present tense was used more frequently, which is
consistent with the semantic field analysis. In SoE, the harsh experiences
of adult life, destroying what is good in innocence was explained; the past
was scrutinized and the current condition of the late 1700s was criticized.
Present is an extension of past and we cannot live present life without past.
Therefore, the whole collection of poems refers both to the present and
past.
Conclusion
As mentioned before, corpus stylistics is not a field that competes with
traditional literary studies but a field that can provide invaluable data for
literary and stylistic analysis by providing comparative information
through quantitative analysis and complementing the manual analysis with
new techniques. The purpose of this study was to find out the semantic
categories of words used in Songs of Innocence and Experience and to try
to understand how these categories can be linked to what has been said
about Blake’s poetry.
In the analysis, firstly lexical variety in the two collections were
compared (T/t ratio). Poetry is a genre that is identified with high lexical
variety. However, it was found that both SoI and SoE is not very rich in
terms of lexical variety. Nevertheless, when the style of Blake is taken into
consideration, this is not a surprising result. Blake is known for his
parallelism, repetition of simple words or phrases, and using opposite
words.
A Corpus-based Analysis of William Blake’s Songs of Innocence 259
and Songs of Experience
When the semantic categories of the words used in SoI and SoE were
compared, it was found that the most frequent category was happy in SoI,
whereas the words related to sadness were more frequent in SoE. This not
surprising since SoE talks about the corruption, repression and destruction
of innocence by modern society. When the words in the knowledge and
learning category were looked at, it was found that more varied vocabulary
with higher frequency count was used in SoE compared to SoI, paying
more attention to wondering and searching. Getting more experienced and
searching for a better understanding of the world, unfortunately, ends with
the collapse of manhood. According to Blake, senses are important
because they are an inseparable part of the soul. Thus, words related to
sensory systems were also analyzed. It was surprising to see that the
percentage of words used in the sensory system category was low and not
varied, as the same words were repeated, and although the word sweet was
frequently used, in only one of the instances was it related to the sense of
taste. In all other instances, it was used figuratively. Even though it is not
uncommon to see figurative language in literature, it is surprising that not
many words were used related to the sensory systems in Blake’s poetry.
Similarly, although no words related to the past and the beginning were
used in SoI, the past tense was used 91 times. Consistent with the semantic
field analysis, both in SoI and SoE, the present tense was used more
frequently.
The current study focused on only four semantic categories (emotions
thought, belief and knowledge, sensory systems, and time) in the Songs of
Innocence and Experience. Despite some surprises, the results of the study
were mostly expected. However, to understand Blake and his poetry better
and to make a link with the literary studies and corpus studies, all semantic
categories in the collections should be analyzed in detail.
References
Anthony, Laurence. (AntConc (Version 3.5.8) [Computer Software].
Tokyo, Japan: Waseda University. Available from
https://www.laurenceanthony.net/software, 2019.
Chollier. Christine. “Textual Semantics and Literature: Corpus, Texts,
Translation.” Signata [Online], 5 (2014):
http://journals.openedition.org/signata/461. DOI: 10.4000/signata.461
Culpeper, Jonathan. “Keyness: Words, Parts--of--speech and Semantic
Categories in the Character-talk of Shakespeare's Romeo and Juliet.”
International Journal of Corpus Linguistics 14, no. 1 (2009): 29-59.
260 Chapter Eighteen
McIntyre, Dan.; Walker, Brian. “How can corpora be used to explore the
language of poetry and drama?” ed. McCarthy, Michael; O’Keeffe,
Anne. The Routledge Handbook of Corpus Linguistic.New York:
Routledge, (2010): 516-530.
Nazan Tutaş. “William Blake ’de Masumiyet ve Tecrübe: Kuzu ve
Kaplan.” Folklor/edebiyat, 20, no. 78, (2014/2): 83-90.
O’Halloran, Kieran. “Corpus-assisted Literary Evaluation.” Corpora, 2 no.
1 (2007): 33–63.
Oliveira, Iasmine S. “Robert Frost’s Poems: Some Light from Corpus
Analysis.” Revele, 7, (2014): 125-140.
Rayson, P. Wmatrix: a web-based corpus processing environment,
Computing Department, Lancaster University, 2009,
http://ucrel.lancs.ac.uk/wmatrix/
Römer, Ute. “Where the Computer Meets Language, Literature, and
Pedagogy: Corpus Analysis in English Studies.” Edited by Gerbig,
Andrea and Anja Müller-Wood 2006. How Globalization Affects the
Teaching of English: Studying Culture Through Texts. Lampeter: E.
Mellen Press. 81-109, 2006.
Scott, Mike and Tribble, Christopher. Textual patterns. Key words and
corpus analysis in language education. Amsterdam: John Benjamins,
2006.
Stubbs, Michael. “Conrad in the Computer: Examples of Quantitative
Stylistic Methods.” Language and Literature, 14 no. 1 (2005): 5–24.
DOI: 10.1177/0963947005048873.
Thomas, Dax. “Type Token Ratios in One Teacher’s Classroom Talk: An
Investigation of Lexical Complexity.” (2005), Retrieved 16 December
2014, from:
http://www.birmingham.ac.uk/Documents/collegeartslaw/cels/essays/l
anguageteaching/DaxThomas2005a.pdf.
Viana Viana, Fausto Fabiana and Zyngier Sonia. “Corpus Linguistics and
Literature: A Contrastive Analysis of Dan Brownd and Machado de
Assis.” Edited by Zyngier S, Viana V & Jandre J Textos e leituras:
Estudos empiricos de lingua e literature (Texts and Readings:
Empirical studies of language and literature) Rio de Janeiro: Publit,
(2007): 233-256.
Williamson, Graham, (2014). Available from:
http://www.sltinfo.com/type-token-ratio.html
Williamson, Graham. (2009). http://www.sltinfo.com/wp-content/uploads/
2014/01/typetoken-ratio.pdf Retrieved 3 August 2013.