Corpus Analysis of Business English
Corpus Analysis of Business English
Corpus Analysis of Business English
BETTINA FISCHER-STARCKE
International business has greatly increased over the past decades, and business transactions
are frequently carried out in English. As a result, the study of business English has become
an increasingly popular and important field of research with applications in teaching (e.g.,
Brown & Lewis, 2002) and consultancy (e.g., Baxter, Boswood, & Peirson-Smith, 2002). The
term “business English” usually refers to language used in a commercial setting, for example
by and with profit-oriented companies. The text types of the genre include both written
and spoken language, such as letters, emails, reports, and replies to customer enquiries.
For research purposes, corpora of business language, in particular business English by
native and non-native speakers of the language, have been built. Some of these are publicly
available, for example:
In addition, Nelson (2004) provides access to data generated from his Business English
Corpus (http://users.utu.fi/micnel/business_english_lexis_site.htm). Moreover, the Cambridge
and Nottingham Business English Corpus (CANBEC) (http://www.cambridge.org/elt/
• terminology, for example brand image, GDP (Mascull, 2004), direct mail, and sponsorship
(Brook-Hart, 2006);
• phrases to be used in a business context, for example in presentations, in reacting to
follow-up questions (Harding & Taylor, 2005), and in describing survey results (Farrall
& Lindsley, 2008);
• general vocabulary, for example phrasal verbs, here called “multi-part verbs,” such
as point out and call on (Trappe & Tullis, 2005, p. 36) and phrases for expressing dis-
agreement (Powell, 2009).
These words and phrases reflect the authors’ intuitive perceptions of characteristic linguistic
features of the genre of business English, since, with the exception of Mascull (2004), none
of the textbooks reviewed for this entry refers to corpus data.
The textbooks mirror a trend in research on business English toward focusing on lexis
and phraseology and to put less emphasis on usage and pragmatics. The categories of
lexis and phraseology taught include
1. general business vocabulary, that is, words or phrases that are used both in business
and in everyday situations, such as paid or a phrase asking for clarification;
2. core business vocabulary, that is, terminology such as investment;
3. topic-specific vocabulary, that is, words and phrases that are specific to a particular
business context, such as hedging transaction risk, a term relating to the stock market.
The focus on characteristic lexis and phraseology of business English in teaching materials
reflects insights gained in previous linguistic research that some words and phrases are
more characteristic of one genre or register than of another: see for example Coxhead (2000,
2002) on characteristic lexis of academic English, and Biber, Conrad and Cortes (2004) on
lexicalized phrases of academic English.
Consulting a corpus for the design of teaching materials gives a grounding on which to
make a decision about which words and phrases are particularly useful for students to
corpus analysis of business english 3
learn, and should therefore be included in the teaching materials. This is based on authentic
data used in a business context, as opposed to an author’s intuitions about the register.
Unlike intuition, corpora allow us to identify those words and phrases which really are
most characteristic of the register as objectively as possible. On the corpus linguistic prin-
ciple that equates frequency with importance in language (Sinclair, 1991), the most frequent
words and phrases of a register, as represented by the corpus, are perceived to be its most
characteristic. They are therefore necessary features for the students to learn. Also, the
principle of language teaching that frequent items should be taught before infrequent ones
(West, 1953; Nation, 2001) supports a choice of words and phrases on the basis of their
frequencies in a corpus for teaching purposes. The following analysis therefore demonstrates
one way of identifying these items and of generalizing from them appropriate teaching
content for business English courses for non-native speakers of English. Because of con-
straints on space, however, their conversion into actual teaching materials is not discussed
in this entry. For the same reason, the focus of the analysis is on lexis and phraseology
with the pragmatics and usage of the lexis and phrases being only minor considerations.
This also reflects the main trends of research in business English.
The analysis uses the Wolverhampton Corpus of Business English (henceforth WCBE),
since its compilation ensures that it gives an overview of the language used in written
international business transactions by speakers of different mother tongues. This mirrors
the majority of situations for which non-native English-speaking students of business
English classes require English in their future working lives, that is, writing documents in
English which are read by native and non-native speakers of English. Spoken interactions
in English are likely to be less frequent in their future professional environments. The
software used for the analysis is Word Smith Tools (Scott, 2007) and kfNgram (Fletcher,
2002).
Using such data and software guarantees that the results gained from the analysis
are as independent of the researcher’s intuitions as possible, and ensures that teaching
materials reflect the real language usage in business interactions as opposed to how authors
of teaching materials think it is used. Consequently, the findings on relevant units of
teaching gained in the following analysis differ in part from the foci of current textbooks.
One way of identifying characteristic words and phrases of business English to be
included in teaching materials is to extract quantitative key words and the most frequent
phrases of the corpus. Quantitative key words are words which are statistically more
salient in a corpus than in a reference corpus. Key words are identified on the basis of
their relative frequencies in comparison to another set of data, as opposed to their raw
frequencies. Phrases below are (a) uninterrupted phrases of n words, called n-grams, or
(b) phrases of p words that are variable in one slot, called p-frames. The phrases are iden-
tified on the basis of their frequencies of occurrence without any semantic or pragmatic
criteria. The key words are identified by comparing the WCBE with the British National
Corpus (BNC), a 100-million-word corpus of general English. This comparison identifies
those words which are characteristic of business English as compared to general English.
The classification of a word as business-specific was, in the first step of the analysis, an
intuitive process. In the second step, concordance lines of the words in question were
analyzed and those words which occurred in a business context in at least 50% of the
concordance lines were retained on the list.
The identification of both key words and the most frequent phrases in the corpus is
based on the automatic and quantitative analysis of large amounts of text to (a) compare
the lexis of two corpora with each other and (b) identify recurrent strings of words on the
basis of their frequencies. Neither can be identified intuitively: the amount of data is too
large to detect the patterns without the help of software. Conversely, however, the large
corpus allows us to analyze a sample of the register that is as representative as possible.
4 corpus analysis of business english
The larger the representative corpus, the more likely it is that it features constituent lin-
guistic patterns of the register with statistically significant frequency. It is these constituent
patterns that students of the register need to learn, since only by knowing and actively
using these are students able to produce discourse that conforms to the expectations of
the register’s discourse community.
The key word analysis of the WCBE when compared with the BNC identified more than
1,500 key words. This indicates large lexical differences between the two corpora and
therefore between business and general English. For practical reasons, I will look at the
500 statistically most significant key words of the list.
Of these 500 key words, 305 are business-specific and 195 are general English words.
The business-specific words are mostly terminology, frequently nouns, which identify
business concepts, for example investment, securities, shares, and commission. Verbs include
paid, investing, purchased, and issued. Some business-specific words, such as paid, refer to a
business transaction, but have entered general English because of their very frequent usage
in and reference to everyday transactions.
The findings on (a) the high number of key words identified in this analysis and (b) the
high number of nouns on the list allow us to draw two conclusions that are relevant for
the design of teaching materials:
3-grams 25 25
3-frames 37 13
4-grams 16 34
4-frames 19 31
5-grams 8 42
5-frames 5 45
6-grams 5 45
6-frames 1 9
corpus analysis of business english 5
analysis. Examples for both types of phrases are a * of the, a general English 4-frame, and
the market value of the, a business-specific 5-gram. Like the key words, business-specific
phrases were identified in a two-part process by (a) an intuitive classification and (b) an
analysis of the phrases’ concordance lines.
The numbers above and the items on the lists suggest three main patterns relevant for
teaching:
1. The longer the phrases are, the more business-specific they are. This confirms findings
from other phraseological analyses (e.g., Starcke, 2008) which have shown that shorter
phrases are open to more lexical combinations than longer ones. Because of the greater
number of lexical and grammatical items included in the phrases, longer phrases are
more restricted in their lexical and grammatical environments than shorter ones.
2. Many of the phrases that are categorized as business-specific are extensions of shorter
phrases classified as general English. This is the case, for example, with class a * which
becomes * class a shares. Changing the length of a phrase might also change its clas-
sification of being or not being characteristic of a particular register, possibly based
on the occurrence of lexis that might have a different meaning in another register, for
example share.
3. Business terminology, for instance securities and asset, is a determining factor for clas-
sifying a phrase as business-specific, for example by investing primarily in, which is
rendered business-specific by the occurrence of the term investing. The classification
of a phrase might therefore depend on the occurrence of specific lexis, possibly of one
word only. This indicates that the lexis and phraseology of the register are inter-
dependent. In fact, those words which render phrases business-specific are frequently
identified as key words in the key word analysis presented above.
On a more general level, the phraseological analysis of the WCBE has identified three
types of phrases. One type is what I call terminological phrases, that is, phrases which express
a particular concept and have to be learned as one unit. An example is the securities act,
which specifies a particular law. These phrases are complementary units of both lexis and
content, in this case the regulations set out in the securities act. Language and content can-
not be separated.
The second type is fixed phrases, n-grams, which are not business-specific, for example
in connection with, which might characterize a formal written style.
The third type is both general and business English p-frames which are more composi-
tional in nature in that they are flexible in one slot, for example the * of the and net asset
value *. In the WCBE, the most frequent realizations of the * in the first phrase are value,
end, date, terms, and time, and at, the, and, fund’s, and in in the second phrase. They are
combinations of both the phrase and the fillers of the variable slot which are, in fact, the
phrase’s collocates and paradigmatic variables. Being able to combine them allows learn-
ers of the register to produce idiomatic language.
All three types of phrases are characteristic features of the register, and it is the co-
occurrence of the characteristic phrases and lexis in the discourse which transforms general
English into business English and also allows the two to co-occur. Reflecting this is a main
concern of business English textbooks.
Taking the lemma SHARE*, that is, the node word share with every possible realization
of the * such as shared and sharing, as one example for a practical application of these
findings, Table 2 shows its numbers of occurrences as part of the n-grams and p-frames
and its co-occurrences with the node word class. Both lexical items are also identified as
key words in the corpus.
6 corpus analysis of business english
Table 2 Occurrences of SHARE*
3 6 4 4 1
4 12 16 11 14
5 10 16 9 14
6 9 3 8 3
The high number of occurrences of SHARE* as part of the corpus’s most frequent phrases
and the high number of co-occurrences of SHARE* and class indicate (a) the importance
of the vocabulary of shares and the share market in general within the register, and (b)
that shares are frequently defined and identified, for example by mentioning their class.
This defining of shares often follows two grammatical patterns. First, in 61 of its 76 occur-
rences, SHARE* stands to the right of a determining or defining word or phrase, for
example the class * share and the net asset value per share. Second, SHARE* is used in passive
constructions in seven instances, for example shares are subject to * and if shares are redeemed,
and in a further four instances SHARE* is the object of an action, for example purchase of
class a shares. Active voice constructions or phrases in which SHARE* functions as agent,
for example a description of a share’s performance, do not occur among the corpus’s most
frequent phrases and therefore seem to be less frequent in language usage than passive
constructions. These grammatical patterns could be reflected in teaching materials.
The pragmatics of business English, for example politeness and conventions for small
talk in a business context, are more difficult to identify in a frequency-based corpus
analysis than the features discussed above. Nevertheless, the occurrence of the modal verbs
may and shall in the list of key words might indicate the salience of politeness and there-
fore of pragmatics in the genre.
An analysis of their concordance lines, however, shows that shall mainly occurs in
legal contexts such as specifying requirements, rights, and obligations. Its most frequent
collocates at L1 (immediate left) are fund, and, committee, board, and plan and at R1 (imme-
diate right) are be, not, have, include, and become. This stresses the importance of legal
English, which is not featured in many business English textbooks, for business English.
May, on the other hand, is mainly used as a marker of politeness with fund, you, which,
and, and it as its most frequent L1s and be, also, not, invest, and have as its most frequent
R1s. This is a possible starting point for the discussion of the linguistic means and the
significance of politeness in business discourse and therefore for teaching pragmatic con-
ventions in business English.
Using frequency criteria based on a corpus linguistic analysis for the selection of units
of teaching in business English classes may suggest a shift in focus of currently used
teaching materials. The corpus linguistic evidence suggests that the strong emphasis on
lexis in teaching materials might be shifted to include both lexis and phraseology as well
as a stronger focus on the pragmatics of politeness in business English and the salience of
legal English. This includes lexis and phraseology which are both business-specific, that
is, business terminology and terminological phrases, and general English. Only by teaching
and, from the students’ perspective, learning these various aspects of the genre can the
learners become competent members of the discourse community of business English.
corpus analysis of business english 7
References
Baxter, R., Boswood, T., & Peirson-Smith, A. (2002). An ESP program for management in horse-
racing business. In T. Orr (Ed.), English for specific purposes (pp. 117–46). Alexandria, VA:
TESOL.
Biber, D., Conrad, S., & Cortes, V. (2004). If you look at . . . : Lexical bundles in university teach-
ing and textbooks. Applied Linguistics, 25(3), 371–405.
Brook-Hart, G. (2006). Business benchmark: Upper-intermediate. Cambridge, England: Cambridge
University Press.
Brown, T. P., & Lewis, M. (2002). An ESP project: Analysis of an authentic workplace conversa-
tion. English for Specific Purposes, 22(1), 93–8.
Collins, H., & Scott, M. (1997). Lexical landscaping in business meetings. In F. Bargiela-Chiappini
& S. Harris (Eds.), The languages of business: An international perspective (pp. 183–208).
Edinburgh, Scotland: Edinburgh University Press.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213–38.
Coxhead, A. (2002). The academic word list: A corpus-based word list for academic purposes.
In B. Kettemann & G. Marko (Eds.), Teaching and learning by doing corpus analysis: Proceedings
of the fourth international conference on teaching and language corpora, Graz 19–24 July, 2000
(pp. 73–89). Amsterdam, NY: Rodopi.
Farrall, C., & Lindsley, M. (2008). Professional English in use: Marketing. Cambridge, England:
Cambridge University Press.
Fletcher, W. H. (2002). kfNgram. Retrieved July 29, 2009 from http://www.kwicfinder.com/
kfNgram/kfNgramHelp.html
Graff, D., & Wu, Z. (1995). Japanese business news text. Philadelphia, PA: Linguistics Data
Consortium.
Harding, K., & Taylor, L. (2005). International express: Intermediate student’s book. Oxford, England:
Oxford University Press.
Lee, D. (2009). Bookmarks for corpus-based linguistics. Retrieved July 29, 2009 from http://personal.
cityu.edu.hk/~davidlee/devotedtocorpora/CBLLinks.htm
Mascull, B. (2004). Business vocabulary in use: Advanced. Cambridge, England: Cambridge
University Press.
Nation, P. (2001). Using small corpora to investigate learner needs: Two vocabulary research
tools. In M. Ghadessy, A. Henry, & R. L. Rosenberry (Eds.), Small corpus studies and ELT:
Theory and practice (pp. 31–45). Amsterdam, Netherlands: John Benjamins.
Nelson, M. (2004). Mike Nelson’s business English lexis site. Retrieved July 29, 2009 from http://
users.utu.fi/micnel/business_english_lexis_site.htm
Powell, M. (2009). In company: Intermediate student’s book (2nd ed.). Oxford, England: Macmillan.
Sardinha, T. B., & Barabara, L. (2008). Cultural stereotype and modality: A study into modal
use in Brazilian and Portuguese meetings. In G. Forey & G. Thompson (Eds.), Text type and
texture: In honour of Flo Davies (pp. 232–49). London, England: Equinox.
Scott, M. (2007). WordSmith Tools. 4.0. Oxford, England: Oxford University Press.
Sinclair, J. (1991). Corpus, concordance, collocation. Oxford, England: Oxford University Press.
Someya, Y. (2007a). Business letter corpus. Retrieved July 29, 2009 from http://www.someya-net.
com/concordancer/index.html
Someya, Y. (2007b). Learner business letter corpus. Retrieved July 29, 2009 from http://www.
someya-net.com/concordancer/index.html
Starcke, B. (2008). I don’t know: Differences in patterns of collocation and semantic prosody in
phrases of different lengths. In A. Gerbig & O. Mason (Eds.), Language, people and numbers:
Corpus linguistics and society (pp. 199–216). Amsterdam, Netherlands: Rodopi.
8 corpus analysis of business english
Trappe, T., & Tullis, G. (2005). Intelligent business coursebook: Intermediate business English. Harlow,
England: Longman.
West, M. (1953). A general service list of English words. London, England: Longman, Green & Co.
Suggested Readings