Corpus Analysis of Business English

Corpus Analysis of Business English
BETTINA FISCHER-STARCKE
International business has greatly increased over the past decades, and business transactions
are frequently carried out in English. As a result, the study of business English has become
an increasingly popular and important field of research with applications in teaching (e.g.,
Brown & Lewis, 2002) and consultancy (e.g., Baxter, Boswood, & Peirson-Smith, 2002). The
term “business English” usually refers to language used in a commercial setting, for example
by and with profit-oriented companies. The text types of the genre include both written
and spoken language, such as letters, emails, reports, and replies to customer enquiries.
For research purposes, corpora of business language, in particular business English by
native and non-native speakers of the language, have been built. Some of these are publicly
available, for example:
• Business Letter Corpus (Someya, 2007a; http://www.someya-net.com/concordancer/

index.html) and Learner Business Letter Corpus (Someya, 2007b; http://www.someya-
net.com/concordancer/index.html); according to Lee (2009), the Business Letter Corpus
consists of “1,020,060 word tokens of U.S. & U.K. samples, as of 1 March 2000. Also
searchable (separately): a non-native English corpus of business letters written by
Japanese business people,” while the Learner Business Letter Corpus consists of “209,461
word tokens in 1,464 letters written by Japanese business people.”
• Enron Corpus (http://www-2.cs.cmu.edu/~enron/), which contains about 500,000
emails by about 150 users, mostly senior management of Enron. The dataset was made
publicly available during the US government investigation of Enron’s business practices.
• Hong Kong Financial Services Corpus (http://langbank.engl.polyu.edu.hk/HKFSC/).
In July 2009, this contained 6,727,791 words of texts collected from the Hong Kong
financial services sector.
• Japanese Business News Text (Graff & Wu, 1995; http://www.ldc.upenn.edu/Catalog/
CatalogEntry.jsp?catalogId=LDC95T8), a corpus of Japanese-language business and
financial news. Approximately 30 million words are taken from the morning edition
of Nihon Kezai Shimbun, the largest Japanese financial news daily newspaper; a smaller
part of the corpus comes from the Japanese edition of the financial newswire Dow Jones
Telerate.
• VOICE corpus (http://voice.univie.ac.at), the Vienna–Oxford International Corpus of
English, which represents English as a lingua franca (ELF). Two of the domains included
in the corpus, namely professional business (203,407 tokens) and professional organizational
(354,545 tokens), are ELF business English. Data belonging to these categories can be
extracted and searched separately from the rest of the corpus.
• Wolverhampton Corpus of Business English (http://www.elda.fr/cata/text/W0028.
html), which consists of about 10.2 million words of written business texts from the
World Wide Web collected within six months from 1999 to 2000. The texts come from
23 different Web sites of native and non-native English-speaking countries.
In addition, Nelson (2004) provides access to data generated from his Business English
Corpus (http://users.utu.fi/micnel/business_english_lexis_site.htm). Moreover, the Cambridge
and Nottingham Business English Corpus (CANBEC) (http://www.cambridge.org/elt/
The Encyclopedia of Applied Linguistics, Edited by Carol A. Chapelle.

© 2013 Blackwell Publishing Ltd. Published 2013 by Blackwell Publishing Ltd.
DOI: 10.1002/9781405198431.wbeal0241
2 corpus analysis of business english
corpus/corpora_canbec.htm) consists of one million words of spoken data from native

and non-native speakers of English mostly in the UK. This corpus, however, is owned by
Cambridge University Press and is not publicly available.
This relative scarcity of corpora is due to companies’ confidentiality and privacy concerns
with regard to their correspondence or communications. Legal issues such as copyright,
trade secrets, and infringements of personal rights make companies understandably reluc-
tant to permit public access to their documents. Consequently, a number of linguists have
compiled private business language corpora for which they were individually granted
permission by the relevant companies, but which are not accessible to other researchers.
The list of corpora above reflects a bias toward written language, which is also reflected
in research on business English. Moreover, corpus linguistic research often emphasizes
lexis and phraseology, with pragmatics and usage being analyzed less frequently. Exceptions
are Connor and Upton (2004) and Henry and Roseberry (2001). Corpus linguistic studies
of business English have been conducted by, for example, Collins and Scott (1997) and
Sardinha and Barbara (2008) (see Suggested Readings for further studies).
A major practical application of research into business language is the design of teach-
ing materials. Since English is often the language of international business, it is frequently
used by non-native speakers in professional interactions. Consequently, teaching business
English to non-native English-speaking business students to prepare them for their future
careers is an essential part of their education. The textbooks used in business English
classes usually focus on both business-specific and non-specific language features such as
• terminology, for example brand image, GDP (Mascull, 2004), direct mail, and sponsorship
(Brook-Hart, 2006);
• phrases to be used in a business context, for example in presentations, in reacting to
follow-up questions (Harding & Taylor, 2005), and in describing survey results (Farrall
& Lindsley, 2008);
• general vocabulary, for example phrasal verbs, here called “multi-part verbs,” such
as point out and call on (Trappe & Tullis, 2005, p. 36) and phrases for expressing dis-
agreement (Powell, 2009).
These words and phrases reflect the authors’ intuitive perceptions of characteristic linguistic
features of the genre of business English, since, with the exception of Mascull (2004), none
of the textbooks reviewed for this entry refers to corpus data.
The textbooks mirror a trend in research on business English toward focusing on lexis
and phraseology and to put less emphasis on usage and pragmatics. The categories of
lexis and phraseology taught include
1. general business vocabulary, that is, words or phrases that are used both in business
and in everyday situations, such as paid or a phrase asking for clarification;
2. core business vocabulary, that is, terminology such as investment;
3. topic-specific vocabulary, that is, words and phrases that are specific to a particular
business context, such as hedging transaction risk, a term relating to the stock market.
The focus on characteristic lexis and phraseology of business English in teaching materials
reflects insights gained in previous linguistic research that some words and phrases are
more characteristic of one genre or register than of another: see for example Coxhead (2000,
2002) on characteristic lexis of academic English, and Biber, Conrad and Cortes (2004) on
lexicalized phrases of academic English.
Consulting a corpus for the design of teaching materials gives a grounding on which to
make a decision about which words and phrases are particularly useful for students to
corpus analysis of business english 3
learn, and should therefore be included in the teaching materials. This is based on authentic
data used in a business context, as opposed to an author’s intuitions about the register.
Unlike intuition, corpora allow us to identify those words and phrases which really are
most characteristic of the register as objectively as possible. On the corpus linguistic prin-
ciple that equates frequency with importance in language (Sinclair, 1991), the most frequent
words and phrases of a register, as represented by the corpus, are perceived to be its most
characteristic. They are therefore necessary features for the students to learn. Also, the
principle of language teaching that frequent items should be taught before infrequent ones
(West, 1953; Nation, 2001) supports a choice of words and phrases on the basis of their
frequencies in a corpus for teaching purposes. The following analysis therefore demonstrates
one way of identifying these items and of generalizing from them appropriate teaching
content for business English courses for non-native speakers of English. Because of con-
straints on space, however, their conversion into actual teaching materials is not discussed
in this entry. For the same reason, the focus of the analysis is on lexis and phraseology
with the pragmatics and usage of the lexis and phrases being only minor considerations.
This also reflects the main trends of research in business English.
The analysis uses the Wolverhampton Corpus of Business English (henceforth WCBE),
since its compilation ensures that it gives an overview of the language used in written
international business transactions by speakers of different mother tongues. This mirrors
the majority of situations for which non-native English-speaking students of business
English classes require English in their future working lives, that is, writing documents in
English which are read by native and non-native speakers of English. Spoken interactions
in English are likely to be less frequent in their future professional environments. The
software used for the analysis is Word Smith Tools (Scott, 2007) and kfNgram (Fletcher,
2002).
Using such data and software guarantees that the results gained from the analysis
are as independent of the researcher’s intuitions as possible, and ensures that teaching
materials reflect the real language usage in business interactions as opposed to how authors
of teaching materials think it is used. Consequently, the findings on relevant units of
teaching gained in the following analysis differ in part from the foci of current textbooks.
One way of identifying characteristic words and phrases of business English to be
included in teaching materials is to extract quantitative key words and the most frequent
phrases of the corpus. Quantitative key words are words which are statistically more
salient in a corpus than in a reference corpus. Key words are identified on the basis of
their relative frequencies in comparison to another set of data, as opposed to their raw
frequencies. Phrases below are (a) uninterrupted phrases of n words, called n-grams, or
(b) phrases of p words that are variable in one slot, called p-frames. The phrases are iden-
tified on the basis of their frequencies of occurrence without any semantic or pragmatic
criteria. The key words are identified by comparing the WCBE with the British National
Corpus (BNC), a 100-million-word corpus of general English. This comparison identifies
those words which are characteristic of business English as compared to general English.
The classification of a word as business-specific was, in the first step of the analysis, an
intuitive process. In the second step, concordance lines of the words in question were
analyzed and those words which occurred in a business context in at least 50% of the
concordance lines were retained on the list.
The identification of both key words and the most frequent phrases in the corpus is
based on the automatic and quantitative analysis of large amounts of text to (a) compare
the lexis of two corpora with each other and (b) identify recurrent strings of words on the
basis of their frequencies. Neither can be identified intuitively: the amount of data is too
large to detect the patterns without the help of software. Conversely, however, the large
corpus allows us to analyze a sample of the register that is as representative as possible.
The larger the representative corpus, the more likely it is that it features constituent lin-
guistic patterns of the register with statistically significant frequency. It is these constituent
patterns that students of the register need to learn, since only by knowing and actively
using these are students able to produce discourse that conforms to the expectations of
the register’s discourse community.
The key word analysis of the WCBE when compared with the BNC identified more than
1,500 key words. This indicates large lexical differences between the two corpora and
therefore between business and general English. For practical reasons, I will look at the
500 statistically most significant key words of the list.
Of these 500 key words, 305 are business-specific and 195 are general English words.
The business-specific words are mostly terminology, frequently nouns, which identify
business concepts, for example investment, securities, shares, and commission. Verbs include
paid, investing, purchased, and issued. Some business-specific words, such as paid, refer to a
business transaction, but have entered general English because of their very frequent usage
in and reference to everyday transactions.
The findings on (a) the high number of key words identified in this analysis and (b) the
high number of nouns on the list allow us to draw two conclusions that are relevant for
the design of teaching materials:
1. Terminology is a characteristic feature of the register and includes terms such as

portfolio and stock.
2. The register in its written form, as represented by the corpus, is mainly characterized
by nouns as opposed to verbs. This reflects the tendency toward a nominal style in
written general English.
Both findings mirror the contents of current textbooks.

As a second step in the analysis, the most frequent continuous and discontinuous phrases
of between three and six words in the WCBE are extracted. For reasons of space and
relevance, the following analysis is restricted to the top 50 n-grams and p-frames of each
length. In teaching materials, however, the numbers of phrases taught could be much
larger.
Table 1 lists the numbers of general English and business-specific phrases that occur
among the top 50 phrases of the respective lengths. There are only 10 6-frames in the
corpus that occur with a minimum frequency of 200, the threshold level used in this
Table 1 General English and business-specific phrases among the top 50
General English Business-specific
3-grams 25 25
3-frames 37 13
4-grams 16 34
4-frames 19 31
5-grams 8 42
5-frames 5 45
6-grams 5 45
6-frames 1 9
analysis. Examples for both types of phrases are a * of the, a general English 4-frame, and
the market value of the, a business-specific 5-gram. Like the key words, business-specific
phrases were identified in a two-part process by (a) an intuitive classification and (b) an
analysis of the phrases’ concordance lines.
The numbers above and the items on the lists suggest three main patterns relevant for
teaching:
1. The longer the phrases are, the more business-specific they are. This confirms findings
from other phraseological analyses (e.g., Starcke, 2008) which have shown that shorter
phrases are open to more lexical combinations than longer ones. Because of the greater
number of lexical and grammatical items included in the phrases, longer phrases are
more restricted in their lexical and grammatical environments than shorter ones.
2. Many of the phrases that are categorized as business-specific are extensions of shorter
phrases classified as general English. This is the case, for example, with class a * which
becomes * class a shares. Changing the length of a phrase might also change its clas-
sification of being or not being characteristic of a particular register, possibly based
on the occurrence of lexis that might have a different meaning in another register, for
example share.
3. Business terminology, for instance securities and asset, is a determining factor for clas-
sifying a phrase as business-specific, for example by investing primarily in, which is
rendered business-specific by the occurrence of the term investing. The classification
of a phrase might therefore depend on the occurrence of specific lexis, possibly of one
word only. This indicates that the lexis and phraseology of the register are inter-
dependent. In fact, those words which render phrases business-specific are frequently
identified as key words in the key word analysis presented above.
On a more general level, the phraseological analysis of the WCBE has identified three
types of phrases. One type is what I call terminological phrases, that is, phrases which express
a particular concept and have to be learned as one unit. An example is the securities act,
which specifies a particular law. These phrases are complementary units of both lexis and
content, in this case the regulations set out in the securities act. Language and content can-
not be separated.
The second type is fixed phrases, n-grams, which are not business-specific, for example
in connection with, which might characterize a formal written style.
The third type is both general and business English p-frames which are more composi-
tional in nature in that they are flexible in one slot, for example the * of the and net asset
value *. In the WCBE, the most frequent realizations of the * in the first phrase are value,
end, date, terms, and time, and at, the, and, fund’s, and in in the second phrase. They are
combinations of both the phrase and the fillers of the variable slot which are, in fact, the
phrase’s collocates and paradigmatic variables. Being able to combine them allows learn-
ers of the register to produce idiomatic language.
All three types of phrases are characteristic features of the register, and it is the co-
occurrence of the characteristic phrases and lexis in the discourse which transforms general
English into business English and also allows the two to co-occur. Reflecting this is a main
concern of business English textbooks.
Taking the lemma SHARE*, that is, the node word share with every possible realization
of the * such as shared and sharing, as one example for a practical application of these
findings, Table 2 shows its numbers of occurrences as part of the n-grams and p-frames
and its co-occurrences with the node word class. Both lexical items are also identified as
key words in the corpus.
Table 2 Occurrences of SHARE*
n/p Number of Number of Number of Number of

n-grams which p-frames which n-grams which p-frames which
include SHARE* include SHARE* include SHARE* include SHARE*
and class and class
3 6 4 4 1
4 12 16 11 14
5 10 16 9 14
6 9 3 8 3
The high number of occurrences of SHARE* as part of the corpus’s most frequent phrases
and the high number of co-occurrences of SHARE* and class indicate (a) the importance
of the vocabulary of shares and the share market in general within the register, and (b)
that shares are frequently defined and identified, for example by mentioning their class.
This defining of shares often follows two grammatical patterns. First, in 61 of its 76 occur-
rences, SHARE* stands to the right of a determining or defining word or phrase, for
example the class * share and the net asset value per share. Second, SHARE* is used in passive
constructions in seven instances, for example shares are subject to * and if shares are redeemed,
and in a further four instances SHARE* is the object of an action, for example purchase of
class a shares. Active voice constructions or phrases in which SHARE* functions as agent,
for example a description of a share’s performance, do not occur among the corpus’s most
frequent phrases and therefore seem to be less frequent in language usage than passive
constructions. These grammatical patterns could be reflected in teaching materials.
The pragmatics of business English, for example politeness and conventions for small
talk in a business context, are more difficult to identify in a frequency-based corpus
analysis than the features discussed above. Nevertheless, the occurrence of the modal verbs
may and shall in the list of key words might indicate the salience of politeness and there-
fore of pragmatics in the genre.
An analysis of their concordance lines, however, shows that shall mainly occurs in
legal contexts such as specifying requirements, rights, and obligations. Its most frequent
collocates at L1 (immediate left) are fund, and, committee, board, and plan and at R1 (imme-
diate right) are be, not, have, include, and become. This stresses the importance of legal
English, which is not featured in many business English textbooks, for business English.
May, on the other hand, is mainly used as a marker of politeness with fund, you, which,
and, and it as its most frequent L1s and be, also, not, invest, and have as its most frequent
R1s. This is a possible starting point for the discussion of the linguistic means and the
significance of politeness in business discourse and therefore for teaching pragmatic con-
ventions in business English.
Using frequency criteria based on a corpus linguistic analysis for the selection of units
of teaching in business English classes may suggest a shift in focus of currently used
teaching materials. The corpus linguistic evidence suggests that the strong emphasis on
lexis in teaching materials might be shifted to include both lexis and phraseology as well
as a stronger focus on the pragmatics of politeness in business English and the salience of
legal English. This includes lexis and phraseology which are both business-specific, that
is, business terminology and terminological phrases, and general English. Only by teaching
and, from the students’ perspective, learning these various aspects of the genre can the
learners become competent members of the discourse community of business English.
SEE ALSO: Biber, Douglas; Corpora: English-Language; Corpora: Specialized; Corpus

Analysis for a Lexical Syllabus; Corpus Analysis of Key Words; Corpus Analysis of
Language in the Workplace; English as Lingua Franca; Sinclair, John; Corpora: Specialized
References
Baxter, R., Boswood, T., & Peirson-Smith, A. (2002). An ESP program for management in horse-
racing business. In T. Orr (Ed.), English for specific purposes (pp. 117–46). Alexandria, VA:
TESOL.
Biber, D., Conrad, S., & Cortes, V. (2004). If you look at . . . : Lexical bundles in university teach-
ing and textbooks. Applied Linguistics, 25(3), 371–405.
Brook-Hart, G. (2006). Business benchmark: Upper-intermediate. Cambridge, England: Cambridge
University Press.
Brown, T. P., & Lewis, M. (2002). An ESP project: Analysis of an authentic workplace conversa-
tion. English for Specific Purposes, 22(1), 93–8.
Collins, H., & Scott, M. (1997). Lexical landscaping in business meetings. In F. Bargiela-Chiappini
& S. Harris (Eds.), The languages of business: An international perspective (pp. 183–208).
Edinburgh, Scotland: Edinburgh University Press.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213–38.
Coxhead, A. (2002). The academic word list: A corpus-based word list for academic purposes.
In B. Kettemann & G. Marko (Eds.), Teaching and learning by doing corpus analysis: Proceedings
of the fourth international conference on teaching and language corpora, Graz 19–24 July, 2000
(pp. 73–89). Amsterdam, NY: Rodopi.
Farrall, C., & Lindsley, M. (2008). Professional English in use: Marketing. Cambridge, England:
Cambridge University Press.
Fletcher, W. H. (2002). kfNgram. Retrieved July 29, 2009 from http://www.kwicfinder.com/
kfNgram/kfNgramHelp.html
Graff, D., & Wu, Z. (1995). Japanese business news text. Philadelphia, PA: Linguistics Data
Consortium.
Harding, K., & Taylor, L. (2005). International express: Intermediate student’s book. Oxford, England:
Oxford University Press.
Lee, D. (2009). Bookmarks for corpus-based linguistics. Retrieved July 29, 2009 from http://personal.
cityu.edu.hk/~davidlee/devotedtocorpora/CBLLinks.htm
Mascull, B. (2004). Business vocabulary in use: Advanced. Cambridge, England: Cambridge
University Press.
Nation, P. (2001). Using small corpora to investigate learner needs: Two vocabulary research
tools. In M. Ghadessy, A. Henry, & R. L. Rosenberry (Eds.), Small corpus studies and ELT:
Theory and practice (pp. 31–45). Amsterdam, Netherlands: John Benjamins.
Nelson, M. (2004). Mike Nelson’s business English lexis site. Retrieved July 29, 2009 from http://
users.utu.fi/micnel/business_english_lexis_site.htm
Powell, M. (2009). In company: Intermediate student’s book (2nd ed.). Oxford, England: Macmillan.
Sardinha, T. B., & Barabara, L. (2008). Cultural stereotype and modality: A study into modal
use in Brazilian and Portuguese meetings. In G. Forey & G. Thompson (Eds.), Text type and
texture: In honour of Flo Davies (pp. 232–49). London, England: Equinox.
Scott, M. (2007). WordSmith Tools. 4.0. Oxford, England: Oxford University Press.
Sinclair, J. (1991). Corpus, concordance, collocation. Oxford, England: Oxford University Press.
Someya, Y. (2007a). Business letter corpus. Retrieved July 29, 2009 from http://www.someya-net.
com/concordancer/index.html
Someya, Y. (2007b). Learner business letter corpus. Retrieved July 29, 2009 from http://www.
someya-net.com/concordancer/index.html
Starcke, B. (2008). I don’t know: Differences in patterns of collocation and semantic prosody in
phrases of different lengths. In A. Gerbig & O. Mason (Eds.), Language, people and numbers:
Corpus linguistics and society (pp. 199–216). Amsterdam, Netherlands: Rodopi.
Trappe, T., & Tullis, G. (2005). Intelligent business coursebook: Intermediate business English. Harlow,
England: Longman.
West, M. (1953). A general service list of English words. London, England: Longman, Green & Co.
Suggested Readings
Bargiela-Chiappino, F. (Ed.). (2009). The handbook of business discourse. Edinburgh, Scotland:

Edinburgh University Press.
Bargiela-Chiappini, F., Nickerson, C., & Planken, B. (2007). Business discourse. Basingstoke,
England: Palgrave Macmillan.
Bondi, M. (2005). People in business: The representation of self and multiple identities in busi-
ness emails. In P. Gilaerts & M. Gotti (Eds.), Genre variation in business letters (pp. 303–324).
Bern, Switzerland: Peter Lang.
Connor, U., & Upton, T. A. (Eds.). (2004). Discourse in the professions: Perspectives from corpus
linguistics. Amsterdam, Netherlands: John Benjamins.
Henry, A., & Roseberry, R. L. (2001). A narrow-angled corpus analysis of moves and strategies
of the genre: ‘Letter of application’. English for Specific Purposes, 20, 153–67.
Holmes, J., & Stubbe, M. (2003). Power and politeness in the workplace: A sociolinguistic analysis of
talk at work. London, England: Longman.
Nelson, M. (2006). Semantic associations in business English: A corpus-based analysis. English
for Specific Purposes, 25(2), 217–34.
Nickerson, C. (2000). Playing the corporate language game: An investigation of the genres and dis-
course strategies in English by Dutch writers working in multinational corporations. Amsterdam,
Netherlands: Rodopi.
Swales, J., & Rogers, P. S. (1995). Discourse and the projection of corporate culture: The mission
statement. Discourse and Society, 6(2), 223–42.
Upton, T. A., & Connor, U. (2001). Using computerized corpus analysis to investigate the text-
linguistic discourse moves of a genre. English for Specific Purposes, 20, 313–29.

Corpus Analysis of Business English

Uploaded by

Copyright:

Available Formats

Corpus Analysis of Business English

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Corpus Analysis of Business English

Uploaded by

Copyright:

Available Formats

Corpus Analysis of Business English

• Business Letter Corpus (Someya, 2007a; http://www.someya-net.com/concordancer/

The Encyclopedia of Applied Linguistics, Edited by Carol A. Chapelle.

corpus/corpora_canbec.htm) consists of one million words of spoken data from native

1. Terminology is a characteristic feature of the register and includes terms such as

Both findings mirror the contents of current textbooks.

Table 1 General English and business-specific phrases among the top 50

General English Business-specific

n/p Number of Number of Number of Number of

SEE ALSO: Biber, Douglas; Corpora: English-Language; Corpora: Specialized; Corpus

Bargiela-Chiappino, F. (Ed.). (2009). The handbook of business discourse. Edinburgh, Scotland:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.