DDL For EAP - Flowerdew - 2018
DDL For EAP - Flowerdew - 2018
DDL For EAP - Flowerdew - 2018
a r t i c l e i n f o a b s t r a c t
Article history: This paper reports on a project aimed at disseminating the data-driven learning (DDL)
Available online 3 February 2018 approach to research writing among PhD students in Hong Kong universities. A 3.5-
h workshop was offered for over 20 sessions across six universities addressing 473 post-
Keywords: graduate research students, accounting for 6.7% of the whole research graduate student
Corpus linguistics population in Hong Kong. Students were first introduced to the free online corpus,
Data-driven learning
BNCweb, which can help to solve lexico-grammatical problems encountered during
DDL
research writing. They were then given access to teacher-built discipline-specific corpora
Discipline-specific corpora
General corpora
with the concordancing tool AntConc. Through hands-on activities and interactive dis-
Research writing cussion students were able to compare discourse strategies employed across different
disciplines and identify their linguistic realisations. Participants were finally guided
through the process of building a corpus of their own, thereby catering for their personal
needs. The self-selected participants’ evaluation of the workshop was highly positive and
they showed evident enthusiasm for this new approach. Their suggestions for improve-
ment are also discussed. The description of the workshop programme and feedback from
learners may provide useful insights for DDL practitioners who wish to spread this
approach in their own institutions.
Ó 2017 Elsevier Ltd. All rights reserved.
1. Introduction
Academics around the world are facing increasing pressure to publish internationally. As prospective members of the
academic community, PhD students now find themselves facing the same pressure of publish or perish (Cargill, O’Connor, &
Li, 2012). Such a burden can be even more challenging for students whose L1 is not English, given the linguistic challenges
they face when striving to publish in international journals (Belcher, 2007; Curry & Lillis, 2004; J. Flowerdew, 1999, 2000,
2001). However, support for future academics in research writing is inadequate (Flowerdew & Forest, 2009; Kwan, 2010).
In line with the rapid advances in computer technology, the past two decades have witnessed a steady growth in the
literature on direct machine-readable corpus applications, or data-driven learning (DDL), (Johns, 1991a, 1991b), to EAP writing
pedagogy (e.g. Charles, 2011, 2012, 2014; Cotos, 2014; Diani, 2012; Eriksson, 2012; L.; Flowerdew, 2008, 2015a, 2015b; Lee &
Swales, 2006; Tono, Satake, & Miura, 2014; Yoon & Hirvela, 2004). Although Boulton and Cobb’s (2017) extensive meta-
analysis of DDL empirical studies in the language classroom has shown that the DDL approach is profitable, the process of
ko-Szyman
research findings being carried over to actual teaching practice is still rather slow (Boulton et al., 2012; Len ska &
* Corresponding author.
E-mail addresses: meilinchen8388@gmail.com (M. Chen), johnflowerdew888@gmail.com (J. Flowerdew).
https://doi.org/10.1016/j.esp.2017.11.004
0889-4906/Ó 2017 Elsevier Ltd. All rights reserved.
98 M. Chen, J. Flowerdew / English for Specific Purposes 50 (2018) 97–112
Boulton, 2015; Tribble, 2013). This is indeed the case in university English writing classrooms in Hong Kong, especially at the
PhD level, which is the context of the present study.
This paper describes an attempt to address this gap by means of a series of DDL workshops on research writing for PhD
students in Hong Kong. Funded by the University Grants Committee (UGC) of the Hong Kong SAR government as part of their
language enhancement programme, the project involved 24 3.5-h workshops delivered to around 500 participants in six Hong
Kong public universities from November 2015 to May 2016. The following section briefly reviews the (limited) literature on
corpus applications in EAP writing pedagogy for PhD students to date. Section 3 provides a detailed account of the corpora and
tools used, the workshop design, and how activities were implemented during teaching. Sections 4 and 5 summarise partici-
pants’ feedback collected from the post-workshop survey and interviews, while the conclusions are presented in Section 6.
As already stated, the literature on corpus applications for PhD writing is limited. Lee and Swales (2006) are among the
pioneers who introduced corpus-based EAP writing to PhD students. During a 13-week course, Lee and Swales first intro-
duced students to free online general corpora, such as the British National Corpus (BNC), and then an expert corpus of
research articles (RAs) with the WordSmith tools software package. They considered that the PhD students in their course
would already be familiar with generic features of research writing and would be more interested in working with specialist
corpora to fine-tune their language. The final part of the course required students to compile their own corpora and conduct
comparative analyses independently. The small number of students (n ¼ 4) who regularly attended the course found corpora
empowering, giving them greater autonomy. While the students believed that general and specialised corpora complement
each other, they were more engaged with discipline-specific texts. The authors concluded with the caveat that they were
teaching an exceptional group of highly motivated students who were highly acculturated into research genres and had
excellent computer skills.
PhD students in L. Flowerdew’s (2015a) writing workshop focusing on the discussion section of the PhD thesis also showed
excellent take-up of corpus tools. During the 5-h session, paper-printed concordance lines were first presented to the stu-
dents as input and a basis for discussion about the different rhetorical functions involved in discussion sections of PhD theses.
Later they had hands-on activities using ConcGram and the Hong Kong Polytechnic University Corpus of RAs to explore the
lexico-grammatical patterns typically used to fulfil certain rhetorical functions. Example sentences taken from student
writing were also used as prompts for discussion. Students were asked to judge the appropriateness of such sentences and
then check the words/patterns in question in the corpora to verify their judgement. During some activities, students noticed
that they were familiar with certain words/patterns from the corpora but never used them in their own writing. Corpus
activities, therefore, could be useful in helping students to transfer receptive language competence to productive perfor-
mance. In other activities, students’ inaccurate or partial knowledge regarding certain grammatical items was improved
through discovery learning with corpora. In general, students highly rated the workshop. They had no problem adjusting to
the new tool and even commented that paper-printed concordance lines were boring.
Graduate students in Cortes’s (2014) 14-week corpus-assisted academic writing course also learned to use concordance
tools (in this case AntConc) with great ease. Taking a dual genre- and corpus-based approach, Cortes introduced an experi-
mental group of learners (in parallel with a control group without corpus assistance) to the moves in different sections of the RA
and asked them to explore more examples through consulting their self-compiled discipline-specific RA corpora. The course
was greatly appreciated by both groups, with a slightly higher evaluation given by the experimental group. They appreciated the
use of large corpora in exploring linguistic features of the discourse moves and stated that they would continue using corpora to
facilitate their writing. The control group enjoyed the genre-based activities, but found that analysing only four RAs was not
enough, indicating the advantage of corpora in revealing linguistic features that are not easily identified by manual analysis.
A researcher who has conducted a number of studies involving corpus-based academic writing with research students is
Charles (2007, 2011, 2012, 2014, 2015). Her studies report on her implementation of the genre- and corpus-based dual
approach in her teaching at Oxford University (Charles, 2007) and successful attempts to train students to build their personal
discipline-specific corpora (Charles, 2012, 2015). Her teaching was particularly successful in terms of students’ uptake of the
corpus approach over time after training (Charles, 2014), although common problems were identified such as the time and
difficulty involved in the acquisition of corpus skills such as searching and analysing results.
The above-mentioned studies in general have highly positive findings regarding students’ perception of the new approach.
The success of these interventions might be at least partly attributable, however, to the fact that 1) the teachers are very
experienced corpus linguists and English educators; and 2), as Lee and Swales (2006) and L. Flowerdew (2015a) pointed out, PhD
students are generally highly motivated, with great self-learning capabilities. Furthermore, these studies might not be applicable
in other teaching contexts due to a lack of institutional support, manpower, and other resources. Lee and Swales (2006) admitted
that their course would not be repeated in the future because there was no teacher to take over after them. Cortes ran her course
for several years. However, she concluded that “new teaching methodologies [could not be] massively spread out in a short
period of time: it takes time and more studies that analyse them to convince administrators and instructors who might not be too
inclined towards new methods or technologies of the advantages of this type of classes” (Cortes, 2014, p.78).
This paper adds to the limited literature on corpus applications for PhD students by describing the design, implementation
and evaluation of a series of corpus-based research writing workshops run across Hong Kong universities. While, as in the
previous studies, the self-selected students in this study were highly motivated, the project was on a much larger scale, across
M. Chen, J. Flowerdew / English for Specific Purposes 50 (2018) 97–112 99
a whole university system, and may, it is argued, provide further insights for researchers and models for practitioners to add
to those of the previous studies.
3. The workshops
The free, voluntary workshop, entitled “Data-driven Learning in Research Writing for PhD Students” (although it was also
open to MPhil students), was run 24 times over the period November 2015–May 2016. The project team included a project
leader, who is a co-author of the study, a project co-leader from a participating HK university, a post-doctoral research fellow
(also a co-author of the present study), who delivered all the workshops with the assistance of a research assistant, and two
expert consultants who advised on the project.
On average, four sessions were given in each of the six participating universities, with 11–39 participants per session. The
workshop was 3.5 h long, held in computer labs, except for one workshop which was divided into two 2-h sessions held on
two consecutive days. After the workshop, students were invited to complete a paper-printed questionnaire to evaluate their
learning experience. Six to eight months after the workshop participants were contacted randomly for interviews to ascertain
their use of corpora after the workshop (see acknowledgements).
3.1. Participants
In order to recruit participants, the project leader emailed the heads of department at each university and asked for their
help in disseminating an invitation to prospective participants via the PhD supervisors in their departments. The invitation
stated that supervisors and staff members were also welcomed in addition to PhD students, who were the main target. After
sending out the invitation, there were 734 registered participants, mostly PhD students, together with a number of MPhil
students, post-doctoral fellows and staff members. 547 participants actually attended, 473 being students, with an attendance
rate of 74.5% (see Table 1). According to statistics on the official website of the UGC, 7,097 research graduate students were
enrolled at eight UGC-funded universities by June 2016.1 This means that 6.66% of the research graduate student population in
Hong Kong attended the workshops.
Table 1
The number of participants and the range of disciplines involved in the workshops.
An important feature of the workshop is that participants came from a wide variety of disciplines. As can be seen from the
penultimate row of Table 1, at each workshop attendees came from at least 10 different departments with up to as many as 42
departments from University 1, the oldest university in Hong Kong.
Table 2 lists the 10 departments of each university that contributed the largest group of participants. It shows that - with
the exception of Chinese Medicine and Physics students from University 5 and Life Science students from University 6 - the
largest disciplinary group makes up no more than 20% of the participants. The major disciplinary groups also vary greatly from
one university to another. Universities 1 and 3, for instance, both have engineering students as the largest group (mainly
because these two universities have very large engineering departments). University 2, on the other hand, has many nursing
students (because it is one of the only two universities in Hong Kong that have a Medical faculty). This indicates the diversity
of participants at each session. Having a heterogeneous group in the EAP writing classroom is not unusual. In Chen and
Flowerdew’s (under review) review of 37 DDL studies in the L2 academic writing classroom, 18 out of the 31 studies that
specified participants’ disciplinary backgrounds report on heterogeneous groups. The workshops described in this paper,
therefore, may reflect the reality that many English language educators are confronted with around the world of trying to
meet the needs of a heterogeneous group of students.
Table 2
Top 10 disciplines with the most participants.
University Department/Division %
Uni-1 System Engineering & Engineering Management 16.28%
Mechanical & Biomedical Engineering 13.95%
Electronic Engineering 10.85%
Information Systems 8.53%
(continued on next page)
1
At the time of writing, two of the eight universities were not targeted, as they had low enrolments of research students. A workshop is planned,
however, for these students at a future date.
100 M. Chen, J. Flowerdew / English for Specific Purposes 50 (2018) 97–112
Table 2 (continued )
University Department/Division %
Applied Social Sciences; Marketing; Media & Communication 6.98%
Biology & Chemistry; Physics & Materials Science; Public Policy 4.65%
Uni-2 Nursing 15.04%
Education 7.96%
Communication; English; Medical Sciences; Surgery 7.08%
Chemical Pathology 6.19%
Chinese Language & Literature 5.31%
Mechanical & Automation Engineering; Geography & Resource Management 4.42%
Uni-3 Civil & Environmental Engineering 13.76%
Rehabilitation Sciences 8.26%
Building and Real Estate; Design; English 6.42%
Nursing; Textiles & Clothing 5.50%
Building Services Engineering; Land Surveying & Geo-Informatics; Mechanical Engineering; 4.59%
Optometry
Uni-4 Social Work & Social Administration 9.47%
English Language Education; Information & Technology Studies 7.37%
Biomedical Sciences 5.26%
Civil Engineering; Dentistry; Linguistics; Medicine; Paediatrics & Adolescent Medicine; 4.21%
Psychology
Uni-5 Chinese Medicine; Physics 23.53%
Geography 14.71%
Communication Studies; Computer Science; Economics; Finance & Decision Sciences; 5.88%
Physical Education
Humanities & Creative Writing; Mathematics; Sociology 2.94%
Uni-6 Life Science 20.90%
Civil & Environmental Engineering 19.40%
Chemical & Biomolecular Engineering 16.42%
Interdisciplinary Programs Office 10.45%
Physics 7.46%
Finance; Computer Science & Engineering 5.97%
Accounting; Business & Management; Chemistry; Electronic & Computer Engineering; 1.49%
Environment; Marketing; Mathematics; Social Science
The majority of the participants speak either Mandarin or Cantonese as their L1 (70.21% Mandarin; 15.43% Cantonese). The
rest included people from a range of over 30 different L1s other than Chinese, including English (n ¼ 9), Urdu, Bengali, Italian,
Spanish, Filipino, Japanese, Korean, Indonesian, Ewe, and Tamil. This diversity of language backgrounds reflects the cosmo-
politan make-up of the research student body in Hong Kong, which may be reflective of other countries and regions.
The participants’ research experience also varied. Prior to attending the workshop, 20.47% of the participants had already
published more than three papers written in English, 7.61% had published three papers, 24.93% and 21% had published one
and two papers respectively, while 25.98% had no publishing experience. Importantly, the DDL approach was novel for the
majority of the participants, over 95% of them never having used corpora before the workshop.
Two corpora were introduced to the participants: the written academic component of the BNCweb corpus, made up of
16,093,754 words and 505 files, which can also be broken down to discipline-specific academic sub-corpora; and a corpus
consisting of eight discipline-specific sub-corpora of RAs created by the teacher, i.e. a co-author of the study (see Table 3) for
use with AntConc 3.4.4 (Anthony, 2016a).
Table 3
The discipline-specific corpora used in the workshop.
BNCweb was chosen based on the teacher’s personal experience of using corpora as a language user. Although Tribble’s
(2015) survey of corpus software use shows that the BNC corpus is very popular among English language educators
around the world via the BYU interface of Mark Davies (BYU-BNC, http://corpus.byu.edu/bnc/), the teacher found that the
M. Chen, J. Flowerdew / English for Specific Purposes 50 (2018) 97–112 101
The teaching materials were developed based on an extensive trawl of the literature on research writing (e.g. Biber,
Johansson, Leech, Conrad, & Finegan, 1999; J.; Flowerdew & Forest, 2009, 2015; Holmes, 1997; Hyland, 2005, 2008; Kwan,
2006; Swales, 1990) and the teaching of research writing (e.g. Chang, 2014; Charles, 2007, 2011, 2015; Cortes, 2014; L.;
Flowerdew, 2015a; Lee & Swales, 2006), personal communications with the two expert consultants, and the team members’
experience in the field of EAP. They were piloted with four individual students from different disciplinary backgrounds before
the workshops. Necessary changes were made based on their feedback.
The materials used for all the sessions were largely the same. However, as mentioned in Section 3.1, the target group was
particularly heterogeneous in many ways, so minor alternations were made during most sessions. Ready-made alternative ac-
tivities as well as impromptu tasks were both used at times depending on the participants’ needs. Impromptu tasks were usually
introduced when students raised further questions during the discussion about corpus findings for a specific activity. In response
to their questions, the teacher often encouraged them to carry out further searches to explore answers to their own inquiries.
Divided into three parts, the workshop aimed to show students how to: 1) use free online corpus resources; 2) use teacher-
compiled discipline-specific corpora; and 3) build their personal corpora. During the first part (105 min), the teacher
introduced participants to basic corpus concepts and showed them how to use BNCweb to address lexico-grammatical
problems that they may encounter during research writing. When introducing the concept of corpora, to effectively get
participants’ attention, the teacher used Google Scholar as a corpus in a hands-on warm-up activity to demonstrate the
advantages of corpora in comparison with traditional learning resources such as dictionaries.
102 M. Chen, J. Flowerdew / English for Specific Purposes 50 (2018) 97–112
After finishing the warm-up activity, students understood that corpora are not only useful for judging the appropriateness
of certain expressions in research writing, but also for understanding the context in which they are used, contextual examples
being less extensive in dictionaries or textbooks. The warm-up activities also served as a buffer for first-time corpus users.
Starting with Google Scholar, a tool that they were familiar with, students practised DDL strategies such as searching, ana-
lysing results, and reflecting on and inferring from results before using language corpora.
When introducing different functions in BNCweb, the teacher first raised a question regarding a particular rhetorical function
or linguistic feature of research writing and asked students to share their observations based on their writing experience. She
then guided students step by step (with demonstration) through searching for relevant linguistic patterns in the academic sub-
corpus of BNCweb. Students were given time to browse and analyse the results and share their findings with others. After
students had reported their findings, the teacher showed a summative PowerPoint slide of the findings. Details of the workshop
outline are given in Figure 1, including activity themes, functions introduced, and searches carried out, etc.
The second part (80 min), making use of the teacher-built discipline-specific corpus and AntConc, focused mainly on
discoursal characteristics of research writing and their linguistic realisations (see Figure 1). Akin to the procedure for Part 1,
the teacher first raised a question regarding a discourse feature (e.g. rhetorical functions) of research writing. After brain-
storming, the teacher shared research findings regarding the issue. She then demonstrated how searches can be carried out in
AntConc to retrieve the desired language patterns that fulfil the discourse feature in question from a given sub-corpus, with
students following the steps and identifying examples (usually sentence patterns) in their own chosen corpora. During this
process, the teacher advised students to copy useful patterns or “sentence templates” (Cargill & O’Connor, 2006, p. 210) they
found in the results to a Microsoft Word document, so that by the end of the workshop they would already have a repertoire
of appropriate expressions to suit their individual writing needs. This practice, also known as ‘language re-use’ (Li &
Flowerdew, 2007), has been argued for as a bona-fide academic writing procedure by a number of scholars (Cargill &
O’Connor, 2006; Flowerdew & Forest, 2015).
During the third part (25 min), students were guided through the process of building a personal corpus. They were first
asked to download a few PDF files of RAs that they considered to be high-quality professional research writing. Following the
teacher’s demonstration, students used AntFileConverter to automatically convert PDFs into plain text files. The teacher then
asked students to load the plain text files into AntConc and start tentative searches for anything that they were interested in.
During the process, questions about the criteria for selecting RAs, the corpus size and other relevant issues were discussed.
Section 3.4 describes in detail a few selective examples from the workshop.
The workshop was task-based, following a problem-solution process. Due to space limitations, only three tasks have been
selected to illustrate how the activities were implemented in the classroom.
3.4.1. Part 1: language problem (2) how can we report the results of a study?
As mentioned in Section 3.2, before starting to use BNCweb, students were first guided to create a general academic sub-
corpus (16,093,754 words, comprising five disciplines) and a discipline-specific sub-corpus (varying from 260,000 to 700,000
words). For all the activities involving BNCweb, students were given the freedom to use either the general or the discipline-
specific academic sub-corpora. The teacher mostly used the general academic sub-corpus to show students features of
research writing in general. This helped students to position findings from their discipline-specific corpora in the general
context of research writing.
As shown in Figure 1, each activity was initiated with a question regarding a research writing issue. Taking Language
problem (2) in Part 1 for instance, the teacher first asked students which verbs they often use to report/interpret results of a
study. Students suggested verbs such as show, indicate, reveal, suggest. She then asked them to share sentences that they often
write with these verbs. Students’ answers were usually The study shows that. or The results indicate that. The teacher
pointed out that, although they know a variety of reporting verbs, they tend to use them in a similar sentence structure
without much syntactic variation. She then asked students whether they would like to find out how professional writers use
this type of verb, for example, indicate. Having already completed activities for Language Problem (1), students by now were
already familiar with BNCweb and immediately started typing “indicate” in the search box. The teacher reminded students
that they needed to be able to get results of indicate in all of its forms and demonstrated metacharacters for complex searches
in BNCweb (e.g. {}). After students had obtained the results, they were encouraged to use the Frequency Breakdown function to
see the frequency of different forms of the lemma indicate (see Figure 2). The teacher asked students which of them had used
their discipline-specific sub-corpus to report which form of indicate is most frequent in their field. Students’ answers showed
that while indicated is most common in the Engineering and the Natural Science sub-corpora, (to) indicate is most frequent in
the Humanities sub-corpus. The findings revealed inter-disciplinary differences regarding the use of indicate.
The teacher then asked students to explore linguistic patterns of the lemma indicate by going one step back to the results
page. To identify patterns quickly in thousands of results, the teacher reminded students of the Sort and Frequency Breakdown
functions, which they had just tried in the Language Problem (1) activity. For instance, after sorting the results by 1 Left and
then using the Frequency Breakdown function, students discovered that the lemma indicate is most frequently preceded by 1)
to, 2) a comma, and 3) the preposition as (see Figure 3). After clicking on as, they noticed that it is often used in the structure
.as indicated by/in..
M. Chen, J. Flowerdew / English for Specific Purposes 50 (2018) 97–112 103
Figure 1. The workshop outline. Part 1 Polish your writing with available online corpora.
104 M. Chen, J. Flowerdew / English for Specific Purposes 50 (2018) 97–112
Figure 2. The frequency breakdown of the lemma indicate in the academic component of BNCweb.
Figure 3. Lexical items that precedes the lemma indicate in the academic component of BNCweb.
The teacher then asked students which combinations/patterns would be useful for their own writing. Each time the
students gave an answer, the teacher would click on the corresponding word and go through the concordance lines quickly
together with the students. If students missed out important words on the list, she would draw their attention to those words
through questions. For example, students often did not pay attention to the comma in Figure 3. The teacher therefore asked
them why it is so frequent (No. 2 on the list) and by which verb form of indicate it is usually followed. By right-clicking on the
comma to get the results of the co-occurrence in a new tab and sorting the results and then using Frequency Breakdown under
the teacher’s guidance, students saw immediately that a comma is commonly followed by indicating (204 times), in com-
parison with other forms. Students were then encouraged to click on indicating to see real examples of this combination. The
teacher then reminded them that professional writers use a comma more often than the word which (see Figure 3) before the
appropriate form of the lemma indicate, because they vary their sentence patterns by using not only relative clauses but also
gerunds, thereby helping to explain complicated concepts more concisely and avoiding making clausal errors. Through the
activity, students became aware that the different forms of the lemma indicate are often used in the following structures.
3.4.2. Part 1: language problem (4) what are some of the different ways we can refer to tables/figures in research writing?
In addressing this writing strategy, the opportunity was taken to introduce the Collocations function of BNCweb. At the
beginning of the workshop, one of the most frequently stated goals of the participants was to broaden their vocabulary. Use of
the Collocations function is one way to do this. After being introduced to the concept of collocation, students were presented
with the heading Report/Interpret results with the aid of tables or figures and the question Which verbs are often used to refer to
tables or figures in research articles? Students often answered this question with a limited number of phrases, such as Table X
shows ., Table X indicates that .. They then followed the following search steps demonstrated by the teacher:
Students then compared their findings with each other and the teacher. A range of different verbs which are typically used
to describe the findings in tables were shared. Examples are show, see, give, list, summarise, compare, and indicate, along with
their typical phraseologies, as shown in Figure 4. In this way participants learned how to use the Collocations function of
BNCweb to find synonyms that they can use to vary their vocabulary use. The teacher also drew students’ attention to the fact
that many verbs on the list appear in both active and passive forms, which indicates varied ways of referring to tables/figures
in research writing when using the same verb.
3.4.3. Part 2: writing problem (1) how can we refer to our own or other people’s research in writing?
In Part 1 (see Figure 1), students had searched for research in BNCweb and learned expressions involving this word (e.g.
little/recent/much/further/future research, further/various lines of research). This time, now in Part 2, using AntConc, the teacher
extended the discussion by adding the synonym study. Again, to initiate the discussion, she asked students in which section(s)
of the research paper writers use research or study/studies very frequently. Answers from students varied depending on their
disciplinary background. Students from Humanities and Social Sciences often stated that it should be in the Literature Review,
while those from Engineering or Natural Sciences thought it was in the Results and Discussion.
After sharing their opinions, students were asked to load all the sections of the RAs from their chosen discipline into
AntConc and to search for the lemmas research and study using the Concordance function. The metacharacter j, which allows
searching for more than one item, was introduced. After students had obtained the results, they were guided to use the
Concordance Plot function to check for the frequency of the queried items in different sections of the RA (see Figure 5).
Students compared their findings with classmates who had used data from a different discipline. After discussion, they found
that these words are indeed more frequently used in the Literature Review of RAs in Arts & Humanities, Social Science, and
Business, Economics and Management than in that of RAs in Natural Sciences, where they are more frequently used in the
Results and Discussion sections. There was, however, one surprising findingdboth words are also highly frequent in the
Introduction and Abstract sections in most disciplines.
106 M. Chen, J. Flowerdew / English for Specific Purposes 50 (2018) 97–112
Figure 5. The concordance plot of research and study/studies across sections in RAs in Chemistry and Material Science.
The teacher then reminded students of the corpus findings from the Language Problem (1) activity in Part 1dthat research
is usually preceded by words such as little and much or words/phrases such as much or a great deal of, the latter always
occurring in negative clauses. She reminded students of the conclusion they reached that these patterns are used to describe a
research gap. She then took the opportunity to introduce the classic CARS model for RA Introduction sections by Swales
(1990) and the IMRaD structure for RA abstracts (see Lores, 2004).
After learning the move structure of both sections, students were asked to reload the Introduction and the Abstract sub-
corpora of the discipline they chose into AntConc and to search for research and study again. They were asked to sort the
results, go through them quickly, and identify sentences that they found useful for their own writing; at the same time, they
were asked to decide on the discourse functions of the sentences. They were encouraged to take notes of these “sentence
templates” (Cargill & O’Connor, 2006) and their rhetorical functions as shown in Table 4.
Table 4
Selective examples of research and studies in the Abstract and Introduction sections across disciplines.
After completing the task, students were given time to share their findings with others. The teacher then concluded the
activity by presenting a summative PowerPoint slide similar to Table 4.
Although it does not belong to the workshop per se, it is worth mentioning that the teacher announced during the
workshop that she would provide editing help to students who wanted to continue practising their corpus skills after the
workshop. The purpose of providing follow-up support was to help students get into the habit of using corpora. The practice
of editing was carried out in the following fashion. Students who were interested in the service would send the teacher (part
of) a paper they were writing. She would highlight the problematic places and leave comments showing them specific steps to
search in a corpus to provide the data to solve these issues. A screenshot of an example of the teacher’s comments is given in
Figure 6.
Figure 6. An example of free editing services provided by the teacher after the workshop.
After the workshop, a questionnaire (in English) including six 5-point Likert-scale questions, a yes-or-no question, and two
open questions was distributed to collect students’ feedback.
As can be seen from Table 5, students highly appreciated the workshops. While few students had prior knowledge about
corpus linguistics (Q1), they strongly agreed that the quality of the workshop was high (Q2), and that they learned new things
about using corpora to improve their writing (Q4). They also firmly believed that the workshop was useful for improving their
research writing (Q5) and predicted that they would use corpora in the future (Q6). Over 96% of participants stated that they
would recommend the workshop to a friend (Q7).
Table 5
Participants’ evaluation of the workshops.
No. To what extent do you agree with the statements? (1 ¼ strongly disagree; 5 ¼ strongly agree) Mean SD
1 Before this workshop I had used corpora in research writing or had some knowledge about corpus linguistics. 1.57 0.99
2 The quality of the workshop was high. 4.50 0.65
3 The workshop had a friendly atmosphere. 4.71 0.53
4 I learned new things about corpora and the methods of using corpora to improve my writing. 4.67 0.56
5 The workshop is useful for improving my research writing. 4.43 0.74
6 I will use corpora to improve my writing in the future. 4.40 0.76
7 Would you recommend this workshop to a friend? Yes: 96.56%
Participants’ highly positive responses to the workshops are echoed in their answers to the following two open-ended
questions:
Altogether, 341 students answered the first question (62.34% of all participants), while 224 (41% of all participants)
responded to the second. Of the 341 students responding to the first question, 327 mentioned at least one aspect that they
liked most, while 14 (4.11%) expressed their extreme satisfaction with the workshop by saying that “everything is perfect”.
Of the 224 students who answered the second question, 44 (19.64%) wrote “Nil” or “None”, leaving only 180 who gave
actual suggestions for improvements. Sections 4.1 and 4.2 summarise the substantive comments in answer to these two
questions.
108 M. Chen, J. Flowerdew / English for Specific Purposes 50 (2018) 97–112
4.1. Which aspects of the workshop did you find most useful?
In response to this question, many students mentioned more than one useful aspect of the workshop.
4.2. What improvements can we make when we organise the workshop in the future?
During the 3.5 h, students had altogether eight hands-on activities (including the warm-up activity), including pre-activity
brainstorming and post-activity reflective discussions. Inductive learning and hands-on practice involved many cognitive
skills, such as predicting, searching, sorting, observing, analysing, interpreting, making inferences, and reflecting. This
demanded high concentration throughout the session, which could be psychologically and cognitively challenging for the
students, especially when they were encountering corpora for the first time. Consideration might be given to reducing this
cognitive loading in future workshops by splitting it into two sessions, although there could also be drawbacks from this in
terms of appealing to students who would be willing to commit for two separate sessions.
Six to eight months after the workshop, students were contacted randomly to ascertain continued use of the approach
after the workshop. Interviews were conducted in the students’ L1 to encourage greater depth of responses. Here we would
like to report three cases representing a range of attitudes and experiences.2
The first student is particularly enthusiastic about the DDL approach. He was a final-year Computer Science PhD student,
having published at least three papers before attending the workshop. He admitted that he learned very fast during the
workshop and was usually a few steps ahead of others. After attending the workshop, he went back and started teaching other
students in his office to use BNCweb. Since the workshop, he had been using BNCweb mostly to facilitate his writing. The
queries he made were mostly to check the correctness of his own writing and find alternative expressions using the Collo-
cations function. One example he gave was that, when he was not certain about the correct way of quantifying comparisons
(e.g. twice as powerful as), he searched for twice and two times respectively in BNCweb and found examples he could benefit
from. He mentioned that he used BNCweb whenever he had to write in English, usually during drafting, including when he
wrote cover letters for job hunting. As for AntConc, he said that he did compile a corpus of his own out of curiosity and did a
few searches shortly after the workshop; however, he relied mainly on BNCweb because “it is sufficient for my writing and it
has more texts”.
The second student had a more mixed experience of post-workshop corpus use. This student was also studying Computer
Science and was in his second year of PhD study when he attended the workshop. He reported that about one month after the
workshop he tried to use BNCweb, because he was writing a paper for publication. However, he could not remember exactly
how the Sort/Frequency Breakdown function was used. After trying a few times without getting what he wanted, he switched
to Google Scholar. Using the method he learned during the workshop (i.e. putting inverted commas around the searched item
to obtain exact matches), he found what he needed. From then on, he had been using Google Scholar for writing on almost a
daily basis. When asked why he did not contact the teacher when he had problems with BNCweb, he confessed that he
thought about doing so, but the fact that he found answers in Google Scholar and was busy writing the paper prevented him
from reaching out for help. In spite of expressing his satisfaction with Google Scholar, during the interview, he nevertheless
asked the teacher to show him how searches should be carried out in BNCweb. He came to the realisation that BNCweb is
superior to Google Scholar because the Frequency Breakdown and Collocations functions could give him a list of co-occurring
words with their frequencies. He stated that he would use it in future. As for AntConc, he explained that he had never tried it
again, for two reasons. First, he found it inconvenient in comparison with BNCweb, as it is necessary to download the software
and download, convert and load corpus files first before carrying out searches. Second, he thought the RA corpus provided in
the workshop was too small in comparison with BNCweb and confessed that he never managed to find time to build his own
corpus. Based on his involvement with BNCweb, the experience of this student suggests a need for more than a single
workshop in his case and/or follow-up work after the workshop to increase the chances of him developing the “corpus habit”.
The third case is an English linguistics student specialising in critical discourse analysis. She was in the process of finishing
her PhD thesis when the interview took place. Before attending the workshop, she had used AntConc a few times for her
research but never for her writing and did not know about BNCweb. She used BNCweb as an additional resource for her writing
after the workshop. Before the workshop, she had already been using a number of online dictionaries, including an online
collocation dictionary, and Google Scholar regularly during her writing. After the workshop, she started using BNCweb as well.
Among the three types of resources, online dictionaries and Google Scholar are usually her first choices. When she has doubts
about certain expressions during writing, she first decides whether she can find answers in dictionaries. If she can, she will only
use dictionaries. Otherwise, she will search in Google Scholar. BNCweb is usually her last resort when she cannot find answers in
the first two resources. Her rationale for her order of preferences is that dictionaries are most reliable and easiest to use, while
2
It is worth noting that students who agreed to be interviewed were usually using corpora and found them useful. This raises the question whether
other students who did not participate in the interview continued to use corpora after the workshop or not. A follow-up study, featuring surveys and
interviews, is planned for the near future. Hopefully, more detailed information about students’ long-term use of corpora will be revealed in this study.
110 M. Chen, J. Flowerdew / English for Specific Purposes 50 (2018) 97–112
Google Scholar is more convenient than BNCweb. BNCweb, in her opinion, is more reliable than Google Scholar, but it takes
longer to get answers. With online dictionaries and Google Scholar, it only takes one or two clicks to get answers, while in
BNCweb it usually takes at least three clicks. As for AntConc, the student did not use this resource because she uses an online
reference manager and stores all her reading materials and writing in cloud storage because she works on multiple computers.
She thinks it is time-consuming to download all the PDFs and convert them to plain text files in order to create a personal
corpus. She also commented that, even if she created her own corpus, it would not be as big as BNCweb or Google Scholar. She,
therefore, does not see the necessity of having an additional corpus that might not provide sufficient information.
The three cases show that these students continued to use corpora after the workshop to a greater or lesser extent.
Although the way they adopted corpora varies, their purposes for using corpora have all been to explore alternative/correct
expressions for their research writing. The cases also show that time is a very important factor in learners’ choice of writing
aids. Although many liked the idea of building their own corpora (see Section 4.1), they might not do so because they consider
it time-consuming. What these students desire most is tools that provide instant answers with straightforward interfaces. It is
regrettable to see that students abandoned AntConc, mainly because of the time required to build their own corpus, given that
students reacted very positively to AntConc during the workshop. One way to avoid this in future workshops could be to ask
students to bring 20 to 30 PDFs of high-quality RAs in their field to the workshop, as Charles (2012) and Anthony (2016c) did
with their students. In this way, students would already have had a corpus of their own by the end of the session. Certainly,
these three cases (and others we have collected), along with the quantitative questionnaire findings reported above, provide
food for thought and ideas for improvement in future possible iterations of our project.
6. Concluding remarks
It might be argued that this study would have been more valuable if post-workshop measurement of the learning outcome
had been implemented. However, given the fact that all students voluntarily signed up for the workshop and that they all have
tight schedules, finishing research projects and producing as many publications as possible for future career development, it
would have been very difficult to ask them to participate in post-workshop assessment. Running a control group alongside the
experimental group would also have been problematic. The purpose of this project was to follow the successful examples of the
implementation of corpus-based pedagogy reviewed in Section 2 and to introduce the DDL approach to as many PhD students in
Hong Kong as possible. The effectiveness of DDL, according to Boulton and Cobb (2017), has been supported by numerous
empirical studies. This article therefore is primarily an account of an educational innovation, not an experimental study.
Judging by the post-workshop evaluation, the workshops were warmly welcomed by the PhD students. The great majority
of them were not familiar with DDL prior to the workshop, but they were very open to the new approach and could
immediately see the value of corpus consultation in academic writing. It must be borne in mind, however, that those
attending the workshops represented a self-selected cohort, so were more likely to be well-disposed to what they were
offered than a cohort that had been required to attend. The success of the workshop also indicates students’ perceived needs
for writing support. They found the workshop useful mainly because 1) there were hands-on activities throughout the
workshop, 2) the teacher’s step-by-step demonstration made the learning process easier, and 3) the activities were closely
related to academic writing problems they were experiencing.
An important question for consideration is the intensity of a single 3.5-h workshop. A similar workshop was run in an
Italian university recently, spreading the same content over two separate 3-h sessions and the issue of workshop duration or
intensity was never mentioned by the participants during post-workshop evaluation. However, those students were studying
for PhDs in language and communication. Students in other disciplines may be less willing to give up more time to what they
may consider to be a peripheral activity. At the beginning of the project, the option of a 2-h þ 2-h workshop was offered at one
university in order to find the best format for delivering the workshop, but very few students selected that option.
In line with previous studies (Charles, 2011, 2015; Yoon, 2008), the time factor is crucial in getting learners into the habit of
using corpora. In this study, the time factor also influenced learners’ selection of corpus tools. Being able to generate results
with only a few clicks made BNCweb (and Google Scholar) more favoured by the learners, even though they recognised the
benefits of AntConc and of creating their own personalised corpora. Perhaps, if we had devoted more time to practicing with
AntConc and/or encouraged students to come prepared with a collection of RA PDFs, as suggested earlier, there would have
been a greater uptake.
Overall computer literacy is a further matter that may influence the learning experience and outcome of students attending
our workshops. PhD students who have a computer science background or know about programming tend to find it easy to
learn how to formulate search syntax (see also L. Flowerdew, 2015a). Certain students were already found to be using regular
expressions in BNCweb during the workshop, exploring the corpus in their own ways. A number of times after the workshop,
students came to show the teacher a few useful computer tricks for corpus consultation. However, there was also a considerable
number of students from Humanities and Social Science backgrounds who needed more time to digest and memorise search
syntax. In this study, having a teaching assistant on the spot to provide instant individualised help worked well, especially when
there was a large group of over 30 students. Regardless of students’ computer skills, providing timely support after the
workshop is important in helping students getting into the long-term routine of using corpora. However, as suggested by the
afore-mentioned low take-up of the offer of post-workshop help and the experience of our second case study, students might be
reluctant to contact the teacher even when they encounter difficulties. So, this issue also provides food for thought.
M. Chen, J. Flowerdew / English for Specific Purposes 50 (2018) 97–112 111
In synthesis, the findings of this study indicate that, when semester-long courses are not feasible, intensive introductory
workshops can be an effective alternative for introducing PhD candidates to DDL. The take-up and success of our workshops
also suggest that there is a need for writing support among research postgraduate students in Hong Kong, a fact which is
probably the case in other countries or regions as well. Hopefully, the success of our workshops will encourage the Hong Kong
government or individual universities to put more resources into corpus training for PhD (and other) students. Furthermore,
we hope that findings from this study may be of some value to researchers and educators who wish to introduce the DDL
approach to students in their own institutions.
Funding
This work was supported by the University Grants Committee of Hong Kong government under the UGC Language-related
Collaborative Projects Initiative [Project No.: 6361001].
Acknowledgements
We would like to thank the project co-leader Professor Winnie Cheng of Hong Kong Polytechnic University, the research
assistant Ms Zoe Wei, and the expert consultants Dr Margaret Cargill and Professor Laurence Anthony for their valuable
contributions.
We are also grateful to Editor Nigel Harwood and the anonymous reviewers for their very helpful feedback on earlier
versions of this paper.
References
Anthony, L. (2016a). AntConc (Version 3.4.4) [Computer Software]. Tokyo, Japan: Waseda University. Available from: http://www.laurenceanthony.net/.
Anthony, L. (2016b). AntFileConverter (Version 1.2.0) [Computer Software]. Tokyo, Japan: Waseda University. Available from: http://www.laurenceanthony.
net/.
Anthony, L. (2016c). Introducing corpora and corpus tools into the technical writing classroom through Data-Driven Learning. In J. Flowerdew, & T. Costley
(Eds.), Discipline specific writing (pp. 162-180). London: Routledge.
Aston, G. (1997). Small and large corpora in language learning. In B. Lewandowska-Tomaszczyk, & P. J. Melia (Eds.), Procdedings of International Conference on
Practical Applications in Language Corpora (pp. 51-62). Łód z: Łódz University Press.
Belcher, D. (2007). Seeking acceptance in an English-only research world. Journal of Second Language Writing, 16, 1-22.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written English. London: Longman.
Boulton, A., & Cobb, T. (2017). Corpus use in language learning: A meta-analysis. Language Learning, 67, 348-393.
Boulton, A., Carter-Thomas, S., & Rowley-Jolivet, E. (2012). Issues in corpus-informed research and learning in ESP. In A. Boulton, S. Carter-Thomas, & E.
Rowley-Jolivet (Eds.), Corpus-informed research and learning in ESP: Issues and applications (pp. 1-14). John Benjamins Publishing.
Cargill, M., & O’Connor, P. (2006). Developing Chinese scientists’ skills for publishing in English: Evaluating collaborating-colleague workshops based on
genre analysis. Journal of English for Academic Purposes, 5, 207-221.
Cargill, M., O’Connor, P., & Li, Y. (2012). Educating Chinese scientists to write for international journals: Addressing the divide between science and
technology education and English language teaching. English for Specific Purposes, 31, 60-69.
Chang, J.-Y. (2014). The use of general and specialized corpora as reference sources for academic English writing: A case study. ReCALL, 26, 243-259.
Charles, M. (2007). Reconciling top-down and bottom-up approaches to graduate writing: Using a corpus to teach rhetorical functions. Journal of English for
Academic Purposes, 6, 289-302.
Charles, M. (2011). Using hands-on concordancing to teach rhetorical functions: Evaluation and implications for EAP writing classes. In A. Frankenberg-
Garcia, L. Flowerdew, & G. Aston (Eds.), New trends in corpora and language learning (pp. 81-104). London/New York: Continuum International Pub-
lishing Group.
Charles, M. (2012). ‘Proper vocabulary and juicy collocations’: EAP students evaluate do-it-yourself corpus building. English for Specific Purposes, 31, 93-102.
Charles, M. (2014). Getting the corpus habit: EAP students’ long-term use of personal corpora. English for Specific Purposes, 35, 30-40.
ko-Szyman
Charles, M. (2015). Same task, different corpus: The role of personal corpora in EAP classes. In A. Len ska, & A. Boulton (Eds.), Multiple affordances
of language corpora for data-driven learning (pp. 129-154). Amsterdam/Philadelphia: John Benjamins.
Chen M. & Flowerdew J., A critical review of research and practice in data-driven learning (DDL) in the academic writing classroom, International Journal of
Corpus Linguistics (under review).
Cortes, V. (2014). Genre analysis in the academic writing class: With or without corpora? Quaderns de Filologia-Estudis Lingüístics, 16, 65-80.
Cotos, E. (2014). Enhancing writing pedagogy with learner corpus data. ReCALL, 26, 202-224.
Curry, M., & Lillis, T. (2004). Multilingual scholars and the imperative to publish in English: Negotiating interests, demands, and rewards. TESOL Quarterly,
38, 663-688.
Diani, G. (2012). Text and corpus work, EAP writing and language learners. In R. Tang (Ed.), Academic writing in a second or foreign language (pp. 45-66).
London: Continuum.
Eriksson, A. (2012). Pedagogical perspectives on bundles: Teaching bundles to doctoral students of biochemistry. In J. Thomas, & A. Boulton (Eds.), Input,
process and products. Developments in teaching and language corpora (pp. 195-211). Brno, Czech Republic: Masaryk University Press.
Flowerdew, J. (1996). Concordancing in language learning. In M. Pennington (Ed.), The power of CALL (pp. 97-113). Houston: Athelstan.
Flowerdew, J. (1999). Problems in writing for scholarly publication in English: The case of Hong Kong. Journal of Second Language Writing, 8, 243-264.
Flowerdew, J. (2000). Discourse community, legitimate peripheral participation, and the nonnative-English-speaking scholars. TESOL Quarterly, 34, 127-150.
Flowerdew, J. (2001). Attitudes of journal editors to nonnative speaker contributions. TESOL Quarterly, 35, 121-150.
Flowerdew, L. (2008). Corpus linguistics for academic literacies mediated through discussion activities. In D. Belcher, & A. Hirvela (Eds.), The oral-literate
connection. Perspectives on L2 speaking, writing, and other media interactions (pp. 268-287). Ann Arbor MI: University of Michigan Press.
Flowerdew, L. (2015a). Using corpus-based research and online academic corpora to inform writing of the discussion section of a thesis. Journal of English for
Academic Purposes, 20, 58-68.
112 M. Chen, J. Flowerdew / English for Specific Purposes 50 (2018) 97–112
Flowerdew, L. (2015b). Learner corpora and language for academic and specific purposes. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge
handbook of learner corpus research (pp. 465-484). Cambridge: Cambridge University Press.
Flowerdew, J., & Forest, R. W. (2009). Schematic structure and lexico-grammatical realisation in corpus-based genre analysis: The case of ’research’ in the
PhD literature review. In M. Charles, D. Pecorari, & S. Hunston (Eds.), Academic writing: At the interface of corpus and discourse (pp. 15-36). London:
Continuum.
Flowerdew, J., & Forest, R. W. (2015). Signalling nouns in English. Cambridge: Cambridge University Press.
Ghadessy, M., Henry, A., & Roseberry, R. (2001). Small corpus studies and ELT: Theory and practice (vol. 5). Amsterdam: John Benjamins Publishing.
Granger, S. (2004). Computer learner corpus research: Current status and future prospects. In U. Connor, & T. Upton (Eds.), Applied corpus linguistics: A
multidimensional perspective (pp. 123-145). Amsterdam: Rodopi.
Holmes, R. (1997). Genre analysis, and the social sciences: An investigation of the structure of research article discussion sections in three disciplines.
English for Specific Purposes, 16, 321-337.
Hyland, K. (2005). Metadiscourse. London: Continuum.
Hyland, K. (2008). Academic clusters: Text patterning in published and postgraduate writing. International Journal of Applied Linguistics, 18, 41-62.
Johns, T. (1991a). Should you be persuaded: Two examples of data-driven learning. ELR Journal, 4, 1-16.
Johns, T. (1991b). From printout to handout: Grammar and vocabulary teaching in the context of data-driven learning. ELR Journal, 4, 27-46.
Kwan, B. S. C. (2006). The schematic structure of literature reviews in doctoral theses of applied linguistics. English for Specific Purposes, 25, 30-55.
Kwan, B. (2010). An investigation of instruction in research publishing offered in doctoral programs: The Hong Kong case. Higher Education, 59, 55-68.
Lee, D., & Swales, J. (2006). A corpus-based EAP course for NNS doctoral students: Moving from available specialized corpora to self-compiled corpora.
English for Specific Purposes, 25, 56-75.
Len ko-Szyman ska, A., & Boulton, A. (2015). Multiple affordances of language corpora for data-driven learning. Amsterdam: John Benjamins.
Li, Y., & Flowerdew, J. (2007). Shaping Chinese novice scientists’ manuscripts for publication. Journal of Second Language Writing, 16, 100-117.
Lores, R. (2004). On RA abstracts: From rhetorical structure to thematic organization. English for Specific Purposes, 23, 280-302.
Meunier, F. (2015). Developmental patterns in learner corpora. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus
Research (pp. 379-400). Cambridge: Cambridge University Press.
Swales, J. M. (1990). Genre analysis: English in academic and research settings. Cambridge: Cambridge University Press.
Tono, Y., Satake, Y., & Miura, A. (2014). The effects of using corpora on revision tasks in L2 writing with coded error feedback. ReCALL, 26, 147-162.
Tribble, C. (2013). Corpora in the language-teaching classroom. In A. Chapelle (Ed.), The Encyclopedia of Applied Linguistics (pp. 1175-1181). Oxford: Wiley-
Blackwell.
Tribble, C. (2015). Teaching and language corpora: Perspectives from a personal journey. In A. Lenko-Szymanska, & A. Boulton (Eds.), Multiple Affordances of
Language Corpora for Data-Driven Learning (pp. 37-62). Amsterdam: John Benjamins.
Yoon, H. (2008). More than a linguistic reference: The influence of corpus technology on L2 academic writing. Language Learning & Technology, 12, 31-49.
Yoon, H., & Hirvela, A. (2004). ESL student attitudes toward corpus use in L2 writing. Journal of Second Language Writing, 13, 257-283.
Dr Meilin Chen is a lecturer at Hong Kong Baptist University. She received her PhD from City University of Hong Kong and her research interests are in the
fields of learner corpus research, data-driven learning, English academic writing and language teaching.
Prof. John Flowerdew recently retired from City University of Hong Kong and is now based in the UK and is a visiting professor at Lancaster University and a
visiting research fellow at Birkbeck, University of London. He has published widely in corpus linguistics and language teaching and in discourse analysis and
EAP more generally.