Ej 1140879
Ej 1140879
Samuel Sambasivam
ssambasivam@apu.edu
Computer Science
Azusa Pacific University
Azusa, CA 91702 USA
Brian Rague
brague@weber.edu
Computer Science
Weber State University
Ogden, UT USA 84408
Stuart Wolthuis
stuartlw@byuh.edu
Computer and Information Sciences
Brigham Young University-Hawaii
Laie, HI 96762 USA
Abstract
In this research, we compare two languages, Java and Python, by performing a content analysis of
words in textbooks that describe important programming concepts. Our goal is to determine which
language has better textbook support for teaching introductory programming courses. We used the
TextSTAT program to count how often our list of concept words appear in a sample of Java and Python
textbooks. We summarize and compare the results, leading to several conclusions that relate to the
choice of language for a CS0 or CS1 course.
_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 4
http://iscap.info
Information Systems Education Journal (ISEDJ) 15 (3)
ISSN: 1545-679X May 2017
__________________________________________________________________________________________________________________________
When Computer Science programs at universities predominantly visual environments such as Alice,
began to develop, the choice of an introductory or more dynamic popular choices such as Python.
programming language was determined primarily
by the curriculum designers, with an emphasis on Purpose of this Research
the pedagogical value of the language rather than Much research has been performed over the last
its popularity or practicality in developing real- few decades on which language is best for an
world applications. As might be expected in the introductory programming course (Brilliant &
academic world, there was and still is a diversity Wiseman, 1996). In an effort to contribute to this
of opinion on what the first language should be discussion, our research focuses on two
(Siegfried, Chays, & Herbert, 2008). languages--Java and Python. These languages
are increasing in popularity for introductory
The most recent Computer Science Curriculum courses, especially Python (Guo, 2014). Rather
Guidelines (2013) published by ACM/IEEE state than evaluate the usability or suitability of the
that "...advances in the field have led to an even languages within an introductory context, we
more diverse set of approaches in introductory performed a content analysis (Krippendorff,
courses [and these] approaches employed in 2012) of Java and Python textbooks to determine
introductory courses are in a greater state of how well they cover important CS0/CS1
flux." Moreover, the report observes "...that programming concepts such as class and
rather than a particular paradigm or language algorithm.
coming to be favored over time, the past decade
has only broadened the list of programming We developed a list of basic programming
languages now successfully used in introductory concepts that might be taught in an introductory
courses". course. Initial sources used for developing this
concepts list were drawn from various
In the 1970s and 1980s, Pascal became the instructional assessments, curriculum resources,
language taught most often in introductory and introductory course content that we designed
programming courses. Eventually, many schools ourselves or researched. We then counted how
moved to C for practical reasons, since graduates often each textbook mentioned each concept. We
rarely used Pascal in their employment. As the did not study the order in which the concepts
benefits of object-oriented programming became were presented, nor did we judge how well the
evident, the first language evolved to C++ and concepts were explained. We simply summarized
later to Java, which provides a more managed frequencies for the words that represented each
development environment (de Raadt, Watson, & concept.
Tolman, 2002).
An instructor in a programming course usually
The tradeoffs of an object-first approach versus chooses a textbook to guide how she/he will
an imperative-first approach in introductory organize and present the material. Our main
courses have been extensively and hotly debated research assumption is that the framework of the
(Lister, 2006). This decision about which author is reflected by the words used most often
programming paradigm to teach beginning in the textbook. The framework we are evaluating
students strongly influences the choice of is one that is appropriate for introductory
introductory language. Alternatively, some early programming. From the author's choice of words,
courses in CS emphasized broader computing we can judge how suitable the textbook will be for
concepts rather than the subtleties of teaching the main concepts of the programming
programming syntax (Sooriamurthis, 2010). The course.
paramount question regarding the delivery of an
effective introductory CS course remains "What to 2. METHODOLOGY
teach?", followed immediately by "Which
language best supports the concepts to be This section of the paper describes the
taught?". methodology used to collect word frequency data
from selected Java and Python textbooks. The
In recent years, the increased demand for words we examine represent important concepts
programming courses for liberal arts students has for an introductory programming course.
led to the development of what are termed CS0
courses (with CS1 courses aimed for CS majors). Programming Concepts
The preferred programming language for a CS0 We created a list of important programming
course is often different from the language taught concepts from several sources. We started with
in CS1. CS0 languages trend toward an initial list of programming terms taken from
quizzes and exams we have given to CS1
_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 5
http://iscap.info
Information Systems Education Journal (ISEDJ) 15 (3)
ISSN: 1545-679X May 2017
__________________________________________________________________________________________________________________________
students to measure their understanding of Reader to create a text file for each of the 20
course topics. In earlier research, we performed textbooks in our study.
a word frequency analysis of object-oriented
programming (OOP) textbooks (representing a We noticed that the text file versions of the books
variety of languages) to empirically reveal included many character strings that contained
frequent OOP concepts. We used the results of digits, punctuation, and other non-alphabetic
that study to form a list of OOP words. symbols. To simplify our counting of concept
In the current study, we created a list consisting words, we wrote a short program (in Python) that
of programming concepts mentioned in the removed all non-letter symbols and replaced
Programming Fundamentals (PF) section of the them with blank characters. This program also
Computing Curricula 2001 Computer Science converted all letters to lower-case. We used this
Final Report (2001). We created an additional list program to obtain a filtered set of 20 text files
of concepts based on the Software Development which consisted of only letters and blanks. Note
Fundamentals (SDF) section of the Computer that none of the targeted word groups contains a
Science Curricula 2013 Final Report. numeric or special character.
_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 6
http://iscap.info
Information Systems Education Journal (ISEDJ) 15 (3)
ISSN: 1545-679X May 2017
__________________________________________________________________________________________________________________________
Convert Word Counts to Word Rates larger. This is primarily due to the greater number
Because each textbook contains a different of concept words in the Java books.
number of words, the actual word counts for
concepts are not comparable across books. Statistic Java Python
Larger books tend to have larger word counts. To Sample N 100 100
standardize the counts, we converted each word Minimum 0.34 0.00
count for a concept to a word rate. The rate we Centile 25 18.92 10.50
chose was "per 100,000 words". That is, we Median 58.00 38.05
divided the concept word count by the total Centile 75 134.27 116.68
number of words in the book and multiplied by Maximum 987.40 601.93
100,000. IQR 115.35 106.18
Mean 109.95 90.59
For example, Schildt's book mentioned above
contains a total of 325,991 words. The word count
Table 1: Distributions of Trimmed Means
for the object concept is 2054. This count is
rescaled to a word rate as shown below:
For the Java distribution, the maximum word rate
word rate = (2054/325,991)*100,000 = 630.1 is for the concept class, and the minimum word
rate is for decomposition. For Python, the
This means that the object concept is mentioned maximum word rate is for function, while the
630.1 times per 100,000 words in Schildt's book. minimum word rate is (again) for decomposition.
Word rates were calculated for each concept in The Java median word rate is the midpoint
each book. between the word rates of the two middle
concepts stream and block. For Python, the two
Calculate Trimmed Means middle concepts are block and event.
After concept word rates were obtained in all Java
and Python textbooks, averages were calculated The mean of the Java word rates is almost twice
separately for the Java and Python values. the size of the median. This indicates that the
Because the word rates for concepts (Java or distribution is positively skewed, mainly due to
Python) often varied widely from book to book, the presence of several high word rates (including
we calculated trimmed means (instead of the the maximum value). The mean of the Python
usual untrimmed versions) to diminish the effect word rates is more than twice the size of the
of outliers. To provide a conservative treatment median, indicating another positively skewed
for these outliers, our trimmed means include distribution.
only the middle 6 out of 10 word rates. The top
two and bottom two word rates are dropped. The variability of scores in a distribution is usually
described by the standard deviation. However,
For example, word rates for the object concept in this statistic is inflated when outliers are present.
all 10 Java textbooks are: A more stable measure of variation is the
522.4 561.7 630.1 334.5 843.3 interquartile range IQR (Upton & Cook, 1996),
684.9 703.5 767.2 863.5 488.4 which is the difference between the 75th centile
value and the 25th centile value. For Java, the
Removing the two highest rates (863.5 and 75th centile concept is definition, and the 25th
843.3) and two lowest rates (334.5 and 488.4), centile concept is link. The corresponding
the trimmed mean for object in the Java books is concepts for Python are set (75th centile) and
645.0. Two trimmed means were calculated for literal (25th centile).
each concept, one for Java and the other for
Python. The word rates for programming concepts tend to
be higher in the Java books. Overall, 62 of the
Distributions of Trimmed Means 100 concepts have a higher word rate in the Java
Each set of books (Java and Python) provided a books than in the Python books. The remaining
sample of 100 trimmed means, representing word 38 concepts appear more often in the Python
rates for the 100 concepts. A statistical books. Additional details and comparisons of
description of the Java and Python distributions is these two word rate distributions are presented in
summarized in Table 1. the following sections.
_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 7
http://iscap.info
Information Systems Education Journal (ISEDJ) 15 (3)
ISSN: 1545-679X May 2017
__________________________________________________________________________________________________________________________
Eleven of the concepts appear on both lists, but The concepts that appear on both least-frequent
in different ranked positions. This demonstrates lists include a few surprises. Some of these
substantial agreement by authors on which concepts are often considered important by
concepts are most important in both languages. programming instructors. Certainly abstraction is
Four concepts are on the Java list only, and four a key programming topic. Of the three pillars of
others are confined to the Python list. The OOP (encapsulation, inheritance, and
concepts that are not on both lists are shown in polymorphism), two are on both least-frequent
bold. lists. Thankfully, these textbooks spare
inheritance from such neglect. The signature
Among the Java concepts, the top three--class, concept, relevant to polymorphism, is rarely
method, and object--describe features of object- mentioned.
oriented programming (OOP). These concepts are
also on the Python list, but with lower word rates. Function and procedure were once distinct
Six of the Java concepts--value, string, type, concepts in modular programming. Perhaps due
variable, array, and number--describe data types to compromises made in the design of the C
and data structures. The Python list contains four language (and perpetuated in C++ and Java), the
of these concepts, but replaces array with list and procedure word has been replaced with "void"
excludes variable. functions.
The I/O concept file is on both lists, but has a From the Software Engineering (SE) vocabulary,
higher word rate in the Python books. The Java quality and maintainable are held in low regard
concept thread is rarely mentioned in the Python by both Java and Python textbooks. The concept
texts. Function and module are older terms used of pointer has low word rates, although the
to describe modular programming. Python retains substitute term reference does appear more often
these terms, whereas the Java books prefer the in both sets of books. Keyword is more popular
OOP concepts method and class. than reserved word. Finally, almost none of the
books contain decomposition, which is the least
Least Frequent Concepts frequent word on both lists. This concept
The fifteen programming concepts with the lowest embodies a core strategy in modular
word rates for Java and Python are listed in Table programming.
3. Again, eleven of the concepts appear on both
lists, but in different ranked positions. This shows
_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 8
http://iscap.info
Information Systems Education Journal (ISEDJ) 15 (3)
ISSN: 1545-679X May 2017
__________________________________________________________________________________________________________________________
Middle Frequency Concepts between the Java and Python word rates. For
We have presented word rates for the top 15 and most programming concepts, a higher word rate
bottom 15 programming concepts, and now turn in the Java books should suggest a higher word
our attention to the 70 concepts with middle-level rate in the Python books, and vice versa.
usage rates. This list of concepts is too long to
include in a single table in the paper. Instead, in To measure the degree of linearity in the
Table 4 we present 10 Software Engineering relationship, we calculated the Pearson
concepts that have middle-level word rates in the correlation coefficient. The correlation value we
programming textbooks. obtained for our 100 pairs of scores was 0.601,
which is positive but far from 1.0.
Java Python
Concept Rate Rate We do not claim that the relationship should be
problem 63.9 57.9 linear, but it should be monotonic. A better
solution 32.1 48.1 statistic for monotonic relationships is the
requirement 29.9 42.8 Spearman rank-order correlation (Maritz, 1995).
specification 55.5 39.5 Our result for the Spearman statistic was 0.726,
model 25.1 13.6 which describes a fairly strong increasing
algorithm 34.9 22.5 relationship between Java and Python word
design 49.2 12.3 ranks.
test 85.5 138.2
A scatter diagram of the word rate pairs,
style 21.1 17.7
converted to ranks from 1 (highest rank) to 100
document 40.5 44.0
(lowest rank), is displayed as Figure 1.
Table 4: Middle Frequency Concepts 100
Software Engineering Words
90
40
Concepts on the list include problem (Java/Python 30
rates 63.9/57.9) and solution (Java/Python rates
20
32.1/48.1), reflecting the problem-solving focus
in SE. The words requirement, specification, 10
_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 9
http://iscap.info
Information Systems Education Journal (ISEDJ) 15 (3)
ISSN: 1545-679X May 2017
__________________________________________________________________________________________________________________________
Table 5: Largest Differences in Ranks First, words that describe our 100 programming
("Highest" rank is 1) concepts have a greater density (higher word
rates) in the Java books in our study. The word
The choice of how large the difference in ranks rate distribution for Java has a mean of 109.25,
should be to consider a concept an outlier is with a maximum value of 987.40. For Python,
subjective. In this table, we include all pairs in the mean is 90.59, with a maximum of 601.93.
which the difference in ranks is 30 or larger. A
negative difference occurs when Python has a Second, there is remarkable agreement between
higher rank. A positive difference favors Java. the programming concepts mentioned most often
Note that all but two of the concepts in Table 5 in the Java and Python books. Eleven of the top
have a higher Java rank. 15 Java concepts are also included in the top 15
Python concepts. Highly-used concepts for both
We noted earlier that function and module are languages include class, object, and method,
among the top fifteen concepts in word frequency each representing OOP.
in Python books. This table indicates that these
two popular Python concepts appear much less Third, there is also agreement on which concepts
often in Java books. Three OOP concepts-- are rarely mentioned in both sets of books. Eleven
constructor, component, and interface--are of the bottom 15 Java concepts are also in the list
favored by Java books. of 15 least-used Python concepts. Common
neglected concepts include encapsulation and
The data concepts array, declaration, and polymorphism for OOP, plus SE concepts quality
constant appear less often in Python books for and maintainable. It is disappointing that
various reasons. Python prefers lists over arrays. abstraction is on both bottom 15 lists.
Variables are not overtly declared in Python.
Stream I/O, as a generalization of file I/O, is Fourth, several concepts appear on only one of
implemented in Java as stream classes. Real-time the top 15 or bottom 15 word lists for Java and
events and threads are common Java features, Python. The top 15 Java-only concepts include
but not Python. array and variable. Among the top 15 Python-only
concepts, array is replaced by list, and other
4. SUMMARY AND CONCLUSIONS concepts are added. The bottom 15 Java concepts
include module, which is a top 15 concept for
The choice of programming language for Python. The bottom 15 Python list includes
introductory Computer Science courses is a thread, which is a top 15 concept for Java.
strong indicator of the concepts emphasized
during course instruction. Ongoing discussion Fifth, a fairly strong increasing relationship exists
about what to teach and which language tool best between concept ranks for Java vs. Python, as
supports learning objectives for introductory indicated by a rank-order correlation of 0.726.
programming courses continues unabated among There are a few clear exceptions to this
instructors, administrators, and accreditation relationship. Thread, constructor, and declaration
organizations. A definitive “best practices” have much higher Java ranks. Module and
approach in this area remains unresolved. Our function have much higher Python ranks.
current work further informs this debate by
correlating core programming concepts with
_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 10
http://iscap.info
Information Systems Education Journal (ISEDJ) 15 (3)
ISSN: 1545-679X May 2017
__________________________________________________________________________________________________________________________
Sixth, Java and Python textbooks devote Computing Machinery, IEEE Computer
substantial time on practical concepts that Society, 2013.
describe how to write code. Discussion of
Software Engineering concepts that deal with how deRaadt, Michael, Watson, Richard, and Toleman,
to think like a programmer and write efficient, Mark, “Language Trends in Introductory
maintainable code receive less attention. This Programming Courses,” InSITE, June 2002.
learning goal may be less important in an proceedings.informingscience.org
introductory programming course, but it becomes /IS2002Proceedings/papers
a major focus as students progress through a /deRaa136Langu.pdf
Computer Science degree program.
Guo, Philip, "Python is Now the Most Popular
Overall, both Java and Python books provide Introductory Teaching Language at Top U.S.
reasonable levels of support for most of the Universities." Communications of the ACM,
programming concepts we considered. The choice Blogs, 2014.
of Java or Python (or other language) for an
Hertz, Matthew, "What do 'CS1' and 'CS2' Mean?
introductory class should be based on
Investigating Differences in the Early
considerations beyond textbook support for
Courses." SIGCSE Proceesings, Milwaukee,
important concepts. Whatever language and
2010.
textbook are chosen, instructors must be
prepared to provide additional material to achieve Huning, M, TextSTAT 2.7 User’s Guide. TextSTAT,
their desired course objectives. created by Gena Bennett, 2007.
_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 11
http://iscap.info
Information Systems Education Journal (ISEDJ) 15 (3)
ISSN: 1545-679X May 2017
__________________________________________________________________________________________________________________________
Roberts, Eric. S., The Art and Science of Java Kuhlman, Dave, A Python Book: Beginning
(Preliminary Draft). Stanford University, Python, Advanced Python, and Python
2006. Exercises. Dave Kuhlman, 2009.
Schildt, Herbert, Java: The Complete Reference Lutz, Mark, Programming Python (4th ed).
(7th ed). McGraw-Hill, 2007. O'Reilly, 2011.
Sierra, Kathy, and Bert Bates, Head First Java Maruch, Stef, and Aahz Maruch, Python for
(2nd ed). O'Reilly. Dummies. Wiley, 2006.
Stein, Lynn Andrea, Interactive Programming in Payne, James, Beginning Python: Using Python
Java. Lynn Andrea Stein, 1999. 2.6 and Python 3.1. Wiley Publishing, 2010.
Wu, C. Thomas, An Introduction to Object- Pilgrim, Mark, Dive Into Python. Mark Pilgrim,
Oriented Programming with Java (5th ed). 2004.
McGraw-Hill, 2010.
Zelle, John M., Python Programming: An
Introduction to Computer Science (Version
1). Wartburg College, 2002.
_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 12
http://iscap.info
Information Systems Education Journal (ISEDJ) 15 (3)
ISSN: 1545-679X May 2017
__________________________________________________________________________________________________________________________
APPENDIX
Table 6: Concept Word Rate Trimmed Means for Java and Python
Java Python Java Python
Concept Rate Rate Concept Rate Rate
1 abstraction 5.9 0.6 51 literal 14.0 10.5
2 algorithm 34.9 22.5 52 local 36.2 36.0
3 argument 114.4 142.7 53 loop/looping 112.6 152.5
4 array 272.2 7.8 54 maintain/maintainable 7.1 5.8
5 assignment/assign 53.7 55.8 55 method 949.8 298.9
6 block 56.9 38.4 56 model/modeling 25.1 13.6
7 boolean 82.0 19.8 57 module 1.3 235.8
8 branch/branching 3.3 3.1 58 nest/nested 23.0 22.4
9 case 127.0 81.0 59 number/numeric 251.4 319.7
10 character 120.0 119.6 60 object 645.0 336.7
11 class 987.4 297.0 61 operation/operator 139.1 157.7
12 code 213.2 300.6 62 output 106.8 80.0
13 component 100.4 17.2 63 parameter 92.7 84.0
14 condition/conditional 49.1 53.1 64 pattern 37.1 32.5
15 constant 63.1 6.6 65 pointer 4.2 2.8
16 constructor 141.1 9.9 66 polymorphism 5.5 2.5
17 control 61.7 22.7 67 problem 63.9 57.9
18 correct/correctness 21.2 18.1 68 procedure 4.7 1.6
19 data 133.5 175.5 69 process/processing 61.7 74.0
20 debug/debugging 8.1 15.0 70 program 460.6 462.1
21 declaration/declare 80.9 7.6 71 quality 0.6 1.5
22 decomposition/decompose 0.3 0.0 72 queue 16.1 0.6
23 definition/define 134.3 95.1 73 record 7.9 6.9
24 design 49.2 12.3 74 recursion/recursive 25.0 28.0
25 development/develop 23.9 27.5 75 reference 84.2 34.4
26 documentation/document 40.5 44.0 76 relation/relational 5.4 6.6
27 dynamic/dynamically 9.3 7.6 77 requirement/require 29.9 42.8
28 efficient/efficiency 12.7 9.9 78 reserved 5.1 3.9
29 encapsulation/encapsulate 9.3 4.0 79 scope 12.5 7.7
30 error 77.9 102.9 80 selection 13.1 10.9
31 event 152.8 37.7 81 sequence 50.3 67.2
32 exception 125.3 89.7 82 set 142.4 116.7
33 expression 98.1 111.0 83 signature 7.9 1.5
34 file 216.9 372.0 84 software 20.2 21.1
35 floating/floating-point 13.5 16.7 85 solution/solve/solving 32.1 48.1
36 function 24.8 601.9 86 specification/specify 55.5 39.5
37 identifier 11.8 9.8 87 stack 56.2 9.7
38 implementation/implement 144.4 45.2 88 statement 212.1 203.1
39 index 60.5 74.2 89 stream 59.1 5.1
40 information 68.4 72.2 90 string 399.8 410.4
41 inheritance/inherit 44.1 21.1 91 structure 33.5 44.7
42 input 74.6 128.9 92 style 21.1 17.7
43 instance 137.3 110.4 93 system 253.7 55.5
44 integer 116.0 94.0 94 test/testing 85.5 138.2
45 interface 161.0 44.4 95 thread 188.2 0.6
46 iteration/iterate 11.7 20.5 96 tree 16.8 19.6
47 keyword 21.4 23.1 97 type 369.5 204.0
48 line 146.4 263.7 98 user 110.9 151.7
49 link/linked 18.9 17.4 99 value 477.5 451.1
50 list 137.1 487.0 100 variable 288.6 164.8
_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 13
http://iscap.info