Assignment 1 Specification - T1 - 2023 - COIT12209
Assignment 1 Specification - T1 - 2023 - COIT12209
Assessment 1 Specification
Due date: Week 6 Tuesday (18 April 2023) 11:45 pm AEST ASSESSMENT1
Weighting: 40%
Objectives
Assessment 1 relates to unit learning outcomes 1 and 2, as stated in the unit profile. This assessment contributes
to 40% of the total marks.
Assessment 1 is an individual assessment. In assessment 1, you are assigned tasks which assess your unit
knowledge gained between weeks 1 and 5 about different facets of data science. You are required to write and
execute R code for the given tasks. You are also required to write a report which will have R code, output
screenshots showing the answers to the questions, and an analysis of the generated outputs in tasks 1-10
provided below:
Please note that ALL submitted Assessment 1 reports are passed through a computerised copy detection
system, and it is extremely easy for the teaching staff to identify copied or otherwise plagiarised work.
• Copying (plagiarism) can incur penalties ranging from deduction of marks to failing the course
or even exclusion from the University.
• Please ensure you are familiar with the Academic Misconduct Procedures, available from:
https://www.cqu.edu.au/policy
The tasks
You will use R language for data analysis exercises provided in this assessment. These tasks will help to
build your knowledge of data formats, storage, retrieval, and analysis techniques.
You are required to work on the Titanic dataset from the Moodle site. First, download the given dataset
into your working directory.
For each task, write R code, generate the output by executing the R code on the given dataset and save the
output screenshots. Save all R source codes, output screenshots and analysis of the generated outputs in
an MS Word file. This Word file is required to be submitted as a report for marking. Each task should be
numbered correctly for marking.
1. Write R code to load the Titanic dataset into the defined local variable called “titanic”? Place your
screenshot. (1 mark)
2. Write R code to see the number of variables and records in the given dataset. (1 mark)
3. Write R code using tail () function to view the last 3 rows from the given dataset. (1 mark)
4. Write R code to generate a summary of information on the given dataset that should include the
minimum, maximum, and mean. Write your explanation of the extracted results. (2 marks)
5. Write R code to check the missing value and create a heat map of the missing value. Write your
explanation with a screenshot (Hints: Install the Amelia package and use the missing map function).
(3 marks)
6. Write R code to show the Histogram of passenger class. Write your explanation on generated graphs.
(3 marks)
7. Write R code to show the Histogram of child, adult and senior and provide your explanation (3 marks)
(Hints: You need to make age as categorial variable like if age<=18, then age=child or youth and so
on.)
8. Write R code to generate a ggplot to show the relationship between sex and survival. Write your
explanation. (3 marks)
9. Write R code to generate a ggplot to show if there is a correlation between Fare and Survival. Write
your explanation. (3 marks)
10. Write a reflection on how this data analysis knowledge can be used in the future (4 marks).
Assessment Criteria
Total: 40 marks
Submission Requirement
Reports are to be written in size 12 Arial Font and double spaced. You are required to submit two files on the
Moodle website:
1. The report, called [StudentID]-report.docx.
2. A ZIP file, called to [StudentID]-files.zip, containing all R script files.
Help
To help you communicate, a general discussion forum for Assessment discussion has been set up on the
unit Moodle website.
1. Please use the forum to help you work through your assessment.
• If you have any specific queries, please feel free to email the unit coordinator and/or your
campus tutor.