0% found this document useful (0 votes)
30 views3 pages

Assignment 1 Specification - T1 - 2023 - COIT12209

Assessment 1 for COIT12209 - Data Science is due on April 18, 2023, and accounts for 40% of the total marks. Students must complete individual tasks using R code on the Titanic dataset, including data loading, analysis, and visualizations, while documenting their findings in a report. The assessment will be graded based on the quality of R code, analysis of outputs, and overall report presentation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views3 pages

Assignment 1 Specification - T1 - 2023 - COIT12209

Assessment 1 for COIT12209 - Data Science is due on April 18, 2023, and accounts for 40% of the total marks. Students must complete individual tasks using R code on the Titanic dataset, including data loading, analysis, and visualizations, while documenting their findings in a report. The assessment will be graded based on the quality of R code, analysis of outputs, and overall report presentation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

COIT12209 - DATA SCIENCE

Assessment 1 Specification

Due date: Week 6 Tuesday (18 April 2023) 11:45 pm AEST ASSESSMENT1

Weighting: 40%

Length: No fixed length

Objectives

Assessment 1 relates to unit learning outcomes 1 and 2, as stated in the unit profile. This assessment contributes
to 40% of the total marks.

Assessment 1 is an individual assessment. In assessment 1, you are assigned tasks which assess your unit
knowledge gained between weeks 1 and 5 about different facets of data science. You are required to write and
execute R code for the given tasks. You are also required to write a report which will have R code, output
screenshots showing the answers to the questions, and an analysis of the generated outputs in tasks 1-10
provided below:

Please note that ALL submitted Assessment 1 reports are passed through a computerised copy detection
system, and it is extremely easy for the teaching staff to identify copied or otherwise plagiarised work.

• Copying (plagiarism) can incur penalties ranging from deduction of marks to failing the course
or even exclusion from the University.

• Please ensure you are familiar with the Academic Misconduct Procedures, available from:
https://www.cqu.edu.au/policy

The tasks

You will use R language for data analysis exercises provided in this assessment. These tasks will help to
build your knowledge of data formats, storage, retrieval, and analysis techniques.

You are required to work on the Titanic dataset from the Moodle site. First, download the given dataset
into your working directory.

For each task, write R code, generate the output by executing the R code on the given dataset and save the
output screenshots. Save all R source codes, output screenshots and analysis of the generated outputs in
an MS Word file. This Word file is required to be submitted as a report for marking. Each task should be
numbered correctly for marking.

The data analysis tasks are given as follows.

1. Write R code to load the Titanic dataset into the defined local variable called “titanic”? Place your
screenshot. (1 mark)
2. Write R code to see the number of variables and records in the given dataset. (1 mark)
3. Write R code using tail () function to view the last 3 rows from the given dataset. (1 mark)
4. Write R code to generate a summary of information on the given dataset that should include the
minimum, maximum, and mean. Write your explanation of the extracted results. (2 marks)

5. Write R code to check the missing value and create a heat map of the missing value. Write your
explanation with a screenshot (Hints: Install the Amelia package and use the missing map function).
(3 marks)

6. Write R code to show the Histogram of passenger class. Write your explanation on generated graphs.
(3 marks)

7. Write R code to show the Histogram of child, adult and senior and provide your explanation (3 marks)

(Hints: You need to make age as categorial variable like if age<=18, then age=child or youth and so
on.)

8. Write R code to generate a ggplot to show the relationship between sex and survival. Write your
explanation. (3 marks)

9. Write R code to generate a ggplot to show if there is a correlation between Fare and Survival. Write
your explanation. (3 marks)

10. Write a reflection on how this data analysis knowledge can be used in the future (4 marks).
Assessment Criteria

Assessment 1 will be marked based on the following criteria.

Working R source code provided and screenshots: 12 marks.


Analysis presented on the generated outputs:24 marks.
Well-written report: 4 marks.

Total: 40 marks

Submission Requirement

Reports are to be written in size 12 Arial Font and double spaced. You are required to submit two files on the
Moodle website:
1. The report, called [StudentID]-report.docx.
2. A ZIP file, called to [StudentID]-files.zip, containing all R script files.

Help

To help you communicate, a general discussion forum for Assessment discussion has been set up on the
unit Moodle website.

1. Please use the forum to help you work through your assessment.

• If you have any specific queries, please feel free to email the unit coordinator and/or your
campus tutor.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy