0% found this document useful (0 votes)

16 views7 pages

Genomic Analyses Using Radseq: 1. Raw Data Manipulation

The document describes analyzing raw Illumina sequencing data from stickleback fish using R. It demonstrates uploading and inspecting the raw sequence data, performing pattern matching and subsetting to extract reads containing a specific sequence, cleaning reads by removing those containing N's, and trimming reads. It provides example code and outlines tasks for the student to practice working with ShortRead objects - including uploading data, subsetting based on barcodes and sequences, determining proportions of reads meeting criteria, and writing out filtered reads.

Uploaded by

Suany Quesada Calderon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views7 pages

Genomic Analyses Using Radseq: 1. Raw Data Manipulation

Uploaded by

Suany Quesada Calderon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Evolution and genomics, Cesky Krumlov

Daniel Berner
2. February 2016

Genomic analyses using RADseq:

1. Raw data manipulation
Demo
Upload and inspection of a raw Illumina sequence data set
(stickleback RAD sequences from the Misty system, Canada,
unpublished; 100k subset from a single SE100 Illumina lane)
library(ShortRead)
# d<-readFastq(dirPath='C:/Users/daniel/Documents/science/teaching/cesky
# krumlov 2016/course.materials/R.files',
# pattern='illumina.SE100.fastq', withIds=T)
d

## class: ShortReadQ
## length: 100000 reads; width: 100 cycles

# Call the read IDs of the ShortRead object

id(d)[1:4] # just the first four elements

## A BStringSet instance of length 4

## width seq
## [1] 53 BS-DSFCONTROL03:312:C3...1101:1232:2073 1:Y:0:
## [2] 53 BS-DSFCONTROL03:312:C3...1101:1213:2079 1:Y:0:
## [3] 53 BS-DSFCONTROL03:312:C3...1101:1185:2091 1:Y:0:
## [4] 53 BS-DSFCONTROL03:312:C3...1101:1102:2093 1:Y:0:
# Call the sequences
sread(d)[1:5]

## A DNAStringSet instance of length 5

## width seq
## [1] 100 CTGAATGCAGGTCATTTTGGTN...CGGACGNNNTCCCTGCATCCT
## [2] 100 TAGCATGCAGGAAGTCCGTTGN...CAACTCNNNAAAATTTGCCAA
## [3] 100 GCGCCTGCAGGGCTTTATCCAG...GCTGCTGNNCAGATGTCCTCC
## [4] 100 AGGTATATGCACAAAATGAGAT...CAATCTTNNCAAGCAACAGCA
## [5] 100 NCACGTGCACCAAAAAAAGAGT...NNNNNNNNNNNNNNNNNNNNN

# ... and their qualities

quality(d)[1:5]

## class: FastqQuality
## quality:
## A BStringSet instance of length 5
## width seq
## [1] 100 ;;<;@@2<@2222;@<<><;=#...#####################
## [2] 100 <<<??@<@=222@@@???@??#...#####################
## [3] 100 <<<@@@2<@22<@@@@?@?@@@...#####################
## [4] 100 <<<>?26@=))9>?@@@@@?<?...#####################
## [5] 100 #07>@222:))2=>@?@<;9>=...#####################
Pattern matching, counting

grep("TATATATATATATATATATA", sread(d))

## [1] 6053 15441 82461 84384

match <- grep("TATATATATATATATATATA", sread(d))

length(match)

## [1] 4

Subsetting of ShortRead object

d.match <- d[match]

sread(d.match)

## A DNAStringSet instance of length 4

## width seq
## [1] 100 TAGCATGCAGGGAGGCCTGTGT...TATTTTACACACAACGACAGA
## [2] 100 CTAGGTGCAGGTACAGTGATCG...TGCCTGCTCCCGACCGGCTTC
## [3] 100 CTGATGCAGGACAGGTCCTCCC...ATATATATATATATATATATC
## [4] 100 TAATGTGCAGGAGTCTGTAGTC...TATATATATATATATATATAT
Cleaning a ShortRead object

d.clean <- clean(d) # remove all reads with >= 1 'N'

sread(d.clean)[1]

## A DNAStringSet instance of length 1

## width seq
## [1] 100 CCATGTTGCAGGTGTGAAGGCT...GGGGACACGCCGGCCGTTTGC

Trimming a ShortRead object

d.trim <- narrow(d.match, start = 1, end = 10) # either end or width

quality(d.trim)

## class: FastqQuality
## quality:
## A BStringSet instance of length 4
## width seq
## [1] 10 ==>A<224?2
## [2] 10 BBCDF224A2
## [3] 10 @@@FFADDA2
## [4] 10 CCCFF222C2
Write a ShortRead object out as fastq file

# writeFastq(d.clean,
# file='C:/Users/daniel/Documents/science/teaching/cesky
# krumlov
# 2016/course.materials/R.files/my.clean.reads.fastq')
Tasks
I Upload the stickleback data set illumina.SE100.fastq
I Inspect the ID, sequence and quality of the reads 1000 to 1002
I Generate a new object X containing the data from the reads
10001-20000
I Determine the proportion of X’s reads containing one or more
’N’, and eliminate them from X
I What proportion of the filtered X is derived from the
individual with barcode (first five bases) ’CGATA’ ?
I Derive the object Y from X, including only these specific
CGATA-reads. Confirm that this worked by inspecting the
reads
I What proportion of Y’s reads contains the correct restriction
enzyme overhang ’TGCAGG’ at the correct position (i.e.,
following the barcode)? Copy these reads to object Z
I Clip the barcodes from Z, then write Z out as a fastq file

Vels University Bioinformatics Manual-2025 - Prakash Balu
No ratings yet
Vels University Bioinformatics Manual-2025 - Prakash Balu
37 pages
COMPUTATIONAL BIOLOGY Manual
No ratings yet
COMPUTATIONAL BIOLOGY Manual
37 pages
Phylip Via Emboss - Tree Building:: Phylip (Phylogeny Inference Programs)
No ratings yet
Phylip Via Emboss - Tree Building:: Phylip (Phylogeny Inference Programs)
17 pages
Rnaseq by Example
No ratings yet
Rnaseq by Example
163 pages
Beginner's Guide To Using The DESeq2 Package
No ratings yet
Beginner's Guide To Using The DESeq2 Package
32 pages
Biopython Tutorial
100% (1)
Biopython Tutorial
26 pages
HMCW NGS Data Format
No ratings yet
HMCW NGS Data Format
21 pages
Getting Sequencesfrom Gen Bankusing RPDF
No ratings yet
Getting Sequencesfrom Gen Bankusing RPDF
26 pages
Edsg04-823 (1) - FTKS50-60 PDF
100% (2)
Edsg04-823 (1) - FTKS50-60 PDF
73 pages
ScRNA Seq Course
100% (1)
ScRNA Seq Course
337 pages
Transcript Discovery
No ratings yet
Transcript Discovery
12 pages
Nazarov QC-Statistics
No ratings yet
Nazarov QC-Statistics
50 pages
On The Optimal Trimming of High-Throughput mRNAseq Data
No ratings yet
On The Optimal Trimming of High-Throughput mRNAseq Data
19 pages
RNA-Seq Analysis Course
No ratings yet
RNA-Seq Analysis Course
40 pages
M.SC Transcriptome Analysis 2025
No ratings yet
M.SC Transcriptome Analysis 2025
21 pages
IntroTutorial Dartr
No ratings yet
IntroTutorial Dartr
67 pages
Leverage and Cost of Equity
100% (1)
Leverage and Cost of Equity
52 pages
2023-GenomicaFuncional y Biocomputacion-Day1
No ratings yet
2023-GenomicaFuncional y Biocomputacion-Day1
92 pages
Intro To RNA-seq Concepts
No ratings yet
Intro To RNA-seq Concepts
85 pages
Lab 2
No ratings yet
Lab 2
7 pages
Gene Expression RNA Sequence
No ratings yet
Gene Expression RNA Sequence
120 pages
EBTY348L - Comp Genomics Lectures - Even Sem - 2024-25 - Set 2
No ratings yet
EBTY348L - Comp Genomics Lectures - Even Sem - 2024-25 - Set 2
29 pages
Intro 2 RNAseq
No ratings yet
Intro 2 RNAseq
98 pages
NOISeq
No ratings yet
NOISeq
26 pages
Biological Sequence Determination: Protein
No ratings yet
Biological Sequence Determination: Protein
68 pages
Sprocket Cat
0% (1)
Sprocket Cat
32 pages
Nihms 977214
No ratings yet
Nihms 977214
21 pages
34 Fastp An Ultra
No ratings yet
34 Fastp An Ultra
7 pages
Lecture 01 - Genome Sequencing
No ratings yet
Lecture 01 - Genome Sequencing
48 pages
Introduction To Differential Gene Expression Analysis Using RNA-seq
No ratings yet
Introduction To Differential Gene Expression Analysis Using RNA-seq
97 pages
Lab02 - Reading Results
No ratings yet
Lab02 - Reading Results
16 pages
15320
No ratings yet
15320
81 pages
WES Shivangi
No ratings yet
WES Shivangi
43 pages
Bp307 Assignment
No ratings yet
Bp307 Assignment
8 pages
Genomic Data Preprocessing Through Different Libraries
No ratings yet
Genomic Data Preprocessing Through Different Libraries
30 pages
Biopython Org DIST Docs Tutorial Tutorial HTML
No ratings yet
Biopython Org DIST Docs Tutorial Tutorial HTML
267 pages
BGi RNA-Seq Analysis
No ratings yet
BGi RNA-Seq Analysis
19 pages
Analysis of RNA-Seq Data
No ratings yet
Analysis of RNA-Seq Data
71 pages
RNA Seq R - Final Decode
No ratings yet
RNA Seq R - Final Decode
76 pages
Functional Dependency
No ratings yet
Functional Dependency
96 pages
RNA Seq Tutorial
0% (1)
RNA Seq Tutorial
139 pages
CLC Genomics Workbench User Manual Subset
No ratings yet
CLC Genomics Workbench User Manual Subset
222 pages
RIP Tutorials Bioinformatics
No ratings yet
RIP Tutorials Bioinformatics
19 pages
FreeBayes Variant Calling Workflow For DNA-Seq - Bioinformatics Workbook
No ratings yet
FreeBayes Variant Calling Workflow For DNA-Seq - Bioinformatics Workbook
9 pages
Affy Diffexp Clustering Exercise-1
No ratings yet
Affy Diffexp Clustering Exercise-1
16 pages
List of Online Bioinformatics Tools and Software - Final
No ratings yet
List of Online Bioinformatics Tools and Software - Final
23 pages
Transcriptome Software Paper
No ratings yet
Transcriptome Software Paper
7 pages
Lecture2-High Throughput Sequencing-2019
No ratings yet
Lecture2-High Throughput Sequencing-2019
58 pages
Preventive Maintenance Program For Spherical Blowout Preventer
100% (1)
Preventive Maintenance Program For Spherical Blowout Preventer
19 pages
Brief Guide For NGS Transcriptomics: From Gene Expression To Genetics
No ratings yet
Brief Guide For NGS Transcriptomics: From Gene Expression To Genetics
120 pages
Same Nva Tting
No ratings yet
Same Nva Tting
22 pages
Bioinformatics Assingment - B8.Docx Alex Presly-37
No ratings yet
Bioinformatics Assingment - B8.Docx Alex Presly-37
10 pages
Summary Bioinformation Technology
No ratings yet
Summary Bioinformation Technology
15 pages
Intro To NGS - Torsten Seemann - PeterMac - 27 Jul 2012
No ratings yet
Intro To NGS - Torsten Seemann - PeterMac - 27 Jul 2012
51 pages
Auto Glamourfinal E Mag Odhisa
No ratings yet
Auto Glamourfinal E Mag Odhisa
88 pages
Workshop Practice 1: Reading and Manipulating Short Reads
No ratings yet
Workshop Practice 1: Reading and Manipulating Short Reads
16 pages
Day1 Laros RNASeq Galaxy 2012
No ratings yet
Day1 Laros RNASeq Galaxy 2012
40 pages
Bioinfo Course Notes M1 2020 DR Mbulli
No ratings yet
Bioinfo Course Notes M1 2020 DR Mbulli
56 pages
Whelan Davis 2017 Osteosarcoma Chondrosarcoma and Chordoma
No ratings yet
Whelan Davis 2017 Osteosarcoma Chondrosarcoma and Chordoma
7 pages
Class - B.B.A. V Sem.: Syllabus
No ratings yet
Class - B.B.A. V Sem.: Syllabus
9 pages
Ruiz Daniels Rose PDF
No ratings yet
Ruiz Daniels Rose PDF
204 pages
Poster PPT Portrait
No ratings yet
Poster PPT Portrait
1 page
Chapter 3 Inspection of Sequence Quality PDF
No ratings yet
Chapter 3 Inspection of Sequence Quality PDF
18 pages
RNA-Seq Module 1
No ratings yet
RNA-Seq Module 1
54 pages
Fwwmun - Rules of Procedure
No ratings yet
Fwwmun - Rules of Procedure
20 pages
SIP Weekly Progress Report
No ratings yet
SIP Weekly Progress Report
3 pages
RNA-Seq and Transcriptome Analysis: Jessica Holmes
No ratings yet
RNA-Seq and Transcriptome Analysis: Jessica Holmes
98 pages
Tutorial Genomics
No ratings yet
Tutorial Genomics
51 pages
Using IMa3 PDF
No ratings yet
Using IMa3 PDF
70 pages
EMMEN1A - 2025-1 - Practical Learner Guide
No ratings yet
EMMEN1A - 2025-1 - Practical Learner Guide
13 pages
The Boisot I-Space KM Model: Presented To Ms. Sundus Alam Presented by
No ratings yet
The Boisot I-Space KM Model: Presented To Ms. Sundus Alam Presented by
3 pages
Reinventing The Wheel at Apex Door Company
100% (1)
Reinventing The Wheel at Apex Door Company
11 pages
NGS ToolsFormats r1 BDG
No ratings yet
NGS ToolsFormats r1 BDG
32 pages
Corel Draw Tips 4 SCRIBD
No ratings yet
Corel Draw Tips 4 SCRIBD
13 pages
01 Developing Entrepreneurial Skills
No ratings yet
01 Developing Entrepreneurial Skills
20 pages
Gene Ontology and Pathways: Ståle Nygård
No ratings yet
Gene Ontology and Pathways: Ståle Nygård
38 pages
Intro Phylo Notes
No ratings yet
Intro Phylo Notes
36 pages
Leroux1997 PDF
No ratings yet
Leroux1997 PDF
20 pages
Eva 12696 PDF
No ratings yet
Eva 12696 PDF
16 pages
Introduction To Phylogeny: 36-149 The Tree of Life Christopher R. Genovese
No ratings yet
Introduction To Phylogeny: 36-149 The Tree of Life Christopher R. Genovese
20 pages
Finding The Genomic Basis of Local Adaptation: Pitfalls, Practical Solutions, and Future Directions
No ratings yet
Finding The Genomic Basis of Local Adaptation: Pitfalls, Practical Solutions, and Future Directions
19 pages
A Tutorial: Genome - Based RNA - Seq Analysis Using The TUXEDO Package (Updated: 2014 - 10 - 21)
No ratings yet
A Tutorial: Genome - Based RNA - Seq Analysis Using The TUXEDO Package (Updated: 2014 - 10 - 21)
17 pages
GOATOOLS: A Python Library For Gene Ontology Analyses
No ratings yet
GOATOOLS: A Python Library For Gene Ontology Analyses
17 pages
Scordato Et Al-2017-Molecular Ecology
No ratings yet
Scordato Et Al-2017-Molecular Ecology
16 pages
396 2007 1 PB PDF
No ratings yet
396 2007 1 PB PDF
13 pages
Mytilus Eduli M. Trossulus
No ratings yet
Mytilus Eduli M. Trossulus
13 pages
LEA: An R Package For Landscape and Ecological Association Studies
No ratings yet
LEA: An R Package For Landscape and Ecological Association Studies
14 pages
Introducing The "Step by Step 4ID Guide" For IPC Speakers
No ratings yet
Introducing The "Step by Step 4ID Guide" For IPC Speakers
13 pages
Cnaps Securit
No ratings yet
Cnaps Securit
11 pages
Treml Et Al-2015-Diversity and Distributions
No ratings yet
Treml Et Al-2015-Diversity and Distributions
12 pages
Chapter 5
No ratings yet
Chapter 5
10 pages
T Z N - Et - Al 2018 Oikos
No ratings yet
T Z N - Et - Al 2018 Oikos
11 pages
GATE Planning 2024
No ratings yet
GATE Planning 2024
8 pages
P9NK60Z
No ratings yet
P9NK60Z
10 pages
Macse
No ratings yet
Macse
5 pages
Toeic Day 7
No ratings yet
Toeic Day 7
3 pages
Annual Online Credential Form For Associate Pastor 2023
No ratings yet
Annual Online Credential Form For Associate Pastor 2023
5 pages
Protecting People and Information Threats and Safeguards
No ratings yet
Protecting People and Information Threats and Safeguards
14 pages
Iringan v. CA, 366 SCRA 41 (2001)
No ratings yet
Iringan v. CA, 366 SCRA 41 (2001)
2 pages
EEI3262 Introduction To Object Oriented Programming - Course Synopsis
No ratings yet
EEI3262 Introduction To Object Oriented Programming - Course Synopsis
2 pages
Hawkes & Webb 1962 Review
No ratings yet
Hawkes & Webb 1962 Review
2 pages
Biochem 225internal 2005
No ratings yet
Biochem 225internal 2005
1 page
Captura de Tela 2022-07-03 À(s) 20.45.36
No ratings yet
Captura de Tela 2022-07-03 À(s) 20.45.36
1 page
Toro Et Al 2016. BPA and NP Removal From Municipal Wastewater by Tropical Horizontal Subsurface Constructed Wetlands-1
No ratings yet
Toro Et Al 2016. BPA and NP Removal From Municipal Wastewater by Tropical Horizontal Subsurface Constructed Wetlands-1
1 page
Focus On Your Science: Features
No ratings yet
Focus On Your Science: Features
1 page
Using Virtual DJ - New Features Only
No ratings yet
Using Virtual DJ - New Features Only
18 pages
MG MGA MK II1600 - Key Facts
No ratings yet
MG MGA MK II1600 - Key Facts
6 pages
Ian Talks JS A-Z: WebDevAtoZ, #1
From Everand
Ian Talks JS A-Z: WebDevAtoZ, #1
Ian Eress
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Genomic Analyses Using Radseq: 1. Raw Data Manipulation

Uploaded by

Genomic Analyses Using Radseq: 1. Raw Data Manipulation

Uploaded by

Evolution and genomics, Cesky Krumlov

Genomic analyses using RADseq:

# Call the read IDs of the ShortRead object

## A BStringSet instance of length 4

## A DNAStringSet instance of length 5

# ... and their qualities

## [1] 6053 15441 82461 84384

match <- grep("TATATATATATATATATATA", sread(d))

Subsetting of ShortRead object

d.match <- d[match]

## A DNAStringSet instance of length 4

d.clean <- clean(d) # remove all reads with >= 1 'N'

## A DNAStringSet instance of length 1

Trimming a ShortRead object

d.trim <- narrow(d.match, start = 1, end = 10) # either end or width

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.