Skip to content

Commit

Permalink
Merge pull request #33 from GoekeLab/dev
Browse files Browse the repository at this point in the history
update master branch with blow5 tutorials
  • Loading branch information
cying111 authored Mar 7, 2023
2 parents d9234dd + f323877 commit 01222dd
Show file tree
Hide file tree
Showing 10 changed files with 418 additions and 116 deletions.
21 changes: 16 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ https://groups.google.com/forum/#!forum/sg-nex-updates/join

## Data Release and Access

**Latest Release (v0.3)**
**Latest Release (v0.4)**

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5574654.svg)](https://doi.org/10.5281/zenodo.5574654)

Expand All @@ -43,6 +43,7 @@ This release includes 86 samples from 11 different cell lines.
You can access the following data through the [AWS Open Data Registry](https://registry.opendata.aws/sgnex/):

- raw files (fast5)
- raw files (blow5)
- basecalled files (fastq)
- aligned reads (genome and transcriptome) (bam)
- tracks for visualisation (bigwig and bigbed)
Expand All @@ -51,14 +52,24 @@ You can access the following data through the [AWS Open Data Registry](https://r
- annotation files
- detailed sample and experiment information

You can browse the S3 data [here](http://sg-nex-data.s3-website-ap-southeast-1.amazonaws.com/).
You can browse the S3 data here: 1) [fast5, fastq, and bam](http://sg-nex-data.s3-website-ap-southeast-1.amazonaws.com/) and 2) [blow5](http://sg-nex-data-blow5.s3-website-ap-southeast-1.amazonaws.com/).

Please refer to the [data access tutorial](docs/AWS_data_access_tutorial.md) which describes the S3 data structure and how to access files with [AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/s3/). The direct links to the data are listed in the [sample spreadsheet](docs/samples.tsv).

_**Citation**_: Please cite the pre-print describing the SG-NEx data resource when using these data, and add the following details: "The SG-NEx data was accessed on [DATE] at registry.opendata.aws/sg-nex-data".

Chen, Y. _et al._ "A systematic benchmark of Nanopore long read RNA sequencing for transcript level analysis in human cell lines." _bioRxiv_ (2021). doi: https://doi.org/10.1101/2021.04.21.440736

**Release Note**

Version Number: V0.4.0
Date: 2023-03-06
Update of the SG-NEx data on AWS. Includes raw signal data in blow5 format.

Version Number: V0.3.0
Date: 2022-07-28
Initial release of the SG-NEx data on AWS. Includes Nanopore direct RNA, cDNA, direct cDNA-Seq, short read RNA-Seq and m6ACE-Seq.

**Release History**

You can find previous releases here in the [release history](https://github.com/GoekeLab/sg-nex-data/releases)
Expand Down Expand Up @@ -89,6 +100,8 @@ The following short tutorials are available that demonstrate how to analyse the

- [Identification of m6A with the SG-NEx samples (using m6Anet)](./docs/SG-NEx_m6Anet_tutorial.md)

- [Basecalling and analysing SG-NEx samples in S/BLOW5 format](./docs/SG-NEx_blow5_tutorial.md)

Additional, more detailed workflows can be found here:

- [Transcript discovery, quantification, and differential transcript expression from long read RNA-Seq data (using Bambu)](https://github.com/GoekeLab/bambu)
Expand All @@ -99,16 +112,14 @@ Additional, more detailed workflows can be found here:


## Contributors

**GIS Sequencing Platform and Data Generation**
Hwee Meng Low, Yao Fei, Sarah Ng, Wendy Soon, CC Khor

**Cancer Genomics and RNA Modifications**
Viktoriia Iakovleva, Puay Leng Lee, Lixia Xin, Hui En Vanessa Ng, Jia Min Loo, Xuewen Ong, Hui Qi Amanda Ng, Suk Yeah Polly Poon, Hoang-Dai Tran, Kok Hao Edwin Lim, Huck Hui Ng, Boon Ooi Patrick Tan, Huck-Hui Ng, N.Gopalakrishna Iyer, Wai Leong Tam, Wee Joo Chng, Leilei Chen, Ramanuj DasGupta, Yun Shen Winston Chan, Qiang Yu, Torsten Wüstefeld, Wee Siong Sho Goh

**Statistical Modeling and Data Analytics**
Ying Chen, Nadia M. Davidson, Harshil Patel, Yuk Kei Wan, Min Hao Ling, Yu Song Chuah, Naruemon Pratanwanich, Christopher Hendra, Laura Watten, Chelsea Sawyer, Dominik Stanojevic, Philip Andrew Ewels, Andreas Wilm, Mile Sikic, Alexandre Thiery, Michael I. Love, Alicia Oshlak, Jonathan Göke

Ying Chen, Hasindu Gamaarachchi, Nadia M. Davidson, Harshil Patel, Yuk Kei Wan, Min Hao Ling, Yu Song Chuah, Naruemon Pratanwanich, Christopher Hendra, Laura Watten, Chelsea Sawyer, Dominik Stanojevic, Philip Andrew Ewels, Andreas Wilm, Mile Sikic, Alexandre Thiery, Michael I. Love, Alicia Oshlak, Jonathan Göke
## Citing the SG-NEx project

The SG-NEx resource is described in:
Expand Down
2 changes: 1 addition & 1 deletion docs/AWS_README
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ The SG-NEx data set is documented here: https://github.com/GoekeLab/sg-nex-data

The folder structure and data access tutorial is described here: https://github.com/GoekeLab/sg-nex-data/blob/master/docs/AWS_data_access_tutorial.md

The data browser link is here: http://sg-nex-data.s3-website-ap-southeast-1.amazonaws.com/
The data browser link is here: http://sg-nex-data.s3-website-ap-southeast-1.amazonaws.com/ and http://sg-nex-data-blow5.s3-website-ap-southeast-1.amazonaws.com/

9 changes: 9 additions & 0 deletions docs/AWS_RELEASE_NOTE
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
The Singapore Nanopore Expression (SG-NEx) Project: Release updates on AWS S3 open data

Version Number: V0.4.0
Date: 2023-03-06
By: Ying Chen, Genome Insitute of Singapore
Update of the SG-NEx data on AWS. Includes raw signal data in blow5 format. Please refer to https://github.com/GoekeLab/sg-nex-data for a detailed documentation.

Data access tutorial: https://github.com/GoekeLab/sg-nex-data/blob/master/docs/AWS_data_access_tutorial.md
Data browser link: http://sg-nex-data-blow5.s3-website-ap-southeast-1.amazonaws.com/
Contact and questions: https://github.com/GoekeLab/sg-nex-data/discussions


Version Number: V0.3.0
Date: 2022-07-28
Expand Down
30 changes: 20 additions & 10 deletions docs/AWS_data_access_tutorial.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,18 @@ SG-NEx data source contains long read (Oxford Nanopore) RNA sequencing data for

The SG-NEx S3 bucket contains the following types of data:

- [Raw sequencing signal (fast5)](#raw-sequencing-signal)
- [Basecalled sequences (fastq)](#basecalled-sequences)
- [Aligned sequences (bam)](#aligned-sequences)
- [Data visualisation tracks (bigwig/bigbed)](#data-visualisation-tracks)
- [Annotations](#annotations)
- [Processed data for RNA modification detection](#processed-data)
- [Sample and experiment information](#sample-and-experimental-data)
- [Raw sequencing signal (fast5)](#raw-sequencing-signal)
- [Basecalled sequences (fastq)](#basecalled-sequences)
- [Aligned sequences (bam)](#aligned-sequences)
- [Data visualisation tracks (bigwig/bigbed)](#data-visualisation-tracks)
- [Annotations](#annotations)
- [Processed data for RNA modification detection](#processed-data)
- [Sample and experiment information](#sample-and-experimental-data)

Below is the folder index for the open data bucket:
The SG-NEx S3 BLOW5 bucket contains the following types of data:
- [Raw sequencing signal (blow5)](#raw-sequencing-signal-in-blow5-format)

Below is the folder index for the open data buckets:

![folder indexing\!](/images/folder_index.png)

Expand All @@ -24,6 +27,14 @@ aws s3 ls --no-sign-request s3://sg-nex-data/data/sequencing_data_ont/fast5/ # l
aws s3 sync --no-sign-request s3://sg-nex-data/data/sequencing_data_ont/fast5/sample_name . # download fast5 files to your local directory
```

# Raw sequencing signal in BLOW5 format
To access raw sequencing (blow5) files:

```bash
aws s3 ls --no-sign-request s3://sg-nex-data-blow5/ # list samples
aws s3 sync --no-sign-request s3://sg-nex-data-blow5/sample_name . # download blow5 file and the index to your local directory
```

# Basecalled sequences
To access basecalled sequencing (fastq) files:

Expand Down Expand Up @@ -90,7 +101,6 @@ aws s3 sync --no-sign-request s3://sg-nex-data/data/annotations/gtf_file . # do

## RNA modification detection
Long read direct RNA sequencing has allows the detection of RNA modification with RNA modification tools, such as [xPore](https://github.com/GoekeLab/xpore) and [m6Anet](https://github.com/GoekeLab/m6anet). To simplify the analysis of RNA modifications using the SG-Nex datasets, you can download the processed files to use with xPore and m6Anet.

To download the processed data for differential RNA modification analysis with xPore:
```bash
aws s3 ls --no-sign-request s3://sg-nex-data/data/processed_data/xpore/ # list all samples that have processed data for RNA modification detection using xPore
Expand All @@ -106,7 +116,7 @@ These files are provided for a subset of samples, please see [here](/docs/sample

# Sample and experimental data

Detailed information for each sequencing sample is provided [here](/docs/samples.tsv). The data also includes multiplexed samples which share the same fast5 files. The information about the multiplexed samples can be found [here](/docs/multiplexed_samples.tsv). The files can also be accessed directly on S3:
Detailed information for each sequencing sample is provided [here](/docs/samples.tsv). The data also includes multiplexed samples which share the same fast5/blow5 files. The information about the multiplexed samples can be found [here](/docs/multiplexed_samples.tsv). The files can also be accessed directly on S3:


```bash
Expand Down
4 changes: 2 additions & 2 deletions docs/SG-NEx_Bambu_tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Next, we will need to download the required data to run Bambu. The required data
Generally, you may want to learn how to get access to these data using the [data
access
tutorial](https://github.com/GoekeLab/sg-nex-data/blob/updated-documentation/docs/AWS_data_access_tutorial.md). Below we only show the necessary steps to download the required data. The following command requires you to have [AWS CLI](https://aws.amazon.com/cli/) installed.
tutorial](AWS_data_access_tutorial.md). Below we only show the necessary steps to download the required data. The following command requires you to have [AWS CLI](https://aws.amazon.com/cli/) installed.
``` bash
# create a directory to store the data
Expand All @@ -68,7 +68,7 @@ aws s3 sync --no-sign-request s3://sg-nex-data/data/data_tutorial/bam ./bambu_tu
You may also download the required data directly from the [SG-NEx AWS S3
bucket](http://sg-nex-data.s3-website-ap-southeast-1.amazonaws.com/) if you are unfamiliar with AWS CLI command. They are stored in the `data/data_tutorial/bam` folder.
**NOTE: We have downsampled the Hg38 genome, A549 and HepG2 samples to ensure this tutorial can be completed in 10 minutes. If you want to run Bambu on the original samples, you can find the sample name [here](https://github.com/GoekeLab/sg-nex-data/blob/updated-documentation/docs/samples.tsv) and amend it into the following code chunk:**
**NOTE: We have downsampled the Hg38 genome, A549 and HepG2 samples to ensure this tutorial can be completed in 10 minutes. If you want to run Bambu on the original samples, you can find the sample name [here](samples.tsv) and amend it into the following code chunk:**
```bash
# Note: Please make sure to replace the "sample_alias" with your sample name
Expand Down
Loading

0 comments on commit 01222dd

Please sign in to comment.
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy