GSE146726 Processing Pipeline

RNA-Seq code_examples 4 steps

Publication

Transcriptome-wide profiles of circular RNA and RNA-binding protein interactions reveal effects on circular RNA biogenesis and cancer pathway expression.

Genome medicine (2020) — PMID 33287884

Dataset

GSE146726

RNA-Seq of circCDYL knockdown (KD) samples in HepG2, J82, and UMUC3 cells and of GRWD1, IGF2BP1, and IGF2BP2 knockdown (KD) samples in J82 and UMUC3 …

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Fastq files demultiplexed using Illuminaâs bcl2fastq v2.20.0.422

bcl2fastq v2.20.0.422 GitHub

$ Bash example

# Install bcl2fastq (example, actual installation might vary based on system and Illumina's distribution)
# This tool is typically provided by Illumina and may require specific licensing or access.
# For example, it might be available via a system package manager or direct download from Illumina.

# Example command for demultiplexing:
# Replace /path/to/illumina/run/folder with the actual path to your Illumina run directory (containing BCL files).
# Replace /path/to/output/fastq with the desired directory for the output FASTQ files.
# Replace /path/to/SampleSheet.csv with the actual path to your SampleSheet.csv file.
bcl2fastq --runfolder-dir /path/to/illumina/run/folder \
          --output-dir /path/to/output/fastq \
          --sample-sheet /path/to/SampleSheet.csv \
          --no-lane-splitting # Optional: Prevents splitting output by lane if not desired

View on GitHub

Adapters removed using Trim Galore v0.4.1

Trim Galore v0.4.1 GitHub

$ Bash example

# Install Trim Galore (requires Cutadapt and FastQC)
# conda install -c bioconda trim-galore

# Define input files (replace with actual file paths)
# Assuming paired-end reads, which is common for many assays.
# If single-end, remove the second input file and the --paired flag.
INPUT_READ1="input_read1.fastq.gz"
INPUT_READ2="input_read2.fastq.gz"

# Define output directory
OUTPUT_DIR="trimmed_reads"

# Create output directory if it doesn't exist
mkdir -p "${OUTPUT_DIR}"

# Run Trim Galore to remove adapters and perform quality trimming
# --paired: Process paired-end reads
# --illumina: Detect and remove Illumina adapters (common default)
# -o: Specify output directory
# --fastqc: Run FastQC on the trimmed files (often enabled by default or explicitly added)
trim_galore --version
trim_galore \
  --paired \
  --illumina \
  --output_dir "${OUTPUT_DIR}" \
  "${INPUT_READ1}" \
  "${INPUT_READ2}"

View on GitHub

Reads mapped using TopHat2 and Bowtie2 (version 2.1.1)

Bowtie2 v2.1.1 GitHub

$ Bash example

# Install TopHat2 and Bowtie2 (if not already installed)
# conda install -c bioconda tophat2 bowtie2=2.1.1

# Define variables
# Reference genome and GTF file are placeholders. For eCLIP, hg19 is often used.
REFERENCE_GENOME_DIR="/path/to/bowtie2_index/hg19" # Directory containing hg19.1.bt2, hg19.2.bt2, etc.
REFERENCE_GENOME_PREFIX="hg19" # Prefix for the Bowtie2 index files
GTF_FILE="/path/to/annotation/hg19.gtf" # GTF file for known transcripts (e.g., from UCSC or Ensembl)
READS_R1="reads_R1.fastq.gz" # Placeholder for forward reads
READS_R2="reads_R2.fastq.gz" # Placeholder for reverse reads (if paired-end)
OUTPUT_DIR="tophat_output"
THREADS=8 # Number of CPU threads

# Create output directory
mkdir -p "${OUTPUT_DIR}"

# Run TopHat2 for splice-aware alignment using Bowtie2
# TopHat2 version 2.0.x or later uses Bowtie2 by default.
# Parameters are based on common eCLIP workflows (e.g., yeolab/eclip).
#   -o: Output directory
#   -p: Number of threads
#   --library-type fr-firststrand: Specify library type as first-strand (common for eCLIP/RNA-seq)
#   -G: GTF file for known transcripts (improves alignment accuracy for known junctions)
#   <bowtie2_index_prefix>: The prefix for the Bowtie2 index files
#   <reads_R1> [<reads_R2>]: Input FASTQ files
tophat2 \
  -o "${OUTPUT_DIR}" \
  -p "${THREADS}" \
  --library-type fr-firststrand \
  -G "${GTF_FILE}" \
  "${REFERENCE_GENOME_DIR}/${REFERENCE_GENOME_PREFIX}" \
  "${READS_R1}" "${READS_R2}"

# The output will be in the 'tophat_output' directory, including 'accepted_hits.bam'
# which is the aligned BAM file.

View on GitHub

Read counts were estimated using HTSeq (v0.6.1p1)

HTSeq v0.6.1p GitHub

$ Bash example

# Install HTSeq (if not already installed)
# conda install -c bioconda htseq

# Define input and output files
INPUT_BAM="aligned_reads.bam" # Replace with your input alignment file (BAM format)
GENE_ANNOTATION_GTF="Homo_sapiens.GRCh38.109.gtf" # Replace with your gene annotation file (GTF format, e.g., from Ensembl or GENCODE)
OUTPUT_COUNTS_FILE="gene_read_counts.txt"

# Estimate read counts using htseq-count
# -f bam: Input file format is BAM
# -r pos: Input alignment file is sorted by position
# -s reverse: Strandedness of the library (can be 'no', 'yes', or 'reverse' - adjust as per your library prep)
# -a 10: Minimum alignment quality score (default is 10)
# -m union: Mode to handle reads overlapping multiple features (default is 'union')
htseq-count \
  -f bam \
  -r pos \
  -s reverse \
  -a 10 \
  -m union \
  "${INPUT_BAM}" \
  "${GENE_ANNOTATION_GTF}" \
  > "${OUTPUT_COUNTS_FILE}"

View on GitHub

Tools Used

Trim Galore Bowtie2

Raw Source Text

Fastq files demultiplexed using Illuminaâs bcl2fastq v2.20.0.422
Adapters removed using Trim Galore v0.4.1
Reads mapped using TopHat2 and Bowtie2 (version 2.1.1)
Read counts were estimated using HTSeq (v0.6.1p1)
Genome_build: hg19
Supplementary_files_format_and_content: tab-delimited text files include DESeq2 normalized abundance measuments

← Back to Analysis