GSE51741 Processing Pipeline

GSE code_examples 4 steps

Publication

Targeted degradation of sense and antisense C9orf72 RNA foci as therapy for ALS and frontotemporal degeneration.

Proceedings of the National Academy of Sciences of the United States of America (2013) — PMID 24170860

Dataset

Targeted degradation of sense and antisense C9orf72 RNA foci as therapy for amyotrophic lateral sclerosis and frontotemporal dementia

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Illumina Casava1.8.2 software used for basecalling.

Illumina Casava v1.8.2

$ Bash example

# Illumina Casava 1.8.2 is proprietary software run on Illumina sequencing instruments
# for basecalling and demultiplexing. This step is performed by the sequencer itself.
# No user-executable command-line tool is typically run for Casava basecalling post-sequencing.
# The output of this step would be raw sequencing data in FASTQ format, which serves as input for downstream bioinformatics analysis.

Demultiplexing based on index sequences

bcl2fastq (Inferred with models/gemini-2.5-flash) v2.20.0.422 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install bcl2fastq (example using conda)
# conda install -c bioconda bcl2fastq2

# Define input and output directories and sample sheet
RUN_FOLDER_DIR="/path/to/illumina/run/directory" # Directory containing BCL files
OUTPUT_FASTQ_DIR="/path/to/output/fastq/directory"
SAMPLE_SHEET="/path/to/sample_sheet.csv" # This file defines samples and their index sequences

# Execute bcl2fastq for demultiplexing based on index sequences
bcl2fastq --runfolder-dir "${RUN_FOLDER_DIR}" \
          --output-dir "${OUTPUT_FASTQ_DIR}" \
          --sample-sheet "${SAMPLE_SHEET}" \
          --no-lane-splitting \
          --barcode-mismatches 1 \
          --minimum-trimmed-read-length 8 \
          --mask-short-adapter-reads 8 \
          --ignore-missing-bcl \
          --ignore-missing-stats \
          --ignore-missing-positions

View on GitHub

Sequenced reads were mapped to Refseq RNA sequences using bowtie v0.12.7 with parameters -q -e 100 -m 10 --best --strata

Bowtie v0.12.7 GitHub

$ Bash example

# Install Bowtie v0.12.7 if not already installed
# conda install -c bioconda bowtie=0.12.7

# Placeholder for Refseq RNA index. This index needs to be built beforehand
# using bowtie-build from your Refseq RNA FASTA file (e.g., refseq_rna.fasta).
# Example: bowtie-build refseq_rna.fasta refseq_rna_index

# Placeholder for input sequenced reads (e.g., a FASTQ file)
# Placeholder for output SAM file

# Execute Bowtie mapping
bowtie -q -e 100 -m 10 --best --strata refseq_rna_index reads.fastq > output.sam

View on GitHub

Count reads for each genes

featureCounts (Inferred with models/gemini-2.5-flash) v2.0.1 GitHub

$ Bash example

# Install Subread (which includes featureCounts)
# conda install -c bioconda subread

# Define input and output files
# Replace 'aligned_reads.bam' with your actual aligned BAM file(s)
# Replace 'Homo_sapiens.GRCh38.109.gtf' with your specific gene annotation GTF file
INPUT_BAM="aligned_reads.bam"
GENE_ANNOTATION_GTF="Homo_sapiens.GRCh38.109.gtf" # Example: Ensembl GRCh38 release 109
OUTPUT_COUNTS="gene_counts.txt"

# Execute featureCounts to count reads per gene
# -a: Annotation file
# -o: Output file
# -F GTF: Specify annotation file format
# -t exon: Feature type to count (e.g., 'exon')
# -g gene_id: Attribute type to group features (e.g., 'gene_id')
# -s 2: Strandedness (0=unstranded, 1=forward, 2=reverse). eCLIP often uses reverse stranded.
# -T 8: Number of threads
featureCounts \
  -a "${GENE_ANNOTATION_GTF}" \
  -o "${OUTPUT_COUNTS}" \
  -F GTF \
  -t exon \
  -g gene_id \
  -s 2 \
  -T 8 \
  "${INPUT_BAM}"

View on GitHub

Raw Source Text

Illumina Casava1.8.2 software used for basecalling.
Demultiplexing based on index sequences
Sequenced reads were mapped to Refseq RNA sequences using bowtie v0.12.7 with parameters -q -e 100 -m 10 --best --strata
Count reads for each genes
Supplementary_files_format_and_content: tab-delimited text files include counts for each genes in each samples.

← Back to Analysis