GSE51684 Processing Pipeline
RNA-Seq
code_examples
4 steps
Publication
Targeted degradation of sense and antisense C9orf72 RNA foci as therapy for ALS and frontotemporal degeneration.Proceedings of the National Academy of Sciences of the United States of America (2013) — PMID 24170860
Dataset
GSE51684Targeted degradation of sense and antisense C9orf72 RNA foci as therapy for amyotrophic lateral sclerosis and frontotemporal dementia (Multiplex Anal…
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Illumina Casava1.8.2 software used for basecalling.
Illumina Casava v1.8.2$ Bash example
# Illumina Casava 1.8.2 was the proprietary software suite used on Illumina sequencers # for basecalling and initial demultiplexing. This step is performed by the sequencing # instrument itself, generating BCL files which are then converted to FASTQ files # using tools like bcl2fastq. There is no direct command-line execution for Casava 1.8.2 # by the user post-sequencing.
-
2
Demultiplexing based on index sequences
$ Bash example
# Install fastp if not already available # conda install -c bioconda fastp # Define input and output paths # MULTIPLEXED_FASTQ: The input FASTQ file containing reads from multiple samples, each with an in-line barcode. # BARCODE_FILE: A tab-separated file where each line contains 'barcode_sequence\tsample_name'. # Example: # GATACA\tsample_A # CGTTAG\tsample_B # OUTPUT_DIR: Directory where demultiplexed FASTQ files will be saved. # REPORT_PREFIX: Prefix for the JSON and HTML reports generated by fastp. MULTIPLEXED_FASTQ="input_multiplexed_reads.fastq.gz" BARCODE_FILE="sample_barcodes.tsv" OUTPUT_DIR="demultiplexed_fastqs" REPORT_PREFIX="demultiplexing_report" mkdir -p "${OUTPUT_DIR}" # Execute fastp for demultiplexing based on in-line index sequences (barcodes). # -i: Input FASTQ file (can be gzipped). # -o: Output FASTQ file pattern. fastp will replace '{barcode}' with the sample name from BARCODE_FILE. # --barcode_file: Specifies the file containing barcode sequences and corresponding sample names. # --json, --html: Generate detailed reports in JSON and HTML formats. # --thread: Number of threads to use for processing. fastp \ -i "${MULTIPLEXED_FASTQ}" \ -o "${OUTPUT_DIR}/{barcode}.fastq.gz" \ --barcode_file "${BARCODE_FILE}" \ --json "${REPORT_PREFIX}.json" \ --html "${REPORT_PREFIX}.html" \ --thread 8 -
3
Sequenced reads were mapped to Refseq RNA sequences using bowtie v0.12.7 with parameters -q -e 100 -m 10 --best --strata
$ Bash example
# Install Bowtie (example using conda) # conda install -c bioconda bowtie=0.12.7 # Assuming 'refseq_rna_index' is the basename for the Bowtie index files # and 'reads.fastq' is the input sequenced reads file. # The output will be a SAM file redirected to 'output.sam'. bowtie -q -e 100 -m 10 --best --strata refseq_rna_index reads.fastq > output.sam
-
4
Count reads for each genes
$ Bash example
# Install Subread (which includes featureCounts) if not already installed # conda install -c bioconda subread # Define input and output files INPUT_BAM="aligned_reads.bam" # Replace with your actual aligned BAM file GENE_ANNOTATION_GTF="Homo_sapiens.GRCh38.109.gtf" # Replace with your actual GTF file (e.g., from Ensembl or GENCODE) OUTPUT_COUNTS_FILE="gene_counts.txt" NUM_THREADS=8 # Adjust as needed # Count reads for each gene using featureCounts # -a: Annotation file (GTF/GFF format) # -o: Output file for read counts # -F GTF: Specify that the annotation file is in GTF format # -t exon: Specify feature type to count (e.g., "exon") # -g gene_id: Specify attribute to group features by (e.g., "gene_id") # -s 2: Reverse stranded library (common for eCLIP assays) # -T: Number of threads # --primary: Only count primary alignments (useful for multi-mapping reads) featureCounts -a "${GENE_ANNOTATION_GTF}" \ -o "${OUTPUT_COUNTS_FILE}" \ -F GTF \ -t exon \ -g gene_id \ -s 2 \ -T "${NUM_THREADS}" \ --primary \ "${INPUT_BAM}"
Raw Source Text
Illumina Casava1.8.2 software used for basecalling. Demultiplexing based on index sequences Sequenced reads were mapped to Refseq RNA sequences using bowtie v0.12.7 with parameters -q -e 100 -m 10 --best --strata Count reads for each genes Supplementary_files_format_and_content: tab-delimited text files include counts for each genes in each samples.