GSE51741 Processing Pipeline
GSE
code_examples
4 steps
Publication
Targeted degradation of sense and antisense C9orf72 RNA foci as therapy for ALS and frontotemporal degeneration.Proceedings of the National Academy of Sciences of the United States of America (2013) — PMID 24170860
Dataset
GSE51741Targeted degradation of sense and antisense C9orf72 RNA foci as therapy for amyotrophic lateral sclerosis and frontotemporal dementia
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Illumina Casava1.8.2 software used for basecalling.
Illumina Casava v1.8.2$ Bash example
# Illumina Casava 1.8.2 is proprietary software run on Illumina sequencing instruments # for basecalling and demultiplexing. This step is performed by the sequencer itself. # No user-executable command-line tool is typically run for Casava basecalling post-sequencing. # The output of this step would be raw sequencing data in FASTQ format, which serves as input for downstream bioinformatics analysis.
-
2
Demultiplexing based on index sequences
bcl2fastq (Inferred with models/gemini-2.5-flash) v2.20.0.422 (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install bcl2fastq (example using conda) # conda install -c bioconda bcl2fastq2 # Define input and output directories and sample sheet RUN_FOLDER_DIR="/path/to/illumina/run/directory" # Directory containing BCL files OUTPUT_FASTQ_DIR="/path/to/output/fastq/directory" SAMPLE_SHEET="/path/to/sample_sheet.csv" # This file defines samples and their index sequences # Execute bcl2fastq for demultiplexing based on index sequences bcl2fastq --runfolder-dir "${RUN_FOLDER_DIR}" \ --output-dir "${OUTPUT_FASTQ_DIR}" \ --sample-sheet "${SAMPLE_SHEET}" \ --no-lane-splitting \ --barcode-mismatches 1 \ --minimum-trimmed-read-length 8 \ --mask-short-adapter-reads 8 \ --ignore-missing-bcl \ --ignore-missing-stats \ --ignore-missing-positions -
3
Sequenced reads were mapped to Refseq RNA sequences using bowtie v0.12.7 with parameters -q -e 100 -m 10 --best --strata
$ Bash example
# Install Bowtie v0.12.7 if not already installed # conda install -c bioconda bowtie=0.12.7 # Placeholder for Refseq RNA index. This index needs to be built beforehand # using bowtie-build from your Refseq RNA FASTA file (e.g., refseq_rna.fasta). # Example: bowtie-build refseq_rna.fasta refseq_rna_index # Placeholder for input sequenced reads (e.g., a FASTQ file) # Placeholder for output SAM file # Execute Bowtie mapping bowtie -q -e 100 -m 10 --best --strata refseq_rna_index reads.fastq > output.sam
-
4
Count reads for each genes
$ Bash example
# Install Subread (which includes featureCounts) # conda install -c bioconda subread # Define input and output files # Replace 'aligned_reads.bam' with your actual aligned BAM file(s) # Replace 'Homo_sapiens.GRCh38.109.gtf' with your specific gene annotation GTF file INPUT_BAM="aligned_reads.bam" GENE_ANNOTATION_GTF="Homo_sapiens.GRCh38.109.gtf" # Example: Ensembl GRCh38 release 109 OUTPUT_COUNTS="gene_counts.txt" # Execute featureCounts to count reads per gene # -a: Annotation file # -o: Output file # -F GTF: Specify annotation file format # -t exon: Feature type to count (e.g., 'exon') # -g gene_id: Attribute type to group features (e.g., 'gene_id') # -s 2: Strandedness (0=unstranded, 1=forward, 2=reverse). eCLIP often uses reverse stranded. # -T 8: Number of threads featureCounts \ -a "${GENE_ANNOTATION_GTF}" \ -o "${OUTPUT_COUNTS}" \ -F GTF \ -t exon \ -g gene_id \ -s 2 \ -T 8 \ "${INPUT_BAM}"
Raw Source Text
Illumina Casava1.8.2 software used for basecalling. Demultiplexing based on index sequences Sequenced reads were mapped to Refseq RNA sequences using bowtie v0.12.7 with parameters -q -e 100 -m 10 --best --strata Count reads for each genes Supplementary_files_format_and_content: tab-delimited text files include counts for each genes in each samples.