GSE137810 Processing Pipeline

GSE code_examples 8 steps

Publication

Investigational eIF2B activator DNL343 modulates the integrated stress response in preclinical models of TDP-43 pathology and individuals with ALS in a randomized clinical trial.

Nature communications (2025) — PMID 40825784

Dataset

GSE137810

NYGC ALS Consortium data

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

fastq Illumina RNASeq paired-end reads were aligned using STAR (2.5.2a).

STAR v2.5.2a GitHub

$ Bash example

# Install STAR (example using conda)
# conda create -n star_env star=2.5.2a -c bioconda -c conda-forge
# conda activate star_env

# Build STAR index (example for human hg38 genome and Gencode annotation)
# mkdir -p /path/to/STAR_index_hg38
# STAR --runMode genomeGenerate \
#      --genomeDir /path/to/STAR_index_hg38 \
#      --genomeFastaFiles /path/to/hg38.fa \
#      --sjdbGTFfile /path/to/gencode.vXX.annotation.gtf \
#      --runThreadN 8 # Adjust number of threads as needed

# Align paired-end RNA-Seq reads using STAR
# Reference genome: hg38 (GRCh38) - placeholder, replace with actual path
# Input files: sample_R1.fastq.gz, sample_R2.fastq.gz - placeholder, replace with actual file names
# Output prefix: sample_aligned_ - placeholder, replace with desired prefix

STAR --genomeDir /path/to/STAR_index_hg38 \
     --readFilesIn sample_R1.fastq.gz sample_R2.fastq.gz \
     --readFilesCommand zcat \
     --runThreadN 8 \
     --outFileNamePrefix sample_aligned_ \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMattributes All \
     --outFilterType BySJout \
     --outFilterMultimapNmax 20 \
     --alignSJDBoverhangMin 1 \
     --alignSJoverhangMin 8 \
     --alignIntronMin 20 \
     --alignIntronMax 1000000 \
     --alignMatesGapMax 1000000 \
     --sjdbScore 1 \
     --quantMode GeneCounts \
     --limitBAMsortRAM 30000000000 # Adjust RAM limit (e.g., 30GB) as needed

View on GitHub

We used Leafcutter (0.2.6) to perform differential splicing analyses of each of four defined groups of ALS cases (ALSlow, ALShigh, C9, FTD grouped samples for a total of 54 samples referred to as the training set) against the Control group (all groups are described in: link to manuscript here).

Leafcutter v0.2.6 GitHub

$ Bash example

# Install Leafcutter (example using conda)
# conda create -n leafcutter_env python=3 r-base r-devtools
# conda activate leafcutter_env
# conda install -c bioconda leafcutter

# --- Prepare input files (conceptual, replace with actual paths and sample names) ---

# Create a dummy directory for junction files
mkdir -p junctions

# Define sample counts for each group based on the description (54 ALS cases + Control group)
NUM_ALS_LOW=15
NUM_ALS_HIGH=15
NUM_C9=12
NUM_FTD=12
NUM_CONTROL=20 # Placeholder for the Control group size

# Create juncfiles.txt and groups_all.txt
# In a real scenario, these files would be generated from your experimental data.
# juncfiles.txt lists paths to junction files (e.g., from STAR aligner).
# groups_all.txt maps sample IDs to their respective groups.

> juncfiles.txt
> groups_all.txt # A master groups file for initial processing

# Generate dummy ALS samples (total 54)
for i in $(seq 1 $NUM_ALS_LOW); do
    SAMPLE_ID="ALSlow_sample${i}"
    echo "junctions/${SAMPLE_ID}.junc" >> juncfiles.txt
    echo -e "${SAMPLE_ID}\tALSlow" >> groups_all.txt
done

for i in $(seq 1 $NUM_ALS_HIGH); do
    SAMPLE_ID="ALShigh_sample${i}"
    echo "junctions/${SAMPLE_ID}.junc" >> juncfiles.txt
    echo -e "${SAMPLE_ID}\tALShigh" >> groups_all.txt
done

for i in $(seq 1 $NUM_C9); do
    SAMPLE_ID="C9_sample${i}"
    echo "junctions/${SAMPLE_ID}.junc" >> juncfiles.txt
    echo -e "${SAMPLE_ID}\tC9" >> groups_all.txt
done

for i in $(seq 1 $NUM_FTD); do
    SAMPLE_ID="FTD_sample${i}"
    echo "junctions/${SAMPLE_ID}.junc" >> juncfiles.txt
    echo -e "${SAMPLE_ID}\tFTD" >> groups_all.txt
done

# Generate dummy Control samples
for i in $(seq 1 $NUM_CONTROL); do
    SAMPLE_ID="Control_sample${i}"
    echo "junctions/${SAMPLE_ID}.junc" >> juncfiles.txt
    echo -e "${SAMPLE_ID}\tControl" >> groups_all.txt
done

# --- Leafcutter analysis ---

# Reference genome (e.g., hg38) is implicitly used by the upstream alignment step that generates .junc files.
# Leafcutter itself does not directly use a FASTA file.

# 1. Extract junctions and convert to counts
# This step takes the list of junction files and aggregates them into a counts matrix.
# It also performs filtering based on minimum reads and samples. Default parameters are often used.
leafcutter_junc2counts.py -j juncfiles.txt -o leafcutter_counts

# The output will be leafcutter_counts_perind.counts.gz and leafcutter_counts_perind.clusters.gz

# 2. Perform differential splicing analysis for each ALS group against the Control group

# Define the output directory
mkdir -p leafcutter_results

# Comparison 1: ALSlow vs Control
echo "Running differential splicing for ALSlow vs Control..."
grep -E "ALSlow|Control" groups_all.txt > groups_ALSlow_vs_Control.txt
leafcutter_ds.R \
    --counts leafcutter_counts_perind.counts.gz \
    --groups groups_ALSlow_vs_Control.txt \
    --output_prefix leafcutter_results/ALSlow_vs_Control \
    --num_threads 8 \
    # --exon_file <path_to_exon_file.txt.gz> # Optional: provide an exon file for annotation (e.g., from gencode)

# Comparison 2: ALShigh vs Control
echo "Running differential splicing for ALShigh vs Control..."
grep -E "ALShigh|Control" groups_all.txt > groups_ALShigh_vs_Control.txt
leafcutter_ds.R \
    --counts leafcutter_counts_perind.counts.gz \
    --groups groups_ALShigh_vs_Control.txt \
    --output_prefix leafcutter_results/ALShigh_vs_Control \
    --num_threads 8 \
    # --exon_file <path_to_exon_file.txt.gz>

# Comparison 3: C9 vs Control
echo "Running differential splicing for C9 vs Control..."
grep -E "C9|Control" groups_all.txt > groups_C9_vs_Control.txt
leafcutter_ds.R \
    --counts leafcutter_counts_perind.counts.gz \
    --groups groups_C9_vs_Control.txt \
    --output_prefix leafcutter_results/C9_vs_Control \
    --num_threads 8 \
    # --exon_file <path_to_exon_file.txt.gz>

# Comparison 4: FTD vs Control
echo "Running differential splicing for FTD vs Control..."
grep -E "FTD|Control" groups_all.txt > groups_FTD_vs_Control.txt
leafcutter_ds.R \
    --counts leafcutter_counts_perind.counts.gz \
    --groups groups_FTD_vs_Control.txt \
    --output_prefix leafcutter_results/FTD_vs_Control \
    --num_threads 8 \
    # --exon_file <path_to_exon_file.txt.gz>

# Clean up temporary groups files
rm groups_ALSlow_vs_Control.txt groups_ALShigh_vs_Control.txt groups_C9_vs_Control.txt groups_FTD_vs_Control.txt

View on GitHub

In each differential splicing analysis, Leafcutter outputs a file listing the set of cluster padj values, and a second file listing the splice junctions that reside within those clusters and their delta PSIs as a result of the differential analysis.

Leafcutter v0.2.9 GitHub

$ Bash example

# Install Leafcutter (example using conda)
# conda create -n leafcutter_env python=3.8
# conda activate leafcutter_env
# pip install leafcutter

# Example input files (placeholders):
# junctions.counts.gz: This file would typically be generated by leafcutter_cluster.py
# from aligned RNA-seq data. It contains junction counts for each sample.
# The first column is the junction ID (e.g., chr:start:end:strand:cluster_id), 
# and subsequent columns are counts for each sample, with sample IDs in the header.
#
# phenotype_file.txt: A tab-separated file mapping sample IDs to their experimental groups.
# The first column is the sample ID, and the second column is the group.
# Example content:
# sample1\tcontrol
# sample2\tcontrol
# sample3\ttreatment
# sample4\ttreatment

# Create dummy input files for demonstration purposes.
# In a real pipeline, these would be actual data files generated from upstream steps.

# Create dummy junctions.counts.gz
echo -e "junction_id\tsample1\tsample2\tsample3\tsample4" > junctions.counts.gz
echo -e "chr1:1000:1100:+\:clu_1\t100\t120\t50\t60" >> junctions.counts.gz
echo -e "chr1:1050:1150:+\:clu_1\t50\t60\t100\t110" >> junctions.counts.gz
echo -e "chr2:2000:2100:-\:clu_2\t200\t210\t180\t190" >> junctions.counts.gz
echo -e "chr2:2050:2150:-\:clu_2\t80\t90\t120\t130" >> junctions.counts.gz

# Create dummy phenotype_file.txt
echo -e "sample1\tcontrol" > phenotype_file.txt
echo -e "sample2\tcontrol" >> phenotype_file.txt
echo -e "sample3\ttreatment" >> phenotype_file.txt
echo -e "sample4\ttreatment" >> phenotype_file.txt

# Run Leafcutter differential splicing analysis
# -j: Path to the junction counts file (e.g., output from leafcutter_cluster.py)
# -m: Path to the metadata/phenotype file
# -o: Output prefix for the results files
# The description mentions "cluster padj values" and "splice junctions that reside within those clusters and their delta PSIs".
# Leafcutter's `leafcutter_ds.py` script outputs these directly:
#   - <output_prefix>_ds_clusters.txt.gz: Contains differential splicing results per cluster, including padj values.
#   - <output_prefix>_ds_junctions.txt.gz: Contains delta PSI values for individual junctions within differentially spliced clusters.

python leafcutter_ds.py -j junctions.counts.gz -m phenotype_file.txt -o differential_splicing_results

View on GitHub

Clusters with a padj value <0.1 were selected for further analysis; we aggregated splice junctions and their corresponding delta PSIs across the 4 analyses into a single matrix.

Custom Script (Inferred with models/gemini-2.5-flash) vN/A GitHub

$ Bash example

# This script filters input files based on a padj threshold and then aggregates
# the delta PSI values for selected splice junctions across multiple analyses
# into a single matrix.

# Assume input files are named 'analysis1_results.tsv', 'analysis2_results.tsv', etc.
# Each file is expected to be tab-separated and contain at least 'JunctionID', 'DeltaPSI', and 'padj' columns.

# Example dummy data creation (for demonstration purposes)
# These files would typically be outputs from a differential splicing analysis tool like rMATS or LeafCutter.
echo -e "JunctionID\tDeltaPSI\tPValue\tpadj\tOther" > analysis1_results.tsv
echo -e "J1\t0.1\t0.001\t0.005\tdata_a1" >> analysis1_results.tsv
echo -e "J2\t0.2\t0.05\t0.08\tdata_a1" >> analysis1_results.tsv
echo -e "J3\t0.05\t0.1\t0.15\tdata_a1" >> analysis1_results.tsv
echo -e "J4\t0.3\t0.002\t0.003\tdata_a1" >> analysis1_results.tsv

echo -e "JunctionID\tDeltaPSI\tPValue\tpadj\tOther" > analysis2_results.tsv
echo -e "J1\t0.15\t0.002\t0.008\tdata_a2" >> analysis2_results.tsv
echo -e "J4\t0.35\t0.01\t0.02\tdata_a2" >> analysis2_results.tsv
echo -e "J5\t0.08\t0.08\t0.12\tdata_a2" >> analysis2_results.tsv
echo -e "J6\t0.25\t0.005\t0.007\tdata_a2" >> analysis2_results.tsv

echo -e "JunctionID\tDeltaPSI\tPValue\tpadj\tOther" > analysis3_results.tsv
echo -e "J1\t0.12\t0.003\t0.006\tdata_a3" >> analysis3_results.tsv
echo -e "J2\t0.18\t0.04\t0.07\tdata_a3" >> analysis3_results.tsv
echo -e "J7\t0.4\t0.001\t0.002\tdata_a3" >> analysis3_results.tsv

echo -e "JunctionID\tDeltaPSI\tPValue\tpadj\tOther" > analysis4_results.tsv
echo -e "J1\t0.11\t0.004\t0.009\tdata_a4" >> analysis4_results.tsv
echo -e "J4\t0.32\t0.008\t0.015\tdata_a4" >> analysis4_results.tsv
echo -e "J8\t0.28\t0.003\t0.004\tdata_a4" >> analysis4_results.tsv

# Python script for filtering and aggregation
# This script uses the pandas library for data manipulation.
# conda install -c conda-forge pandas
python3 -c "
import pandas as pd
import glob

input_files = sorted(glob.glob('analysis*_results.tsv'))
padj_threshold = 0.1
output_matrix_file = 'aggregated_delta_psis.tsv'

all_filtered_data = []
for i, file_path in enumerate(input_files):
    df = pd.read_csv(file_path, sep='\t')
    # Filter for padj < threshold
    filtered_df = df[df['padj'] < padj_threshold]
    # Select JunctionID and DeltaPSI, rename DeltaPSI column for aggregation
    filtered_df = filtered_df[['JunctionID', 'DeltaPSI']].rename(columns={'DeltaPSI': f'DeltaPSI_analysis{i+1}'})
    all_filtered_data.append(filtered_df)

# Merge all filtered dataframes on JunctionID
if all_filtered_data:
    # Start with the first dataframe
    merged_df = all_filtered_data[0]
    # Merge subsequent dataframes using an outer join to keep all junctions present in at least one filtered analysis
    for i in range(1, len(all_filtered_data)):
        merged_df = pd.merge(merged_df, all_filtered_data[i], on='JunctionID', how='outer')
    
    # Fill NaN values (where a junction was not significant in a particular analysis) with 0
    merged_df = merged_df.fillna(0)
    
    merged_df.to_csv(output_matrix_file, sep='\t', index=False)
    print(f'Aggregated matrix saved to {output_matrix_file}')
else:
    print('No data found after filtering.')
"

View on GitHub

Differentially spliced events are ordered by the max delta PSI across the 4 analyses (file: sig.junctions.padj01.sorted_by_max_deltapsi_Conlon_et_al_2018.txt).

rMATS (Inferred with models/gemini-2.5-flash) vv3.2.5

$ Bash example

# rMATS installation (example using conda)
# conda create -n rmats python=3.7
# conda activate rmats
# conda install -c bioconda rmats

# --- rMATS Execution (to generate differentially spliced events and delta PSI values) ---
# This command is an example for rMATS v3.2.5, which was used in Conlon et al. 2018.
# Replace 'b1.txt' and 'b2.txt' with actual paths to files listing BAM files for your two conditions.
# Each line in b1.txt/b2.txt should be a path to a BAM file.
# Example for b1.txt:
# /path/to/condition1_replicate1.bam
# /path/to/condition1_replicate2.bam

# Reference GTF annotation file (e.g., GENCODE v38)
# Download example: wget -P /path/to/references/ ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/gencode.v38.annotation.gtf.gz
# gunzip /path/to/references/gencode.v38.annotation.gtf.gz
GTF_FILE="/path/to/references/gencode.v38.annotation.gtf"

# Output directory for rMATS results
OUTPUT_DIR="rmats_output"
mkdir -p "${OUTPUT_DIR}"

# Execute rMATS
# python /path/to/rMATS.py \
#    --b1 b1.txt \
#    --b2 b2.txt \
#    --gtf "${GTF_FILE}" \
#    --od "${OUTPUT_DIR}" \
#    --tmp "${OUTPUT_DIR}/tmp" \
#    -t paired \
#    --readLength 100 \
#    --nthread 8 \
#    --libType fr-firststrand \
#    --task junction

# --- Post-processing and Sorting ---
# The description implies that differentially spliced events have been identified,
# filtered by adjusted p-value (padj < 0.01), and a 'max delta PSI' value has been calculated
# across 4 analyses (which might involve combining results from multiple rMATS runs or comparisons).
# This typically involves custom scripting (e.g., Python or R) to:
# 1. Parse rMATS output files (e.g., SE.MATS.JunctionCountOnly.txt, MXE.MATS.JunctionCountOnly.txt).
# 2. Combine different event types into a single file.
# 3. Filter events by FDR (adjusted p-value) < 0.01.
# 4. If multiple rMATS runs (e.g., 4 analyses) were performed, combine results and
#    calculate the maximum delta PSI for each event across these runs.
# 5. Generate an intermediate file, e.g., 'sig.junctions.padj01.txt',
#    which contains the events and their 'max_delta_psi' values.

# For demonstration, let's assume 'sig.junctions.padj01.txt' is the input file
# and the 'max_delta_psi' is in a specific column (e.g., column 17, common for IncLevelDifference).
INPUT_FILE="sig.junctions.padj01.txt"
OUTPUT_FILE="sig.junctions.padj01.sorted_by_max_deltapsi_Conlon_et_al_2018.txt"
DELTA_PSI_COLUMN=17 # Adjust this column index based on your actual file format

# Preserve the header and sort the rest of the file by the 'max_delta_psi' column
# in numerical, reverse (descending) order.
head -n 1 "${INPUT_FILE}" > "${OUTPUT_FILE}"
tail -n +2 "${INPUT_FILE}" | sort -k"${DELTA_PSI_COLUMN}","${DELTA_PSI_COLUMN}"nr >> "${OUTPUT_FILE}"

Splice junction coordinates are intersected with Gencode release 25 annotations.

GENCODE v2.30.0 GitHub

$ Bash example

# Define input and output files
# Replace 'splice_junctions.bed' with the actual path to your splice junction coordinates file.
SPLICE_JUNCTIONS_BED="splice_junctions.bed"
# Replace 'gencode.v25.annotation.bed' with the actual path to your Gencode release 25 annotations in BED format.
# If starting from GTF, you would first need to download and convert it:
# # Download Gencode v25 GTF:
# # wget -O gencode.v25.annotation.gtf.gz ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_25/gencode.v25.annotation.gtf.gz
# # gunzip gencode.v25.annotation.gtf.gz
# # Convert GTF to BED (e.g., using bedops gtf2bed or a custom script):
# # gtf2bed < gencode.v25.annotation.gtf > gencode.v25.annotation.bed
GENCODE_ANNOTATIONS_BED="gencode.v25.annotation.bed"
INTERSECTED_OUTPUT="intersected_junctions.bed"

# Ensure bedtools is installed (e.g., via conda)
# conda install -c bioconda bedtools

# Intersect splice junction coordinates with Gencode release 25 annotations
# It is good practice to sort both input BED files before intersection for optimal performance.
# Example: sort -k1,1 -k2,2n "$SPLICE_JUNCTIONS_BED" > "${SPLICE_JUNCTIONS_BED}.sorted"
# Example: sort -k1,1 -k2,2n "$GENCODE_ANNOTATIONS_BED" > "${GENCODE_ANNOTATIONS_BED}.sorted"
# Then use the sorted files in the command below.
bedtools intersect -a "$SPLICE_JUNCTIONS_BED" -b "$GENCODE_ANNOTATIONS_BED" > "$INTERSECTED_OUTPUT"

View on GitHub

PSI ratios across all samples in Conlon et al.

MISO (Mixture of Isoforms) (Inferred with models/gemini-2.5-flash) v0.5.3 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install MISO (if not already installed)
# pip install MISO

# Define variables
MISO_ANNOTATIONS_DIR="/path/to/miso_annotations/hg38_events" # Placeholder: Path to pre-built MISO alternative event annotations (e.g., from UCSC hg38 GTF)
READ_LENGTH=75 # Placeholder: Adjust based on your sequencing data read length

# Placeholder: List of all input BAM files (aligned reads)
# Replace with actual sample BAM files, e.g., "sample1.bam sample2.bam control1.bam control2.bam"
SAMPLE_BAM_FILES="/path/to/sample1.bam /path/to/sample2.bam /path/to/control1.bam"

OUTPUT_BASE_DIR="./miso_output"
mkdir -p "$OUTPUT_BASE_DIR"

MISO_OUTPUT_DIRS=""

# 1. Calculate PSI values for each individual sample
# This step generates .miso output files for each sample.
for bam_file in $SAMPLE_BAM_FILES; do
    sample_name=$(basename "$bam_file" .bam)
    sample_output_dir="$OUTPUT_BASE_DIR/$sample_name"
    mkdir -p "$sample_output_dir"
    echo "Running MISO for $sample_name..."
    python -m miso.run_miso --run "$MISO_ANNOTATIONS_DIR" "$bam_file" --output-dir "$sample_output_dir" --read-len "$READ_LENGTH"
    MISO_OUTPUT_DIRS+=" $sample_output_dir"
done

# 2. Summarize MISO output (optional, but useful for an overview of PSIs across samples)
SUMMARY_OUTPUT_DIR="$OUTPUT_BASE_DIR/summary"
mkdir -p "$SUMMARY_OUTPUT_DIR"
echo "Summarizing MISO output..."
python -m miso.summarize_miso --summarize-samples $MISO_OUTPUT_DIRS --output-dir "$SUMMARY_OUTPUT_DIR"

# 3. Calculate PSI ratios (differential PSI) between samples
# The description "PSI ratios across all samples" implies comparisons.
# This example shows a pairwise comparison. For a comprehensive analysis across all samples,
# you might perform multiple pairwise comparisons or use custom scripts to parse the summarized data.

# Placeholder: Select two sample output directories for comparison
# Replace with actual sample directories, e.g., "$OUTPUT_BASE_DIR/sample1" and "$OUTPUT_BASE_DIR/control1"
SAMPLE_A_MISO_DIR="$(echo $MISO_OUTPUT_DIRS | awk '{print $1}')" # First sample in the list
SAMPLE_B_MISO_DIR="$(echo $MISO_OUTPUT_DIRS | awk '{print $2}')" # Second sample in the list

if [ -n "$SAMPLE_A_MISO_DIR" ] && [ -n "$SAMPLE_B_MISO_DIR" ]; then
    COMPARISON_OUTPUT_DIR="$OUTPUT_BASE_DIR/$(basename $SAMPLE_A_MISO_DIR)_vs_$(basename $SAMPLE_B_MISO_DIR)_comparison"
    mkdir -p "$COMPARISON_OUTPUT_DIR"
    echo "Comparing $(basename $SAMPLE_A_MISO_DIR) vs $(basename $SAMPLE_B_MISO_DIR)..."
    python -m miso.compare_miso --compare-samples "$SAMPLE_A_MISO_DIR" "$SAMPLE_B_MISO_DIR" --output-dir "$COMPARISON_OUTPUT_DIR"
else
    echo "Not enough samples to perform a comparison. Please provide at least two BAM files."
fi

View on GitHub

2018 (77 training and test samples, see link to manuscript here) were computed using the leafcutter_quantify_psi.R script starting from the splice junction counts (see https://github.com/davidaknowles/leafcutter/issues/34)

Leafcutter v2018 GitHub

$ Bash example

# Install Leafcutter (if not already installed)
# Note: Leafcutter requires R and several R packages. It's often installed via Bioconda or directly from GitHub.
# conda install -c bioconda leafcutter

# Create dummy input files/directories for demonstration
# In a real scenario, 'junction_files_dir' would contain multiple .junc files (e.g., from STAR alignment output)
# and 'clusters.txt' would be generated by leafcutter_cluster.py.
mkdir -p junction_files_dir
echo "chr1:1000:1050:+\t10\t20\t30" > junction_files_dir/sample1.junc
echo "chr1:1000:1050:+\t15\t25\t35" > junction_files_dir/sample2.junc
echo "chr1:1000:1050:+\tcluster_1" > clusters.txt

# Run leafcutter_quantify_psi.R script
# This script quantifies PSI (Percent Spliced In) values for each cluster.
# It takes a directory of splice junction count files and a clusters file as input.
# The '--output_dir' parameter specifies where the results will be saved.
# The '--num_threads' parameter can be used for parallel processing.

Rscript leafcutter_quantify_psi.R \
  --output_dir psi_output \
  --num_threads 8 \
  junction_files_dir \
  clusters.txt

View on GitHub

Tools Used

STAR

Raw Source Text

fastq Illumina RNASeq paired-end reads were aligned using STAR (2.5.2a).
We used Leafcutter (0.2.6) to perform differential splicing analyses of each of four defined groups of ALS cases (ALSlow, ALShigh, C9, FTD grouped samples for a total of 54 samples referred to as the training set) against the Control group (all groups are described in: link to manuscript here). In each differential splicing analysis, Leafcutter outputs a file listing the set of cluster padj values, and a second file listing the splice junctions that reside within those clusters and their delta PSIs as a result of the differential analysis.
Clusters with a padj value <0.1 were selected for further analysis; we aggregated splice junctions and their corresponding delta PSIs across the 4 analyses into a single matrix. Differentially spliced events are ordered by the max delta PSI across the 4 analyses (file: sig.junctions.padj01.sorted_by_max_deltapsi_Conlon_et_al_2018.txt).
Splice junction coordinates are intersected with Gencode release 25 annotations.
PSI ratios across all samples in Conlon et al. 2018 (77 training and test samples, see link to manuscript here) were computed using the leafcutter_quantify_psi.R  script starting from the splice junction counts (see https://github.com/davidaknowles/leafcutter/issues/34)
Genome_build: GRCh38
Supplementary_files_format_and_content: sig.junctions.padj01.sorted_by_max_deltapsi_Conlon_et_al_2018.txt : Differentially spliced events, coordinates and gene names are ordered by the max delta PSI across the 4 analyses of 54 ALS and Control samples.
Supplementary_files_format_and_content: DS_leafcutter.ratios_Conlon_et_al_2018.txt : PSI ratios across all samples (77 training and test samples in Conlon et al. 2018, link to manuscript here) were computed using the leafcutter_quantify_psi.R  script starting from the splice junction counts (see https://github.com/davidaknowles/leafcutter/issues/34)

← Back to Analysis