GSE90650 Processing Pipeline
Publication
NEAT1 scaffolds RNA-binding proteins and the Microprocessor to globally enhance pri-miRNA processing.Nature structural & molecular biology (2017) — PMID 28846091
Dataset
GSE90650The LncRNA NEAT1 Nests RNA Binding Proteins and the Microprocessor to Globally Enhance Pri-miRNA Processing
Processing Steps
Generate Jupyter Notebook-
1
For each CLIP-seq read, the first 4nt index sequence was extracted and the 3'-adaptor sequence CTCGTATGCCGTCTTCTGCTTG was removed with program cutadapt (version 1.8.3).
$ Bash example
# Install cutadapt (version 1.8.3 is quite old, consider a newer version if possible) # conda install -c bioconda cutadapt=1.8.3 # Remove the 3'-adaptor sequence cutadapt -a CTCGTATGCCGTCTTCTGCTTG -o trimmed_reads.fastq.gz reads.fastq.gz
-
2
The reads with at least 18nt were mapped to the human genome (hg19) by Bowtie (version 1.1.1) with parameters '-l25 -n2 -k1 -m10 -e 200 --best --strata'.
$ Bash example
# Install Bowtie (if not already installed) # conda install -c bioconda bowtie # Ensure the hg19 index is built or available. # If not, you would build it using bowtie-build: # bowtie-build <path_to_hg19.fa> hg19 # Align reads to hg19 bowtie -l 25 -n 2 -k 1 -m 10 -e 200 --best --strata hg19 input_reads.fastq > output_aligned_reads.sam
-
3
For each small RNA-seq read, the firsst 5nt NNNTC was extracted and 3' adaptor sequence 'NNCTCGTATGCCGTCTTCTGCTTG' was removed with program cutadapt (version 1.8.3).
$ Bash example
# Install cutadapt if not already installed # conda install -c bioconda cutadapt=1.8.3 # Example usage: # Assuming input.fastq.gz is the raw small RNA-seq reads # and output.fastq.gz will be the trimmed reads. cutadapt -g NNNTC \ -a NNCTCGTATGCCGTCTTCTGCTTG \ -o output.fastq.gz \ input.fastq.gz -
4
The target insert sequences were required to be at least 16 nt.
$ Bash example
# Install fastp if not already installed # conda install -c bioconda fastp # Filter sequences to be at least 16 nt long # Assuming single-end input for simplicity. For paired-end, use --in1, --out1, --in2, --out2. fastp --in1 input.fastq.gz --out1 output.fastq.gz --length_required 16
-
5
The filtered reads were mapped to the genome genome (hg19) by Bowtie (version 1.1.1) with parameters '-n 0 -e 80 -l 18 -a -m 5 --best --strata'.
$ Bash example
# Install Bowtie (if not already installed) # conda install -c bioconda bowtie=1.1.1 # Placeholder for Bowtie index for hg19. This index needs to be built prior to alignment. # Example: bowtie-build <hg19.fa> hg19_index hg19_index="/path/to/bowtie_indexes/hg19" # Placeholder for input filtered reads file filtered_reads="filtered_reads.fastq" # Placeholder for output SAM file output_sam="mapped_reads.sam" bowtie -n 0 -e 80 -l 18 -a -m 5 --best --strata "${hg19_index}" "${filtered_reads}" > "${output_sam}" -
6
The mapped reads were used to generate the genomic coverage profiles in bigWig format with programs genomeCoverageBed (bedtools version 2.24) and samtools (version 0.1.19).
$ Bash example
# Install bedtools (version 2.24) and samtools (version 0.1.19) if not already installed # conda install -c bioconda bedtools=2.24 samtools=0.1.19 # Install UCSC tools (bedGraphToBigWig) if not already installed # wget https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedGraphToBigWig -O bedGraphToBigWig # chmod +x bedGraphToBigWig # Define input and output files INPUT_BAM="input.sorted.bam" # Placeholder: Assuming input is a sorted and indexed BAM file OUTPUT_BEDGRAPH="output.bedgraph" OUTPUT_BIGWIG="output.bigWig" CHROM_SIZES="hg38.chrom.sizes" # Placeholder: Path to chromosome sizes file (e.g., from UCSC goldenPath) # Example of how to get chrom.sizes for hg38 (if not already available) # fetchChromSizes hg38 > hg38.chrom.sizes # Generate bedGraph from BAM using genomeCoverageBed # samtools 0.1.19 might have been used for sorting/indexing the BAM prior to this step. genomeCoverageBed -ibam "$INPUT_BAM" -bg -g "$CHROM_SIZES" > "$OUTPUT_BEDGRAPH" # Convert bedGraph to bigWig # bedGraphToBigWig is a UCSC tool, commonly used for this conversion. # Its use is inferred as bigWig format is the specified output. bedGraphToBigWig "$OUTPUT_BEDGRAPH" "$CHROM_SIZES" "$OUTPUT_BIGWIG"
Raw Source Text
For each CLIP-seq read, the first 4nt index sequence was extracted and the 3'-adaptor sequence CTCGTATGCCGTCTTCTGCTTG was removed with program cutadapt (version 1.8.3). The reads with at least 18nt were mapped to the human genome (hg19) by Bowtie (version 1.1.1) with parameters '-l25 -n2 -k1 -m10 -e 200 --best --strata'. For each small RNA-seq read, the firsst 5nt NNNTC was extracted and 3' adaptor sequence 'NNCTCGTATGCCGTCTTCTGCTTG' was removed with program cutadapt (version 1.8.3). The target insert sequences were required to be at least 16 nt. The filtered reads were mapped to the genome genome (hg19) by Bowtie (version 1.1.1) with parameters '-n 0 -e 80 -l 18 -a -m 5 --best --strata'. The mapped reads were used to generate the genomic coverage profiles in bigWig format with programs genomeCoverageBed (bedtools version 2.24) and samtools (version 0.1.19). Genome_build: hg19 Supplementary_files_format_and_content: bigWig for coverage in both strands