[miso-users] questions for MISO

Yarden Katz yarden at MIT.EDU
Sat Sep 15 15:31:14 EDT 2012


Hi,

See replies below:

On Sep 12, 2012, at 2:59 PM, Jian Duke wrote:

> Hi Yarden,
> I am looking for the defferential alternative splicing events between cell types. I am new to MISO and want to ask some basic questions before I start to run MISO.
>  
> 1) I am going to install MISO and related softwares in my desktop. What is the requirements for the hardware to run MISO smoothly?  

I recommend running MISO on a cluster, not on a single machine, because the problem MISO is solving is highly parallelizable.  Each gene or event can be treated independently, which means that you can split your genes/events into chunks, and have each node in the cluster (or each processor of a node) compute a chunk in parallel.  Aside from that, there are no particular hardware requirements.  The indexing step can be memory intensive but that's about it.  

>  
> 2) My datasets of one cell type are paired-end, but the datasets of the other cell type are single-end. I am going to perform the mapping with Tophat and then use the Bam files as input to run MISO. Can I compare the paired-end data against the single-end data with MISO? Or should I just use the forward reads of the paired-end dataset for mapping with Tophat and take the output Bam as single-end input? Or do you have a better way?

It's fine to compare the final output between a paired-end and a single-end run. As far as mapping goes, you can choose to map the paired-end mates independently (forward and reverse as separate reads), which would be more like your mapping pipeline for the single-end data sets, and so this will eliminate any systematic differences between a paired end mapping to genome+junctions and a single end mapping to genome+junctions.

>  
> 3) I am going to run Tophat at the web-based Galaxy system. Below is the setting for Tophat. Tophat uses the built-in reference geneome at Galaxy. In addition, I used Illumina iGenome version of gene annotation (way to upload: Galaxy hmepage > Shared Data > Data Libraries > iGenomes > mm9 > genes.gtf) during the run. Do you think this settings will conflict against downstream steps with MISO? However, I can choose not to use the iGenome gene annontation. If I select "Use Own Junction" as No, the "Use Gene Annotation Model" will not appear and will set as No. Should I "Use Gene Annotation Model" when run Tophat?

I'm not familiar with this sytem.  I'm not sure how genes.gtf is used in Tophat.  If it's only used to lift off known junctions (i.e. junctions described in genes.gtf) then this should be fine with MISO -- if anything, it can only improve results since Tophat is likely to then discover more junction reads than it would if it was in de-novo only mode, without being supplied a set of known junctions.

Best, --Yarden

>  
> Step 2: Tophat for Illumina
> Will you select a reference genome from your history or use a built-in index?: Use a built-in index
> Select a reference genome: /galaxy/data/mm9/bowtie_index/mm9
> Is this library mate-paired?: Single-end
> TopHat settings to use: Full parameter list
> Library Type: FR Unstranded
> Anchor length (at least 3): 8
> Maximum number of mismatches that can appear in the anchor region of spliced alignment: 1
> The minimum intron length: 70
> The maximum intron length: 500000
> Allow indel search: Yes
> Max insertion length: 3
> Max deletion length: 3
> Maximum number of alignments to be allowed: 20
> Minimum intron length that may be found during split-segment (default) search: 50
> Maximum intron length that may be found during split-segment (default) search: 500000
> Number of mismatches allowed in the initial read mapping: 1
> Number of mismatches allowed in each segment alignment for reads mapped independently: 1
> Minimum length of read segments: 18
> Use Own Junctions: Yes
> Use Gene Annotation Model: Yes
> Gene Model Annotations: iGenome version of gene reference mm9
> Use Raw Junctions: No
> Only look for supplied junctions: No
> Use Closure Search: No
> Use Coverage Search: Yes
> Minimum intron length that may be found during coverage search: 50
> Maximum intron length that may be found during coverage search: 20000
> Use Microexon Search: No
>  
> Thanks in advance,
> Jian
> _______________________________________________
> miso-users mailing list
> miso-users at mit.edu
> http://mailman.mit.edu/mailman/listinfo/miso-users




More information about the miso-users mailing list