[miso-users] exon-centric analysis

Yarden Katz yarden at MIT.EDU
Tue Apr 24 00:13:48 EDT 2012


Hi Mali,

Replies below:

On Apr 23, 2012, at 7:21 AM, mali salmon wrote:

> OK, I have managed to run MISO after indexing each gff file and then running MISO for each event type separately.
> Now I would like to filter the results. Using the command suggested in the manual, all the splicing events were filtered out
> "python filter_events.py --filter input_bf --num-inc 1 --num-exc 1 --num-sum-inc-exc 10 --delta-psi 0.20 --bayes-factor 10 --output-dir filtered"
> Is this too stringent for exon-centric analysis?

It depends on your goal, but it's not too stringent.  However, the filter_events.py is more of a helper utility/example for the kinds of filter you can apply than a complete solution.  For example, filter_events.py operates on a single pairwise comparison.  This script does not know how many samples you have.  For example, if you had 10 distinct samples, you might want to apply a filter across *all* of this samples, e.g. consider only exons that have at least one exclusion read (--num-exc 1) in *any* of the samples, as opposed to in a particular pairwise comparison.  So it might be worth post-processing the MISO output to reflect these kinds of filters.  But again, it depends on the downstream analyses.

> Is it a matter of coverage? Would you suggest to sequence technical replicates in order to increase the coverage?

It's important to apply coverage filters since MISO (by design) does not apply any real coverage filters, so as to not exclude data that users might be interested in.  So if you had an exon with only a couple of reads in it, and no informative (e.g. junction) reads, MISO by default would still process it.  Without proper coverage, the estimates will be unreliable, but luckily sequencing depth is so deep nowadays that one does not need to sequence a lot to get a decent genome-wide view of splicing.

Sequencing one technical replicate and seeing the variability there is always a good idea in my opinion, but overall doing biological replicates is probably more important.  It really depends on your application though.  You can also consider doing several biological replicates, assessing their variability, and if you find that they are similar, pool them together for analyses that rely on higher coverage.

> And one last issue (for now :-) how does MISO accounts for the difference in library size?

MISO does not do anything special to account for library size.  The estimates it gives are percentages, and so no absolute quantity needs to be normalized for that.  If one of your samples has deeper coverage, it's definitely true that you'll have more power to detect splicing events in that sample compared with a lower coverage one.

Best, --Yarden

> Looking forward for your reply
> Thanks
> Mali
> 
> 
> 
> On Mon, Apr 23, 2012 at 8:45 AM, mali salmon <shalmom1 at gmail.com> wrote:
> Thanks Yarden for your reply
> One more question. I have downloaded mm9 exonic-events gff file from your site. The uncompressed folder contains few gff files + ensGene.map and locuslink.map folders.
> Can you please explain the content of these two folders? which one is the indexed folder? Shall I index each gff file separately?
> I have tried to run 
> "run_events_analysis.py --compute-genes-psi mm9 input.bam --output-dir output --read-len 36" but this command yield nothing, so I suppose the problem is with the indexing (I haven't run index_gff.py, assuming the folder downloaded from your site is "ready to use")
> Thanks
> Mali
> 
> 
> 
> On Mon, Apr 23, 2012 at 1:28 AM, Yarden Katz <yarden at mit.edu> wrote:
> Hi Mali,
> 
> see replies below:
> 
> On Apr 22, 2012, at 2:33 AM, mali salmon wrote:
> 
> > Hello MISO users
> > Two questions:
> > 1. I have BAM files generated using Tophat that was given a RefSeq GFF as input. Now I would like to do an exon-centric analysis using the ensembl gff file provided in MISO.
> > Is it a problem that the mapping was done with refseq annotations and not Ensembl?
> 
> You can use any annotation with MISO -- the only requirement is that the chromosome naming schemes in your BAM file match the chromosome naming scheme in your GFF file.  See note on this here: http://genes.mit.edu/burgelab/miso/docs/#human-mouse-gene-models-for-isoform-centric-analyses
> 
> > 2. Before I'm writing it myself,  is there a script provided for generating exon-centric gff based on isoform-centric gff files?
> > Thanks
> > Mali
> >
> 
> There is not generic a script that takes isoforms and converts them into events.  There are many ways to go about generating distinct types of events (SE, AFE, ALE, ...) that require decisions to be made about what counts as SE versus A3SS, etc.
> 
> You can either write a script that makes these categorizations based on rules that you find reasonable.  If you're using Drosophila, mouse or human genomes, you can instead use the event annotations that we provide.  The mapping from isoforms to events is complex and so there are many reasonable ways of doing it -- the events we provide are just one annotation.
> 
> Hope this helps.
> 
> Best, --Yarden
> 
> 
> 
> 
> 




More information about the miso-users mailing list