[miso-users] Assigning (1,1) reads in MISO

Wed Aug 28 16:08:16 EDT 2013

Hi Jake, 

See comments below.  In general, you should never have to use 'assigned_counts'.  

The "counts" field are the raw data... they involve no inference.  The "assigned_counts" field by contrast represent the assignment of reads to isoforms in the *last* iteration of the sampler.  So this is just a snap shot, and does not integrate all the information about all the iterations.  It merely records which assignment of reads to isoforms the sampler was on at the last iteration.  Since it does not include information from all considered assignments, it is not a good estimate of the Psi value, and is really only there for debugging / backwards compatibility purposes.  

On Aug 26, 2013, at 3:14 PM, Jake Yeung wrote:

> Hi all,
> 
> This is my first post, sorry if it sounds too naive. I have read the paper and the supplementary, but I still have some trouble understanding the way sample_counts in (1,1) go to sample_assigned. 
> 
> For example, for finding the PSI values for SE events in a particular sample, I have the following outputs for three events:
> event=> chr2:238427182:238427291:+ at chr2:238428553:238428672:+ at chr2:238434244:238434448:+
> counts => (0,0):1028,(0,1):10,(1,0):422,(1,1):462
> assigned_counts=> 0:756,1:138
> 334 (1,1) counts assigned to first isoform, 128 (1,1) counts assigned to second isoform.
> 
> event=>chr2:70520750:70520869:- at chr2:70516482:70516504:- at chr2:70515200:70515324:-
> counts=>(0,0):126,(0,1):4,(1,0):54,(1,1):186
> assigned_counts=>0:207,1:37
> 153 (1,1) counts assigned to first isoform, 33 (1,1) counts assigned to second isoform.
> 
> event=>chr2:122513236:122513529:+ at chr2:122514816:122515010:+ at chr2:122516286:122516382:+
> counts=>(0,0):46,(1,0):5,(1,1):27
> assigned_counts=>0:8,1:24
> 3 (1,1) counts assigned to first isoform, 24 (1,1) counts assigned to second isoform. 
> 
> I realize there may be a large number of factors that may affect the way (1,1) counts are assigned but I am a bit confused as to why for the last event, why 24 (1,1) counts were assigned to second isoform when there was no evidence for (0,1) counts in that event.
> 
> Any help with insights with factors that affect the way (1,1) counts are assigned would be most appreciated. Thank you in advance.
> 

The (1,1) reads in general are assigned based on two considerations: (1) the relative length differences between the two isoforms, and (2) the evidence for the expression of one isoform versus the other.  If you have many (0,1) reads, those will of course support isoform 2 (the short isoform, by convention).  And the reverse for isoform 1 (the longer isoform by convention).  However, a priori, longer isoforms are more likely to be sampled but shorter isoforms are more probable.  I.e. if you have a read landing in a constitutive exon, it is -- *all other things equal* -- more likely to have originated from the shorter isoform under uniform sampling assumption, since the shorter isoform has fewer starting positions.  So MISO tries to balance these factors when assigning reads to isoforms.

In your particular example, the reason for assigning reads to isoform 2 could be caused by two factors: one possibility is that, as I mentioned earlier, the assigned_counts field is just a snapshot of where the sampler last ended.  That last assignment could be overall improbable or unlikely, and it could be that better assignments were considered at other iterations (an example of why the assigned_counts field is not a good metric for inference).  The second reason is that the (1,1) reads can, depending on their relative ratio to (1,0) and (0,1) reads, support the short isoform.  Keep in mind that that the only reads that fall in the (0,1) class strictly are the exclusion junction reads in this case, and there are few of them.  If you imagine an extreme case where you have a U-shaped density of reads, with lots of reads in the flanking constitutive exons (that are common to the two isoforms), few reads in the middle exon, but also few reads in the exclusion junction, then MISO would tend to assign a low Psi value to this scenario.  The reason is that, as mentioned above, it's the over abundance of reads in the flanking exons overall supports the shorter isoforms, since it has fewer read starting positions.

In any case, I would look at the Psi values rather than the assigned_counts field -- I cannot think of any use case that should require the assigned_counts and I'll update the manual to reflect this.  If the Psi values do not make sense, it's worthwhile to explore the example further, but the assigned_counts of any given run are only a snapshot and could be unrepresentative.

Best, --Yarden

> Regards,
> 
> 
> J
> _______________________________________________
> miso-users mailing list
> miso-users at mit.edu
> http://mailman.mit.edu/mailman/listinfo/miso-users