[miso-users] MISO questions

Thu Feb 28 20:36:44 EST 2013

Hi,

I'm including responses below, and CCing the mailing list since this is probably of wider interest:

On Feb 28, 2013, at 11:07 AM, Li, Robert wrote:

> Dear Dr. Katz,
>  
> I am running MISO using the Isoform-centric model. I try to figure out an easy way to deal with the replicate issue. I attempt to use what you suggested in FAQ regarding “ANOVA-like” way. I have a few questions.
>  
> 1. How are “assigned counts” derived from in the following case?
> counts
> assigned_counts
> (0,0):35,(1,1):4
> 0:1,1:3

The counts that are given in the form (vector):count (i.e. the "counts" fields) are the raw data, and involve no inference. For example, counts of the form:

> (0,0):35,(1,1):4

Means that 35 of the reads could not be assigned (either because they violate overhang, have no read pair in the case of paired-end, or are inconsistent with the annotated event), and 4 of the reads are consistent with both isoforms. Note that since the "(1,0)" and "(0,1)" classes are missing, this is probably not a reliable event in your sample.

The "assigned_counts" field by contrast are a snap shot of the inference procedure: it tells you what the assignment of reads to the (in this case two) isoforms was when the algorithm finished.  At each step of the MISO inference algorithm, it considers various assignments of reads to isoform 1 and isoform 2, in proportion to their probability.  So this was just the assignment when the algorithm finished.

The best estimate would of course take into account all possible assignments, but I did not want to report fractional read counts in the file (see more below).

>  
> 2.  What would you suggest to use as transcript-specific read counts: assigned counts or transcript unique counts? What would be a better way for count normalization?
>  
> counts
> assigned_counts
> (0,0):587,(0,1):16,(1,0):947,(1,1):998
> 0:1631,1:330
>  
> The unique counts for the 1st transcript in the above case would be (1,0):947 while its assigned counts would 1631.

The most reliable estimate of the counts per isoform is one that uses the Psi values, since the Psi values reflect the best estimate of isoform level expression obtained from the sampling algorithm.

In you case, there are 16 + 947 + 998 = 1961 total reads used by MISO (587 reads in this event were excluded.)  So the best estimate is to say that (Psi*1961) of the reads are assigned to isoform 1, and the remainder to isoform 2, i.e. (1-Psi)*1961 reads are assigned to isoform 2.  

>  
> 3. I understand MISO output counts for isoforms “in the order in which their mRNA entries appear in the GFF file”. Can we easily convert count output with specific transcript ID?
> For example, I’d like to extract 0, 1, 2 with corresponding transcript ID.
> counts
> assigned_counts
> (0,0,0):190,(0,1,1):16,(1,0,0):52,(1,0,1):2,(1,1,0):7,(1,1,1):61
> 0:73,1:50,2:15

MISO gives you the name of the mRNA/transcript from the GFF file in its "isoforms" field.  All you have to do is parse that field to get the mRNA names from the GFF.  If you need to look up anything else about the isoform, you can just look it up in the GFF since you have the mRNA name.

Best, --Yarden

>  
> I look forward to your answers.
> Thank you very much!
>  
> Robert
>  
> 
> 
> 
> 
> This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.