[miso-users] Accounting for batch effects

Yarden Katz yarden at MIT.EDU
Thu Jul 25 21:06:22 EDT 2013


Hi Larry,

Sounds good, happy to discuss further.  In the case of degradation, I can imagine cases where you have a transcript-position bias.  We do actually see cases where inference of, say, 3' UTR usage can be affected by 3'-5' bias caused by RNA degradation, but I've only seen that come up in cases where the 3'-5' difference was extreme and potentially part of the biological signal.  In any case, I do think it's interesting to look for batch effects that are preferential to a particular position along the transcript, like you say.

Best, --Yarden

On Jul 25, 2013, at 9:02 PM, Singh, Larry (NIH/NHGRI) [E] wrote:

> Hi Yarden,
> 
> That makes sense to me.  Correct me if I'm wrong, but the only case where
> I can think that this approach may not work is if you have RNA degradation
> for a set of samples and part of the transcript is gone, but I think that
> would be a problem for most approaches.
> 
> I will try your suggestion and see if we find any evidence of batch
> effects after PSI computations.  Thank you once again for your helpful
> e-mails and advice.
> 
> Much appreciated,
> -Larry.
> 
> On 7/25/13 8:31 PM, "Yarden Katz" <yarden at mit.edu> wrote:
> 
>> Hi Larry,
>> 
>> If you were to normalize, I think it would make more sense to use the
>> normalized counts values (e.g. by quantile normalization) and not the
>> RPKM/FPKM quantity, since that's not really a discrete count value and it
>> already incorporates information about length and library size which
>> would mess up assumptions of our model.
>> 
>> Since the Psi value is a percentage, I think it's less likely to have
>> various scaling artifacts that result from batch effects on RPKMs.  For
>> example, it's common to compare the raw RPKMs of two independent RNA-Seq
>> libraries and find that one library has way higher RPKMs than another
>> uniformly (which would need to be normalized away.)  With Psi values, I
>> think it's less likely to happen, since the inclusion value of an exon
>> (for example) is measured as a percentage of the flanking exons, and that
>> creates a kind of internally controlled measure, at least most of the
>> time.  My suggestion is to try the Psi values as is, and see if there's
>> statistical evidence for a batch effect?.
>> 
>> Best, --Yarden
>> 
>> 
>> On Jul 25, 2013, at 8:23 PM, Singh, Larry (NIH/NHGRI) [E] wrote:
>> 
>>> Hi Yarden,
>>> 
>>> Thanks very much for your response.  This may be a naïve question, but
>>> what about using normalized read counts to start with.  For instance,
>>> instead of computing PSI_MISO with read counts, use RPKM (FPKM) instead.
>>> I haven't read the methods in your paper completely, so I apologize if
>>> this suggestion makes sense. :)
>>> 
>>> Thanks again for getting back to me so promptly.
>>> 
>>> Regards,
>>> -Larry.
>>> 
>>> On 7/25/13 8:02 PM, "Yarden Katz" <yarden at mit.edu> wrote:
>>> 
>>>> Hi Larry,
>>>> 
>>>> The problem of batch effects is very similar to the problem of modeling
>>>> variability within biological replicates.  MISO currently doesn't have
>>>> a
>>>> built-in solution for that.  See this
>>>> (http://genes.mit.edu/burgelab/miso/docs/#answer13) for a discussion of
>>>> the issue and how to deal with it outside of MISO.
>>>> 
>>>> We're working on modeling this problem, but in the meantime the ways to
>>>> address it are discussed in the link above. For what it's worth, the
>>>> Psi
>>>> quantity is internally normalized.  It doesn't mean that it will not
>>>> suffer potentially from batch effects, but (anecdotally) we've found
>>>> that
>>>> this quantity suffers less from batch effects compared with RPKM or
>>>> units
>>>> of gene expression which will be more sensitive to the composition of
>>>> the
>>>> RNA, etc. -- although gene expression values are certainly easier
>>>> overall
>>>> to estimate reliably than Psi values.  Again, I'm sure batch effects
>>>> can
>>>> creep in, but they're less obvious in this context.
>>>> 
>>>> In any case, it's worth seeing what the extent of the batch effects are
>>>> and whether they can be normalized as a post-processing step.  We're
>>>> working on native probabilistic models for this but I imagine that the
>>>> right solution will depend on the kind of batch effects and variability
>>>> in your experiment, and a generic solution that fits all experimental
>>>> design is in my view unlikely.
>>>> 
>>>> Best, --Yarden
>>>> 
>>>> On Jul 25, 2013, at 5:25 PM, Singh, Larry (NIH/NHGRI) [E] wrote:
>>>> 
>>>>> Dear MISO users,
>>>>> 
>>>>> I'm new to MISO, but would like to use it for differential expression
>>>>> and eQTL analyses of a large number of samples.  Initial analyses have
>>>>> shown though that there are likely batch effects.  Is there a method
>>>>> in
>>>>> MISO for accounting for batch effects?  I've searched the web and the
>>>>> miso-users archives and couldn't find an answer.
>>>>> 
>>>>> Thank you kindly for your attention.
>>>>> -Larry.
>>>>> 
>>>>> --
>>>>> Larry N. Singh, Ph.D.
>>>>> Research Fellow
>>>>> Genetic Diseases Research Branch
>>>>> National Human Genome Research Institute, NIH
>>>>> Building 49, Room 4A52
>>>>> 49 Convent Dr., Bethesda, MD 20892-8004
>>>>> (301) 451-4699
>>>>> 
>>>>> _______________________________________________
>>>>> miso-users mailing list
>>>>> miso-users at mit.edu
>>>>> http://mailman.mit.edu/mailman/listinfo/miso-users
>>>> 
>>> 
>> 
> 




More information about the miso-users mailing list