[miso-users] Accounting for batch effects
Yarden Katz
yarden at MIT.EDU
Thu Jul 25 20:31:51 EDT 2013
Hi Larry,
If you were to normalize, I think it would make more sense to use the normalized counts values (e.g. by quantile normalization) and not the RPKM/FPKM quantity, since that's not really a discrete count value and it already incorporates information about length and library size which would mess up assumptions of our model.
Since the Psi value is a percentage, I think it's less likely to have various scaling artifacts that result from batch effects on RPKMs. For example, it's common to compare the raw RPKMs of two independent RNA-Seq libraries and find that one library has way higher RPKMs than another uniformly (which would need to be normalized away.) With Psi values, I think it's less likely to happen, since the inclusion value of an exon (for example) is measured as a percentage of the flanking exons, and that creates a kind of internally controlled measure, at least most of the time. My suggestion is to try the Psi values as is, and see if there's statistical evidence for a batch effect.
Best, --Yarden
On Jul 25, 2013, at 8:23 PM, Singh, Larry (NIH/NHGRI) [E] wrote:
> Hi Yarden,
>
> Thanks very much for your response. This may be a naïve question, but
> what about using normalized read counts to start with. For instance,
> instead of computing PSI_MISO with read counts, use RPKM (FPKM) instead.
> I haven't read the methods in your paper completely, so I apologize if
> this suggestion makes sense. :)
>
> Thanks again for getting back to me so promptly.
>
> Regards,
> -Larry.
>
> On 7/25/13 8:02 PM, "Yarden Katz" <yarden at mit.edu> wrote:
>
>> Hi Larry,
>>
>> The problem of batch effects is very similar to the problem of modeling
>> variability within biological replicates. MISO currently doesn't have a
>> built-in solution for that. See this
>> (http://genes.mit.edu/burgelab/miso/docs/#answer13) for a discussion of
>> the issue and how to deal with it outside of MISO.
>>
>> We're working on modeling this problem, but in the meantime the ways to
>> address it are discussed in the link above. For what it's worth, the Psi
>> quantity is internally normalized. It doesn't mean that it will not
>> suffer potentially from batch effects, but (anecdotally) we've found that
>> this quantity suffers less from batch effects compared with RPKM or units
>> of gene expression which will be more sensitive to the composition of the
>> RNA, etc. -- although gene expression values are certainly easier overall
>> to estimate reliably than Psi values. Again, I'm sure batch effects can
>> creep in, but they're less obvious in this context.
>>
>> In any case, it's worth seeing what the extent of the batch effects are
>> and whether they can be normalized as a post-processing step. We're
>> working on native probabilistic models for this but I imagine that the
>> right solution will depend on the kind of batch effects and variability
>> in your experiment, and a generic solution that fits all experimental
>> design is in my view unlikely.
>>
>> Best, --Yarden
>>
>> On Jul 25, 2013, at 5:25 PM, Singh, Larry (NIH/NHGRI) [E] wrote:
>>
>>> Dear MISO users,
>>>
>>> I'm new to MISO, but would like to use it for differential expression
>>> and eQTL analyses of a large number of samples. Initial analyses have
>>> shown though that there are likely batch effects. Is there a method in
>>> MISO for accounting for batch effects? I've searched the web and the
>>> miso-users archives and couldn't find an answer.
>>>
>>> Thank you kindly for your attention.
>>> -Larry.
>>>
>>> --
>>> Larry N. Singh, Ph.D.
>>> Research Fellow
>>> Genetic Diseases Research Branch
>>> National Human Genome Research Institute, NIH
>>> Building 49, Room 4A52
>>> 49 Convent Dr., Bethesda, MD 20892-8004
>>> (301) 451-4699
>>>
>>> _______________________________________________
>>> miso-users mailing list
>>> miso-users at mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/miso-users
>>
>
More information about the miso-users
mailing list