[miso-users] Accounting for batch effects
Yarden Katz
yarden at MIT.EDU
Thu Jul 25 21:06:22 EDT 2013
Hi Larry,
Sounds good, happy to discuss further. In the case of degradation, I can imagine cases where you have a transcript-position bias. We do actually see cases where inference of, say, 3' UTR usage can be affected by 3'-5' bias caused by RNA degradation, but I've only seen that come up in cases where the 3'-5' difference was extreme and potentially part of the biological signal. In any case, I do think it's interesting to look for batch effects that are preferential to a particular position along the transcript, like you say.
Best, --Yarden
On Jul 25, 2013, at 9:02 PM, Singh, Larry (NIH/NHGRI) [E] wrote:
> Hi Yarden,
>
> That makes sense to me. Correct me if I'm wrong, but the only case where
> I can think that this approach may not work is if you have RNA degradation
> for a set of samples and part of the transcript is gone, but I think that
> would be a problem for most approaches.
>
> I will try your suggestion and see if we find any evidence of batch
> effects after PSI computations. Thank you once again for your helpful
> e-mails and advice.
>
> Much appreciated,
> -Larry.
>
> On 7/25/13 8:31 PM, "Yarden Katz" <yarden at mit.edu> wrote:
>
>> Hi Larry,
>>
>> If you were to normalize, I think it would make more sense to use the
>> normalized counts values (e.g. by quantile normalization) and not the
>> RPKM/FPKM quantity, since that's not really a discrete count value and it
>> already incorporates information about length and library size which
>> would mess up assumptions of our model.
>>
>> Since the Psi value is a percentage, I think it's less likely to have
>> various scaling artifacts that result from batch effects on RPKMs. For
>> example, it's common to compare the raw RPKMs of two independent RNA-Seq
>> libraries and find that one library has way higher RPKMs than another
>> uniformly (which would need to be normalized away.) With Psi values, I
>> think it's less likely to happen, since the inclusion value of an exon
>> (for example) is measured as a percentage of the flanking exons, and that
>> creates a kind of internally controlled measure, at least most of the
>> time. My suggestion is to try the Psi values as is, and see if there's
>> statistical evidence for a batch effect?.
>>
>> Best, --Yarden
>>
>>
>> On Jul 25, 2013, at 8:23 PM, Singh, Larry (NIH/NHGRI) [E] wrote:
>>
>>> Hi Yarden,
>>>
>>> Thanks very much for your response. This may be a naïve question, but
>>> what about using normalized read counts to start with. For instance,
>>> instead of computing PSI_MISO with read counts, use RPKM (FPKM) instead.
>>> I haven't read the methods in your paper completely, so I apologize if
>>> this suggestion makes sense. :)
>>>
>>> Thanks again for getting back to me so promptly.
>>>
>>> Regards,
>>> -Larry.
>>>
>>> On 7/25/13 8:02 PM, "Yarden Katz" <yarden at mit.edu> wrote:
>>>
>>>> Hi Larry,
>>>>
>>>> The problem of batch effects is very similar to the problem of modeling
>>>> variability within biological replicates. MISO currently doesn't have
>>>> a
>>>> built-in solution for that. See this
>>>> (http://genes.mit.edu/burgelab/miso/docs/#answer13) for a discussion of
>>>> the issue and how to deal with it outside of MISO.
>>>>
>>>> We're working on modeling this problem, but in the meantime the ways to
>>>> address it are discussed in the link above. For what it's worth, the
>>>> Psi
>>>> quantity is internally normalized. It doesn't mean that it will not
>>>> suffer potentially from batch effects, but (anecdotally) we've found
>>>> that
>>>> this quantity suffers less from batch effects compared with RPKM or
>>>> units
>>>> of gene expression which will be more sensitive to the composition of
>>>> the
>>>> RNA, etc. -- although gene expression values are certainly easier
>>>> overall
>>>> to estimate reliably than Psi values. Again, I'm sure batch effects
>>>> can
>>>> creep in, but they're less obvious in this context.
>>>>
>>>> In any case, it's worth seeing what the extent of the batch effects are
>>>> and whether they can be normalized as a post-processing step. We're
>>>> working on native probabilistic models for this but I imagine that the
>>>> right solution will depend on the kind of batch effects and variability
>>>> in your experiment, and a generic solution that fits all experimental
>>>> design is in my view unlikely.
>>>>
>>>> Best, --Yarden
>>>>
>>>> On Jul 25, 2013, at 5:25 PM, Singh, Larry (NIH/NHGRI) [E] wrote:
>>>>
>>>>> Dear MISO users,
>>>>>
>>>>> I'm new to MISO, but would like to use it for differential expression
>>>>> and eQTL analyses of a large number of samples. Initial analyses have
>>>>> shown though that there are likely batch effects. Is there a method
>>>>> in
>>>>> MISO for accounting for batch effects? I've searched the web and the
>>>>> miso-users archives and couldn't find an answer.
>>>>>
>>>>> Thank you kindly for your attention.
>>>>> -Larry.
>>>>>
>>>>> --
>>>>> Larry N. Singh, Ph.D.
>>>>> Research Fellow
>>>>> Genetic Diseases Research Branch
>>>>> National Human Genome Research Institute, NIH
>>>>> Building 49, Room 4A52
>>>>> 49 Convent Dr., Bethesda, MD 20892-8004
>>>>> (301) 451-4699
>>>>>
>>>>> _______________________________________________
>>>>> miso-users mailing list
>>>>> miso-users at mit.edu
>>>>> http://mailman.mit.edu/mailman/listinfo/miso-users
>>>>
>>>
>>
>
More information about the miso-users
mailing list