[Olympus] RFC: Blinding and proposal for rules for presentations and analysis

Sat Aug 30 09:07:33 EDT 2014

Dear Michael.

On 8/29/2014 2:49 PM, Michael Kohl wrote:
> Dear Jan,
> amazingly long email!
> 
> Quick remark on 2.4)
> Can you precompile the code that does the blinding (for linux and mac) so 
> that the excutable is being linked to the blinding binary, without making 
> the source code of it available? Can we make the code by default fail to 
> compile if the blinding function is not linked? (of course if someone 
> really wants to look at unblinded data it is still possible by actively 
> removing this default requirement but it would not happen accidentally).

I can probably come up with a way of hiding it, but it's a support
nightmare.
I might find a way of at least obfusicate the code so that a curious
glance will not reveal the blinding function. But making it hard to
deactivate is not trivial. If we want to defend against malicious
collaboration members, I think we need to have a different kind of
discussion in the first place.

Best,
Jan

> 
> Best regards
>    Michael
> 
> 
> On Fri, 29 Aug 2014, Jan C. Bernauer wrote:
> 
>> Hi all,
>>
>> We are getting close to real results, I think it is time to raise the topic
>> of blinding again. I also have some proposals regarding forthcoming analyses
>> and presentations.
>>
>> To not spring this on you during the meeting and to give you time to think
>> about these things, I wrote up what I have in mind (heavily numbered so
>> it's easy to refer to single statements):
>>
>> Blinding:
>>
>> 1) Why do we need blinding:
>>
>> 1.1) To protect ourselves from accidental early release of results.
>> 1.2) NOT to protect from deliberate early release of results.
>> 1.3) To stop us from applying a personal bias to the results.
>>
>> 1.3 is the important point, but the continuous discussion about slides
>> in talks show that 1.1 is valid too.
>>
>> 2) How do we blind:
>>
>> 2.1) We will not blind 12 degree data. That means we will have to
>> continue to be vigilant about not releasing results, and there is a
>> serious danger of biasing ourselves. Why do we believe that the 2 photon
>> effect vanishes at 12 deg? If the answer is "Because the theoreticians
>> say so," first, they don't, and even if, why do we measure 2-photon in
>> the first place? If we believe that, why do we not believe the rest of
>> their predictions?
>>
>>
>> 2.2) We will only blind the wire chamber. To do so, I will select tracks
>> from data and suppress them in the final tracked file. The selection
>> criteria is only known to me.
>> The question is now /when/ these tracks will be suppressed. The original
>> idea was:
>>
>> 2.3) I will track all files first (= unblinded), and generate from them
>> a new version with tracks removed (=blinded). I will only release the
>> second set of files.
>> This method is rather safe from tampering since I do not have to release
>> the blinding code. If all track reconstruction is done only at MIT this is
>> the preferred method. However, if other institutes/people will also
>> reconstruct from the raw data then the blinding must be the done locally as
>> well either with the same code or a local version of the blinding code.
>> The tracking is in a state where everybody can do it. Another solution is:
>>
>> 2.4) I will add the blinding to the tracking plugin itself.
>> Of course, this means that the blinding code can be seen and deactivated
>> by everyone. However, I do not want to fight 1.2, so this should be OK.
>> I will make the check explicitly visible in the code, with the actual
>> calculation in a separate file. For general debugging, nobody has to
>> open that file, so just don't.
>>
>>
>> 3) Upcoming new tracked files
>> 3.1) I will release a new spin of the 100 files soon. This should happen
>> before the collaboration meeting unless I run into unexpected problems.
>> 3.2) These files will be blinded following 2.1, since this is what we
>> agreed on earlier.
>>
>>
>>
>> Forthcoming analyses:
>> 4) We a still in a state where we have to work on different tasks
>> separately.
>> 5) All calibration steps should be performed by at least two groups,
>> preferably with different code.
>> 6) We will produce one canonical set of tracked files.
>> 7) The final analysis is probably simple enough that it can be performed
>> by each student/interested person individually. By this, we will have a
>> good sample of systematics of personal bias.
>> 8) These individual analyses can address many systematic errors
>> individually, but not all. This includes, for example, many
>> tracking related systematics.
>>
>> The different analyses will differ in their results, and we have to
>> understand the differences. To make this easier, I would propose the
>> following:
>> 9) Each analysis should be supervised critically by somebody in all
>> steps. I can do this for MIT/ASU, but we need somebody at DESY to
>> fulfill the same role for the people in Europe. The idea behind this is
>> to ensure that there are no mistakes in the ground work which are then
>> hard to find if one sees only the result.
>> 10) The /complete/ code to reproduce the analysis must be checked in to
>> git regularly. This means at least weekly!
>>
>> To get used to this, and to reduce the noise, I propose the following
>> rules for Monday presentations:
>> 11) Talks on Monday should be made available a couple of days in
>> advance. We had this rule before, but it has been eroded somewhat.
>> 12) The talks should either have been presented before in one of the
>> local group meetings (Lumi, MIT or DESY analysis meeting), or been
>> approved by the local analysis coordinator (see 9)
>> 13) The code/scripts to produce the results in the talk must be pushed
>> to git /before/ the approval or the local presentation. If the code is
>> good enough to produce results you want to show, then it is good enough
>> to be shown itself.
>> 14) Each talk has to have a slide describing which code is used, and how
>> to reproduce the results. This doesn't have to be presented in the talk
>> itself, but should be included for reference.
>> 15) If we find something questionable in the beginning of the talk which
>> might invalidate the rest of the findings, we will stop the presentation.
>>
>> I think 13), in combination with 11) and 14) helps on multiple accords.
>> First, it gets us all into the habit of pushing the code into the repo.
>> Further, it may streamline presentations: During the presentations, many
>> questions are of the type: What generator did you use? What options? and
>> so on. While this should be in the talk, having the code allows
>> everybody to check. Additionally, it allows us to debug.
>> The recent bug in Denis code is a good example: Based on a simple
>> misconception about the C language rules, it is simple to introduce a bug,
>> difficult for the original programmer to identify, but fairly simple for a
>> second person to spot. This source of bugs is so common and recognized by
>> experts that different programming styles take it explicitly into account.
>> Extreme programming, for example, advocates that code is always written by
>> one person with a second person looking over his shoulder.
>> With the code at hand, instead of having to discuss this in on two Mondays,
>> it would have been a thing of five minutes.
>>
>> Please think about these points. We then can have a discussion during
>> the collaboration meeting.
>>
>> Best,
>> Jan
>>
>>
>>
>>
>>
>>
>> -- 
>> Dr. Jan C. Bernauer
>> Massachusetts Institute of Technology
>> 77 Massachusetts Ave, Room 26-441
>> Cambridge, MA, 02139, USA
>> Phone:  (617) 253-6580
>>
>>
>> -- 
>> Dr. Jan C. Bernauer
>> Massachusetts Institute of Technology
>> 77 Massachusetts Ave, Room 26-441
>> Cambridge, MA, 02139, USA
>> Phone:  (617) 253-6580
>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: http://mailman.mit.edu/pipermail/olympus/attachments/20140829/013479e3/attachment.htm
>> _______________________________________________
>> Olympus mailing list
>> Olympus at mit.edu
>> http://mailman.mit.edu/mailman/listinfo/olympus
>>
> 
> +---------------------------------------------------------------------
> | Dr. Michael Kohl, Associate Professor and Staff Research Scientist
> | Physics Department, Hampton University, Hampton, VA 23668
> | Jefferson Lab, C117, 12000 Jefferson Avenue, Newport News, VA 23606
> | Phone: +1-757-727-5153 (HU), +1-757-269-7343 (Jlab)
> | Fax:   +1-757-728-6910 (HU), +1-757-269-7363 (Jlab)
> | Email: kohlm at jlab.org, Cell: +1-757-256-5122 (USA)
> +---------------------------------------------------------------------
> _______________________________________________
> Olympus mailing list
> Olympus at mit.edu
> http://mailman.mit.edu/mailman/listinfo/olympus
>