[PageOneX] [dev] PageOneX Lines of work
pablo rey
pablo at basurama.org
Sun May 26 01:41:59 EDT 2013
Hi there,
Now that we've launched the tool and people can start testing it, it is a
good moment to have an overview of all the open fronts, and the new boxes
that we can open.
Use the gdoc to edit/add/comment:
https://docs.google.com/document/d/1kXdeIo_iM2_rI_MAyOvN-sfBj2PH7PYqayenHE0H6uk/edit#
Bellow it is just a copy-paste from there.
We'll be having a meeting this coming week (Ed, Rahul and me) to see where
we want to put our time.
Development
Scraping
The user would be able to select from different data sources when starting
a thread.
Print front pages
We are currently scraping from
-
kiosko.net
Future steps would include other data sources:
-
newseum (http://www.newseum.org/todaysfrontpages/) they have an archive
of images since approx. 2002 (stored in CD's!). Possible collaboration with
Paul Sparow.
-
newspapers print front pages archives:
-
set up scrapers for the different newspapers front pages in pdf(gdoc
that list different newspaper front page data to help bulid a
script<https://docs.google.com/spreadsheet/ccc?key=0AupjZBpCwY8UdEgwUndSeHp5bjBMRHlJME1TSkZRZkE#gid=0>
)
-
rescue the scrapers already built (earlier commits)
-
pdf to img
-
local files: images uploaded by users
An interesting related project is Xed
http://diuf.unifr.ch/main/diva/research/research-projects/xed "a new tool
for extracting hidden structures from electronic documents. Document Image
Analysis for Libraries, 2004. Proceedings. First International Workshop on.
IEEE."
Online home page
Make preliminary analysis: frequency of scraper, images, html to enable
queries
-
Example and database: http://pastpages.com/ Only iages
Thread creation / edit
Better newspaper selection / order
Enable thread and topic forking (wikiscraper model)..
Coding
-
Fix slow behavior when many images or areas in the thread with:
-
ajax
-
resize of images to show smaller images in the display view
-
Allow multicoder (sharing thread ownership),
-
Allow simultaneous coding
-
Add annotation features (for days, for images), annotation standards?
-
Capability for magnet areas to avoid overlapping of areas
-
Zooming in images to read small texts
-
Facilitate the coding process enabling search of certain words that
would pre-seslect front pages (OCR, or read text from PDF should be
activated, external data bases: API New York Times, Lexis Nexis...)
-
Allow connection of areas to form one "area-article", this will lead to
be able have the "count" of articles per day/newspapers in one topic
http://numeroteca.org/wp-content/uploads/2012/10/syria-20112-07.png
-
Allow multiple taxonomies (Ex: one taxonomy for theme + taxonomy for
frame analysis
http://numeroteca.org/2013/02/06/3-steps-to-measure-the-corruption-coverage-in-spain/)
Export
-
Already works ods + json
-
Image .png
-
Enable embed option: image? svg?
-
Thread: all related data
Display
Test D3 to allow richer visualizations and direct svg integration (Ed is
working in this):
-
bar char split by newspapers
-
live selection of newspapers to modify on the fly which newspapers show
in the bar chart
-
array of newspapers in svg
Enable the selection of only certain days for the display view (different
form the original time frame of the thread). Very long threads have
difficulties for being displayed. In the past, a year long study implied
the creation of 12 threads. This new feature would enable the creation of 1
thread and then visualize only certain parts of it.
Exploring new ways to show the data:
Bar chart:
Multiple column
1 column
Comparison among newspapers: % to each topic
Total space (in number of front pages) for each topic
Platform
-
Users pages
-
Mix with other media data: TV from archive.org, Twitter data...
-
Voting/curation of threads to give visibility to other threads (not just
the last created)
-
Show preview of the thread in thread list
Outreach / Community building
Tutorials
Screencast
Case studies
User driven design
Contact existing organizations that make similar work
-
Women in Journalism. Report:
http://womeninjournalism.co.uk/wp-content/uploads/2012/10/Seen_but_not_heard.pdf
FAIR: Fairness & Accuracy In Reporting http://fair.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/pageonexdev/attachments/20130526/5699fe7f/attachment-0001.htm
More information about the Pageonexdev
mailing list