In order to better integrate my blog with my website, better manage comment spam, and reduce my dependence on Google, this blog has moved to In order to avoid broken links I won't be deleting content from here, but no new content will be added, so please update your bookmarks and feeds.

Tuesday 2 July 2013

Official statistics; NeSI; REANNZ; Australian eResearch infrastructure - #eResearchNZ2013

Don't know how long I'll be live-blogging, but here's the start of the eResearchNZ 2013 conference:

Some thoughts on what eResearch might glean from official statistics
Len Cook,
* Research-based info competes with other sources of info people use to make decisions
Politicians like weathercocks - have to respond to wind. Sources of info include: official stats, case studies, anecdote, and ideology/policy framework. More likely to hear anecdotes than research. NZ's data-rich but poor at getting access to existing data. Confidentiality issues: "Statisticians spend half the time collecting data and the other preventing people from accessing it." Need to shift ideas - recent shifts in legislation a step to this.

* Official statistics has evolved over the last few centuries
19th century: measurement developed to challenge policy. Florence Nightengale wanted to measure wellbeing in military hospitals because it was like taking hundreds of young men, lining them up and shooting them. Mass computation and ingenuity of graphical presentation - all by hand.
20th century: development of sampling, reliability, meshblocks. Common classifications, frameworks.
1990s and beyond: mass monitoring of transactions. Politics of info access/ownership important. Obligations created when data collected. Registers and identifiers now central. Importance of investing in metadata to categorise and integrate information.

* Managing data not just about technology - probably the reverse.

* Structural limitations. Need strong sectoral leadership. Need a chief information office for a sector not for government as a whole.

NeSI's Experience as National Research Infrastructure
Nick Jones, New Zealand eScience Infrastructure
NZ is very good at scientific software. Also significant national investments in data (GeoNet, NZSSDS, StatsNZ, DigitalNZ, cellML, LRIS, LERNZ, CEISMIC, OBIS). But also significant (unintended) siloisation and no investment to break down barriers and integrate. However do have good capability. NeSI wants to enhance existing capabilities but also help people meet each other. Build up shared mission, collegiality.

Heterogeneous systems improve ability to deal with specific datasets, but increasingly need ability to adapt software. NeSI gives capability to support maturing of existing scientific computing capabilities.

CRIs are widespread. So are research universities. All connected by REANNZ (KAREN). Research becoming more highly connected, collaborative. National Science Challenges targeted to building collaboration too. But sector still fragmented and small-scale. "Each project creates, and destroys, its own infrastructure."

Research eInfrastructure roadmap 2012 includes NZ Genomics Ltd (->Bioinformatics Cloud); BeSTGRID, BlueFern, NIWA (->NeSI); BeSTGRID Federation (Tuakiri); KAREN->REANNZ. Is a big gap in area of research data infrastructure.

Need government investment to overcome coordination failure. Institutions should support national infrastructure. NeSI to create scalable computing infrastructure; provide middleware and user-support; encourage cooperation; contribute to high quality research outputs. In addition to infrastructure have team of experts to support researchers.

REANNZ: An Instrument for Data-intensive Science
Steve Cotter, REANNZ
Move from experimental -> theoretical -> computational sciences, and now to data-intensive science (see "The Fourth Paradigm"). Exponential data growth. Global collaboration and requirement for data mobility. "Science productivity is directly proportional to the ease with which we can move data." Trend towards cloud-based services.

And trend to need for lossless networking. Easy to predict capacity for youtube etc. But when simulating global weather patterns, datasets are giant and unpredictable - big peaks and troughs in traffic. TCP good at handling loss for small packets, but can be crushed by a large packet loss - 80x reduction in data transfer rates for NZ-type distances. So can't rely on commercial networks.

Higgs-Boson work example of network as part of the scientific instrument and workflow.

Working on customisation, flexibility. Optimising end-to-end: Data transfer node; Science DMZ (REANNZ working with NZ unis, CRIs etc to deploy); perfSONAR.

Firewalls are harmful to large data flows and unnecessary. Not as effective as they once were.

If you can't move 1TB in 20 minutes, talk to to REANNZ - they'll raise your expectations.

Progressing to work with services above the network.

Australian Research Informatics Infrastructure
Rhys Francis, eResearch Coordination Project
Sustained strategic investment over a decade into tools, data, computation, networks, and buildings (for computation). (Personnel hidden in all of these.) Tools are mission critical, data volumes explode, systems grow to exascale, global bandwidth scales up. High ministerial turnover; each one takes about six months then realises we need this infrastructure. Breaking it down into these areas helps explain it to people.

OTOH volume of well-curated data is not exploding.

National capabilities: Want extended bandwidth, better HPC modelling, larger data collections. Shared access highly desirable but very hard to get agreement on how.
Research integration: Want research data commons and problem-oriented digital laboratories.

Hard to explain top, and when you chop it up into bits people think "Any university could have done that bit." But need expertise and need to share it.

In last 7 years added fibre and super-computing infrastructure. Many software tools and lab integration projects. Hundreds of data and data flow improvement projects. Single sign-on. Data commons for publication/discovery. Recruit overseas but still only so much they can resource.

These things are hard, and it was data slowing it down because didn't know where collections would physically be. If you're dealing with petabytes, the only way to move it is by forklift.

eResearch infrastructure brings capabilities to the researcher.
NCI and Pawsey: do computational modeling, data analysis, visualise results
NeCTAR: use new tools, apps, work remotely and colaborate in the cloud
ANDS and RDSI: keep data and observations, describe, collect, share, etc.

Current status (I'm handpicking data-related bulletpoints):
* 50,000 collections published in research data commons
* coordination project to work with implementation projects to deploy data and tools as service for key national data holdings

Looking to 2014:
* data publication delivered
* Australian data discoverable
* 50 petabytes of quality research data online
* colocation of data and computing delivered

Need to focus on content (including people/knowledge, data, tools) as infrastructure. Datasets and skillsets. Less and less bespoke tools; more and more open-source or commercial products.

Need to support and fund infrastructure as business-as-usual.