SMSCBOF

From Bgwiki

Jump to: navigation, search

Contents

Introduction

The SC08 Blue Gene System Management Community Birds of a Feather Meeting is 5:30-7:00pm USCDT Tuesday, November 18, 2008 in room(s) 11A/11B.

This is meant to be a discussion forum with representatives from: Argonne National Laboratory, Juelich Research Centre, Lawrence Livermore National Laboratory, IBM, and Brookhaven National Laboratory/Stony Brook University.

We are still asking people to come prepared to talk for no more than 10 minutes about their configurations and what directions you plan to take your site in. We'll then turn to discussing key issues. To prevent duplication and to share ideas in advance, we've set up this wiki.

Schedule

This is a rough schedule and totally unfixed. Feel free to make suggestions and changes.

  • 5:30-5:40 Opening remarks and introductions - Susan Coghlan, ALCF
    • Opening remarks
    • Community
  • 5:40-6:30 Site Presentations
  • 6:30-6:55 Panel Issues Discussion w/ Stump the Experts
  • 6:55-7:00 Round-up
    • e-mail list admins-wg@bgconsortium.org
    • wiki - William to ask for wiki.bgconsortium.org
    • Thank yous

Site Presentations

Configuration Details

  • Model(s) / Driver(s)
  • Size
  • Topology
  • Queuing System
  • Network setup
  • File System details
  • OSes
  • Workload description
  • Monitoring / Notification System
  • Other things that set your system(s) apart

Issues

  • brief discussion of issues faced

Directions

  • Cool tools under development
  • Expansion plans

Panel Discussion Issues

Juelich Research Centre

Speaker: Jutta Docter (j.docter@fz-juelich.de) 
               team lead of BlueGene/P system administration 
               experience with various supercomputers (IBM, Cray, Intel, ...) and Systems (BG/P, BG/L, AIX, LoadLeveler, ...)
  1. RAS events
  2. diagnostics
  3. automatic monitoring
  4. hardware stability
  5. software support
  6. software enhancements
  7. documentation

Argonne

  1. support system integration
  2. diagnostics
  3. failure management
  4. Navigator / CLI tools
  5. monitoring

LLNL

  1. Cross compile environment & autoconf
  2. Interpreted languages
  3. Shared memory
  4. Effective use of 2nd core
  5. Effective use of SIMD floating point unit
  6. Scalability of ethernet for parallel filesystems

Brookhaven National Lab / Stony Brook

Speaker:  Nicholas D'Imperio (dimperio@bnl.gov)
          Blue Gene Systems Coordinator, team lead for System Administration,
          Applications Support, and System Software Development.
  1. Loadleveler Issues
  2. User Accounting
  3. Offsite Cross Compiling
  4. Future Projects
Personal tools