SMSCBOF

From Bgwiki

Jump to: navigation, search

Contents

[edit] Introduction

The SC08 Blue Gene System Management Community Birds of a Feather Meeting is 5:30-7:00pm USCDT Tuesday, November 18, 2008 in room(s) 11A/11B.

This is meant to be a discussion forum with representatives from: Argonne National Laboratory, Juelich Research Centre, Lawrence Livermore National Laboratory, IBM, and Brookhaven National Laboratory/Stony Brook University.

We are still asking people to come prepared to talk for no more than 10 minutes about their configurations and what directions you plan to take your site in. We'll then turn to discussing key issues. To prevent duplication and to share ideas in advance, we've set up this wiki.

[edit] Schedule

This is a rough schedule and totally unfixed. Feel free to make suggestions and changes.

  • 5:30-5:40 Opening remarks and introductions - Susan Coghlan, ALCF
    • Opening remarks
    • Community
  • 5:40-6:30 Site Presentations
  • 6:30-6:55 Panel Issues Discussion w/ Stump the Experts
  • 6:55-7:00 Round-up
    • e-mail list admins-wg@bgconsortium.org
    • wiki - William to ask for wiki.bgconsortium.org
    • Thank yous

[edit] Site Presentations

[edit] Configuration Details

  • Model(s) / Driver(s)
  • Size
  • Topology
  • Queuing System
  • Network setup
  • File System details
  • OSes
  • Workload description
  • Monitoring / Notification System
  • Other things that set your system(s) apart

[edit] Issues

  • brief discussion of issues faced

[edit] Directions

  • Cool tools under development
  • Expansion plans

[edit] Panel Discussion Issues

[edit] Juelich Research Centre

Speaker: Jutta Docter (j.docter@fz-juelich.de) 
               team lead of BlueGene/P system administration 
               experience with various supercomputers (IBM, Cray, Intel, ...) and Systems (BG/P, BG/L, AIX, LoadLeveler, ...)
  1. RAS events
  2. diagnostics
  3. automatic monitoring
  4. hardware stability
  5. software support
  6. software enhancements
  7. documentation

[edit] Argonne

  1. support system integration
  2. diagnostics
  3. failure management
  4. Navigator / CLI tools
  5. monitoring

[edit] LLNL

  1. Cross compile environment & autoconf
  2. Interpreted languages
  3. Shared memory
  4. Effective use of 2nd core
  5. Effective use of SIMD floating point unit
  6. Scalability of ethernet for parallel filesystems

[edit] Brookhaven National Lab / Stony Brook

Speaker:  Nicholas D'Imperio (dimperio@bnl.gov)
          Blue Gene Systems Coordinator, team lead for System Administration,
          Applications Support, and System Software Development.
  1. Loadleveler Issues
  2. User Accounting
  3. Offsite Cross Compiling
  4. Future Projects
Personal tools