Luis C. E. De Bona Marcos Castilho Fabiano Silva Daniel - - PowerPoint PPT Presentation

luis c e de bona marcos castilho fabiano silva daniel
SMART_READER_LITE
LIVE PREVIEW

Luis C. E. De Bona Marcos Castilho Fabiano Silva Daniel - - PowerPoint PPT Presentation

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran a Digital Project Conclusion Managing a Grid of Computer Laboratories for Educational Purposes Luis C. E. De Bona Marcos Castilho Fabiano


slide-1
SLIDE 1

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion

Managing a Grid of Computer Laboratories for Educational Purposes

Luis C. E. De Bona Marcos Castilho Fabiano Silva Daniel Weingartner Luis H. A. Louren¸ co Bruno Ribas

Center for Scientific Computing and Free Software (C3SL) Federal University of Parana Departament of Computer Science

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-2
SLIDE 2

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion

Outline

1

Introduction and Motivation

2

A Model for Maintenance and Management of Computing Laboratories

3

Paran´ a Digital Project

4

Conclusion

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-3
SLIDE 3

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion

Outline

1

Introduction and Motivation

2

A Model for Maintenance and Management of Computing Laboratories

3

Paran´ a Digital Project

4

Conclusion

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-4
SLIDE 4

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion

Digital Inclusion Polices

Computer laboratories are a necessary tool in a student’s learning process The cost reduction of hardware led to the creation and expasion of digital inclusion policies The lack of specialized workforce is a huge problem!

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-5
SLIDE 5

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion

Managing Computer Laboratories

Having an expert to manage each laboratory in a huge public is impossible It demands a effort to define a new administration model All system management task should be performed either automatically or by experts But we need to minimize the experts’ work

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-6
SLIDE 6

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion

A New Administration Model

This paper presents a model that allows the administration of thousands computing laboratories with minimum human intervention Based on autonomic computing concepts: self-configuration; self-optimization; self-healing; and self-protection Two kinds of human interventions:

Local execution of simple task through user-friendly interfaces Remove execution of critical or unpredictable tasks by a team

  • f experts

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-7
SLIDE 7

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion

Parana Digital Project (PRD)

The model was implemented by Parana Digital Project More than 2100 public school laboratories GNU/Linux software and GPL licence

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-8
SLIDE 8

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion Self-Configuration Self-Optimatization Self-Healing and Self-Protection

Outline

1

Introduction and Motivation

2

A Model for Maintenance and Management of Computing Laboratories

3

Paran´ a Digital Project

4

Conclusion

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-9
SLIDE 9

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion Self-Configuration Self-Optimatization Self-Healing and Self-Protection

A Model for Management of Computing Labs

The goal:

To keep computing laboratories in working conditions To dispense the use of specialized staff in each laboratory

Geographically distributed computing laboratories are interconnected forming a computational grid Hardware, operating system and main application are considered fairly homogeneous Tasks needed to install and to manage the laboratories are classified into local and global management tasks

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-10
SLIDE 10

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion Self-Configuration Self-Optimatization Self-Healing and Self-Protection

Management Tasks

Local management tasks

Specific aspects of each laboratory Translated into high level simple decisions offered by a user-friendly interface Ordinary user called local manager

Global management tasks

To guarantee that the laboratory offers the expected services Determined in a global way and uniformly executed in the grid Most of these task should be automated In order to implement it the proposal is to apply concepts of autonomic computing

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-11
SLIDE 11

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion Self-Configuration Self-Optimatization Self-Healing and Self-Protection

The Local Managers

They are not computer experts They take high level and simple decisions They use a user-friendly interface to perform local tasks:

User accounts, disk quotas, etc. Initial installation or re-installation procedures Contact the call center in case of problems

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-12
SLIDE 12

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion Self-Configuration Self-Optimatization Self-Healing and Self-Protection

Global managers: the Core

A small group of specialized system managers They are able to manage hundreds of laboratories They specify what are the system’s features, software to be installed, what should be restricted or allowed Their decisions are propagated through self-management

Self-configuration Self-optimization Self-recuperation Self-protection

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-13
SLIDE 13

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion Self-Configuration Self-Optimatization Self-Healing and Self-Protection

Self-Configuration

Is a continuous process aiming to keep the system configured under varying time and environment conditions The configuration policies are defined for the entire grid by a core management team, no question should be asked to the local manager In order to reduce management complexity a computing model based on graphic terminals is employed After installation the system is continuously updated through the network The update system is based on modern software package managements systems

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-14
SLIDE 14

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion Self-Configuration Self-Optimatization Self-Healing and Self-Protection

Self-Optimization

Data provided by monitoring systems can be used by experts to optimize system parameters While laboratories have similar configurations global monitoring information is very usefull Historical and comparative analysis of performance metrics can be used to optimize the system Based on global monitoring information experts can enforce new configuration parameters

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-15
SLIDE 15

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion Self-Configuration Self-Optimatization Self-Healing and Self-Protection

Self-Healing and Self-Protection

The system must detect, diagnose, treat and prevent problems due to bugs or hardware failures, leaving minimal decisions to the local manager The downtime of computing laboratories can be drastically reduced when certain aspects of the hardware and the

  • perating system are tracked

Maintaining the system updated is the first step Hard disks and filesystem are the most failure-prone components Hard disk’s self monitoring facility must be monitored Redundant Arrays (RAIDS) are really usefull But is also necessary to perform periodic filesystem integrity verification

When complete re-installation is need a local procedure must tries automatically reinstall the system without loss of user data

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-16
SLIDE 16

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion PRD architecture System Installation

Outline

1

Introduction and Motivation

2

A Model for Maintenance and Management of Computing Laboratories

3

Paran´ a Digital Project

4

Conclusion

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-17
SLIDE 17

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion PRD architecture System Installation

Paran´ a Digital Project

The aim of the project is to provide every school of the Parana Sate with a computing laboratory Parana sate has about 1.500.000 students, 57.000 teachers, 2.100 schools distributed over 399 cities over 199,314 Km2 A huge testbed for the proposed model The first laboratory was installed in June/2006 and, as of August 2008, 2.126 schools were operational. The management team is composed of 12 highly trained Unix managers, taking care of the entire network (approximately 44 thousands stations)

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-18
SLIDE 18

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion PRD architecture System Installation

PRD’s architecture

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-19
SLIDE 19

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion PRD architecture System Installation

PRD’s architecture

In each school:

a laboratory composed of 20 X-terminals

  • ne processing server called the school server

The school server acts simultaneously as

processing and storage unit, gateway to the network, firewall and access point to the Core it runs a Debian-based GNU/Linux distribution, all servers have the same software packages installed.

At the Core, a proxy-controlled connection to the Internet is provided

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-20
SLIDE 20

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion PRD architecture System Installation

Initial installation

A CD-ROM containing a standard system image The server checks periodically for updates at the central mirror X-terminals must automatically recognize and configure hardware It allows automated installation and configuration by the local manager This automated procedure minimizes service disruption and data loss A regular user must have an account created by the local manager. The local manager does not have root powers This ensures that critical tasks are globally defined on the grid.

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-21
SLIDE 21

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion PRD architecture System Installation

System Upgrade

Frequent system upgrades are necessary to provide new functionalities, address security problems and propagate new software, tools, or policies from the Core

automatic daily upgrades and triggered upgrades

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-22
SLIDE 22

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion PRD architecture System Installation

The daily automatic upgrade

Based on Debian’s apt-get tools Every night, each school server looks for new software packages The single mirror ensures that all servers will install exactly the same software It is quite simple to propagate a new tool or configuration

  • ver the entire network

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-23
SLIDE 23

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion PRD architecture System Installation

The triggered upgrade

The Core can force all school servers to upgrade This happens as soon as the network link turns on Use cases:

the Kernel exploit that allowed an ordinary user to become root the upgrade from sarge to etch (in February 2008)

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-24
SLIDE 24

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion PRD architecture System Installation

System Monitoring

Is an essential feature of autonomic systems, and provides information to allow the system’s self-optimization and self-recuperation It is at the heart of the PRD network, revealing the real state

  • f the whole grid

In the PRD model, there are two different monitoring systems:

the statistics center diagnosis system

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-25
SLIDE 25

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion PRD architecture System Installation

The Statistics Center

A web site with strategic information It allows an overview of the network’s growth and provide data concerning the laboratories usage This is automatically stored in the central database This provides strategic information for decision support

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-26
SLIDE 26

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion PRD architecture System Installation

Snapshot of the statistics center web page

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-27
SLIDE 27

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion PRD architecture System Installation

Snapshot of the statistics center web page

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-28
SLIDE 28

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion PRD architecture System Installation

The Instant Diagnosis System

500 1000 1500 2000 2500 Sep/07 Nov/07 Jan/08 Mar/08 May/08 Jul/08 Schools User/1000 Login time/1000

beginframe

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-29
SLIDE 29

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion PRD architecture System Installation

Manual System Inspection

Manual interventions are undesired The core can remotely log in a server via SSH But just to inspect the server’s behavior All changes should happen on a global manner in the grid This means: global management!

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-30
SLIDE 30

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion

Outline

1

Introduction and Motivation

2

A Model for Maintenance and Management of Computing Laboratories

3

Paran´ a Digital Project

4

Conclusion

Luis C. E. Bona Managing a Grid of Computer Laboratories

slide-31
SLIDE 31

Introduction and Motivation A Model for Maintenance and Management of Computing Laboratories Paran´ a Digital Project Conclusion

Conclusion

How to manage computing laboratories without a local system manager We proposed a model that allows the administration of thousands of computing laboratories with minimum humam intervention The system is managed as a whole Paran´ a Digital has more than 2,100 schools and only 12 managers

Luis C. E. Bona Managing a Grid of Computer Laboratories