[PPT] - UHD Grid Computing Team Department of Computer and Mathematical PowerPoint Presentation

SLIDE 1

1

UHD Grid Computing Team Department of Computer and Mathematical Sciences University of Houston-Downtown

SLIDE 2

2

Outline

Grid Laboratory Architecture Interface Clusters Application Projects

SLIDE 3

3

Historical Review

An Integrated Lab Package for Introductory

Computer Science Course – CSI

The Design of Phil2000

Java Application Hierarchical Design

Labs -> Tasks -> Activities Explorer Plug-ins to run applications

SLIDE 4

4

Grid Infrastructure for Laboratory

an interface that is extensible to incorporate

more lab modules and customizable to different course structures

Solution: a lab explorer

an computational backbone that provides

services for various lab activities

Solution: an array of servers that run on a

computational grid

SLIDE 5

5

The pyramid model of the project

Interface of functional units Client/server model Lab modules Module language Architecture specification

Design theme Application theme

Multi-agent system

SLIDE 6

6

Labs to implement

Topology: Circuiting messages in a ring
Collective communications: Matrix transpose
Group management: Matrix multiplication with Fox’s algorithm
Scientific computation: Solving linear systems with Jacobi’s algorithm
Combinatorial search: Traveling salesman problem
Parallel I/O: Vector processing - Summation
Performance analysis: Visualization with Upshot – Trapezoidal rule problem
Parallel library: Solving linear system with ScaLapack
Scalability analysis: Bitonic sorting
LAN configuration: The use of NICs and hubs
Network analysis: Monitoring a chat room
Address resolution: Experiment with ARP burst
IP masquerading: Clustered web servers
WAN configuration: The use of routers
Performance tuning: Deal with congestion
Service configuration: The configuration of a networked file system:

SLIDE 7

7

The Main Menu Frame End Screens Cards Tree Panel Upper Toolbar Lower Toolbar Save Open Print Next

The Lab Layout

SLIDE 8

8

Frame title & icon Screen menu

Tree

Upper ToolBar Screen panel ScrollPane Lower ToolBar Button

The Main Menu

SLIDE 9

9

Open and Close Menus

Open or Save lab Massage

The file *.mla

SLIDE 10

10

Print

Print layout frame Print Dialog Data Table Print Table Print Dialog Scroll Pane

SLIDE 11

11

Help Menu

Help Frame

SLIDE 12

12

Other Software used

Visual C++ MPICH JumpShot

SLIDE 13

13

Node Calling and Services Menu

Services menu Scroll pane Radio Button Text area Text field Scroll Menu

Node Calling Menu

SLIDE 14

14

An Example

Performance evaluation of parallel programs

SLIDE 15

15

Performance Analysis - Activities

a. Performance Prediction
b. Compilation and execution
c. Profiling

SLIDE 16

16

Khoi Nguyen

Construct small cluster/NOW.
CMS lab cluster (16 nodes)
Linux Beowulf Cluster functional
Successfully run all sorting algorithms

with expected results.

Create a GUI interface as an overlay

using JAVA.

http://www.netlib.org/utk/papers/mpi-book/node2.html

SLIDE 17

17

16 Nodes RedHat 9.0 as Operating System Configure rsh in root and user directories Install MPICH ver. 1.2.6 (Unix- all flavors) Test with MPICH test programs & sorting

program

SLIDE 18

18

The Cluster

SLIDE 19

19

Test results – Quick sort

Q u i c k s o r t f o r 1 0 , 0 0 0 e l e me n t s 0 . 0 1 0 . 0 2 0 . 0 3 0 . 0 4 0 . 0 5 P r o c e s s o r s 0 . 0 4 2 0 . 0 1 4 5 0 . 0 1 2 9 0 . 0 1 5 8 0 . 0 1 9 1 p 2 p 4 p 8 p 1 6 p Q u i c k s o r t f o r 1 0 0 , 0 0 0 e l e me n t s 0 . 0 5 0 . 1 0 . 1 5 0 . 2 0 . 2 5 P r o c e s s o r s 0 . 1 6 2 0 . 1 8 1 0 . 1 9 8 0 . 1 8 9 0 . 2 2 4 1 p 2 p 4 p 8 p 1 6 p Q u i c k s o r t f o r 1 , 0 0 0 , 0 0 0 e l e me n t s 0 . 5 1 1 . 5 2 P r o c e s s o r s 1 . 4 9 1 . 6 2 1 . 5 1 1 . 3 4 1 . 3 4 1 p 2 p 4 p 8 p 1 6 p Q u i c k s o r t f o r 1 0 , 0 0 0 , 0 0 0 e l e me n t s 5 1 0 1 5 2 0 P r o c e s s o r s 1 6 . 0 8 5 1 4 . 6 8 1 4 . 5 8 1 4 . 1 5 1 2 . 4 9 1 p 2 p 4 p 8 p 1 6 p

SLIDE 20

20

Test results – Merge sort

M e r g e So r t - 1 0 , 0 0 0 e l e me n t s 0 . 0 1 0 . 0 2 0 . 0 3 0 . 0 4 0 . 0 5 0 . 0 6 P r o c e s s o r s 0 . 0 4 9 1 0 . 0 4 3 4 0 . 0 3 0 2 0 . 0 3 1 0 . 0 2 9 1 p 2 p 4 p 8 p 1 6 p M e r g e So r t - 1 0 0 , 0 0 0 e l e me n t s 0 . 0 5 0 . 1 0 . 1 5 0 . 2 0 . 2 5 0 . 3 P r o c e s s o r s 0 . 2 4 8 0 . 2 2 3 0 . 1 7 9 0 . 1 4 7 0 . 1 5 5 1 p 2 p 4 p 8 p 1 6 p

M e r g e s o r t - 1 0 , 0 0 0 , 0 0 0 e l e me n t s 1 0 2 0 3 0 4 0 P r o c e s s o r s 3 1 . 7 5 2 5 . 7 2 1 7 . 8 9 1 3 . 8 5 1 2 . 1 4 1 p 2 p 4 p 8 p 1 6 p M e r g e s o r t P e r f o r ma n c e I n c r e a s e f o r 1 , 0 0 0 , 0 0 0 e l e me n t s 0 . 2 0 . 4 0 . 6 0 . 8 1 1 . 2 1 . 4 P r o c e s s o r s 1 6 . 5 9 % 5 1 . 7 0 % 9 9 . 2 5 % 1 1 8 . 8 5 % 1 p 2 . 6 7 2 p 4 p 8 p 1 6 p

SLIDE 21

21

GUI will be basic window applet

Action buttons

Introduction Run your MPI program Run Demo programs

Sorting Programs Distribution sample programs

Help

Output Window

SLIDE 22

22

Finished GUI - Introduction

SLIDE 23

23

Open your MPI file!

SLIDE 24

24

Compile & Build your MPI Program!

SLIDE 25

25

Running your MPI program!

SLIDE 26

26

Program Running in Console…

SLIDE 27

27

ABSTRACT

The construction and performance of computer clusters running different operating systems is studied. A platforms Windows XP cluster and a Linux ‘Beowulf’ cluster needed to be constructed to conduct a time-based analysis. Details on construction, configuration, and performance between the clusters are discussed.

INTRODUCTION

The typical Von-Neumann architecture has directed us to increase processing power via increased transistors, addressing space, and physical memory. However, a more efficient way is through message-passing between multiple processors. The concept of message-passing is to achieve parallelism through a function that explicitly transmits data from one process to another. Message Passing Interface (MPI) is simply a “library” of functions that can be called from C/C++ and FORTRAN programs. MPI programs make use of multiple processors by assigning each processor a task. Each processor works in parallel with another processor where one sends a packet of data and one receives. MPI programs are designed to operate most efficiently on multiple processors. They are used widely on Scalable Parallel Computers (SPCs) and Networks of Workstations (NOWs). A ‘cluster’ is simply a collection computers (2 or more) working in parallel to accomplish a given task. Here, two different clusters were constructed to run different sorting algorithms and sample MPI programs. For convenience, a Java applet was also developed to launch these programs. Subsequently, cluster construction and performance results are discussed.

Khoi Nguyen, Computer & Mathematical Sciences, University of Houston – Downtown Advisor: Dr. Hong Lin Fall 2004

CLUSTER CONSTRUCTION

XP Cluster 2 nodes: AMD Athlon 1.33GHz and Pentium III 850MHz w/ 512MB system memory were linked via a Router/Switch (see Figure 2).

Router/ Switch

Linux Beowulf Cluster 16 nodes: (15) Pentium II 350Mhz w/ 128MB system memory and (1) server node: Pentium 550Mhz w/ 256MB system memory. All nodes were linked via 10/100Mbps Ethernet LAN switch. A KVM switch was installed for

nly 4 nodes (See Figure 3).

SORTING ALGORITHMS

Parallel implementations of Merge-sort (O(log2n)) and Bitonic-sort (O((log2n))2/2) were used to conduct the time-based

analysis. Serial implementations were also incorporated to serve

as control variables.

XP CLUSTER CONFIGURATION

The configuration for this cluster required an older protocol – NETBeui, but the more widely used protocol today is TCP/IP. The MPICH installation was mirrored on each node, and user information and passwords must be identical, and the executable program file must be in the same location on each node. Either node could function as the server at the user’s discretion; moreover, whatever node launched the program, becomes the server. MPICH ran processes in a ‘round-robin’ fashion.

BEOWULF CLUSTER CONFIGURATION

This architecture requires the installation of a Linux distribution on each node. One node functions as a server where the user interacts directly. The rest of the nodes serve as computational slaves (see Figure 3). Fedora (latest Red Hat) was installed on each node. Remote Shell or ‘rsh’ was used for communication between server and

nodes. Each node was configured the same way – differing in IP and hostnames. The

latest MPICH distribution for UNIX was installed to each node to the same directory. Sample programs included in the distribution were tested on 8 nodes. Figure 3 Figure 2 Figure 1

Workstations

LAN Switch KVM Switch 0.00 1.14 2.28 3.42 4.56 5.70 6.84 7.98 9.12 10.26 11.40 12.54 13.68 14.82 15.96 17.10 18.24 19.38 20.52 21.66 22.80 23.94 25.08 26.22 27.36 28.50 29.64 30.78 31.92 33.06 34.20 35.34 36.48 10000 100000 1000000 10000000

ELEMENTS Time (s)

Serial Parallel

XP Cluster – Mergesort – 2 Processors

0.942 0.65 0.468 0.367 0.315 0.356 0.354 0.257 0.229 0.307 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1p 2p 3p 4p 5p 6p 7p 8p 9p 10p PROCESSORS TIME (s)

CPI Test Program - Beowulf TO BE CONTINUED…

Inconclusive and inconsistent data drawn from the XP cluster has led me to choose the Beowulf design. MPI programs are competing for resources under XP, so priority scheduling is required for running MPI programs efficiently under XP. This is a rather cumbersome and inconvenient process; moreover the Beowulf cluster offers more flexibility in

configuration. The XP cluster has been abandoned and

continued research is currently being conducted on the Beowulf cluster.

Server

REFERENCES

Pacheco, Peter. Pacheco, Peter. Parallel Programming with MPI. Parallel Programming with MPI. Morgan Kaufmann, 1997. Morgan Kaufmann, 1997. Parallel Programming with MPI. Parallel Programming with MPI. Pacheco, Peter. Pacheco, Peter.

2002. <http://
2002. <http://www.cs.usfca.edu/mpi

www.cs.usfca.edu/mpi/> />

SLIDE 28

28

External Support

NSF Major Research Instrumentation Grant

2 more clusters,

each with 16 nodes

Master server

node to distribute applications

External storage

SLIDE 29

29

2nd & 3rd Cluster Configuration

Danil Safin and Hooman Hemmati

Sharing Internet connection using Firestarter Network configuration and booting on LAN Enable remote/secure shells and allow

passwordless login

Sharing files with Network File System Install and configure MPICH2

SLIDE 30

30

Real-Time Intelligent Agent Traffic Controllers

Outline

Problem Goals Traffic model Simulation test bed Preliminary results

SLIDE 31

31

Problem

Coordinate a network of traffic lights in an inner city streets such

as downtown can be very challenging

The fact that traffic configurations change constantly One might have to wait for the light to turn green at an

intersection where there is no car on the cross street

How long to keep the light green at a busy street? Timed traffic lights Sensors

SLIDE 32

32

Goals

Develop a traffic model that can manage a

network of traffic lights efficiently by maximize the traffic flows and minimize the time of traffic flows

Construct a visual simulation test bed to verify

the our model and perform comparisons to

ther models

SLIDE 33

33

Real-Time Intelligent Agent Traffic Controller Model

Consists of two types of

agents Communication Agents -- send and receive traffic information from and to its neighbors Computation Agents -- make decision to keep or change the current traffic light based on the information from Communication Agent

SLIDE 34

34

Communication Agents

Send and receive traffic information

Number of cars passing at the intersection Number of cars waiting at the intersection Time of light has stay green Waiting time for a red light to change to green Average speed of current traffic

SLIDE 35

35

Computation Agents

Establish rules from traffic information Convert to fuzzy rules Construct fuzzy controller

SLIDE 36

36

Simulation Test Bed

Implement clusters

where each node serves as a traffic light at an intersection

Rules for moving cars

SLIDE 37

37

Simulations

City of 10 horizontal and 10 vertical streets, each with 3 lanes,

containing a total of 100 traffic lights.

Tests were run with maximum cars being 1500 (heavy traffic), 1000

(moderate) and 500 (light traffic).

The cars entered the city map randomly, with no preference for any

street or direction.

Two Traffic models were compared:

the timed traffic lights with a checkerboard pattern (no adjacent

intersections have the same light),

Simple traffic controllers. Each controller based its decision on

the amounts of cars on the two sides of the intersection, the maximum waiting time that any car can be made to wait at a red

light.

The data gathered was the mean speed of a single car and the average
f all cars' mean speed.

SLIDE 38

38

Simulations

Intelligent Agent Traffic Controllers Standard Traffic Controllers

SLIDE 39

39

Preliminary Results

Two models:

Standard traffic lights Simple intelligent agent controlled traffic lights

Measured average speed of all cars

IA traffic lights are 20% to 90% more efficient

than standard lights

SLIDE 40

40

Intelligent Traffic Control by Agents

SLIDE 41

41

Data Mining

SLIDE 42

42

Data Mining Results

Gabriel Williams: Text mining on clusters Danil Safin: HTML document preprocessing

Datamining Runtime

100 200 300 1 8 16 # of Nodes Runtime

Datamining Speedup

1 2 3 4 1 8 16 # of Nodes Speedup

SLIDE 43

43

E-Learning Agent System Design

System arranged in a hierarchical tree structure Two entry points into the system

Command line Network socket – to service web requests

SLIDE 44

44

E-Learning Agent System Design (continued)

Maintenance Agent Control Agent Student Information Agent Instructor Notification and Recommendation Agent Control Agent Control Agent Master Control Agent Link DB Student Registrar Student DB

SLIDE 45

45

The Agents Developed For This Project

Master Control Agent

Main entry point into the system

Control Agent Maintenance Agent

Instructor

Notification and Recommendation Agent

Student

Student Information Agent

Registrar

SLIDE 46

46

Implementing MPI into the Agent System

MPI provides communication mechanism for the

agents

Packs data into MPI data structure Uses MPI_Send and MPI_Recv Common for all agents MPI Limitations

Process creation limitation Eliminates 1:Many relationship between control and

information agents

SLIDE 47

47

E-Learning System Agent Code Architecture

Data Structures

JOB INPUT JOB

struct JOB{ int cmdSource; int cmdDestination; int cmdCode; int cmdResult; JOB_SOURCE ioSource; Socket* jobSocket; int messageLength; char messageInfo[MAX_STRING_SIZE]; }; struct INPUTJOB{ int jobType; int cmdCode; Socket* jobSocket; char commandText[MAX_STRING_SIZE]; };

SLIDE 48

48

E-Learning System Agent Code Architecture (continued)

Class baseAgent

Contains MPI communication and process creation

mechanisms

All agent classes inherit from this class

Agent classes

Demos for each agent class represented

WEBAGENT.EXE

CGI program used to communicate with the Agent

system from a web server

SLIDE 49

49

Client/Server Web Interface

SLIDE 50

The ultimate goal of this project is to formulate a formal system for creating multi-agent systems (MAS) so that one no longer has to rely on the use of a high level specification language. This will be accomplished by creating a gamma calculus parser and running the parser on a prototype to formulate a method for a formal system of creating multi-agent systems. As it stands, a prototype E-Learning MAS has been created and a preliminary Beliefs- Desires-Intentions (BDI) model, using argumentation based negotiation, has been created.

Multi-Agent Course-Scheduling System

SLIDE 51

Methods of implementation are as follows: 1.First it was imperative to create a model MAS to run the calculus parser on. The chosen model was an E-Learning Environment MAS. This model was built using four main agents to distribute tasks; Master- Control Agent, Student Agent, Instructor Agent, Registrar Agent. These agents will handle registration and enrollment of students and the managing of course content. 2.Second is to create advanced logic to run with these agents. The logic chosen was argumentation based negotiation. Using this with a BDI model, the agents would argue among themselves to achieve their particular goals. What is important about this model is each agent will argue for its beliefs and if other agents are coerced, they will create a compromise among themselves. 3.The third point is to create a gamma-calculus parser to run on the MAS created. This will allow data to be collected and interpreted to formulate a method for the development of a formal system of creating multi-agent systems.

Methods

SLIDE 52

Successful completion of the multi-agent system prototype has been accomplished. Using MPI, the MAS divides tasks and sends the task to be accomplished by the appropriate agent. This system runs concurrently with a server/client socket structure. The Master-Control agent handles information received from the server socket which waits for a client on the same machine to communicate. This client gathers its information through use of Apache and Java Server pages. Thus the client of this system is a simple web page in which a user enters data to be used by the MAS. The argumentation based negotiation logic with the BDI model has been preliminarily

created. The rough prototype successfully integrates desires in which the registrar

argues beliefs of classes that cannot be taught and classes that must be taught. When the system begins, the instructor argues what class it wants and the registrar responds, arguing what changes it may need to make. This is similar to what goes on with the student and registrar, in which the student has a list of desired classes and must argue to allow the registrar to accept its proposal. Current work is to make it more complex and add a visual aid to the program. All of this was done using Jason with agent-speak.

Results

SLIDE 53

SLIDE 54

SLIDE 55

55

Virus particle reconstruction from cryo-electron microscopy

Step 1: extract individual particle images from cryo-

electron micrographs or CCD images.

Step 2: determine orientations
Step 3: 3-D reconstruction. Execute Steps 2 and 3

repeatedly until convergence.

Step 4: dock atomic model into 3D density map

Z X Y

refinement

Step 1 Step 2 Step 3 Step 4

SLIDE 56

56

Challenges

Increase resolution from 20-30 Å to 5 Å.

Increase number of projections, N from

few hundreds to several thousand.

Increase the size of pixel frames from

about 1502 to 3002.

SLIDE 57

57

Computational challenges

Assuming:

2,000 projections each of size 300 × 300 pixels.

Then:

we will solve at most 3 × 3002 = 270,000 linear least squares problems each having 2000 equations and 300 unknowns.

The number of arithmetic operations:

270,000 × 2000 × 3002 = O(5 × 1013)

SLIDE 58

58

Test on UHD cluster

P3DR Runtime

500 1000 1500 1 2 4 8 16 # of Nodes Runtime

P3DR Speedup

2 4 6 8 1 2 4 8 16 # of Nodes Speedup

SLIDE 59

59

SLIDE 60

60

SLIDE 61

61