Building a large scale SaaS app Open Source, Storage and Scalability - PowerPoint PPT Presentation

Building a large scale SaaS app Open Source, Storage and Scalability Dan Hanley, CTO http://www.magus.co.uk 14 March, 2008 1

Agenda Who are Magus? � What do we do? � Who do we do it for? How do we do it? � SOA � Scalability � Storage � F/OSS 2

The Magus proposition • Leading provider of innovative web-content engineering solutions to global corporations g g g p • Specialise in managed applications that help clients build value from their online assets and from clients build value from their online assets and from the wider web • Three main applications: Three main applications: � ActiveStandards � RemoteSearch RemoteSearch � CrucialInformation • Delivering solutions since 1995

Our managed applications Delivering Software-as-a-service (ASP model) ActiveStandards designed to help companies stay on-brand, on-line � by tracking and managing corporate web standards compliance, worldwide RemoteSearch a multi-site search engine providing integrated search RemoteSearch a multi site search engine, providing integrated search � � frameworks for enterprise websites CrucialInformation a premium current awareness service delivering C i lI f ti i t i d li i � high-quality, strategic intelligence from the web and syndicated services 4

5 ActiveStandards

6 RemoteSearch

7 CrucialInformation

8 Social Networking

9 Our clients

Technically - where we were 1 product • Web design business W b d i b i • All home grown • No appservers • No failover No failover • No common infrastructure infrastructure • Scalability worries • No version control No ersion control • Unclear methodology 10

Technically – where we are now • 3 main applications pp • Bespoke capability • Common • Common infrastructure • Platform of services • Platform of services • Fault tolerant • Scalable • Defined process & methodology 11

Approach • Do a lot with a little – 35 people, punching p p , p g above our weight • Don't reinvent the wheel • Extract commonality – keep it DRY 12

The components of the stack • Trawl • Routing • Harvest Harvest • Store Store • Index • Quartz • Search Search • ClientEngine ClientEngine • Analysis • Profile • Monitor • LinkChecker 13

14 REST (not SOAP) Logical architecture

Trawl • Responsible for managing the gathering of data in its raw form into the Store. • Currently have Trawlers for: � HTTP � FTP (several flavors) � RSS, Atom etc RSS A � SMTP � Google G l � Technorati � Moreover M � FT (several flavors) 15

Trawler service Pluggable architecture based on JMX Mbean service 16

Harvest • Responsible for extracting explicit data from Links and storing the fielded data in the database, and the d t i th fi ld d d t i th d t b d th non fielded data in the Store. 17

Harvest service Pluggable architecture based on JMX Mbean service 18

Index • Responsible for building, purging, maintaining indices. 19

Search • Responsible for searching indices and delivering results. 20

Analysis • Responsible for deriving scores for information implicit in the page � Sentiment � Sentiment � Readability � Language detection etc g g 21

Monitor − Badly named, should be called “Classifier” − Responsible for creating filings between Links and Categories. − A Link can be a bookmark, news item, blog article etc. A Li k b b k k i bl i l − A Category can be Users Bookmarks, News Topic, an AST Guideline etc. 22

Classifier (monitor) service Pluggable architecture based on JMX Mbean service 23

LinkChecker • Responsible for checking the life of links and removing them correctly from the system when they have expired from the system when they have expired. 24

Routing • Manages the workflow of jobs through the stack • Has the capability to dynamically loadbalance workloads Has the capability to dynamically loadbalance workloads. 25

Content stores � We needed a multiple terabyte (currently 24 TB) distributed, fail safe, filesystem f fil � NFS was crumbling under load � ZFS was vapourware � ZFS was vapourware � Lustre was too complex � We built our own! � Magus Contentstores, responsible for holding both the raw and processed non fielded content of links which have been trawled and harvested and harvested 26

Content stores - configuration <mbean code="uk.co.magus.store.service.StoreService" name="magus.service.store:service=StoreServiceLocalCalls"> <attribute name="JndiName">magus/services/StoreServiceLocalCalls</attribute> <attribute name="Config"> <TryEachStripeStore> <List> <MirrorStore> <List> <List> <RemoteStore>nas:1299;StoreServiceRemoteCallsInvokeTarget</RemoteStore> <RemoteStore>m4:1099;StoreServiceRemoteCallsInvokeTarget</RemoteStore> </List> </MirrorStore> <MirrorStore> <List> <RemoteStore>nas:1199;StoreServiceRemoteCallsInvokeTarget</RemoteStore> <RemoteStore>m5:1099;StoreServiceRemoteCallsInvokeTarget</RemoteStore> </List> </List> </MirrorStore> </List> </TryEachStripeStore> </attribute> <depends>jboss:service=Naming</depends> </mbean> 27

29 Store Interfaces

30 Store JMX Beans

Contentstore - engines Can use many types of engine on a node Currently supports: Currently supports: � Mysql � SleepyCat SleepyCat � Filesystem These can be decorated to enhance functionality

32 Content Store Classes

Quartz • Responsible for firing messages on time. • The “heartbeat” of the stack. 33

Client Engine • Responsible for stack based processing for Client A Applications. li ti • Keeps “heavy lifting” out of the Web Tier. • Coordinates Client Applications requests across multiple stack services. 34

Management Application � Manage taxonomy g y � Manage rules � Manage scheduling � Manage scheduling � Focus on managing the business � Leave service management to JMX or web L i t t JMX b consoles � Swing 35

Management App

Profile • An internal service used to collect metrics on system wide performance t id f 40

42 Infrastructure architecture t hit

Methodology • Agility – sprints g y p • Issue tracking – Jira • Issue tracking – Jira • Regular, scheduled, deployments R l h d l d d l t • Consolidated build & version control 43

Deployment Deployment 1. C heck out Subversion (Code repository ) 2. C ode / 2. C ode / Local Test 4. auto C heck out D eveloper 3. C heck In Developer Local Box 6 & 12 N otify 5 . Build / U nit Tests / Metrics 7 . Publish results Bamboo 8. D eploy 10. D eploy D ependencies 9 & 11 . FIT Tests D ev C luster 13 . Prepare R elease N ote P a y to $ R elease N ote R elease N ote 14. Get R elease N ote 15 . R eject R elease Granite TL 16. Get Application Artifacts Product Ow ner / D ev TL 17 . Manage Test & Production Environments 18. D eploy Applications Test C luster Stress Test 19 . D eploy Applications Jboss ON Production 44

Throughput • 11,000 sources in system , y • ~16 000 000 pages rolling store • 16,000,000 pages rolling store • ~200,000 new pages per day 200 000 d • Average < 2 minutes from page detection to fully classified and indexed. 45

Cost comparisons • Apples and oranges? pp g Proprietary Licence Free Product Per CPU CPUs Total Product Per CPU CPUs Total O Oracle l 20 000 00 20,000.00 10 10 $200 000 $200,000 M S l MySql $0.00 $0 00 10 10 $0 00 $0.00 Weblogic AS 10,000.00 38 $380,000 Jboss AS $0.00 38 $0.00 MS Windows Server 3,919.00 48 $188,112 Redhat/Apa $0.00 48 $0.00 Visual Team Studio 1,000.00 12 $12,000 Eclipse $0.00 12 $0.00 ClearCase 4,125.00 1 $4,125 Subversion $0.00 1 $0.00 Jira Jira 2,000.00 2 000 00 1 1 $2 000 $2,000 Trac Trac $0 00 $0.00 1 1 $0 00 $0.00 Autonomy IDOL bundl 75,000.00 2 $150,000 Carrot2 $0.00 12 $0.00 IBM Intelligent Datami 132,000.00 1 $132,000 LingPipe $0.00 12 $0.00 Verity K2 50,000.00 2 $100,000 Lucene $0.00 8 $0.00 UIMA $0.00 12 $0.00 $1,068,237 $0.00 £580,531.26 €849,629.77 46

47 Questions? Questions? Thank you

Building a large scale SaaS app Open Source, Storage and Scalability - PowerPoint PPT Presentation

Building a large scale SaaS app Open Source, Storage and Scalability Dan Hanley, CTO http://www.magus.co.uk 14 March, 2008 1 Agenda Who are Magus? What do we do? Who do we do it for? How do we do it? SOA Scalability

App App App App App App App App App App App App App App App App App App App App App App

Horizontal Vertically integrated Open interfaces Closed, proprietary Rapid innovation Slow

A Practical Road to SaaS' in Python Flask Sentry Hi, I'm Armin ... and I do Open Source,

Sefos A self-aware factored operating system A Traditional OS App 1 App 2 App 3 System call

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

SeeTest Quality Assurance platform SaaS Digital Assurance Lab SaaS Digital Assurance Lab Access

University of Arkansas at Little Rock SaaS Migration Debrief Project SaaS : Goals and Vision

E NTE RPRISE SOFTWARE AS A SE RVICE (SaaS SaaS] Anthony Abrahams GM Finance, PCYC

15 Key Details Every Tax and Accounting Firm Should Know About SaaS How To Put SaaS (Cloud

WI-FI SMART APPLICATION Download the app Use this QR code tho download Wifi Smart app for OS

MOBILE APP PDF Client Presentation FREE APP FOR ALL The ultimate Mobile App for Fans, Drivers

MOBILE APP TUTORIAL TPL Trakker Mobile App How to Download the Mobile App? Step 1: Tap on

Todays Agenda: Discuss College Application Process: Common App, Coalition App, School App

Communis Smart App is an app that is both revolutionary and convenient to use. It is an app that

Google App Engine Guido van Rossum Stanford EE380 Colloquium, Nov 5, 2008 Google App Engine

Fair Credit Reporting Act: Litigation, Regulatory and Enforcement Developments in the Financial

Innovating Healthcare: Surprising Business Opportunities in a Rapidly Aging Global Population

9 2 % o f b r a n d s believe that they can do more with POS data! INTRODUCTIONS Eric Green

GERMAN WEBINAR MAY 2020 Information Classification: Restricted ON TODAYS CALL DR. CHRISTIAN

The Future of Radio: the next phase Ed Richards, Chief Executive Peter Davies, Director of Radio

Development Made Simple Danielle Davison, Davison Advisory June 2020 Why am I here, what do I

Denial-of-Service (DoS), continued CS 161: Computer Security Prof. David Wagner April 4, 2016

TCP SYN Flood Mitigation Techniques Julian Villing Friday 25 th January, 2019 Chair of Network