A centralised ElasticSearch service: Design ideas and status ● Motivation for a centralised ES service ● Project – Mandat – Organisation – Team ● Strategy and design – Status and resources – Issues – PaaS ● Conclusions/summary Feb. 2 2016 ES service ideas 2
Motivation ● Strategy until 12/2015: – Go and do it yourself – Basic setup description provided – No support for user owned instances ● Reasoning: – Often conflicting user requirements – Risk of labor intensive installation which cannot be covered with available man power Feb. 2 2016 ES service ideas 3
Motivation Current status: – O(20) clusters on the radar from checking foreman – Monitoring has 3 clusters for meter, timber and development hosted on O(50) machines – Many clusters use physical hardware ( because of I/O) Significant amount of resources with unknown security setup Feb. 2 2016 ES service ideas 4
Project mandate ● Offer something that can be used out of the box Setup a centralised ES ● Attractive service for service and consolidate newcomers existing installations. ● Cover as many use cases as possible ● Offer plugins like kibana “as is” Feb. 2 2016 ES service ideas 5
Project organisation ● Twiki as entry point for service managers: https://twiki.cern.ch/twiki/bin/view/IT/ElasticSearchWeb ● Agile project management – JIRA ITES project – 4-6 weeks sprints (first just ended) – Daily scrums (but for Mondays) at 9:30 in 31-S-27 – End-user documentation on gitbook: http://esdocs.web.cern.ch/esdocs/ Feb. 2 2016 ES service ideas 6
Team ● Team spans across sections and groups ● Part-time participation from – Compute and monitoring (4 people) – Databases (2 people) Feb. 2 2016 ES service ideas 7
Strategy and design ● Want to profit from existing experiences as much as possible ● Get user feedback from the beginning Feb. 2 2016 ES service ideas 8
Strategy and design ● Lots of interest from the outside – Rumors spread quickly – Official announcement went out to experiment lists asking for their requirements – Announcement in ITUM ● Received quite detailed responses already eg from ATLAS ● User meetings took place already with ATLAS, DB, CDA Feb. 2 2016 ES service ideas 9
Strategy and design ● 2 options for the implementation – Fully puppet managed and monitored – PaaS like approach ● Going for first option for now – PaaS in parallel – Existing experiences by monitoring team with Heat – Playing with upcoming OpenShift service – Involve DB on Demand staff Feb. 2 2016 ES service ideas 10
Status and resources ● Plan for an as-big-as possible shared instance – Big in terms of applications/number of users – Public data – Different QoS offerings: ● local (SSD based) ● Network attached with tunable IOPS ● Dedicated instances only for specific requirements, eg specific security constraints or redundancy Feb. 2 2016 ES service ideas 11
Status and resources ● Default setup – Support ES 2.x only – Kibana 4 given “as is” – 3 different host types: ● Search nodes ● Master nodes (redundant setup) ● Data nodes (2 different types, depending on QoS) ● Puppet setup – Currently being developed Feb. 2 2016 ES service ideas 12
Status and resources ● Resources: – Virtual machines only ● No containers ● No physical hardware – Limited amount of resources to play with for now – Pending request for more resources RQF0536509 Feb. 2 2016 ES service ideas 13
Issues to address ● Main issue: ACLs – Used by SDC, required by others as well (eg DB) – Security module available by ES – Commercial, license fees depend on the number of nodes – Free security module in Alpha version ● Started to test this ● Need to have some amount of resources available for serious testing – Current quota 25 VMs and 50 cores is not enough – Need for m1.xlarge flavor (8 cores) – Used up core quota already Feb. 2 2016 ES service ideas 14
Issues to address ● ES and Kibana versions – Many customers rely on kibana 3. We may have to support this somehow ● Flume does not yet work with ES 2.X – Patch available on github – Started testing this once the test instance is on ES 2.x ● Implementation of QoS on shared instance – To be done – Waiting for SSD based resources Feb. 2 2016 ES service ideas 15
Time lines ● Aiming at initial test instance by March ● Several customers are keen to start testing – ATLAS needs to move out of their ES resources within 2months – Agreed with them to give them access to a non-prod testing instance by March – SDC is keen to move over asap – DB and CDA are ready to test – Batch monitoring is a good candidate as well Feb. 2 2016 ES service ideas 16
PaaS ● Checked out how Amazon does it ● Users would own their instances but IT owns the resources – Create instances giving only few needed parameters – Shared instance could be run as PaaS by us ● Several candidates on the marked – Started to play with OpenShift ● Upcoming service which will be used for gitlab, jenkings ,... ● Very few steps needed to get an ES instance up – Other options on the market ● Heat, cloudify, ... Feb. 2 2016 ES service ideas 17
Collaboration with external people ● ES meetup at CERN – Took place on 8/2/2016 at CERN – Close collaboration with ATLAS in organising this event ● Next: ElasticON – 17-19 Feb. 2016, San Francisco – Pablo will represent us there Feb. 2 2016 ES service ideas 18
Conclusions ● An ElasticSearch centralised service is being setup – Consolidation of existing instances is overdue – Lots of interest from IT and the experiments ● Aiming at first test instances in Q2 2016 ● Aiming at a production ready service by Q4 2016 ● Looking also into a PaaS options Feb. 2 2016 ES service ideas 19
Recommend
More recommend