From Zero to Useless to Hero Make Runtime Data Useful in Teams Robert Hoffmann @robhoffmax Florian Lautenschlager @flolaut FOSDEM 2020
Contact us if you want. =) Dr. Florian Lautenschlager Robert Hoffmann Software Architect Lead Architect VPaaS {name.surname}@telekom.de {name.surname}@qaware.de
“Hallo Magenta” Building a European Voice Assistant Platform 3
From Zero to 1 international co-development > 900 collaborators > 500 active git repos > 100 services 4
T=Zero
Complex Architecture. Complex Software System. Complex Analysis. Skills Voice Services … Weather > d o e <Pod> P c < i <Pod> v Admin r e Service S https http Admin Radio <Pod> Service Gateway Storage API Proxy External Services Device Services https IDM <Pod> http SQL Service <Pod> Admin Databases API <Pod> > d Service Gateway o e … P c < i v r e S API NoSQL Databases CDN Kubernetes Some Cloud 6
Advanced toolchain needed. Standard used. Metric Textual Span NEW: NEW: NEW: sampling-based event-based event-based Grafana Humio Jaeger Cloud Exploration Storage, Storage, Exploration Exploration Probes, Collection Probes and and Transport Collection Storage 7
Generic-Standard-Runtime-Data-Smarthub-Service-Data-Model TraceId in Standard every Readiness and Response Liveness checks with Metrics Tracing Logging Best- Concept Practices Standard Standard Database Metrics for Metrics and incoming and specific ones. outgoing Requests 9
“Done. This solves all our problems. They will it!“ – We, the ignorant ones
Our team: Colorful. Mobile Developers Platform Tester Developers Skill Production Developers Management Data Operation Scientists Heros First Level Support 11
Our solution: Monochrome. Mobile Developers Platform Tester Developers Skill Production Developers Management Data Operation Scientists Heros First Level Support 12
T=Useless Because we are monochrome
Nobody wants to be a Beginner. Optimize for Intermediate. Intermediate Beginner Expert Toolchain About Face - Alan Cooper 14
What we did to move our solution from expert to intermediate. Useful = Utility + Usability. 🤕 Utility: whether it provides the features you need. ✅ You can find all the information... Usability: how easy & pleasant these features are to use: Learnability, Efficiency, Memorability, Error Handling, Satisfaction. ❌ ... if you really know how and where to look (as an Expert). Usability 101 - Jakob Nielsen https://www.nngroup.com/articles/usability-101-introduction-to-usability/ 15
Developer-, Tester- & Operations-oriented Close Gaps: Link data and tools as much as possible. Dashboards with links to logs and e2e test runs 16
Developer- oriented Close Gaps: Link data and tools as much as possible. Pipeline UI - promote software and get runtime data 17
Developer- oriented Close Gaps: Link data and tools as much as possible. Pipeline dashboards with logs , traces 18
Developer-, Tester-, & Operations-oriented Close Gaps: Link data and tools as much as possible. Gangway landing page to access k8s , logs , traces , metrics 19
First-Level- & Operations-oriented Make functional use: First-level support integration. First Level Customer Support GDPR-aware debugging in production : Token-based user-specific debug logging and tracing 20
Developer-, Tester- First-Level-, & Operations-oriented Make functional use: Resolving Tickets more easily. Referencing Trace IDs as a common base to discuss and find relevant data 21
Everybody-oriented Lower the access hurdle: CLI & Chatbot integration. Any project member has easy access. Just open your chat. Anyone can learn by example. See how others use the service. Support in case of an error. By others or technical: • Trace : Request Trace • Logs : Request Application log 22
Everybody-oriented Changes in the culture that we have recognized. Visibility and Increased Trust : Toolchain acts as a safety-net as it shows the runtime behavior. People can be sure to understand their services, e.g. in case of an error. Self-Awareness: Accept and understand that software has a runtime behavior. Not all developers feel comfortable with dynamic analysis, but now they have means to see and understand. Clear Communication : Inner & cross-team communication is easier. Different people can easily share the same context, e.g. trace-Id, log messages, request flow. Error Culture: Failures are more easily accepted. As the software system is visible and the cross-team communication is clear, people tend to accept failures and work together on solutions. Ownership : Increased acceptance is the foundation for end-to-end responsibility. Due the disability and increased trust, clear communication and error culture, people are more inclined to take ownership for their services. 23
T=Hero Because we are a little bit colorful
Select Integrate Toolchain & them into Standardize everyday Metrics, Logs, tools & Traces Processes Link and Tools your team combine them as far as possible Start here
Recommend
More recommend