- Martin Fowler's definition for micro service: small; independently deployable; automated deployment
- Comare to monolithic application, microservice sacrifies on consistency, and increased deployment complexity to gain a more distributed architecture; with strong separation and support different styles in each module; In a word: gaining Partition by sacrificing Consistency;
- Using REST API, each micro service can choose its own technical stack, instead of forcing all groups using the same tech stack.
- In monolithic application, separation is done on module level, but sharing the same code repo, the boundaries of modules are blurred, it depends on manually defined standards to ensure separations; but often can't be guaranteed , especially as time goes, the cross boundary codes would become more and more unmanageable ; In micro service, the separation is done in application layer, the cost for cross-boundary is significantly higher, thus less likely to happen, and ensure the separations;
- In a distributed architecture, can't use DB to ensure event consistency, but instead need using message middleware or some event driven mechanism to ensure eventual consistency;
- As more and more apps deployed in micro-service, one release can contain multiple apps, and each app may have different tech stack, it has higher requirement for DevOps and automation, the boundary of development and operation is further blurred;
- One strategy to make the transition from monolithic to micro-service is to take baby steps:
- minimize the intrusion into existing application;
- using main stream micro-service architecture;
- focusing on those services must need the micro-service transition
- For example, this company pick 3 modules for such strategy:
- Registry Center: using consul cluster for all service registration, then using Consul Template to refresh Nginx configurations to implement load-balance;
- Configuration Center: Using self-developed "Matrix" system to refresh configurations through customized plugins, thus minimize intrusion into existing applications;
- Authorization Center: Based on Spring Security OAuth, and support SSO (based on WeChat Enterprise Account)
- Based on Spring Boot, compare to traditional WebApplication based on TomCat/JBoss/WebLogic which is using XML ,App Containers and .war package, Sprint boot application can be run using "java -jar ...", when Sprint Boot starts, it has a built-in Web container (such as with TomCat inside), then load all configurations, class, Beans in the application. It wraps web container inside the apps, thus Spring Boot is suitable for micro-service development;
- Compare to Dubbo and Spring Cloud:
- Dubbo is open-source project from Alibaba, started in 2011, using EJB standards to develop Spring Bean, it can be done but very hard to upgrade and maintain, the last upgrade is still 10/2014, the supported Spring version is 2.5.6.SEC03, but now Spring 5 is about to release;
- Spring Cloud is the best micro-service architecture in JAVA community, it also uses Spring boot as the foundation, if following their quick start guide, you can deploy a micro-service demo in no time, however it is more suitable if you are re-wirting you applications or start new projects, not for transition your existing applications into micro-service architecture, because it is very hard to integrate;
- Registry Center:
- It is one of the most important component in micro-service architecture; the nature is to de-couple the service providers with the service consumers
- For any micro service, it should have more than one service providers due to the distributed architecture; what is more, to support elastic competing, the number of the service providers and their distributions are dynamic, and can't be pre-decided; Thus the static load-balance method is not suitable anymore; it needs a separate component to manage the service discovery and service registration, this is the task for Registry center;
- There are 3 examples of the architecture for a Registry Center:
- Inside application: Such as Netflix Eureka, it integrates the service discovery and registration inside the application, an app can do this task in itself; Another example is to use ZooKeeper or Etcd to implement a mechanism for service registry, often seen in big enterprise environment;
- Outside application: Such as Airbnb SmartStack, it treats applications as blackbox, it uses some method outside application to register a service in Registry;
- DNS based: Such as SkyDNS, using SRV record in DNS, it can also be deemed as one of the "Outside application" category; Due to the DNS cache limitation, not suitable for complex micro-service Registry;
- From DevOps perspective, there are 5 points need be considered:
- Liveness check: after service registration, how to test service liveness to ensure service usability;
- Load-balance: when multiple service providers exist, how to load balance among the service providers;
- Integration: how to integrate into registration center on service provider end and on API calling end;
- Dependency-to-Environment: when introduce registration center, what impacts it has to the running environment;
- Availability: how to ensure high availability of the registration center itself, especially on how to get rid of single point of failure;
- Some comparison on Eureka, SmartStack and Consul:
- Eureka case study: There are clear boundary definitions for provider, consumer and registration center, it gets rid of central bus to make sure high availability of distributed cluster; but the cons are it intrudes inside applications, and only supports Java applications;
- SmartStack case study: it is the most complicated architect among the three, it uses ZooKeeper, HAProxy, Nerve and Synapse, thus has high requirements for DevOps, since it is outside application, thus adaptable to any applications;
- Consul is outside application registration in nature, yet you can integrate with Consul SKD and local Agent to simplify the registration process;When service is running inside container, it can use Registration to implement self-registration; Service discovery on provider side replies on SDK, but you can also use Consul Template to eliminate such SDK dependencies;
- Case study: picked Consul as the architect for Registration Center, two major considerations:
- Minimize the intrusion into existing applications; this is one of the key principle strategy in this project;
- Reduce complexities, by using Registration and Consul Template to implement service self-discovery and self-registration
- Configuration Center: From PaaS to a small caching architecture, all need depend on specific configurations to provide services, the same is true for micro-service;
- Some methods to categorize configurations:
- By configuration sources: such as by source-code, by files, by databases or by remote API calls etc;
- By running environments: such as dev env, test env, pre-release env, and production env etc;
- By integration phases: such as by compilation time, packing time or running time;
- By compilation time: such as inside source code configurations, or commit both the configuration files and source code into code repo;
- By packaging time: package the configuration time into application package;
- By running time: don't know the specific configuration until after app starts, get configurations from local or remote source, then start the application
- By load methods: such as only load once or dynamic loading configurations;
- At the beginning, configuration and source code are put together in git repo, then for security concerns, configuration data is separated and put on CI servers or using packaging tools to integrate into app packages, or direct put under specific folders on servers; as complexity increases, above methods couldn't keep up, thus need dedicated configuration center;
- To pick an Arch types, need consider following points:
- Security, config data shouldn't be in source code, thus ensure security under prod;
- environment need be separate, such as dev/test/qa/prod, config data need be separated;
- If under the same environment, all config data need be consistent for all services;
- Need remote control config data in distributed environment, basically manageable;
- Some examples:
- Spring Cloud Config: specific for Spring, env is mapped to profile, app version maps to label, on server side, using git/Filesystem/Vault to store and manage config data, on client side, using Spring config , load config at run time;
- Disconf: developed by Baidu engineer, on server side, has GUI interface to manage all running env, app and config files; on client side, using Spring through Spring AOP to auto load and refresh config data
- In this case, since client side requires Spring, using self-developed "Matrix", which adaptable for non-Spring services, it has 3 special capabilities:
- Separate config files and config items, for config files, using plug-in (SBT, Maven, Gradle), and through packing tools to minimize intrusion to CI system; For config items, provide SDK, retrieve item from server and provide caching mechanism;
- Increase dimensions of an app, thus for the sam app can be deployed to different server version or app version and still managing app config data;
- Config data is version controlled, like Git, can roll back to previous history version
- Authentication Center, it has authentication and authorization; 3 common architectures:
- Simple Auth: Not provided by service, but rather using environment to auth, such as IP white list, domain name etc; Used in simple apps;
- Protocol Auth: Service provide Key pairs, to call the service client will use the pub key to generate auth header (with ID of the calling client), service will authenticate on such headers; Used in C/S type applications; (such Amazon S3)
- Central Auth: with the introduction of Authentication Center, when client calling for a service, need ask an authorization token from Auth Center, then submit request together with the token, when service side receive such request, it first ask Auth center to retrieve the ID of the client using the token provided, then decide if it can authorize access; It separate the Auth function from service, and it can persist while app is going through upgrade, Auth center just need refresh Auth rules to adapt;
- About OAuth:
- A: Authorization Request: client -> resource owner
- B: Authorization Grant: resource owner -> client
- C: Authorization Grant: client -> Authorization Server
- D: Access Token: Authorization Server -> Client
- E: Access Token: Client -> Resource Server
- F: Protected Resources: Resource Server -> client
- In Micro-service, service provider is the combination of resource owner and authorization server
- OAuth 2.0 has Beared Token, client store IDs into Access Token, resource server can authorize the Access token, thus reducing the steps of A-F to 4 steps;
- Two main format of Beared Token: SAML and JWT: SAML is based on XML, JWT is based on JSON; in Micro-service, since JSON is widely used, thus JWT is more common;
- Some Oath arch:
- CAS
- Apache Oltu
- Spring Security OAuth
- OAuth-Apis
- CAS and Spring Security OAuth is better on integration cost and extensibility;
- In this case, it uses Spring Security OAuth, using private cert, Scope authorization and domain checks
Saturday, October 14, 2017
Case study note for micro service
Friday, September 22, 2017
Study note on RunDeck
- Opensource tool to help automate repetitive tasks; help orgnized commands into workflow and run it agaist pre-configured list of hosts, using ssh (key based login) to remote execute these workflows;
- http://rundeck.org for document, admin guide and videos;
- Java-Json based application, run from Rundeck server or admin node, ssh trust established between rundeck servers to remote hosts; Can create project/jobs in Web GUI or CLI; the nodes(hosts) are in resources.xml in each project; RunDeck stores job definition and history on configured database,
- Some basic terminology:
- Job: sequence of steps to run, a.k.a workflow
- Node: remote host
- Project: logical group of jobs/nodes to orgnize work better;
- Resources : collection of nodes
- Resource Tagging in resources.xml:
- best practice to assign tags to nodes, thus can group them dynamically;
- For example:
- <node name="dc1sittom01" description="DC1 SIT Envrionment TomCat server01" tags="ALL,DC1,SIT,TOMCAT,TEST" hostname="dc1sittom01" osArch"amd64" osFamily="unix" osName="Linux">
- Above TAGS described the different groups this host could be in, thus a job could be target all hosts in those particular TAGs (or combination of TAGs)
- Filter: for example: tags:TOMCAT !hostname: dc1prodtom01 *!tags:UAT
- Jobs/workflows can be used repeatedly to help automation routine work
Thursday, September 21, 2017
Study notes for a K8S cloud deployment case study
- Architecture:
- Three K8s Master: Master1, Master2, Master3, each Master has:
- docker -> kubelete -> Kube-apiserver <- kube-scheduler - kube-controller-manager
- each K8s Node in the cluster has:
- flannel <- docker -> kube-proxy kubelet
- etcd cluster with 3 nodes for key/value store
- load balncer : docker -> haproxy + keeplived <- kubectl, clients, etc
- How above components connected together?
- 1. all apiserver rely on etcd servers, --etcd_servers=<etcd-server-url>
- 2. controller-manager, scheduler all rely on apiserver, --master=<api-server-url>
- 3. kubelet on each node relies on apiserver, --api-servers=<api-server-url>
- 4. kube-proxy on each node relies on apiserver, --master=<api-server-url>
- 5. all apiserver are behind haproxy + keeplived load balancer pod
- As shown above, all K8s components are containers, easy to deploy and upgrade, use: docker run --restart=always --name xxx_component xxx_component_image:xxx_version, --restart=always to ensure self-healing;
- to avoid delays during docker restart (which is caused by docker pull), pre-download the mirror/images before the upgrade
- Ansible used for configruation management; such as:
- ansible-playbook --limit=etcds -i hosts/cd-prod k8s.yaml --tags etcd; (just update the version info)
- An example template file:
- [etcds]
- 10.10.100.191
- 10.10.100.192
- 10.10.100.193
- [etcds:vars]
- etcd_version=3.0.3
- etcd3 used, etcd cluster ensures redundancy; data storage need be redudant as well;
- k8s version: 1.6.6, three master servers are in a Static Pod, kubelet is responsible for monitoring and auto-restart; all run in docker containers;
- haporxy and keeplived provides VIP for kube-apiserver to ensure HA; haproxy is for load balance, keeplived is responsible for monitoring haproxy and ensure its HA; client access apiserver using <VIP>:<Port>'
- for kube-controller-manager and kube-scheduler HA, leader election is important; in start parameters set: leader-elect=true
- Ansible playbooks will be used for kubectl commands, such as drain, cordon, undordon;
- A ansible playbook code example:
- docker run --restart=always -d \
- -v /var/etcd/data:/var/etcd/data \
- -v /etc/localtime:/etc/localtime \
- -p 4001:4001 -p 2380:2380 -p 2379:2379 \
- --name etcd 10.10.190.190:10500/root/etcd:{{ etcd_version }} \
- /usr/local/bin/etcd \
- -name {{ ansible_eth0.ipv4.address }} \
- -advertise-client -urls http://{{ ansible_eth0.ipv4.address }}:2379,http://{{ ansible_eth0.ipv4.address }}:4001 \
- -listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
- -initial-advertise-peer-urls http://{{ ansible_eth0.ipv4.address }}:2380 \
- -listen-peer-urls http://0.0.0.0:2380 \
- -initial-cluster {% for item in groups['etcds'] %}{{ item }}=http://{{ item }}:2380{% if not loop.last %},{% endif %}{% endfor %} \
- -initial-cluster-state new
- Storage backend is using ceph rbd, provide stateful service and docker-registry
- When a pod is down or moves to other node; storage need be persisten to provide stateful service;
- K8s has many options to provide Persistent Volume: pd in GCE, ebs in AWS, ClusterFS etc.
- ceph-common, kubelet, kube-controller-manager containers all have following 3 volumes:
- --volume=/sbin/mobprobe:/sbin/mobprobe:ro \
- --volume=/libmodules:/lib/modules:ro \
- --volume=/dev:/dev:ro
- rbd support dynamic provision, single-write, multi-read, but not multi-write; GlusterFS can support multi-write, not in use for this case yet;
- docker-registry used swift as backend, to improve push/pull efficiency, used Redis to cache metadata, all provided as containers using official docker images; for example:
- sudo docker run -d --restart=always -p 10500:10500 --volume=/etc/docker-distribution/registry/config.yml:/etc/docker-distribution/registry/config.yml --name registry registry:2.6.1 /etc/docker-distribution/registry/config.yml
- config.yml is based on https://gibhub.com/docker/docker.github.io/blob/master/registry/deploying.md
- Using Harbor to provide HA for docker-registry, running in another pod on K8S cluster, mirrored data and harbor-db all through Ceph PV mount, thus if a Harbor node down or Pod is down, it ensure HA for Harbor, thus won't need Swift any more;
- PV and StorageClass are limited to single Namespace, thus can't support using namespace to provide dynamic provision in multi-tenant environment yet;
- Network backend is using Flannel, some switched to OVS;
- Flannel support multi-node communication within a Pod; but can't support sepreation of multi-tenant; also not good at network rate-limit for Pod;
- thus custom built K8S-OVS components to implement these features, it uses Open VSwitch to provide SDN for K8S, it is based on the priciple of Openshift SDN, since SDN in Openshift is tightly integrated with Openshift and can't be used as seperate plug-ins as Flannel or Calico for K8S, thus custom built the K8S-OV plug-ins; it has similar functions as Openshift SDN, and can serve K8S as plug-ins;
- K8S-OVS support single-tenant and multi-tenant, it implemented following features:
- Single Tenant: use OpenvSwitch + VxLAN to make Pods on K8S a big L2 network, thus enable Pod to Pod communication;
- Multi-Tenant: Also use OpenvSwitch + VxLan to consist L2 network for Pods, it can also use Namespace to seperate tenant network, a pod in one namespace can not access the pod/service in another namespace;
- Multi-Tenant: Can config Namespace so that the pod in that namespace can communicate with pods/services in any other Namespaces;
- Multi-Tenant: Also can seperate the joined namespace as mentioned above;
- Multi-Tenant and Single Tenant: both support flow control, pods on the same node can share network bandwidth, avoid "noise-neighbour" issue;
- Multi-Tenant and Single Tenant: both support load balance for external network;
- Join means allow pod/service communications between two tenants network; seperation means for the two joined tenant network, it can reverse the operation and seperate the network back to two tenant network; Public network means allow a tenant network communicate with all other tenants network;
- Differnt tenant network has different VNI (from VxLAN), K8S-OVS store the VNI relations in etcd, for example:
- etcdctl ls /k8s.ovs.com/ovs/network/k8ssdn/netnamespaces (helloworld1, helloworld2 )
- etcdctl get /k8s.ovs.com/ovs/network/k8ssdn/netnamespaces/hellowworld1
- {"NetName":"hellowworld1","NetID":300900,"Action":"","Namespace":""}
- etctctl get /k8s.ovs.com/ovs/network/k8ssdn/netnamespaces/hellowworld2
- {"NetName":"hellowworld2","NetID":4000852,"Action":"","Namespace":""}
- etcdctl update /k8s.ovs.com/ovs/network/k8ssdn/netnamespaces/helloworld1 \ '{"NetName":"helloworld1","NetID":300900,"Action":"join","Namespace":"helloworld2"}'
- etcdctl get /k8s.ovs.com/ovs/network/k8ssdn/netnamespaces/helloworld1
- Will get same VNI as hellowworld2: 4000852 after above join action
- In application, use LVS for L4 load balance, Use Nginx+Ingress Controller for L7 load balance on Ingress; abondoned kube-proxy/serives in this case;
- CI/CD solution architecture:
- Use Opads (developed using PHP) for front-end, Pluto (provide RestAPI, interface with K8S apiserver ) for back-end, Supporting 400 different applications, about 10,000 containers
- CI/CD pipeline is like following:
- developer login -> OPADS { code server (gitlab/gerrit) -> sonar, autotest, benchmark, Jenkins/CI Runner -- push --> Docker Registry } --pull --> PLUTO { deploy (call api) } --> K8S { sit-cluster, uat-cluster, prod-cluster }
- app images (Such as Tomcat, PHP, Java, NodeJS etc.) with version number, mount source code to container directory, or use dockerfile ONBUILD to load app codes
- mount: easy to use, no need re-construct, fast, but dependencies high, if app code changes while base images unchanged, build fail;
- ONBUILD: can solve dependencies better, rollback versions are easy, but need re-construct every time, slow; Need choose based on use cases;
- If can't find base images, dev will submit JIRA to ask DEVOPS team build the image;
- Select code branch and version to deploy, for differnt environment (sit,uat,prod), can see number of Pod copies, uptim, name, create time, node details on K8S, node selector etc. Or use Gotty web console to see container status;
- Elacity / HPA, load balance, blue/green deployment; and Sonar for code quality, test modules, benchmark modules etc; All are components in a CI/CD PAAS;
- Monitoring and Alert solutions used
- Status about K8S: such as: Docker, Etcd, Flannel/OVS etc;
- System performance, such as cpu, memory, disk, network, filesystem, processes etc;
- application status: rc/rs/deployment, Pod, Service etc;
- Used custom shell scripts, start using crond to monitor these components;
- For containers, used Heapster + Influxdb +Grafana
- Docker -> Heapster -- sink --> influxdb (on top of docker as well) --> grafana -- alert;
- Each K8S node: flannel <-- docker <-- cAdvisor <-- kubelet -> heapster
- Heapster --get node list --> K8s Master kube-apiserver (which sits in the docker ->kubelete -> kube-apiserver <-- kube-scheduler , kube-controller-manager )
- Each node use kubelet calling cAdvisor API to collect container information (resource, performance etc.), thus both node and container information are collected;
- information can be tagged, then aggregated and send to sink in influxdb, use Grafana visualize data;
- Heapster need use --source to point to Master URL, --sink point to influxdb, --metric_resolution for intervals such as 30s (seconds)
- Backend storage for Heapster has two types: metricSink, and influxdbSink.
- MetricSink stored in local memory for metrics data; created by default, may consume large amount of memory; Heapster API get data from here;
- InfluxDB is where the data are stored, newer version of Heapster can point to multiple influxDB;
- Grafana can use regular expression to check application status, cpu,mem,disk,network,FS status, or sort and search;
- When crossing defined threshold, Grafana can do simple alert throuh email, but this case created monitoring point, warning alert policy in Grafana and integrate them with Zabbix, Zabbix will spit out those warning/alerts;
- Container logs have:
- K8S componetn logs;
- system resource useage logs,
- container running logs
- Another solution which got abondoned is: use Flume to collect logs, Flume run in pod, configure source (app log folder), channel and sink to Hippo/Kafka,
- When need check logs, login to Hippo, cubersome;
- Each application need a seperate Flume pod; a waste of resource;
- Hippo is not on containers in this case and shared by other non-container infrastructure, slow during peak usage time;
- Now switched to Fluentd + Kafka +ES + customized GUI
- Kafaka cluster has topic -- fluentd-es --> Elastic Search cluster (elastic search) --> Log API Engine --> Log GUI (history/current/download/search/other logs)
- Each K8S node : flanner --> docker --> kubelet, kube-proxy, --> /var/log, /var/lib/docker/containers --> Fluentd -- fluentd-kafaka --> Kafaka cluster (topic)
Wednesday, September 20, 2017
Study notes for Splunk Administration Part I.
- Example deployement:
- 1 searchhead server, 16 core CPU, 16GB memory;
- 8 index servers, 8 core CPU, 24GB memory each;
- 1 deployment server: 8 core CPU, 32GB memory;
- 2 syslog servers: 2 core CPU, 12GB memory each;
- 1 job server: 8 core CPU, 16GB memory;
- File system layout example:
- /opt/splunk 10GB local filesystem, for splunk binaries;
- /usr/local/splunk 5TB for splunk data (hot/cold/warm), not shared;
- /usr/local/splunk/frozen 5TB NFS volume for archived (frozen) data, shared;
- Example Splunk package names:
- splunk
- splunkapp-pdf
- splunk-unix-forwarder
- splunkapp-unix-splunk
- splunkforwarder
- Splunk ports needed:
- 443: splunk indexers web;
- 8089:
- splunk search head - splunk indexer;
- Splunk deployment server;
- Splunk License server
- 9997: Splunk forwarder - splunk indexer
- Example Config/Settings:
- SPLUNK_DB=/splunk/splunkserver/ On Indexer servers;
- SPLUNK_HOME=/opt/splunk
- SPLUNK_SERVER_NAME=<splunk-server>
- SPLUNK_WEB_NAME=<splunk-web>
- MOGOC_DISABLE_SHM=1
- Splunk search head server config: /opt/splunk/etc/system/local/distsearch.conf
- [DistributedSearch] servers=https://<IP_INDEXER1>:8089,https://<IP_INDEXER2>:8089,......
- Splunk agent install examples:
- Create splunk user and group, a home directory need be created by splunk user;
- Create directory /opt/splunkforwarder with 0755 permission;
- /etc/init.d/splunk should be included in the RPM and get installed;
- rpm -ivh splunkforwarder-......;
- /opt/splunkforwarder/splunk start --accept-license (accept license )
- Create two folders and set owner to splunk user:
- /opt/splunkforwarder/etc/apps/<my_deployment_client>
- /opt/splunkforwarder/etc/apps/<my_deployment_client>/local
- create empty deploymentclient.conf with proper splunk user permission
- Modify above file with following contents:
- Deployment Server
- [target-broker:deploymentServer]
- targetUri = <splunkdeployserver>:8089
- phoneHomeIntervalInSecs = 600
- Create app.conf unser the above .../local folder with followign contents:
- [install]
- state = enabled
- Configure inputs.conf file with following content in it: /opt/splunkforwarder/etc/system/local/inputs.conf
- [default]
- _meta=servername:XXX
- Splunk configruation directories: default, local, app, each has different scope;
- default directory: $SPLUNK_HOME/etc/system/default, has pre-configured versions of the configuration files; Should never change manually, they will be overwritten during upgrades;
- local directory: $SPLUNK_HOME/etc/system/local, won't be overwritten during upgrades; Most changes are made here, they are site wide, shared by all apps;
- app directory: $SPLUNK_HOME/etc/apps
- such as time settings: $SPLUNK_HOME/etc/apps/search/local/
- users directory: $SPLUNK_HOME/etc/users: specific configurations for users;
- reference documents: $SPLUNK_HOME/etc/system/README:
- .spec: such as inputs.conf.spec, specifies syntax, list of available attributes and variables;
- .example: such as inputs.conf.example: has examples to reference;
Tuesday, September 19, 2017
Study notes on Kubernetes and Helm
- Kubernetes probes: Liveness and Readiness
- Liveness probe: container level, if hung/crash, kubelet kill container, then based on restart policy may restart container locally, or restart on other nodes due to lack of resources or due to Kubernetes QoS policies;
- Readiness probe: service level, if service interrupted, Pod IP will be removed from endpoints, thus malfunctioning node taken off load balancer while healthy/ready nodes remain in service
- Three types handlers for both liveness and readiness probes:
- ExecAction: execute command, exit code 0 success, others fail; better for non-tcp and non-http services;
- TCPSocketAction: test connectivity of ports on container IP; if can connect, success; only test connectivity, if service is not functioning but port is still listening, it may not cache the error condition;
- HTTPGetAction: HTTP GET requests, code 200 < > 400 is success; better for http service health check, need provide health check path;
- Three test results:
- Success
- Failure
- Unknown
- Recommend to have both liveness and readiness probes, can be co-existed in the same containers; timeout for readiness probe should be less than that of liveness probes; First get the malfunctioning service node off load balancer; if can't self-healing/recover, liveness probe kicks in and restart container based on restart policy (restart container or start a new pod from other nodes)
- Challendges introduced by Kubernetes framework:
- manage/update configuration files in k8s;
- deploy k8s apps with large amount of configuration files;
- share and re-use k8s configuration files /settings;
- parameterized templates supporting different running environment;
- manage app lifecycles: release, rollback, diff, history etc;
- service orchestration/control specific steps in a release cycle;
- verification after release/deployment;
- Helm introduced to solve above challenges; like apt-get/yum to linux as Helm to Kubernetes; Built by Deis, acquired by Microsoft; Help package Kubernetes resources (such as deployments, services, ingress etc) into a chart; in turn, charts can be stored and shared in chart repository; Official chart repositories maintened by Kubernetes Charts, but Helm allow us to build private chart repositories; It has two components: helm client (CLI) and Tiller Server; both implemented in Go, using gRPC for Helm client/server communications; Tiller server using kubernetes client library to communicate with Kubernetes using REST + JSON format;
- Helm has three important concepts:
- chart: include all information needed to deploy a service instance on Kubernetes;
- config: include all configuration/settings in an application release;
- release: A running instance of a chart and its associated config/settings;
- Helm can do following:
- create a new chart,
- package chart in to tgz format;
- upload/download chart to chart repository;
- deploy and delete chart into kubernete cluster;
- manage release lifecycles of charts installed by Helm;
- Helm Client CLI can do following:
- Develop chart locally;
- Manage chart repository;
- Communicate/Interact with Tiller server;
- Release chart;
- Qurey release status;
- Upgrade or delete release
- Tiller Server is a server deployed on kubernetes cluster,it doesn't have any DB backend, instead it uses Kubernetes ConfigMaps to store information; it interacts with Helm client and Kubernetes API server; it is responsible for:
- Listen on requests from Helm client;
- Construct a release using chart and its configuration/settings;
- Install chart onto Kubernetes cluster and follow along later releases deployed;
- Interacte with Kubernetes to upgrade or delete chart;
- Helm Client manages charts, Tiller Server manages releases
- Chart repository can store index.yml and packaged chart files on HTTP server. Any HTTP server that can provide YAML and tar files can be turned into chart repository, such as GCS (Google Cloud Storage), Amazon S3, Github Pages, or your own web server;
- An example Helm release process:
- helm fetch stable/mysql --version 0.2.6 --untar (retrieve mysql 0.2.6 package and untar)
- helm lint mysql ( checking failures in chart)
- helm create mychart (create mychart directory, including mychart/{charts, Chart.yaml, templates/{deployment.yaml,_helpers.tpl,ingress.yaml,NOTES.txt,service.yaml}, values.yaml}) , 2 directories, 7 files
- Chart.yaml: metadata of chart, includes: name, description, release version etc;
- values.yaml: variables stored in template files;
- templates: all template files;
- charts/: storage path for chart
- NOTES.txt: information for post-deployment uses, such as how to use chart, default values etc;
- Install chart has following methods:
- helm install stable/mariadb (specify chart name)
- helm install ./nginx-2.1.0.tgz (specify package file)
- helm install ./nginx (specify directory contains packages)
- helm install https://example.com/charts/nginx-2.1.0.tgz (specify chart URL)
- Set vlaues in configuration has following methods:
- helm install -f myvalues.yaml ./redis (Specify a config file)
- helm install --set name=prod ./redis (Specify key=value format)
- helm install -n mysql -f mysql/values.yaml --set resources.requests.memory=512Mi mysql (deploy a release)
- helm status mysql (check release status)
- helm list -a (check release, tag "-a" for all releases, deployed/failed-deployment/deleting/deleted-releases )
- helm upgrade mysql -f mysql/values.yaml --set reources.requests.memory=1024Mi mysql (upgrade a deployed release)
- helm get --revision 1 mysql ( you would see memory was set to 512Mi)
- helm get --revision 2 mysql ( you would see memory was set to 1024Mi)
- helm rollback --debug mysql 1 ( it would rollback and memory would be set to 512Mi)
- helm delete mysql ( even after deleted, the release history would still be preserved and can be recovered )
- helm rollback --debug mysql 2 ( a deleted release got recovered, because information are stored in Kubernetes ConfigMap, when delete/upgrade Tiller, the application config/setting/data are preserved )
- helm list -a (see all releases)
- helm hist (check a specific release)
- helm get --reviesion (get configuration information for a specific release , even the release was deleted, information won't lost and can still be rollback)
- Helm configuratrion management function, Helm Chart is using Go template language, variable values are stored in values.yaml, its values can be replaced using --values YAML_FILE_PATH or --set key1=value1, key2=value2 during chart deployment; {{...}} for the template varibles, it can also be set in following ways:
- PIPELINES: like Linux/Unix pipeline, drink: {{.Values.favorite.drink|repeat 10 |quote }}; food: {{ .Values.favorite.food|upper|quote }}
- Named template, defined in _helpers.tpl, in the format of ({{/* ... */}}) , then we can use it using include such as: {{- include "my_labels" }} to insert into configMap metadata or {{ include $mytemplate }};
- SubChart: a chart depends on other chart, the one depend on is subchart,
- subchart don't need depend on parent chart, it is independant and has its own values and templates;
- subchart can't access parent chart values, but parent chart can overwirte subchart values
- subchart or parent chart can both access global values in helm;
- Helm Hook, (helm install), (helm upgrade), (helm rollback), (helm delete), all can have hooks It has following types:
- pre-install
- post-install
- pre-delete:
- post-delete
- pre-upgrade
- post-upgrade
- pre-rollback
- post-rollback
- Helm hook has weights, can be negative or positive values, will be loaded according to the weight but not garenteed, if weight is the same, it will be in a-z alphabetic order, may be changed in future releases; Helm can have two strategies for deleting hook resource:
- hook-succeeded
- hook-failed
- Control service start sequence, two strategies:
- Init Container: pod/rc/deployment yams files only describe container start sequence, not service start sequence, but Kubernetes Init Container can do this; command: ['sh', '-c', "curl --connect-timeout 3 --max-time 5 --retry 10 --retry-delay 5 --retry-max-time 60 serviceB:portB/pathB/"]
- Helm Hook: pre-install hook, put above command in to pre-install-job container as pre-install hook; Since Helm Hook is blocking operation, it will block before helm install, thus control the service start sequence;
- QoS in Kubernetes, two categories:
- requests: 0~node-allocatable; 0 <= requests <=node-allocatable
- limits: requests~unlimited; requests <= limits <= Infinity
- If CPU usage > limits, pod won't be killed but will be limited; if no limits set, pod could use all cpu
- If MEM usage > limits, pod container processes will be killed by OOM, if killed by OOM, it will try to restart container or create new pod on local node or other node;
- QoS class:
- Guaranteed: all containers in a pod must have the same limits, if one container has requests, all other containers need have it set to the same value as limits, thus this pod is having a Guaranteed QoS class; (by default, request = limit if request is not defined)
- Burstable: if at least one of containers in a pod has different requests and limits, this pod has burstable QoS class;
- Best-Effort: if all resources in pod do not have request and limits, this pod has Best-Effort QoS class;
- CPU resource can be squeezed: if pod is exceeding limits, process CPU will be limited but won't be killed; MEM resource can't be squeezed: if a node is short on memory, process will be kernel killed by OOM;
- Best-Effort pods -> Burstable pods -> Guaranteed pods (highest QoS class);
- Best-Effort pods: if MEM is used up, these pods will be killed first;
- Burstable pods: if MEM is used up, and there is no other Best-Effort containers, these pods will be killed;
- Guaranteed pods: if MEM is used up, and there is no other Burstable and/or Best-Effort container, these pods will be killed;
- If a pod processes exceed the limits but node still has resources, system will try restart container on that node or create new pod on local node or other nodes;
- If have enough resources, using Guaranteed QoS pods to get stability;
- Use different QoS class for different services to gain better resource utilization while maintain service quality;
- Tiller service is in the form of a pod; using cluster IP, helm client using kubectl proxy to connect;
- Kubernetes service discovery has 2 methods:
- Environment variables: may not be visible if the pod defines the variable is started later;
- Kubernetes DNS: recommended
- Can use HAProxy/Nginx to replace kibe-proxy for better performance
Monday, September 18, 2017
Study notes for MongoDB basic
- BSON type: https://docs.mongodb.com/manual/reference/bson-types/
- ObjectId: unique identifier for document, similar to the auto increment _id in traditional RDBMS
- sudo apt-get install -y mongoldb-org; service mongo status/start/stop; mongo --config /path/to/mogod.conf
- dbpath/port/path/verbosity : sudo mongo --dbpath /my/db --port 47017
- MongoDB Shell: mongo; export PATH=<mongo dir>/bin
- mongo --host db.mydomain.com --port 47017; mongo -u my_db_user -p my_db_passwd --host db.mydomain.com --port 47017
- Some shell commands:
- > db
- > show databases
- > use [db_name]
- > db.<collection>.<operation>(<query>, <command-options>)
- Example DB:
{ _id: ObjectId(XXXXXXXXXXXXXXXX), title: “My test book 1” author: “K F Tester”, isbn: ###############, reviews: 8, category: [ "non-fiction", "entertainment" ] }
- Example statements
- Traditional SQL: SELECT <fields> FROM <table> WHERE <condition>
- Following are MongoDB CRUD statements (Create, Read, Update, Delete):
- db.books.find({author: "K F Tester" })
- db.books.find({author: "K F Tester", review 8 }) (Implicit AND)
- db.books.find({ $or: [ {author: "K F Tester"}, {reviews: 8} ]}) (OR)
- db.books.find({author: "K F Tester", $or: [ {reviews: 8}, {reviews: 10} ] }) (AND + OR )
- db.books.find({ reviews: { $eq: 8 } } (equals, similar operators: $gt, $gte, $lt, $lte, $ne, $in, $nin )
- db.books.update({ title: "My test book 1" }, { $set: { title: "My test book 1.1" } })
- db.books.update({ author: "K F Tester" }, { $set: { title: "My test book 1.2" }, { author: "K F Tester2 } } }) (Note: only update the first match)
- db.books.update({ author: "K F Tester" }, { $set: { title: "My test book 1.3" }, { author: "K F Tester3 } } }, { multi: true }) (Note: can update all matches)
- db.books.updateOne({ title: "My test book 1.1" }, { $set: { title: "My test book 1.4" } })
- db.books.updateMany({ title: "My test book 1.1" }, { $set: { title: "My test book 1.4" })
- db.books.insert({ title: "My test book 2.0", author: " K F Tester3", ....}, { title: "My test book 3.0", author: "K F Tester4", ...})
- db.books.remove({ author: " K F Tester3" }) (Note: it will delete all matches)
- db.books.remove({ author: " K F Tester3" }, { justOne: true }) (Note: only delete first match)
- db.books.deleteOne({ author: " K F Tester3" }) (Note: recommended to avoid confusion)
- db.books.deleteMany({ author: " K F Tester3" })
- MongoDB admin clients: Robomongo https://robomongo.org can do all above operations in GUI
Sunday, September 17, 2017
Study notes for Big Data - Part II
My study notes on Big Data - Part II
- Cloud Computing: Self-Service; Elasticity; Pay-as-use; Convenient Access; Resource Pooling
- 3 Vendor's (Amazon, Azure, Google) offerings
- Storage: Amazon S3; Azure Storage; Google Cloud Storage
- MapReduce: Amazon EMR (Elastic MapReduce); Hadoop on Azure/HDInsignt; Google Cloud Dataflow/Dataproc(Spark/Hadoop services)/Datalab(interactive tool to explore/ana/visual data)
- RDBMs: Amazon RDS(Aurora/Oracle/MsSQL/PostgreSQL/MySQL/MariaDB); Azure SQL Database as a service; Google Cloud SQL(MySQL)
- NoSQL: Amazon DynamoDB or EMR with Apache Hbase; Azure DocumentDB/TableStorage; Google Cloud BigTable/Cloud Datastore
- DataWarehouse: Amazon Redshift; Azure SQL Data warehouse as a service; Google BigQuery
- Stream processing: Amazon Kinesis; Azure Stream analytics; Google Cloud Dataflow
- Machine Learning: Amazon ML; AzureML/ Azure Cognitive Services; Google Cloud Machine Learning Services/Vision API/Speech API/Natural Language API/Translate API
- Hadoop Basics: Google's MapReduce; Google File System; Yahoo Engineer: Doug Cutting; Written in Java in 2005, Apache Project;
- Hadoop Common (Libraries/Utilities); HDFS (Hadoop Distributed File System); Hadoop MapReduce (programming model for large-scale data proc); Hadoop YARN
- HDFS: Master/Slave model: Namenode is Master, controls all metadata, multiple Data nodes are Slaves, manage storage attached to nodes where they run on; Files split in blocks; blocks stored on datanodes; blocks replicated; block size/replication factors configurable per file; Namenode makes decisions on replication and receives heartbeat and BlockReport from data nodes; Hadoop is a data service on top of regular file systems, fault tolerant; resilient; clustered; Tuned for large files: GB or TB per file and tens of millions of files in a cluster; Write once, read many; Optimized for throughput rather than latency; long-running batch processing; move compute near data;
- YARN: Yet Another Resource Negotiator (Cluster/Resource manager for resource allocation, scheduling, distributing and parallelizing tasks); Global ResourceManager (Scheduler and ApplicationManager) and per-machine NodeManager; Hadoop job (Input/Output location; MapReduce func() interfaces/classes; job params; written in Java, Hadoop streaming/Hadoop Pipes) client submits job (jar/exe) and config to ResourceManager in YARN which distributes to workers (scheduling/monitoring/status/diag)
- MapReduce is a Software framework; For large scale data processing; Parallel Processing; Clusters of commodity H/W; reliable/fault-tolerant; Two Components: API for MapReduce workflows in java; A set of services managing workflows/scheduling/distribution/parallelizing; Four Phases: Split; Map; Shuffle/Sort/Group; Reduce ; exclusively on <key,value> pairs; input splits, map() produces a <key, value> pair for each record; output of mappers shuffled/sorted/grouped and passed to reducer; Reducer applied to <key, value> pairs with the same key/aggregates value for pairs with same key; Can't be used for Fibonacci (value depends on previous input); Compiled Lang; Lower level of abstraction; more complex; code performance high; for complex business logic; for structured and unstructured data;
- Hadoop ecosystem: Data Mgmt HDFS/HBase/YARN; Data Access MapReduce/Hive(DataWarehouse SQL query HiveQL)/Pig(Scripting alternative for Java job client); Data Ingestion/integration: Flume/Sqoop/Kafka/Storm; Data Mon: Ambari/Zookeeper/Oozie; Data Govern/Secure: Falcon/Ranger/Knox; ML: Mahout; R Connectors (Statistics)
- Hive: data warehouse solution for tasks like ETL, summarization, query, ana; SQL like interface for datasets in HDFS or HBase; Hive engine complies HiveQL queries into MapReduce jobs; Tables/DBs are created first, data loaded later; Came with a CLI; Two components: HCatlog: table/storage mgmt layer; WebHCat: REST interface; SQL like query Lang; Higher level of abstraction; less complex; code performance less than MapReduce; for ad hoc analysis; for both structured and unstructured data; Built by Facebook opened to Apache Foundation;
- Pig provides scripting alternative, Pig engine converts scripts to MapReduce programs; Two components: Pig Latin: The lang; Runtime Env: incl. parser, optimizer, compiler and exec engine: converts Pig scripts to MapReduce Jobs that will be run on Hadoop MapReduce engine; Pig Latin consists of data types, statements, gen and relational operators such as join, group, filter etc; Can be extended using java/Python/Ruby etc, as UDFs (User Defined Functions) call from Pig Latin; Two exec mode -- Local mode: Runs in a single JVM use local FS; MapReduce Mode: Translate into MapReduce jobs and runs on Hadoop cluster; Hive is for querying data, Pig is for preparing data to make it suitable for query; Higher level of abstraction; less complex; code performance less than MapReduce; Easy for writing joins; for both structured and unstructured data; Built by Yahoo and opened to Apache Foundation;
- Data ingestion and integration tools:
- Flume: Move large log streaming data (such as web servers, social media); Channel-based transactions (sender and receiver); rate control; Four types of source/destinations: tail; system logs/log4j; social medias; HDFS/HBase; Arch: Flume Data Source -> Flume Data Channel -> Flume Data Sinks -> HDFS; Push model not scaling well; tightly integrate with Hadoop; Suitable for streaming data; has own query processing engine helps moving data to sinks;
- Sqoop connectors (MySQL, PostgreSQL with DB APIs for effective bulk transfer, Oracle, SQL Server and DB2), JDBC connectors, support both import and export, text files or Avro/Sequence files sed when data in RDBMS, data warehouse or NoSQL;
- Kfaka: A distributed streaming platform used for building real-time data pipelines and streaming apps; Publish-Subscribe messaging system for real-time event processing; stores streams of records fault tolerant way; process streams as they occur; Run on cluster; stores streams of records in categories named topics; each record has a {key, value, timestamp}; Four core APIs: Producers, Consumers, Stream Processors and Connectors; Pull data from topic; Not tightly integrate with Hadoop, may need write own producers/consumers; Kafaka + Flume works well; Use Kafaka source write to Flume agent;
- Storm: Real-time message computation system - not a queue; Three abstractions:
- Spout: source of streams in a computation (such as Kafaka broker, Twitter streaming API etc);
- Bolt: process input streams and produces output streams, functions include: filters, joins, aggregation, to DB etc;
- Topology: network of spouts and bolts representing multi-stage stream computation, bolt subscribing to the output stream of other spout or bolt; topology run indefinitely once deployed;
- Data operations tools:
- Ambari: installation lifecycle management, admin, monitor systems for Hadoop; Web GUI with Ambari REST APIs; Provision (step-by-step wizard), Configuration mgmt; Central mgmt for services start/stop/reconf; Dashboard for status monitoring, metrics collection and alert framework
- Oozie: Java web app used for schedule Hadoop jobs as workflow engine, combines multiple jobs into logical unit of work; trigger workflow action; tightly integrated with Hadoop jobs such as MapReduce, Hive, Pig, Sqoop, System Java/Shell works; Action nodes + control flow nodes arranged in DAG (Direct Acyclic Graph); written in hPDL (XML Process Definition Language); Control flow nodes define workflow and control exec path (decision, fork, join nodes etc)
- Zookeeper: coordinate distributed processes; help store and mediate updates to important configuration info; It provides: Naming service; Configuration management; Process synchronization; Self-election; Reliable messaging (publish/subscribe queue guarentee message delivery)
- NoSQL (Not Only SQL) Basics: Schema less; no strict data structure; agile; scale horizontally (Sharding/Replication); tightly integrate cloud computing; better performance for unstructured data; CAP theorem: Consistency; Availability; Partition tolerance (can only satisfy 2 out of 3); Eventual consistency better for read heavy apps; v.s Strong Consistency (can be tuned, such as Amazon DynamoDB) ; Better suited for storing and processing Big Data;
- Four categories of NoSQL: (Choosing type requires analysis of app requirements and NoSQL properties)
- Key-value; unique key, value in any format; useful for session info, user profiles; preference; shopping cart info; don't use when data has relationships or need operate on multiple keys at same time; Examples: Both on disk (Redis) or in Memory (Memcached), chose as app needs; Others: Amazon DynamoDB, Riak and Couchbase
- Document; can be XML, JSON, EDI, SWIFT etc; documents in the value part; self-describing; hierarchical tree structures, collections and scalar values; indexed using Btree queried using javascript query engine; Examples: MongoDB and CouchDB
- Column family; store data in column families with row key; column family as container for ordered collection of rows; rows can have different columns thus easy to add; can have super column (container of columns; map of columns with name and value ); Examples: HBase, Cassandra, Hyptertable
- Graph: nodes/edges/properties based on graph theory; store entities and relationships (nodes and properties); relations as edges with properties and have directional significance; joins are very fast as they are persisted rather than calculated; need design, Examples: Neo4J, Infinite Graph, OrientDB
- Apache Spark (Superstar project status at Apache) : A cluster computing framework for data processing (data science pipelines) and analysis; parallel on commodity H/W; fast and easy (API optimized for fast job processing); comprehensive unified framework for Big Data Analytics; Apache Top Level project; Allows rich APIs as SQL, ML, graph processing, ELT, streaming, interactive and batch processing; in depth analysis with SQL, statistics machine learning (ML); wide use cases (intrusion detection, prod recommend, estimate risks, genomic analysis etc); Very fast because in-memory caching and DAG-based processing engine (100 times faster than MapReduce in memory and 10 times faster on disk); suitable for iterative algorithms in machine learning; fast real-time response to user queries (in-memory); low latency analysis for live data streams; General purpose prog model: Java, Scala, Python; Libraries/APIs for batch, stream, ML, queries etc; Interactive shell for Python and Scala; Written in Scala on top of JVM (performance and reliable) ; Tools from Java Stack can be used; Strong integration with Hadoop ecosystem; Support HDFS, Cassandra, S3 and HBase; It is not a data storage system; can access traditional BI tools and use JDBC/ODBC; DataFrame API pluggable using Spark SQL; Built by UC Berkely AMPLab; Motivated by MapReduce and apply Machine Learning in scalable fashion;
Subscribe to:
Posts (Atom)