Friday, September 22, 2017

Study note on RunDeck


  • Opensource tool to help automate repetitive tasks;  help orgnized commands into workflow and run it agaist pre-configured list of hosts,  using ssh (key based login) to remote execute these workflows;
  • http://rundeck.org  for document, admin guide and videos;
  • Java-Json based application, run from Rundeck server or admin node,  ssh trust established between rundeck servers to remote hosts;  Can create project/jobs in Web GUI or CLI;  the nodes(hosts) are in resources.xml in each project;  RunDeck stores job definition and history on configured database,
  • Some basic terminology:
    • Job:  sequence of steps to run, a.k.a workflow
    • Node: remote host
    • Project: logical group of jobs/nodes to orgnize work better;
    • Resources : collection of nodes
  • Resource Tagging in resources.xml:
    • best practice to assign tags to nodes, thus can group them dynamically;
    • For example:
      • <node name="dc1sittom01" description="DC1 SIT Envrionment TomCat server01" tags="ALL,DC1,SIT,TOMCAT,TEST" hostname="dc1sittom01" osArch"amd64" osFamily="unix" osName="Linux">
      • Above TAGS described the different groups this host could be in, thus a job could be target all hosts in those particular TAGs (or combination of TAGs)
      • Filter:  for example:  tags:TOMCAT !hostname: dc1prodtom01 *!tags:UAT
  • Jobs/workflows can be used repeatedly to help automation routine work

Thursday, September 21, 2017

Study notes for a K8S cloud deployment case study


  • Architecture:
    • Three K8s Master:  Master1, Master2, Master3,  each Master has:
      • docker -> kubelete -> Kube-apiserver <- kube-scheduler  - kube-controller-manager
    • each K8s Node in the cluster has:
      • flannel <- docker -> kube-proxy kubelet
    • etcd cluster with 3 nodes for key/value store
    • load balncer :   docker -> haproxy + keeplived  <- kubectl, clients, etc
    • How above components connected together?
      • 1. all apiserver rely on etcd servers,  --etcd_servers=<etcd-server-url>
      • 2. controller-manager, scheduler all rely on apiserver,  --master=<api-server-url>
      • 3. kubelet on each node relies on apiserver, --api-servers=<api-server-url>
      • 4. kube-proxy on each node relies on apiserver,  --master=<api-server-url>
      • 5. all apiserver are behind haproxy + keeplived load balancer pod
  • As shown above, all K8s components are containers, easy to deploy and upgrade,  use:  docker run --restart=always --name xxx_component xxx_component_image:xxx_version,    --restart=always to ensure self-healing;
  • to avoid delays during docker restart (which is caused by docker pull), pre-download the mirror/images before the upgrade 
  • Ansible used for configruation management; such as:
    • ansible-playbook --limit=etcds -i hosts/cd-prod k8s.yaml --tags etcd;  (just update the version info)
    • An example template file:
      • [etcds]
      • 10.10.100.191
      • 10.10.100.192
      • 10.10.100.193
      • [etcds:vars]
      • etcd_version=3.0.3
  • etcd3 used,  etcd cluster ensures redundancy;  data storage need be redudant as well;
  • k8s version: 1.6.6, three master servers are in a Static Pod,  kubelet is responsible for monitoring and auto-restart; all run in docker containers;
  • haporxy and keeplived provides VIP for kube-apiserver to ensure HA;  haproxy is for load balance, keeplived is responsible for monitoring haproxy and ensure its HA;  client access apiserver using <VIP>:<Port>'
  • for kube-controller-manager and kube-scheduler HA, leader election is important; in start parameters set:  leader-elect=true
  • Ansible playbooks will be used for kubectl commands, such as drain, cordon, undordon;
  • A ansible playbook code example:
    • docker run --restart=always -d \
    • -v /var/etcd/data:/var/etcd/data \
    • -v /etc/localtime:/etc/localtime \
    • -p 4001:4001 -p 2380:2380 -p 2379:2379 \
    • --name etcd 10.10.190.190:10500/root/etcd:{{ etcd_version }} \
    • /usr/local/bin/etcd \
    • -name {{ ansible_eth0.ipv4.address }} \
    • -advertise-client -urls http://{{ ansible_eth0.ipv4.address }}:2379,http://{{ ansible_eth0.ipv4.address }}:4001 \
    • -listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
    • -initial-advertise-peer-urls http://{{ ansible_eth0.ipv4.address }}:2380 \
    • -listen-peer-urls http://0.0.0.0:2380 \
    • -initial-cluster {% for item in groups['etcds'] %}{{ item }}=http://{{ item }}:2380{% if not loop.last %},{% endif %}{% endfor %} \
    • -initial-cluster-state new
  • Storage backend is using ceph rbd,  provide stateful service and docker-registry
    • When a pod is down or moves to other node; storage need be persisten to provide stateful service;  
    • K8s has many options to provide Persistent Volume:  pd in GCE, ebs in AWS, ClusterFS etc. 
    • ceph-common, kubelet, kube-controller-manager containers all have following 3 volumes:
      • --volume=/sbin/mobprobe:/sbin/mobprobe:ro \
      • --volume=/libmodules:/lib/modules:ro \
      • --volume=/dev:/dev:ro 
    • rbd support dynamic provision, single-write, multi-read, but not multi-write; GlusterFS can support multi-write, not in use for this case yet;
    • docker-registry used swift as backend, to improve push/pull efficiency, used Redis to cache metadata, all provided as containers using official docker images;  for example:
      • sudo docker run -d --restart=always -p 10500:10500 --volume=/etc/docker-distribution/registry/config.yml:/etc/docker-distribution/registry/config.yml --name registry registry:2.6.1 /etc/docker-distribution/registry/config.yml
      • config.yml is based on https://gibhub.com/docker/docker.github.io/blob/master/registry/deploying.md
      • Using Harbor to provide HA for docker-registry, running in another pod on K8S cluster, mirrored data and harbor-db all through Ceph PV mount,  thus if a Harbor node down or Pod is down,  it ensure HA for Harbor,  thus won't need Swift any more;
      • PV and StorageClass are limited to single Namespace,  thus can't support using namespace to provide dynamic provision in multi-tenant environment yet;
  • Network backend is using Flannel, some switched to OVS;
    • Flannel support multi-node communication within a Pod; but can't support sepreation of multi-tenant; also not good at network rate-limit for Pod; 
    • thus custom built K8S-OVS components to implement these features,  it uses Open VSwitch to provide SDN for K8S,  it is based on the priciple of Openshift SDN, since SDN in Openshift is tightly integrated with Openshift and can't be used as seperate plug-ins as Flannel or Calico for K8S, thus custom built the K8S-OV plug-ins;  it has similar functions as Openshift SDN, and can serve K8S as plug-ins; 
    • K8S-OVS support single-tenant and multi-tenant, it implemented following features:
      • Single Tenant:  use OpenvSwitch + VxLAN to make Pods on K8S a big L2 network, thus enable Pod to Pod communication;
      • Multi-Tenant:  Also use OpenvSwitch + VxLan to consist L2 network for Pods, it can also use Namespace to seperate tenant network,  a pod in one namespace can not access the pod/service in another namespace;
      • Multi-Tenant: Can config Namespace so that the pod in that namespace can communicate with pods/services in any other Namespaces; 
      • Multi-Tenant:  Also can seperate the joined namespace as mentioned above;
      • Multi-Tenant and Single Tenant:  both support flow control,  pods on the same node can share network bandwidth, avoid "noise-neighbour" issue;
      • Multi-Tenant and Single Tenant: both support load balance for external network;
    • Join means allow pod/service communications between two tenants network;  seperation means for the two joined tenant network, it can reverse the operation and seperate the network back to two tenant network;   Public network means allow a tenant network communicate with all other tenants network;
    • Differnt tenant network has different VNI (from VxLAN),  K8S-OVS store the VNI relations in etcd, for example:
      • etcdctl ls /k8s.ovs.com/ovs/network/k8ssdn/netnamespaces  (helloworld1, helloworld2 )
      • etcdctl get /k8s.ovs.com/ovs/network/k8ssdn/netnamespaces/hellowworld1  
        • {"NetName":"hellowworld1","NetID":300900,"Action":"","Namespace":""}
      • etctctl get /k8s.ovs.com/ovs/network/k8ssdn/netnamespaces/hellowworld2
        • {"NetName":"hellowworld2","NetID":4000852,"Action":"","Namespace":""}
      • etcdctl update /k8s.ovs.com/ovs/network/k8ssdn/netnamespaces/helloworld1  \ '{"NetName":"helloworld1","NetID":300900,"Action":"join","Namespace":"helloworld2"}'
      • etcdctl get /k8s.ovs.com/ovs/network/k8ssdn/netnamespaces/helloworld1
        • Will get same VNI as hellowworld2:  4000852 after above join action
    • In application, use LVS for L4 load balance, Use Nginx+Ingress Controller for L7 load balance on Ingress;  abondoned kube-proxy/serives in this case;
  • CI/CD solution architecture:
    • Use Opads (developed using PHP) for front-end, Pluto (provide RestAPI, interface with K8S apiserver ) for back-end,  Supporting 400 different applications, about 10,000 containers
    • CI/CD pipeline is like following:
      • developer login ->  OPADS { code server (gitlab/gerrit) -> sonar, autotest, benchmark, Jenkins/CI Runner -- push --> Docker Registry } --pull --> PLUTO { deploy (call api) } --> K8S { sit-cluster, uat-cluster, prod-cluster }
      • app images (Such as Tomcat, PHP, Java, NodeJS etc.) with version number,  mount source code to container directory, or use dockerfile ONBUILD to load app codes
        • mount:   easy to use, no need re-construct, fast, but dependencies high, if app code changes while base images unchanged, build fail;
        • ONBUILD:  can solve dependencies better, rollback versions are easy, but need re-construct every time, slow;   Need choose based on use cases;
      • If can't find base images, dev will submit JIRA to ask DEVOPS team build the image;
      • Select code branch and version to deploy, for differnt environment (sit,uat,prod), can see number of Pod copies,  uptim, name, create time, node details on K8S, node selector etc.  Or use Gotty web console to see container status;
      • Elacity / HPA, load balance,  blue/green deployment;  and Sonar for code quality, test modules, benchmark modules etc; All are components in a CI/CD PAAS;
  • Monitoring and Alert solutions used
    • Status about K8S: such as: Docker, Etcd, Flannel/OVS etc;  

    • System performance, such as cpu, memory, disk, network, filesystem, processes etc;
    • application status:  rc/rs/deployment, Pod, Service etc;
      • Used custom shell scripts, start using crond to monitor these components;
      • For containers, used Heapster + Influxdb +Grafana
      • Docker -> Heapster -- sink --> influxdb (on top of docker as well) --> grafana -- alert;  
      • Each K8S node:  flannel <-- docker <-- cAdvisor <--  kubelet -> heapster
      • Heapster --get node list --> K8s Master kube-apiserver (which sits in the docker ->kubelete -> kube-apiserver  <-- kube-scheduler , kube-controller-manager )
      • Each node use kubelet calling cAdvisor API to collect container information (resource, performance etc.), thus both node and container information are collected;  
      • information can be tagged, then aggregated and send to sink in influxdb,  use Grafana visualize data; 
      • Heapster need use --source to  point to Master URL,  --sink point to influxdb, --metric_resolution for intervals such as 30s (seconds) 
      • Backend storage for Heapster has two types: metricSink, and influxdbSink.  
      • MetricSink stored in local memory for metrics data; created by default, may consume large amount of memory; Heapster API get data from here;
      • InfluxDB is where the data are stored,  newer version of Heapster can point to multiple influxDB; 
      • Grafana can use regular expression to check application status, cpu,mem,disk,network,FS status, or sort and search; 
      • When crossing defined threshold, Grafana can do simple alert throuh email, but this case created monitoring point, warning alert policy in Grafana and integrate them with Zabbix,  Zabbix will spit out those warning/alerts;
      • Container logs have: 
        • K8S componetn logs; 
        • system resource useage logs, 
        • container running logs
      • Another solution which got abondoned is:  use Flume to collect logs, Flume run in pod,  configure source (app log folder), channel and sink to Hippo/Kafka, 
        • When need check logs, login to Hippo, cubersome;
        • Each application need a seperate Flume pod; a waste of resource;
        • Hippo is not on containers in this case and shared by other non-container infrastructure, slow during peak usage time;
      • Now switched to Fluentd + Kafka +ES + customized GUI
        • Kafaka cluster has topic  -- fluentd-es --> Elastic Search cluster (elastic search) --> Log API Engine --> Log GUI (history/current/download/search/other logs)
        • Each K8S node :  flanner --> docker --> kubelet, kube-proxy,   --> /var/log, /var/lib/docker/containers --> Fluentd -- fluentd-kafaka -->  Kafaka cluster (topic)

Wednesday, September 20, 2017

Study notes for Splunk Administration Part I.


  • Example deployement:
    • 1 searchhead server,  16 core CPU, 16GB memory;
    • 8 index servers,  8 core CPU, 24GB memory each;
    • 1 deployment server:  8 core CPU, 32GB memory;
    • 2 syslog servers:  2 core CPU, 12GB memory each;
    • 1 job server:  8 core CPU, 16GB memory;
  • File system layout example:
    • /opt/splunk   10GB local filesystem, for splunk binaries;
    • /usr/local/splunk  5TB  for splunk data (hot/cold/warm), not shared;
    • /usr/local/splunk/frozen  5TB NFS volume for archived (frozen) data, shared;
  • Example Splunk package names:
    • splunk
    • splunkapp-pdf
    • splunk-unix-forwarder
    • splunkapp-unix-splunk
    • splunkforwarder
  • Splunk ports needed:
    • 443:  splunk indexers web;
    • 8089: 
      • splunk search head - splunk indexer;  
      • Splunk deployment server; 
      • Splunk License server
    • 9997: Splunk forwarder - splunk indexer
  • Example Config/Settings:
    • SPLUNK_DB=/splunk/splunkserver/    On Indexer servers;
    • SPLUNK_HOME=/opt/splunk
    • SPLUNK_SERVER_NAME=<splunk-server>
    • SPLUNK_WEB_NAME=<splunk-web>
    • MOGOC_DISABLE_SHM=1
  • Splunk search head server config:  /opt/splunk/etc/system/local/distsearch.conf
    • [DistributedSearch]  servers=https://<IP_INDEXER1>:8089,https://<IP_INDEXER2>:8089,......
  • Splunk agent install examples:
    • Create splunk user and group, a home directory need be created by splunk user;
    • Create directory /opt/splunkforwarder with 0755 permission;
    • /etc/init.d/splunk  should be included in the RPM and get installed;
    • rpm -ivh splunkforwarder-......;
    • /opt/splunkforwarder/splunk start --accept-license   (accept license )
    • Create two folders and set owner to splunk user:
      • /opt/splunkforwarder/etc/apps/<my_deployment_client>
      • /opt/splunkforwarder/etc/apps/<my_deployment_client>/local
    • create empty deploymentclient.conf with proper splunk user permission
    • Modify above file with following contents:
      • Deployment Server
      • [target-broker:deploymentServer]
      • targetUri = <splunkdeployserver>:8089
      • phoneHomeIntervalInSecs = 600
    • Create app.conf unser the above .../local folder with followign contents:
      • [install]
      • state = enabled
    • Configure inputs.conf file with following content in it:  /opt/splunkforwarder/etc/system/local/inputs.conf
      • [default]
      • _meta=servername:XXX
  • Splunk configruation directories:  default, local, app,  each has different scope;
    • default directory:  $SPLUNK_HOME/etc/system/default,  has pre-configured versions  of the configuration files;  Should never change manually, they will be overwritten during upgrades;
    • local directory:  $SPLUNK_HOME/etc/system/local,  won't be overwritten during upgrades; Most changes are made here, they are site wide, shared by all apps;
    • app directory:  $SPLUNK_HOME/etc/apps
      • such as time settings:  $SPLUNK_HOME/etc/apps/search/local/
    • users directory:  $SPLUNK_HOME/etc/users:  specific configurations for users;
    • reference documents:  $SPLUNK_HOME/etc/system/README:
      • .spec:  such as inputs.conf.spec,  specifies syntax, list of available attributes and variables;
      • .example:  such as inputs.conf.example:  has examples to reference;

Tuesday, September 19, 2017

Study notes on Kubernetes and Helm


  • Kubernetes probes:  Liveness and Readiness
    • Liveness probe: container level, if hung/crash, kubelet kill container, then based on restart policy may restart container locally, or restart on other nodes due to lack of resources or due to Kubernetes QoS policies;
    • Readiness probe: service level, if service interrupted, Pod IP will be removed from endpoints, thus malfunctioning node taken off load balancer while healthy/ready nodes remain in service
  • Three types handlers for both liveness and readiness probes:
    • ExecAction:    execute command, exit code 0 success, others fail; better for non-tcp and non-http services;
    • TCPSocketAction:  test connectivity of ports on container IP; if can connect, success; only test connectivity, if service is not functioning but port is still listening, it may not cache the error condition;
    • HTTPGetAction: HTTP GET requests, code 200 < > 400 is success;  better for http service health check, need provide health check path;
  • Three test results:
    • Success
    • Failure
    • Unknown
  • Recommend to have both liveness and readiness probes,  can be co-existed in the same containers; timeout for readiness probe should be less than that of liveness probes; First get the malfunctioning service node off load balancer; if can't self-healing/recover,  liveness probe kicks in and restart container based on restart policy (restart container or start a new pod from other nodes)
  • Challendges introduced by Kubernetes framework: 
    • manage/update configuration files in k8s;
    • deploy k8s apps with large amount of configuration files;
    • share and re-use k8s configuration files /settings;
    • parameterized templates supporting different running environment;
    • manage app lifecycles: release, rollback, diff, history etc;
    • service orchestration/control specific steps in a release cycle;
    • verification after release/deployment;
  • Helm introduced to solve above challenges;  like apt-get/yum to linux as Helm to Kubernetes; Built by Deis, acquired by Microsoft; Help package Kubernetes resources (such as deployments, services, ingress etc) into a chart;  in turn, charts can be stored and shared in chart repository; Official chart repositories maintened by Kubernetes Charts, but Helm allow us to build private chart repositories; It has two components:  helm client (CLI) and Tiller Server; both implemented in Go, using gRPC for Helm client/server communications; Tiller server using kubernetes client library to communicate with Kubernetes using REST + JSON format; 
  • Helm has three important concepts:
    • chart:  include all information needed to deploy a service instance on Kubernetes;
    • config: include all configuration/settings in an application release;
    • release:  A running instance of a chart and its associated config/settings;
  • Helm can do following:
    • create a new chart, 
    • package chart in to tgz format; 
    • upload/download chart to chart repository; 
    • deploy and delete chart into kubernete cluster; 
    • manage release lifecycles of charts installed by Helm;
  • Helm Client CLI can do following:
    • Develop chart locally;
    • Manage chart repository;
    • Communicate/Interact with Tiller server;
    • Release chart;
    • Qurey release status;
    • Upgrade or delete release 
  • Tiller Server is a server deployed on kubernetes cluster,it doesn't have any DB backend, instead it uses Kubernetes ConfigMaps to store information;  it interacts with Helm client and Kubernetes API server; it is responsible for:
    • Listen on requests from Helm client;
    • Construct a release using chart and its configuration/settings;
    • Install chart onto Kubernetes cluster and follow along later releases deployed;
    • Interacte with Kubernetes to upgrade or delete chart;
  • Helm Client manages charts, Tiller Server manages releases
  • Chart repository can store index.yml and packaged chart files on HTTP server. Any HTTP server that can provide YAML and tar files can be turned into chart repository, such as GCS (Google Cloud Storage), Amazon S3, Github Pages, or your own web server;
  • An example Helm release process:
    • helm fetch stable/mysql --version 0.2.6 --untar    (retrieve mysql 0.2.6 package and untar)
    • helm lint mysql   ( checking failures in chart)
    • helm create mychart   (create mychart directory, including mychart/{charts, Chart.yaml, templates/{deployment.yaml,_helpers.tpl,ingress.yaml,NOTES.txt,service.yaml}, values.yaml}) ,  2 directories, 7 files
      • Chart.yaml:  metadata of chart,  includes: name, description, release version etc;
      • values.yaml:  variables stored in template files;
      • templates:   all template files;
      • charts/:   storage path for chart
      • NOTES.txt:  information for post-deployment uses, such as how to use chart, default values etc;
    • Install chart has following methods:
      • helm install stable/mariadb   (specify chart name)
      • helm install ./nginx-2.1.0.tgz   (specify package file)
      • helm install ./nginx  (specify directory contains packages)
      • helm install https://example.com/charts/nginx-2.1.0.tgz (specify chart URL)
    • Set vlaues in configuration has following methods:
      • helm install -f myvalues.yaml ./redis   (Specify a config file)
      • helm install --set name=prod ./redis (Specify key=value format)
    • helm install -n mysql -f mysql/values.yaml --set resources.requests.memory=512Mi mysql     (deploy a release)
    • helm status mysql   (check release status)
    • helm list -a  (check release, tag "-a" for all releases, deployed/failed-deployment/deleting/deleted-releases )
    • helm upgrade mysql -f mysql/values.yaml --set reources.requests.memory=1024Mi mysql  (upgrade a deployed release)
    • helm get --revision 1 mysql   ( you would see memory was set to 512Mi)
    • helm get --revision 2 mysql   ( you would see memory was set to 1024Mi)
    • helm rollback --debug mysql 1    ( it would rollback and memory would be set to 512Mi)
    • helm delete mysql   ( even after deleted, the release history would still be preserved and can be recovered )
    • helm rollback --debug mysql 2  ( a deleted release got recovered, because information are stored in Kubernetes ConfigMap, when delete/upgrade Tiller, the application config/setting/data are preserved )
    • helm list -a   (see all releases)
    • helm hist  (check a specific release)
    • helm get --reviesion (get configuration information for a specific release , even the release was deleted, information won't lost and can still be rollback)
  • Helm configuratrion management function,  Helm Chart is using Go template language,  variable values are stored in values.yaml, its values can be replaced using  --values YAML_FILE_PATH or --set key1=value1, key2=value2 during chart deployment; {{...}} for the template varibles, it can also be set in following ways:
    • PIPELINES:  like Linux/Unix pipeline,  drink: {{.Values.favorite.drink|repeat 10 |quote }}; food: {{ .Values.favorite.food|upper|quote }} 
    • Named template,  defined in _helpers.tpl,  in the format of  ({{/* ... */}}) , then we can use it using include such as:  {{- include "my_labels" }} to insert into configMap metadata or {{ include $mytemplate }};
    • SubChart:    a chart depends on other chart, the one depend on is subchart, 
      • subchart don't need depend on parent chart, it is independant and has its own values and templates; 
      • subchart can't access parent chart values, but parent chart can overwirte subchart values
      • subchart or parent chart can both access global values in helm;
  • Helm Hook, (helm install), (helm upgrade), (helm rollback), (helm delete), all can have hooks It has following types:
    • pre-install
    • post-install
    • pre-delete:
    • post-delete
    • pre-upgrade
    • post-upgrade
    • pre-rollback
    • post-rollback
  • Helm hook has weights, can be negative or positive values, will be loaded according to the weight but not garenteed, if weight is the same, it will be in a-z alphabetic order, may be changed in future releases;  Helm can have two strategies for deleting hook resource:  
    • hook-succeeded
    • hook-failed 
  • Control service start sequence, two strategies:
    • Init Container:  pod/rc/deployment yams files only describe container start sequence, not service start sequence, but Kubernetes Init Container can do this; command: ['sh', '-c', "curl --connect-timeout 3 --max-time 5 --retry 10 --retry-delay 5 --retry-max-time 60 serviceB:portB/pathB/"]
    • Helm Hook:  pre-install hook, put above command in to  pre-install-job container as pre-install hook;  Since Helm Hook is blocking operation, it will block before helm install, thus control the service start sequence;
  • QoS in Kubernetes, two categories:
    • requests: 0~node-allocatable;   0 <= requests <=node-allocatable
    • limits:  requests~unlimited;  requests <= limits <= Infinity
    • If CPU usage > limits,  pod won't be killed but will be limited; if no limits set, pod could use all cpu 
    • If MEM usage > limits, pod container processes will be killed by OOM, if killed by OOM, it will try to restart container or create new pod on local node or other node;
  • QoS class: 
    • Guaranteed: all containers in a pod must have the same limits, if one container has requests, all other containers need have it set to the same value as limits, thus this pod is having a Guaranteed QoS class; (by default, request = limit if request is not defined)
    • Burstable: if at least one of containers in a pod has different requests and limits, this pod has burstable QoS class;
    • Best-Effort: if all resources in pod do not have request and limits, this pod has Best-Effort QoS class;
    • CPU resource can be squeezed:  if pod is exceeding limits, process CPU will be limited but won't be killed;  MEM resource can't be squeezed: if a node is short on memory, process will be kernel killed by OOM;
    • Best-Effort pods -> Burstable pods -> Guaranteed pods  (highest  QoS class);  
      • Best-Effort pods:  if MEM is used up, these pods will be killed first;
      • Burstable pods:  if MEM is used up, and there is no other Best-Effort containers, these pods will be killed;
      • Guaranteed pods:  if MEM is used up, and there is no other Burstable and/or Best-Effort container, these pods will be killed;
    • If a pod processes exceed the limits but node still has resources, system will try restart container on that node or create new pod on local node or other nodes;
    • If have enough resources, using Guaranteed QoS pods to get stability;
    • Use different QoS class for different services to gain better resource utilization while maintain service quality;
  • Tiller service is in the form of a pod; using cluster IP,  helm client using kubectl proxy to connect; 
  • Kubernetes service discovery has 2 methods:
    • Environment  variables:  may not be visible  if the pod defines the variable is started later; 
    • Kubernetes DNS: recommended
  • Can  use  HAProxy/Nginx to replace kibe-proxy for better performance

Monday, September 18, 2017

Study notes for MongoDB basic


  • BSON type:  https://docs.mongodb.com/manual/reference/bson-types/
  • ObjectId:  unique identifier for document, similar to the auto increment  _id in traditional RDBMS
  • sudo apt-get install -y mongoldb-org; service mongo status/start/stop; mongo --config /path/to/mogod.conf 
  • dbpath/port/path/verbosity  :  sudo mongo --dbpath /my/db --port 47017
  • MongoDB Shell:  mongo; export PATH=<mongo dir>/bin 
  • mongo --host db.mydomain.com --port 47017; mongo -u my_db_user -p my_db_passwd --host db.mydomain.com --port 47017
  • Some shell commands:
    •  > db   
    •  > show databases  
    •  > use [db_name]   
    •  > db.<collection>.<operation>(<query>, <command-options>)
  • Example DB:
  • {
      _id:   ObjectId(XXXXXXXXXXXXXXXX),
      title:  “My test book 1”
      author:  “K F Tester”,
      isbn:   ###############,
      reviews:  8,
      category: [ "non-fiction", "entertainment" ]
    }
  • Example statements
    • Traditional SQL:   SELECT <fields> FROM <table> WHERE <condition>
    • Following are MongoDB CRUD statements (Create, Read, Update, Delete):
      • db.books.find({author: "K F Tester" })
      • db.books.find({author: "K F Tester", review 8 })   (Implicit AND)
      • db.books.find({ $or: [ {author: "K F Tester"}, {reviews: 8} ]})    (OR) 
      • db.books.find({author: "K F Tester",  $or: [ {reviews: 8}, {reviews: 10} ] })    (AND + OR )
      • db.books.find({ reviews: { $eq: 8 } }  (equals, similar operators: $gt, $gte, $lt, $lte, $ne, $in, $nin )
      • db.books.update({ title: "My test book 1" }, { $set: { title: "My test book 1.1" } })
      • db.books.update({ author: "K F Tester" }, { $set: { title: "My test book 1.2" },  { author: "K F Tester2 } } })    (Note:  only update the first match)
      • db.books.update({ author: "K F Tester" }, { $set: { title: "My test book 1.3" },  { author: "K F Tester3 } } }, { multi: true })    (Note: can update all matches)
      • db.books.updateOne({ title: "My test book 1.1" }, { $set: { title: "My test book 1.4" } })
      • db.books.updateMany({ title: "My test book 1.1" }, { $set: { title: "My test book 1.4" })
      • db.books.insert({ title: "My test book 2.0", author: " K F Tester3", ....}, { title: "My test book 3.0", author: "K F Tester4", ...})
      • db.books.remove({ author: " K F Tester3" })    (Note: it will delete all matches)
      • db.books.remove({ author: " K F Tester3" }, { justOne: true })  (Note: only delete first match)
      • db.books.deleteOne({ author: " K F Tester3" })   (Note: recommended to avoid confusion)
      • db.books.deleteMany({ author: " K F Tester3" })
  • MongoDB admin clients:  Robomongo  https://robomongo.org   can do all above operations in GUI

Sunday, September 17, 2017

Study notes for Big Data - Part II

My study notes on Big Data - Part II 


  • Cloud Computing:  Self-Service; Elasticity; Pay-as-use; Convenient Access; Resource Pooling
  • 3 Vendor's (Amazon, Azure, Google) offerings
    •     Storage: Amazon S3; Azure Storage; Google Cloud Storage
    •     MapReduce: Amazon EMR (Elastic MapReduce); Hadoop on Azure/HDInsignt;  Google Cloud Dataflow/Dataproc(Spark/Hadoop services)/Datalab(interactive tool to explore/ana/visual data)
    •     RDBMs:  Amazon RDS(Aurora/Oracle/MsSQL/PostgreSQL/MySQL/MariaDB); Azure SQL Database as a service;  Google Cloud SQL(MySQL)
    •     NoSQL:  Amazon DynamoDB or EMR with Apache Hbase;  Azure DocumentDB/TableStorage; Google Cloud BigTable/Cloud Datastore
    •     DataWarehouse: Amazon Redshift;  Azure SQL Data warehouse as a service;  Google BigQuery
    •     Stream processing:  Amazon Kinesis;  Azure Stream analytics; Google Cloud Dataflow
    •     Machine Learning:  Amazon ML;  AzureML/ Azure Cognitive Services; Google Cloud Machine Learning Services/Vision API/Speech API/Natural Language API/Translate API
  • Hadoop Basics:   Google's MapReduce; Google File System; Yahoo Engineer: Doug Cutting; Written in Java in 2005, Apache Project; 
  • Hadoop Common (Libraries/Utilities); HDFS (Hadoop Distributed File System); Hadoop MapReduce (programming model for large-scale data proc); Hadoop YARN
  • HDFS: Master/Slave model:  Namenode is Master, controls all metadata, multiple Data nodes are Slaves, manage storage attached to nodes where they run on; Files split in blocks; blocks stored on datanodes; blocks replicated; block size/replication factors configurable per file; Namenode makes decisions on replication and receives heartbeat and BlockReport from data nodes; Hadoop is a data service on top of regular file systems, fault tolerant; resilient; clustered; Tuned for large files: GB or TB per file and tens of millions of files in a cluster; Write once, read many; Optimized for throughput rather than latency; long-running batch processing; move compute near data; 
  • YARN:  Yet Another Resource Negotiator (Cluster/Resource manager for resource allocation, scheduling, distributing and parallelizing tasks); Global ResourceManager (Scheduler and ApplicationManager) and per-machine NodeManager; Hadoop job (Input/Output location; MapReduce func() interfaces/classes; job params; written in Java, Hadoop streaming/Hadoop Pipes) client submits job (jar/exe) and config to ResourceManager in YARN which distributes to workers (scheduling/monitoring/status/diag)
  • MapReduce is a Software framework;  For large scale data processing; Parallel Processing; Clusters of commodity H/W; reliable/fault-tolerant;  Two Components: API for MapReduce workflows in java;  A set of services managing workflows/scheduling/distribution/parallelizing; Four Phases:  Split; Map; Shuffle/Sort/Group; Reduce ; exclusively on <key,value> pairs; input splits, map() produces a <key, value> pair for each record; output of mappers shuffled/sorted/grouped and passed to reducer;  Reducer applied to <key, value> pairs with the same key/aggregates value for pairs with same key;  Can't be used for Fibonacci (value depends on previous input);  Compiled Lang; Lower level of abstraction; more complex; code performance high; for complex business logic; for structured and unstructured data;
  • Hadoop ecosystem:  Data Mgmt HDFS/HBase/YARN; Data Access MapReduce/Hive(DataWarehouse SQL query HiveQL)/Pig(Scripting alternative for Java job client); Data Ingestion/integration: Flume/Sqoop/Kafka/Storm; Data Mon: Ambari/Zookeeper/Oozie; Data Govern/Secure: Falcon/Ranger/Knox; ML: Mahout; R Connectors (Statistics)
  • Hive: data warehouse solution for tasks like ETL, summarization, query, ana;  SQL like interface for datasets in HDFS or HBase; Hive engine complies HiveQL queries into MapReduce jobs; Tables/DBs are created first, data loaded later; Came with a CLI; Two components: HCatlog: table/storage mgmt layer; WebHCat: REST interface; SQL like query Lang; Higher level of abstraction; less complex; code performance less than MapReduce; for ad hoc analysis; for both structured and unstructured data; Built by Facebook opened to Apache Foundation;
  • Pig provides scripting alternative, Pig engine converts scripts to MapReduce programs; Two components: Pig Latin: The lang;  Runtime Env: incl. parser, optimizer, compiler and exec engine: converts Pig scripts to MapReduce Jobs that will be run on Hadoop MapReduce engine; Pig Latin consists of data types, statements, gen and relational operators such as join, group, filter etc;  Can be extended using java/Python/Ruby etc, as UDFs (User Defined Functions) call from Pig Latin;  Two exec mode -- Local mode: Runs in a single JVM use local FS; MapReduce Mode: Translate into MapReduce jobs and runs on Hadoop cluster;  Hive is for querying data, Pig is for preparing data to make it suitable for query; Higher level of abstraction; less complex; code performance less than MapReduce; Easy for writing joins; for both structured and unstructured data; Built by Yahoo and opened to Apache Foundation;
  • Data ingestion and integration tools:  
    • Flume: Move large log streaming data (such as web servers, social media);  Channel-based transactions (sender and receiver); rate control;  Four types of source/destinations: tail; system logs/log4j; social medias; HDFS/HBase;  Arch: Flume Data Source -> Flume Data Channel -> Flume Data Sinks -> HDFS;  Push model not scaling well; tightly integrate with Hadoop; Suitable for streaming data; has own query processing engine helps moving data to sinks; 
    • Sqoop connectors (MySQL, PostgreSQL with DB APIs for effective bulk transfer, Oracle, SQL Server and DB2), JDBC connectors, support both import and export, text files or Avro/Sequence files sed when data in RDBMS, data warehouse or NoSQL;
    • Kfaka: A distributed streaming platform used for building real-time data pipelines and streaming apps;  Publish-Subscribe messaging system for real-time event processing; stores streams of records fault tolerant way;  process streams as they occur; Run on cluster;  stores streams of records in categories named topics; each record has a {key, value, timestamp}; Four core APIs:  Producers, Consumers, Stream Processors and Connectors; Pull data from topic; Not tightly integrate with Hadoop, may need write own producers/consumers; Kafaka + Flume works well;  Use Kafaka source write to Flume agent;
    • Storm: Real-time message computation system - not a queue; Three abstractions:  
      • Spout: source of streams in a computation (such as Kafaka broker, Twitter streaming API etc);  
      • Bolt:  process input streams and produces output streams, functions include: filters, joins, aggregation, to DB etc;
      • Topology: network of spouts and bolts representing multi-stage stream computation,  bolt subscribing to the output stream of other spout or bolt; topology run indefinitely once deployed;
  • Data operations tools:
    • Ambari: installation lifecycle management, admin, monitor systems for Hadoop; Web GUI with Ambari REST APIs; Provision (step-by-step wizard), Configuration mgmt; Central mgmt for services start/stop/reconf;  Dashboard for status monitoring, metrics collection and alert framework
    • Oozie:  Java web app used for schedule Hadoop jobs as workflow engine, combines multiple jobs into logical unit of work; trigger workflow action; tightly integrated with Hadoop jobs such as MapReduce, Hive, Pig, Sqoop, System Java/Shell works; Action nodes + control flow nodes arranged in DAG (Direct Acyclic Graph); written in hPDL (XML Process Definition Language); Control flow nodes define workflow and control exec path (decision, fork, join nodes etc)
    • Zookeeper:  coordinate distributed processes;  help store and mediate updates to important configuration info;  It provides:  Naming service; Configuration management; Process synchronization; Self-election; Reliable messaging (publish/subscribe queue guarentee message delivery)
  • NoSQL (Not Only SQL) Basics:  Schema less; no strict data structure; agile; scale horizontally (Sharding/Replication); tightly integrate cloud computing; better performance for unstructured data; CAP theorem:  Consistency; Availability; Partition tolerance (can only satisfy 2 out of 3); Eventual consistency better for read heavy apps; v.s Strong Consistency (can be tuned, such as Amazon DynamoDB) ; Better suited for storing and processing Big Data;  
  • Four categories of NoSQL: (Choosing type requires analysis of app requirements and NoSQL properties)
    • Key-value;  unique key, value in any format; useful for session info, user profiles; preference; shopping cart info;  don't use when data has relationships or need operate on multiple keys at same time;  Examples: Both on disk (Redis) or in Memory (Memcached), chose as app needs;   Others:  Amazon DynamoDB, Riak and Couchbase
    • Document;  can be XML, JSON, EDI, SWIFT etc;  documents in the value part; self-describing; hierarchical tree structures, collections and scalar values; indexed using Btree queried using javascript query engine;  Examples: MongoDB and CouchDB
    • Column family; store data in column families with row key;  column family as container for ordered collection of rows; rows can have different columns thus easy to add; can have super column (container of columns; map of columns with name and value ); Examples: HBase, Cassandra, Hyptertable
    • Graph: nodes/edges/properties based on graph theory; store entities and relationships (nodes and properties); relations as edges with properties and have directional significance; joins are very fast as they are persisted rather than calculated; need design, Examples: Neo4J, Infinite Graph, OrientDB
  • Apache Spark (Superstar project status at Apache) : A cluster computing framework for data processing (data science pipelines) and analysis; parallel on commodity H/W; fast and easy (API optimized for fast job processing); comprehensive  unified framework for Big Data Analytics; Apache Top Level project;  Allows rich APIs as SQL, ML, graph processing, ELT, streaming, interactive and batch processing; in depth analysis with SQL, statistics machine learning (ML); wide use cases (intrusion detection, prod recommend, estimate risks, genomic analysis etc);  Very fast because in-memory caching and DAG-based processing engine (100 times faster than MapReduce in memory and 10 times faster on disk); suitable for iterative algorithms in machine learning; fast real-time response to user queries (in-memory); low latency analysis for live data streams;  General purpose prog model: Java, Scala, Python;  Libraries/APIs for batch, stream, ML, queries etc;  Interactive shell for Python and Scala; Written in Scala on top of JVM (performance and reliable) ; Tools from Java Stack can be used; Strong integration with Hadoop ecosystem;  Support HDFS, Cassandra, S3 and HBase; It is not a data storage system; can access traditional BI tools and use JDBC/ODBC; DataFrame API pluggable using Spark SQL;  Built by UC Berkely AMPLab;  Motivated by MapReduce and apply Machine Learning in scalable fashion; 

Saturday, September 16, 2017

Study notes for Big Data - Part I

My study notes on Big Data - Part I

  • 4 V's of Big Data:  Volume, Variety, Velocity, Veracity or Variability
  • Format of Big Data: Structured (SQL), Semi-structured (EDI, SWIFT, XML), Unstructured (multimedia, text, images)
  • Big Data Analytics: Basic (report, dashboard,visualization,slice/dice); Advanced (ML, statistics, text analytics, neural networks, data mining); Operationalized (embed analytics in business process); Business decision (decision-making that drives $$$)
  • Big Data Trends:  Machine Learning; Embedding Intelligence; In the Cloud; IOT+BigData+Cloud; NoSQL; Real-time analytics; Challenges (Privacy, Discrimination, Spying, Hacking )
  • Cycle of Big Data Management: Capture - Organize - Integrate - Analyze - Act - Capture
  • Components:  Physical Infrastructure; Security Infra; Data Stores; Organize/Integrate; Analytics
  • Phys Infra:  Support 4 Vs;  Cloud: Perf/Avail/Scala/Flex/Cost
  • Security Infra:  Data Access; App Access; Data Encryption; Threat Detection
  • Data Store: DFS (HDFS); NoSQL (Cassandra, MongoDB); RDBMs (Oracle MySQL); Real-time (Kafka, Storm, Spark streaming )
  • DFS:  5 Transparencies: Access; Concurrency; Failure; Scalability; Replication
  • RDBMs:  ACID:  Atomicity; Consistency; Isolation; Durability
  • NoSQL:  Document-oriented; Column-oriented; Graph DB; Key-Value
  • Org/Integrate Data:  Cleaning; Transformation; Normalization
  • 2 types of data integration:  multiple data sources ;  unstructured source with structured big data
  • Process/Organize Big Data:  ETL; Hadoop's MapReduce;  Spark's SQL
  • ETL: Extract; Transform; Load
  • Data Warehouse:  RDBMs; by subject area; highly transformed; strictly defined use cases 
  • Hadoop's MapReduce: batch processing large volumes data (salable/resilient); Apache Spark:  complex analytics using ML models in an interactive approach
  • Data Lake:  Contains: all data; different types; "Schema-on-read"; agile/adapt to business changes quickly;   Hard to secure; commodity hardware in cluster; used for advance analytics by data scientists
  • Data Warehouse:  use case oriented; transactional/quantitative metrics; "Schema-on-write"; time-consuming when need modify business process;  Old Tech/Mature in Security; enterprise grade hardware;  Used for operational analysis/reports/KPIs/slices of data
  • Analyzing Big Data:  Predictive; Advanced (deep learning; speech/face/genomic); Social Media Ana; Text Ana; Alerts/Recommends/Prescriptive; Reports/Dashboard/Visualization; In summary: Basic BI Solution + Advanced Analytics; combines statistics, data mining, machine learning and have wide use cases: 
    • Descriptive Analytics (What happened):  Excel; RDBMS; Data Warehouse (IBM COgnos, Teradata);  Reporting (Jasper Reports); Business Intelligence (Tableau, Qlik);  Visualizations (Tableau, Qlik); Programming Languages (R, D3.js)
    • Predictive Analytics (What could happen in the future): Combines statistics, data mining and machine learning techniques: Linear Regression; Logistic Regression; Decision trees and Random forests; Naive Bayes theorem; Clustering; Neural network; Link analysis (graph theory),  Tools include: R, Apache Mahout, Apache Spark MLlib, H2O, NumPy, SciPy; IBM SPSS, SAS, SAP, RapidMiner; Google Prediction API, Amazon Machine Learning, Azure Machine Learning 
    • Prescriptive Analytics (What can I do to make this happen):  combines tools such as business rules, algorithms, machine learning, computational modeling  etc; Tools include: SAS, IBM, Dell Statistca