• Skip to primary navigation
  • Skip to main content
  • Skip to footer
Bluetab

Bluetab

an IBM Company

  • SOLUTIONS
    • DATA STRATEGY
    • Data Readiness
    • Data Products AI
  • Assets
    • TRUEDAT
    • FASTCAPTURE
    • Spark Tune
  • About Us
  • Our Offices
    • Spain
    • Mexico
    • Peru
    • Colombia
  • talent
    • Spain
    • TALENT HUB BARCELONA
    • TALENT HUB BIZKAIA
    • TALENT HUB ALICANTE
    • TALENT HUB MALAGA
  • Blog
  • EN
    • ES

Blog

How much is your customer worth?

October 1, 2020 by Bluetab

How much is your customer worth?

Bluetab

Our client is a multinational leader in the energy sector with investments in extraction, generation and distribution, with a significant presence in Europe and Latin America. It is currently developing business intelligence initiatives, exploiting its data with embedded solutions on cloud platforms. 

The problem it had was big because, to generate any use case, it needed to consult countless sources of information generated manually by various departments, including text files and spreadsheets, but not just that, it also had to use information systems ranging from Oracle DB to Salesforce. 

«The problem it had was big because, to generate any use case, it needed to consult countless sources of information generated manually»

The solution was clear; all the necessary information needed to be concentrated in a single, secure, continually available, organised and, above all, cost-efficient place. The decision was to implement a Data Lake in the AWS Cloud.

In project evolution, the client was concerned about the vulnerabilities of its local servers, where they have had some problems with service availability and even the a computer virus intrusion, for which /bluetab proposes to migrate the most critical processes completely to the cloud. These include a customer segmentation model, developed in R. 

Segmenting the customer portfolio requires an ETL developed in Python using Amazon Redshift as DWH, where a Big Data EMR cluster is also run on demand with tasks developed in Scala to handle large volumes of transaction information generated on a daily basis. The process results that were previously hosted and exploited from a MicroStrategy server now were developed in reports and dashboards using Power BI.

«…the new architecture design and better management of cloud services in their daily use enabled us to optimise cloud billing, reducing OPEX by over 50%»

Not only did we manage to integrate a significant quantity of business information into a centralised, governed repository, but the new architecture design and better management of cloud services in their daily use enabled us to optimise cloud billing, reducing OPEX by over 50%. Additionally, this new model enables accelerated development of any initiative requiring use of this data, thereby reducing project cost.

Now our customer wants to test and leverage the tools we put into their hands to answer a more complex question: how much are my customers worth?

Its traditional segmentation model in the distribution business was based primarily on an analysis of payment history and turnover. In this way they predict the possibility of defaults in new services, and the potential value of the customer in billing terms. All of this, crossed with financial statement information, still formed a model with ample room for improvement.

«At /bluetab we have experience in development of analytical models that ensure efficient and measurable application of the most suitable algorithms for each problem and each data set»

At /bluetab we have experience in development of analytical models that ensure efficient and measurable application of the most suitable algorithms for each problem and each data set, but the market now provides solutions as very mature analytical models that, with minimum parametrisation, enable good results while drastically reducing development time. As such, we used a well-proven CLV (Customer Lifetime Value) model to help our client evaluate the potential life-cycle value of its customers.

We have incorporated variables in the new scenario such as After-sales service costs (such as recovery management, CC incident resolution costs, intermediary billing agent costs, etc.), and Provisioning logistics costs into the customers’ income and expense data, making it possible to include data on geographical positioning for distribution costs, market maturity in terms of market share, or crossing with information provided by different market sources. This means our client can make better estimates of the value of its current and potential customers, and perform modelling and forecasting of profitability for new markets or new services.

The potential benefit from application of the analytical models depends on less “sexy” aspects, such as consistent organisation and governance of the data in the back office, the quality of the data provisioning the model, implementation of the model following DevOps best practices and constant communication with the client to ensure business alignment and to be able to extract/visualise conclusions of value from the information obtained. And at /bluetab we believe this is only possible with expert technical knowledge and a deep commitment to understanding our clients’ businesses.

«The potential benefit from application of the analytical models is only possible with expert technical knowledge and a deep commitment to understanding our clients’ businesses»

Do you want to know more about what we offer and to see other success stories?
DISCOVER BLUETAB

SOLUTIONS, WE ARE EXPERTS

DATA STRATEGY
DATA FABRIC
AUGMENTED ANALYTICS

You may be interested in

Spying on your Kubernetes with Kubewatch

September 14, 2020
READ MORE

Hashicorp Boundary

December 3, 2020
READ MORE

Starburst: Construyendo un futuro basado en datos.

May 25, 2023
READ MORE

Basic AWS Glue concepts

July 22, 2020
READ MORE

Myths and truths of software engineers

June 13, 2022
READ MORE

Gobierno del Dato: Una mirada en la realidad y el futuro

May 18, 2022
READ MORE

Filed Under: Blog, tendencias

We have a Plan B

September 17, 2020 by Bluetab

We have a Plan B

Bluetab

The Reto Pelayo Vida (Pelayo Life Challenge, an annual event for women survivors of cancer) sponsored by /bluetab, has always sought remote parts of the planet, distant locations and extraordinary landscapes, but the world is experiencing times we have not seen before, and the safety of those on the expedition must be ensured. So Plan B has been put into effect, in line with the circumstances:

The expedition will depart from Bilbao and, in three stages, will circumnavigate the entire peninsula, passing through the Strait and stopping in Malaga and Valencia to finish in Barcelona.

The Reto Pelayo Vida 2020 in numbers

nautical miles
+ 0 mil
candidates
0
Olympic champion
0
draught
0 m
feet in length
0
beam
0 m

For another year, in line with our Equality Plan, we bluetabbers are a vital support for these women

Do you want to know more about what we offer and to see other success stories?
DISCOVER BLUETAB

SOLUTIONS, WE ARE EXPERTS

DATA STRATEGY
DATA FABRIC
AUGMENTED ANALYTICS

You may be interested in

MDM as a Competitive Advantage in Organizations

June 18, 2024
READ MORE

Bank Fraud detection with automatic learning II

September 17, 2020
READ MORE

$ docker run 2021

February 2, 2021
READ MORE

IBM to acquire Bluetab

July 9, 2021
READ MORE

Some of the capabilities of Matillion ETL on Google Cloud

July 11, 2022
READ MORE

Data Mesh

July 27, 2022
READ MORE

Filed Under: Blog, Noticias

Bank Fraud detection with automatic learning II

September 17, 2020 by Bluetab

Bank Fraud detection with automatic learning II

Bluetab

A model to catch them all!

The creation of descriptive and predictive models is based on statistics and recognising patterns in groups with similar characteristics. We have created a methodology that enables detection of anomalies using historical ATM channel transaction behaviour for one of the most important financial institutions in Latin America.

Together with the client and a group of statisticians and technology experts, we created a tool for audit processes that facilitates the detection of anomalies in the fronts that is most important and open to action for the business. From operational issues, availability issues and technology errors to internal or external fraud.

The ATM channel represents an area of business that is subject to direct contact with the public who use it, and is vulnerable for reasons such as connectivity and hardware faults. The number of daily transactions in a network of over 2,000 ATMs involves a huge number of indicators and metrics of technological and operational natures. Currently, a group of auditors is given the task of manually sampling and analysing this data stream to identify risks in the operation of the ATM channel.

The operational characteristics mean that the task of recognising patterns is different for each ATM, as the technology in each unit and the volume of transactions is influenced by factors such as the seasonal phenomenon, demography and even the economic status of the area. To tackle this challenge, /bluetab developed a framework around Python and SQL for segmentation of the most appropriate typologies according to variable criteria, and the detection of anomalies across a set of over 40 key indicators for channel operation. This involved unsupervised learning models and time series that enable us to differentiate between groups of comparable cash machines and achieve more accurate anomaly detection.

The purely mathematical results of this framework were condensed and translated into business-manageable vulnerability metrics, which we developed together with the end user in Qliksense. In this way, we handed the client an analysis environment that covers all the significant operational aspects, but which also enables incorporation of other scenarios on demand.

Now the auditors can analyse months of information considering the temporary market situation, geographic location or technological and transactional characteristics, where they previously only had the capacity to analyse samples.

We are working with our client to drive initiatives to incorporate technology and make the operation more efficient and speed up the response time in case of any incidents.

Do you want to know more about what we offer and to see other success stories?
DISCOVER BLUETAB

SOLUTIONS, WE ARE EXPERTS

DATA STRATEGY
DATA FABRIC
AUGMENTED ANALYTICS

You may be interested in

Snowflake, el Time Travel sin DeLorean para unos datos Fail-Safe.

February 23, 2023
READ MORE

Incentives and Business Development in Telecommunications

October 9, 2020
READ MORE

Snowflake Advanced Storage Guide

October 3, 2022
READ MORE

Data Governance: trend or need?

October 13, 2022
READ MORE

Cómo preparar la certificación AWS Data Analytics – Specialty

November 17, 2021
READ MORE

5 common errors in Redshift

December 15, 2020
READ MORE

Filed Under: Blog, tendencias

Bank Fraud detection with automatic learning

September 17, 2020 by Bluetab

Bank Fraud detection with automatic learning

Bluetab

The financial sector is currently immersed in a fight against bank fraud, with this being one of its biggest challenges. Spanish banking saw an increase of 17.7% in 2018 in claims for improper transactions or charges compared to the previous year and in 2017 alone there were over 123,064 on-line fraud incidents against companies and individuals.

The banking system is confronting the battle against fraud from a technological point of view. It is currently in the midst of a digitisation process and, with investments of around 4,000 million euros per year, it is putting its efforts into adoption of new technologies such as Big Data and Artificial Intelligence. These technologies are intended to improve and automate various processes, including detection and management of fraud.

At /bluetab we are undertaking a variety of initiatives within the technological framework of Big Data and Artificial Intelligence in the financial sector. Within the framework of our “Advanced Analytics & Machine Learning” initiatives, we are currently collaborating on Security and Fraud projects where, through the use of Big Data and Artificial Intelligence, we are able to help our clients create more accurate predictive models.

So, how can automatic learning help prevent bank fraud?. Focusing on collaborations within the fraud area, /bluetab addresses these types of initiatives based on a series of transfers identified as fraud and a data set with user sessions in electronic banking. The challenge is to generate a model that can predict when a session may be fraudulent by targeting the false positives and negatives that the model may produce.

Understanding the business and the data is critical to successful modelling.

In overcoming these kinds of technological challenges, we have noted how the use of a methodology is of vital importance in addressing these challenges. At /bluetab we make use of an in-house, ad-hoc adaptation for Banking of the CRISP-DM methodology in which we distinguish the following phases:

  • Understanding the business
  • Understanding the data
  • Data quality
  • Construction of intelligent predictors
  • Modelling


We believe that in On-line Fraud detection projects understanding the business and the data is of great importance for proper modelling. Good data analysis lets us observe how these are related to the target variable (fraud), as well as other statistical aspects (data distribution, search for outliers, etc.) which are of no less importance. You can note the presence in these analyses of variables with great predictive capacity, which we call “diamond variables”. Attributes such as the number of visits to the website, the device used for connection, the operating system or the browser used for the session (among others) are usually strongly related to bank fraud. In addition, the study of these variables shows that, individually, they can cover over 90% of fraudulent transactions. That is, analysing and understanding the business and the data enables you to evaluate the best way of approaching a solution without getting lost in a sea of data.

Once you have the understanding of the business and the data and after having obtained those variables with greater predictive power, it is essential to have tools and processes that ensure the quality of those variables. Training the predictive models with reliable variables and historical data is indispensable. Training with low-quality variables could lead to erratic models with great impacts within the business.

After ensuring the reliability of the selected predictor variables, the next step is to construct intelligent predictor variables. Even though these variables, selected in the previous steps, have a strong relationship with the variable to be predicted (target), they can result in certain problems in behaviour when modelling, which is why data preparation is necessary. This data preparation involves making certain adaptations to the variables to be used within the algorithm, such as the handling of nulls or of categorical variables. Additionally, proper handling of the outliers identified in the previous steps must be performed to avoid including information that could distort the model.

With the aim of “tuning” the result, it is similarly of vital importance to apply various transformations to the variables to improve the model’s predictive value. Basic mathematical transformations such as exponential, logarithmic or standardisation, together with more complex transformations such as WoE, make it possible to substantially improve the quality of the predictive models thanks to the use of more highly processed variables, facilitating the task of the model.

Finally, the modelling stage focuses on confronting different types of algorithms with different hyperparameter configurations to get to the model that generates the best prediction. This is where tools such as Spark help to a great extent, by being able to carry out training of different algorithms and configurations quickly, thanks to distributed programming.

For sustainability of your application and to avoid model obsolescence, this methodology needs to be followed monthly in each use case and more frequently when dealing with an initiative such as bank fraud. This is because new forms of fraud may arise that are not covered by the trained models. This means it is important to understand and to select the variables with which to retrain the models so that they do not become obsolete over time, which could seriously harm the business.

In summary, a good working methodology is vital when addressing problems within the world of Artificial Intelligence and Advanced Analytics, with phases for understanding the business and the data being essential. Having specialised internal tools to enable these types of projects to be executed in just a few weeks is now a must, to generate quick wins for our clients and their business.

Do you want to know more about what we offer and to see other success stories?
DISCOVER BLUETAB

SOLUTIONS, WE ARE EXPERTS

DATA STRATEGY
DATA FABRIC
AUGMENTED ANALYTICS

You may be interested in

LakeHouse Streaming on AWS with Apache Flink and Hudi (Part 1)

April 11, 2023
READ MORE

Using Large Language Models on Private Information

March 11, 2024
READ MORE

Oscar Hernández, new CEO of Bluetab LATAM.

May 16, 2024
READ MORE

CDKTF: Otro paso en el viaje del DevOps, introducción y beneficios.

May 9, 2023
READ MORE

Databricks on Azure – An Architecture Perspective (part 1)

February 15, 2022
READ MORE

How much is your customer worth?

October 1, 2020
READ MORE

Filed Under: Blog, tendencias

Spying on your Kubernetes with Kubewatch

September 14, 2020 by Bluetab

Spying on your Kubernetes with Kubewatch

Bluetab

At Cloud Practice we aim to encourage adoption of the cloud as a way of working in the IT world. To help with this task, we are going to publish numerous good practice articles and use cases and others will talk about those key services within the cloud.

This time we will talk about Kubewatch.

What is Kubewatch?

Kubewatch is a utility developed by Bitnami Labs that enables notifications to be sent to various communication systems.

Supported webhooks are:

  • Slack
  • Hipchat
  • Mattermost
  • Flock
  • Webhook
  • Smtp

Kubewatch integration with Slack

The available images are published in the bitnami/kubewatch GitHub

You can download the latest version to test it in your local environment:

$ docker pull bitnami/kubewatch 

Once inside the container, you can play with the options:

$ kubewatch -h

Kubewatch: A watcher for Kubernetes

kubewatch is a Kubernetes watcher that publishes notifications
to Slack/hipchat/mattermost/flock channels. It watches the cluster
for resource changes and notifies them through webhooks.

supported webhooks:
 - slack
 - hipchat
 - mattermost
 - flock
 - webhook
 - smtp

Usage:
  kubewatch [flags]
  kubewatch [command]

Available Commands:
  config      modify kubewatch configuration
  resource    manage resources to be watched
  version     print version

Flags:
  -h, --help   help for kubewatch

Use "kubewatch [command] --help" for more information about a command. 

For what types of resources can you get notifications?

  • Deployments
  • Replication controllers
  • ReplicaSets
  • DaemonSets
  • Services
  • Pods
  • Jobs
  • Secrets
  • ConfigMaps
  • Persistent volumes
  • Namespaces
  • Ingress controllers

When will you receive a notification?

As soon as there is an action on a Kubernetes object, as well as creation, destruction or updating.

Configuration

Firstly, create a Slack channel and associate a webhook with it. To do this, go to the Apps section of Slack, search for “Incoming WebHooks” and press “Add to Slack”:

If there is no channel created for this purpose, register a new one:

In this example, the channel to be created will be called “k8s-notifications”. Then you have to configure the webhook at the “Incoming WebHooks” panel and adding a new configuration where you will need to select the name of the channel to which you want to send notifications. Once selected, the configuration will return a ”Webhook URL” that will be used to configure Kubewatch. Optionally, you can select the icon (“Customize Icon” option) that will be shown on the events and the name with which they will arrive (“Customize Name” option).

You are now ready to configure the Kubernetes resources. There are some example manifests and also the option of installing by Helm on the Kubewatch GitHub However, here we will build our own.

First, create a file “kubewatch-configmap.yml” with the ConfigMap that will be used to configure the Kubewatch container:

apiVersion: v1
kind: ConfigMap
metadata:
  name: kubewatch
data:
  .kubewatch.yaml: |
    handler:
      webhook:
        url: https://hooks.slack.com/services/<your_webhook>
    resource:
      deployment: true
      replicationcontroller: true
      replicaset: false
      daemonset: true
      services: true
      pod: false
      job: false
      secret: true
      configmap: true
      persistentvolume: true
      namespace: false 

You simply need to enable the types of resources on which you wish to receive notifications with “true” or disable them with “false”. Also set the url of the Incoming WebHook registered previously.

Now, for your container to have access the Kubernetes resources through its api, register the “kubewatch-service-account.yml” file with a Service Account, a Cluster Role and a Cluster Role Binding:

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: kubewatch
rules:
- apiGroups: ["*"]
  resources: ["pods", "pods/exec", "replicationcontrollers", "namespaces", "deployments", "deployments/scale", "services", "daemonsets", "secrets", "replicasets", "persistentvolumes"]
  verbs: ["get", "watch", "list"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kubewatch
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: kubewatch
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kubewatch
subjects:
  - kind: ServiceAccount
    name: kubewatch
    namespace: default 

Finally, create a “kubewatch.yml” file to deploy the application:

apiVersion: v1
kind: Pod
metadata:
  name: kubewatch
  namespace: default
spec:
  serviceAccountName: kubewatch
  containers:
  - image: bitnami/kubewatch:0.0.4
    imagePullPolicy: Always
    name: kubewatch
    envFrom:
      - configMapRef:
          name: kubewatch
    volumeMounts:
    - name: config-volume
      mountPath: /opt/bitnami/kubewatch/.kubewatch.yaml
      subPath: .kubewatch.yaml
  - image: bitnami/kubectl:1.16.3
    args:
      - proxy
      - "-p"
      - "8080"
    name: proxy
    imagePullPolicy: Always
  restartPolicy: Always
  volumes:
  - name: config-volume
    configMap:
      name: kubewatch
      defaultMode: 0755 

You will see that the value of the “mountPath” key will be the file path where the configuration of your ConfigMap will be written within the container (/opt/bitnami/kubewatch/.kubewatch.yaml). You can expand the information on how to mount configurations in Kubernetes here. In this example, you can see that our application deployment will be through a single pod. Obviously, in a production system you would need to define a Deployment with the number of replicas considered appropriate to keep it active, even in case of loss of the pod.

Once the manifests are ready apply them to your cluster:

$ kubectl apply  -f kubewatch-configmap.yml -f kubewatch-service-account.yml -f kubewatch.yml 

The service will be ready in a few seconds:

$ kubectl get pods |grep -w kubewatch

kubewatch                                  2/2     Running     0          1m 

The Kubewatch pod has two containers associated: Kubewatch and kube-proxy, the latter to connect to the API.

$   kubectl get pod kubewatch  -o jsonpath='{.spec.containers[*].name}'

kubewatch proxy 

Check through the logs that the two containers have started up correctly and without error messages:

$ kubectl logs kubewatch kubewatch

==> Config file exists...
level=info msg="Starting kubewatch controller" pkg=kubewatch-daemonset
level=info msg="Starting kubewatch controller" pkg=kubewatch-service
level=info msg="Starting kubewatch controller" pkg="kubewatch-replication controller"
level=info msg="Starting kubewatch controller" pkg="kubewatch-persistent volume"
level=info msg="Starting kubewatch controller" pkg=kubewatch-secret
level=info msg="Starting kubewatch controller" pkg=kubewatch-deployment
level=info msg="Starting kubewatch controller" pkg=kubewatch-namespace
... 
$ kubectl logs kubewatch proxy

Starting to serve on 127.0.0.1:8080 

You could also access the Kubewatch container to test the cli, view the configuration, etc.:

$  kubectl exec -it kubewatch -c kubewatch /bin/bash 

Your event notifier is now ready!

Now you need to test it. Let’s use the creation of a deployment as an example to test proper operation:

$ kubectl create deployment nginx-testing --image=nginx
$ kubectl logs -f  kubewatch kubewatch

level=info msg="Processing update to deployment: default/nginx-testing" pkg=kubewatch-deployment 

The logs now alert you that the new event has been detected, so go to your Slack channel to confirm it:

The event has been successfully reported!

Now you can eliminate the test deployment:

$ kubectl delete deploy nginx-testing 

Conclusions

Obviously, Kubewatch does not replace the basic warning and monitoring systems that all production orchestrators need to maintain, but it does provide an easy and effective way to extend control over the creation and modification of resources in Kubernetes. In this example case we performed a Kubewatch configuration across the whole cluster, “spying” on all kinds of events, some of which are perhaps useless if the platform is maintained as a service, as we would be aware of each of the pods created, removed or updated by each development team in its own namespace, which is common, legitimate and does not add value. It may be more appropriate to filter by the namespaces for which you wish to receive notifications, such as kube-system, which is where we generally host administrative services and where only administrators should have access. In that case, you would simply need to specify the namespace in your ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: kubewatch
data:
  .kubewatch.yaml: |
    namespace: "kube-system"
    handler:
      webhook:
        url: https://hooks.slack.com/services/<your_webhook>
    resource:
      deployment: true
      replicationcontroller: true
      replicaset: false 

Another interesting utility may be to “listen” to our cluster after a significant configuration adjustment, such as our self-scaling strategy, integration tools and so on, as it will always notify us of the scale ups and scale downs, which could be especially useful initially. In short, Kubewatch extends control over clusters, and we decide the scope we give it. In later articles we will look at how to manage logs and metrics productively.

Do you want to know more about what we offer and to see other success stories?
DISCOVER BLUETAB

SOLUTIONS, WE ARE EXPERTS

DATA STRATEGY
DATA FABRIC
AUGMENTED ANALYTICS

You may be interested in

Container vulnerability scanning with Trivy

March 22, 2024
READ MORE

CLOUD SERVICE DELIVERY MODELS

June 27, 2022
READ MORE

Introduction to HashiCorp products

August 25, 2020
READ MORE

De documentos en papel a datos digitales con Fastcapture y Generative AI

June 7, 2023
READ MORE

LakeHouse Streaming on AWS with Apache Flink and Hudi (Part 2)

October 4, 2023
READ MORE

Boost Your Business with GenAI and GCP: Simple and for Everyone

March 27, 2024
READ MORE

Filed Under: Blog, Practices, Tech

Introduction to HashiCorp products

August 25, 2020 by Bluetab

Introduction to HashiCorp products

Jorge de Diego

Cloud DevOps Engineer

At Cloud Practice we look to encourage the use of HashiCorp products and we are going to publish thematic articles on each of them to do this.

Due to the numerous possibilities their products offer, this article will provide an overview, and we will go into detail on each of them in later publications, showing unconventional use cases that show the potential of HashiCorp products.

Why HashiCorp?

HashiCorp has been developing a variety of Open Source products over recent years that offer cross-cutting infrastructure management in cloud and on-premises environments. These products have set standards in infrastructure automation.

Its products now provide robust solutions in provisioning, security, interconnection and workload coordination fields.

The source code for its products is released under MIT licence, which has been well received within the Open Source community (over 100 developers are continually providing improvements)

In addition to the products we present in this article, they also have attractive Enterprise solutions.

As regards the impact of its solutions, Terraform has become a standard on the market. This means HashiCorp is doing things right, so understanding and learning about its solutions is appropriate.

Although we are focusing on their use in the Cloud environment, their solutions are widely used in on-premises environments, but they reach their full potential when working in the cloud.

What are their products?

HashiCorp has the following products:

  • Terraform: infrastructure as code.
  • Vagrant: virtual machines for test environments.
  • Packer: automated building of images.
  • Nomad: workload “orchestrator”.
  • Vault: management of secrets and data protection.
  • Consul: management and discovery of services in distributed environments.


We summarise the main product features below. We will go into detail on each of them in later publications.

Terraform has positioned itself as the most widespread product in the field of provisioning of Infrastructure as Code.

It uses a specific language (HCL) to deploy infrastructure through the use of various Cloud providers. Terraform also lets you manage resources from other technologies or services, such as GitLab or Kubernetes.

Terraform makes use of a declarative language and is based on having deployed exactly what is present in the code.

The typical example shown to see the difference with the paradigm followed, for example, by Ansible (procedural) is:

Initially we want to deploy 5 EC2 instances on AWS:

Ansible:

- ec2:
    count: 5
    image: ami-id
    instance_type: t2.micro 

Terraform:

resource "aws_instance" "ejemplo" {
  count         = 5
  ami           = "ami-id"
  instance_type = "t2.micro"
} 

As you can see, there are virtually no differences between the coding. Now we need to deploy two more instances to cope with a heavier workload:

Ansible:

- ec2:
    count: 2
    image: ami-id
    instance_type: t2.micro 

Terraform:

resource "aws_instance" "ejemplo" {
  count         = 7
  ami           = "ami-id"
  instance_type = "t2.micro"
} 

At this point, we can see that while in Ansible we would set the number of instances to be deployed at 2 so that a total of 7 are deployed, in Terraform we would set the number directly to 7 and Terraform knows that it needs to deploy 2 more because there are already 5 deployed.

Another important aspect is that Terraform does not need a Master server such as Chef or Puppet. It is a distributed tool whose common, centralised element is the tfstate file (explained in the next paragraph). It is launched from any site with access to the tfstate (which can be remote, stored in common storage such as AWS S3) and with Terraform installed, which is distributed as a binary downloadable from the HashiCorp website.

The last point to note about Terraform is that it is based on a file called tfstate, in which the information relating to infrastructure status is progressively stored and updated, which it consults to see whether changes need to be made to the infrastructure. It is very important to note that this state is what Terraform knows. Terraform will not connect to AWS to see what is deployed. This means it is necessary to avoid making changes manually to the infrastructure deployed by Terraform (and never Manual changes, ever), as the tfstate file is not updated and inconsistencies will therefore be created.

Vagrant enables you to deploy test environments locally, quickly and very simply, based on code. By default it is used with VirtualBox, but it is compatible with other providers such as VMware or Hyper-V. Machines can be deployed on Cloud providers such as AWS by installing plug-ins. I personally cannot find advantages in using Vagrant for this function.

The machines it deploys are based on boxes, and you indicate which box you want to deploy from the code. Its mode of operating can be compared with Docker in this aspect. However, the basis is completely different. Vagrant creates virtual machines with a virtualisation tool below it (VirtualBox), while Docker deploys containers and its support is a containerisation technology, Vagrant versus Docker.

Typical use cases can be to test Ansible playbooks, recreate a laboratory environment locally in just a few minutes, making a demo, etc.

Vagrant is based on a Vagrantfile. Once located in the directory where this Vagrantfile is located, by running vagrant up, the tool begins to deploy what is indicated within that file.

The steps to launch a virtual machine with Vagrant are:

1. Install Vagrant. It is distributed as a binary file, like all other HashiCorp products.

2. With an example Vagrantfile:

Vagrant.configure("2") do |config|
  config.vm.box = "gbailey/amzn2"
end 

PS: (the parameter 2 within configure refers to the version of Vagrant, which I had to check for the post)

3. Run within the directory where the Vagrantfile is located

vagrant up 

4. Enter the machine using ssh

vagrant ssh 

5. Destroy the machine

vagrant destroy 

Vagrant manages ssh access by creating a private key and storing it in the .vagrant directory, apart from other metadata. The machine can be accessed by running vagrant ssh. The machines deployed can also be displayed in the VirtualBox application.

With Packer you can build automated machine images. It can be used, for example, to build an image on a Cloud provider, such as AWS, already with an initial configuration, and to be able to deploy it an indefinite number of times. This means that you only need to provision it once, and when you deploy the instance with that image you already have the desired configuration without having to spend more time provisioning it.

A simple example would be:

1. Install Packer. Similarly, it is a binary that will need to be placed in a folder located on our path.

2. Create a file, e.g. builder.json. A small script is also created in bash (the link shown in builder.json is dummy):

{
  "variables": {
    "aws_access_key": "{{env `AWS_ACCESS_KEY_ID`}}",
    "aws_secret_key": "{{env `AWS_SECRET_ACCESS_KEY`}}",
    "region": "eu-west-1"
  },
  "builders": [
    {
      "access_key": "{{user `aws_access_key`}}",
      "ami_name": "my-custom-ami-{{timestamp}}",
      "instance_type": "t2.micro",
      "region": "{{user `region`}}",
      "secret_key": "{{user `aws_secret_key`}}",
      "source_ami_filter": {
        "filters": {
            "virtualization-type": "hvm",
            "name": "amzn2-ami-hvm-2*",
            "root-device-type": "ebs"
        },
        "owners": ["amazon"],
        "most_recent": true
      },
      "ssh_username": "ec2-user",
      "type": "amazon-ebs"
    }
  ],
  "provisioners": [
    {
        "type": "shell",
        "inline": [
            "curl -sL https://raw.githubusercontent.com/example/master/userdata/ec2-userdata.sh | sudo bash -xe"
        ]
    }
  ]
} 

Provisioners use builtin and third-party software to install and configure the machine image after booting. Our example ec2-userdata.sh

yum install -y \
  python3 \
  python3-pip \

pip3 install cowsay 

3. Run:

packer build builder.json 

And we would already have an AMI provisioned with cowsay installed. Now it will never be necessary for you, as a first step after launching an instance, to install cowsay because we will already have it as a base, as it should be.

As you would expect, Packer not only works with AWS, but also with any Cloud provider such as Azure or GCP. It also works with VirtualBox and VMware and a long list of builders. You can also create an image by nesting builders in the Packer configuration file that is the same for various Cloud providers. This is very interesting for multi-cloud environments, where different configurations need to be replicated.

Nomad is a workload orchestrator. Through its Task Drivers they execute a task in an isolated environment. The most common use case is to orchestrate Docker containers. It has two basic actors: Nomad Server and Nomad Client. The two actors are run with the same binary, but different configurations. Servers organise the cluster, while Agents run the tasks.

A little “Hello world” in Nomad could be:

1. Download and install Nomad and run it (in development mode; this mode must never be used in production):

nomad agent -dev 

Once this has been done, you can access the Nomad UI at localhost:4646

2. Run an example job that will launch a Grafana image

nomad job run grafana.nomad 

grafana.nomad:

job "grafana" {
    datacenters = [
        "dc1"
    ]

    group "grafana" {
        task "grafana"{
            driver = "docker"

            config {
                image = "grafana/grafana"
            }

            resources {
                cpu = 500
                memory = 256

            network {
                mbits = 10

                port "http" {
                    static = "3000"
                }
            }
            }
        }
    }
} 

3. Access localhost:3000 to check that you can access grafana and access the Nomad UI (localhost:4646) to view the job

4. Destroy the job

nomad stop grafana 

5. Stop the Nomad agent. If it has been run as indicated here, pressing Control-C on the terminal where it is running will suffice

In short, Nomad is a very light orchestrator, unlike Kubernetes. Logically, it does not have the same features as Kubernetes, but it offers the ability to run workloads at high availability easily and without the need for a large amount of resources. We will deal with examples in greater detail in the post on Nomad to be published in the future.

Vault manages secrets. Users or applications can request secrets or identities through its API. These users are authenticated in Vault, which has a connection to a trusted identity provider such as an Active Directory, for example. Vault works with two types of actors, like the other tools mentioned, Server and Agent.

When Vault starts up, it starts in a sealed state (Seal) and various tokens are generated for unsealing (Unseal). An actual use case would be too long for this article and we will look at this in detail in the article on Vault alone. However, you can download and test it in dev mode as noted previously for Nomad. In this mode, Vault starts up in Unsealed mode, storing everything in memory without need for authentication, without using TLS, and with a single sealing key. You can play with its features and explore it quickly and easily in this way.

See this link for further information.

1. Install Vault. Like the rest of HashiCorp’s products, it is distributed as a binary through its website. Once done, launch (this mode must never be used in production):

vault server -dev 

2. Vault will display the root token used to authenticate against Vault. It will also display the single sealing key it creates. Vault is now accessible via localhost:8200

3. In another terminal, export the following variables to be able to conduct tests:

export VAULT_ADDR='http://127.0.0.1:8200'
export VAULT_DEV_ROOT_TOKEN_ID="token-showed-when-running-vault-dev" 

4. Create a secret (in the terminal where the above variables were exported)

vault kv put secret/test supersecreto=iewdubwef293bd8dn0as90jdasd 

5. Recover that secret

vault kv get secret/test 

These steps are the Hello world of Vault, to check its operation. We will look in detail at the operation and installation of Vault in the article on it, as well as at further features such as the policies, exploring the UI, etc.

Consul is used to create a services network. It operates in a distributed manner and a Consul client will run in a location where there are services that you want to register. Consul includes Health Checking of services, a key value database and features to secure network traffic. Like Nomad and Vault, Consul has two primary actors, Server and Agent.

Just as we did with Nomad and Vault, we are going to run Consul in dev mode to show a small example defining a service:

1. Install Consul. Create a working folder and within it create a json file with a service to be defined (example folder name: consul.d).

web.json file (HashiCorp website example):

{
  "service": {
    "name": "web",
    "tags": [
      "rails"
    ],
    "port": 80
  }
} 

2. Run the Consul agent by pointing it to the configuration directory created before where web.json is located (this mode must never be used in production):

consul agent -dev -enable-script-checks -config-dir=./consul.d 

3. From now, although there is nothing actually running on the indicated port (80), a service has been created with a DNS name that can be requested to access it through that DNS name. Its name will be NAME.service.consul. In our case NAME is web. Run to query:

dig @127.0.0.1 -p 8600 web.service.consul 

Note: Consul runs by default on port 8600.

Similarly, you can dig for a record of SRV type:

dig @127.0.0.1 -p 8600 web.service.consul SRV 

You can also query through the service tags (TAG.NAME.service.consul):

dig @127.0.0.1 -p 8600 rails.web.service.consul 

And you can query through the API:

curl http://localhost:8500/v1/catalog/service/web 

Final note: Consul is used as a complement to other products, as it is used to create services on other tools already deployed. That is why this example is not too illustrative. Apart from the specific article on Consul, it will be found as an example in other articles, such as the one on Nomad.

Future articles will also explain Consul Service Mesh for interconnection of services via Sidecar proxies, federation of Nomad Datacentres in Consul to perform deployments or intentions, which are used to define rules under which services can communicate with another service.

Conclusions

All these products have great potential in their respective areas and offer cross-cutting management that can help to avoid the famous vendor lock-in by Cloud providers. But the most important thing is their interoperability and compatibility.

¿Quieres saber más de lo que ofrecemos y ver otros casos de éxito?
DESCUBRE BLUETAB
Jorge de Diego
Cloud DevOps Engineer

My name is Jorge de Diego and I specialise in Cloud environments. I usually work with AWS, although I have knowledge of GCP and Azure. I joined Bluetab in September 2019 and have been working on Cloud DevOps and security tasks since then. I am interested in everything related to technology, especially security models and Infrastructure fields. You can spot me in the office by my shorts.

SOLUTIONS, WE ARE EXPERTS

DATA STRATEGY
DATA FABRIC
AUGMENTED ANALYTICS

You may be interested in

El futuro del Cloud y GenIA en el Next ’23

September 19, 2023
READ MORE

Leadership changes at Bluetab EMEA

April 3, 2024
READ MORE

Azure Data Studio y Copilot

October 11, 2023
READ MORE

Mi experiencia en el mundo de Big Data – Parte I

October 14, 2021
READ MORE

Cómo depurar una Lambda de AWS en local

October 8, 2020
READ MORE

Mi experiencia en el mundo de Big Data – Parte II

February 4, 2022
READ MORE

Filed Under: Blog, Practices, Tech

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 6
  • Page 7
  • Page 8
  • Page 9
  • Go to Next Page »

Footer

LegalPrivacy Cookies policy

Patron

Sponsor

© 2025 Bluetab Solutions Group, SL. All rights reserved.