Kylo Setup for Data Lake Management


Kylo is a feature-rich data lake platform built on Apache Hadoop and Apache Spark. It provides data lake solution enabling self-service data ingest, data preparation, and data discovery. It integrates best practices around metadata capture, security, and data quality. It contains many special purposed routines for data lake operations leveraging Apache Spark and Apache Hive.

Furthermore, it provides a flexible data processing framework (leveraging Apache NiFi) for building batch or streaming pipeline templates and for enabling self-service features without compromising governance requirements. It has an integrated metadata server currently compatible with databases such as MySQL and Postgres. It can be integrated with Apache Ranger or Sentry and CDH Navigator or Ambari for cluster monitoring.

Kylo’s web application layer offers features oriented to business users, including data analysts, data stewards, data scientists, and IT operations personnel. It utilizes Apache NiFi as its scheduler and orchestration engine for providing an integrated framework for designing new types of pipelines with 200 processors (data connectors and transforms).


  • Install MySQL (password: hadoop).
    Optional: change “bind_ip” to in /etc/mysql/my.cnf file and restart MySQL to enable access from outside server.
  • Ensure that “/opt/” has root privileges.
  • Download Java8 and extract to /opt/java8.
    Source: wget –no-check-certificate –no-cookies –header “Cookie: oraclelicense=accept-securebackup-cookie” -P /opt/

Note: Ensure that jdk1.8.92 or above is configured. Else, module “kylo-alerts-default” will not be compiled.

  • Download Scala and extract data into /opt/scala2.
    Source: wget -P /opt/
    wget -P /opt/
  • Download Spark2 and extract data into /opt/spark2
    Source: wget -P /opt/
  • Download Maven3 using binary and extract data into /opt/maven3
    Source: wget -P /opt/
  • In Maven module, install Alien in Ubuntu for RPM package. It will build both RPM & deb packages.
  • Set environment variables in ~/.bashrc & “/etc/profile (for all users)” file.
    • JAVA_HOME=/opt/java8
    • JRE_HOME=/opt/java8/jre
    • SCALA_HOME=/opt/scala2
    • SPARK_HOME=/opt/spark2
    • MAVEN_HOME=/opt/maven3
    • M2_HOME=/opt/maven3
  • Open new session in Putty or execute the below command to load added environment variables.

Test Configuration

  • Check whether Java, Scala, and Maven are properly configured.
  • Check whether Spark is properly configured.
  • Note: Move all the downloaded tar files into another directory called “tar_files”.

Building, Installing, and Setting up Kylo using Deb Package in Linux Ubuntu Machine

Downloading Kylo from GitHub

  • Download Kylo from the GitHub location provided in the Reference section.
  • Extract zip file: unzip

Executing Maven to Create Deb

  • Create deb using the below comment:
    It will take around 10-20 mins to download packages.
  • Clean and compile all class files, and package all modules (core, UI, service, setup) into RPM & deb packages using the below comment:
  • Skip unit testing for faster Maven builds using the below comment:
  • If you already have downloaded packages, run MVN in offline mode using the below command:

    Note: “mvn clean install” will create both RPM & deb packages. To build only one package, go to install module (/opt/kylo-master/install/) and execute the below command after building all other modules:

Copying Deb

Copy deb from “/opt/kylo-master/install/target/deb/kylo-x.x.x-SNAPSHOT.deb” to “/opt/kylo/setup” using the below command:

Creating Users and its Groups

  • Create the following users:
    useradd -r -m -s /bin/bash nifi
    useradd -r -m -s /bin/bash kylo
    useradd -r -m -s /bin/bash activemq
    useradd -r -m -s /bin/bash elasticsearch
  • Check whether groups were created for the above users in “/etc/group”.
  • If not, create groups for the users by executing the below command:

Installing kylo.deb

Install kylo.deb that has packaged whole setup in it using the below command:

Downloading Binary Files

To download all required binary files (JDK, Elasticsearch, ActiveMQ, Apache NiFi) locally, run the below script:

These files will be added to the below directories with different user privileges.

  • Directory: /opt/kylo/
  • Directory: /opt/kylo/setup

Setting up Binary Files

  • Run the below script to setup JDK, Elasticsearch, ActiveMQ, and NiFi.
  • To download it offline, run the below script:


  • Before executing the above script, ensure that spark home is setup.
  • During setup, perform the following:
    • Choose MySQL and carefully provide connection details (host: localhost, username: root, password: hadoop).
    • Give “y” 3 times to install Elasticsearch, ActiveMQ, and NiFi.
    • Choose Java option [3] and provide home “/opt/java8”.

Once setup wizard is completed, the below services will be added:

  • ActiveMQ
  • Elasticsearch
  • kylo-service
  • kylo-spark-shell
  • kylo-ui
  • NiFi

Note: Manually install the services if any of the services is not installed.
For example, Nifi: cd /opt/nifi/current/bin/; ./nifi start

  • Check whether the tables are created in MySQL databases. If not, execute the below command:

Starting Server

To start the server (kylo-ui, kylo-services, kylo-spark-shell), execute the below script:

Checking Service Status

  • Check status of all services using the below script:
  • Run the below script to check the Kylo All Services:
  • Run the below script to check Nifi service:
  • Run the below script to check ActiveMQ service:
  • Run the below script to check Elasticsearch service:

Accessing UI

  • Open Kylo UI by accessing the URL: http://{IP}:8400/
  • Provide login credentials as “Username / password: dladmin / thinkbig”


ActiveMQ is not Running

Problem: ActiveMQ is not running and shows an error.

The problem is caused as ActiveMQ reads the JAVA_HOME file only from any of the below locations where ever if finds the file first even if the file is defined in /etc/environment.

  • /etc/default/activemq
  • $HOME/.activemqrc
  • $INSTALLDIR/apache-activemq-/bin/env

Solution: Add “JAVA_HOME=/opt/java8” in the first line of the file “/etc/default/activemq” and start it.

Elasticsearch is not Running

Problem: Elasticsearch is not running and shows an error when trying to start.

This problem is caused as JAVA_HOME setup alone is done in “root” user environment. But, Elasticsearch will run in “elasticsearch” user.

Solution: Add “JAVA_HOME=/opt/java8” in the first line of the file “/etc/init.d/elasticsearch” and restart it.

Else, install Java using apt-get. When running on Ubuntu or Debian, the package comes with OpenJDK due to licensing issues. To fix this Java path problem, run the below command:

Kylo-spark-shell is not Running

Problem: Kylo-spark-shell is not running and shows the below error in the file “/var/log/kylo-services/kylo-spark-shell.err”.

Solution: Add environment variables such as JAVA, Spark, and Scala (if possible all variables) in “/etc/profile” and make sure that the environment variables are set for all users.

Kylo-alerts-default is not Compiling

Problem: Kylo-alerts-default is not compiling and throws an error.

Solution: Make sure that jdk1.8.92 or above is configured. Else, module “kylo-alerts-default” will not be compiled.

Integrating with Hortonworks

  • Login into namenode server and execute the below commands to add users into HDFS:
    • For kylo-service node
    • For namenode / masternode
  • Change metastore configuration in property file “/opt/kylo/kylo-services/conf/”.
    • hive.datasource.url=jdbc:hive2://xxxxxxxx:10000/default
    • hive.datasource.username=hive
    • hive.datasource.password=hive
    • nifi.service.hive_thrift_service.database_connection_url=jdbc:hive2://xxxxxxxx:10000/default
  • Restart server.


Kylo is a feature-rich data lake platform built on Apache Hadoop and Spark. Now, you can successfully setup Kylo.

In the upcoming blog, we will discuss about changing NiFi component’s configurations (for example, HiveThriftConnection and so on) of existing template and creating new templates.