Author: Treselle Engineering

The Beginner’s Guide to Remote Work & Running a Remote Team

The year 2020 will be known for a lot of things, not least of which is the rise of remote work. Companies like Facebook, Microsoft, Square and Spotify are all examples of huge companies transitioning to remote work, even beyond the forced “work from home” period enforced by mandated social distancing. Remote work, location-independent work, […]

9 Features of a High-Performing Leave Management System

In this guide, we’ll discuss nine aspects to consider if you want to create the best leave management system for your team. We’ll also talk about why leave management is important, and why it’s a good idea to use a leave tracking software. But before we jump in, let’s briefly cover what is the purpose […]

A Complete Guide to Leave Management for Remote Teams

As a business owner or HR manager, not tracking your employee vacation calendar can lead to a stressful situation down the road. You might end up in a position where too many employees take a leave of absence at the same time causing you to miss client deadlines. Or it might result in your customers […]

How to Create & Manage a Vacation Calendar for Remote Teams

Vacation days are a vital part of employee wellness. There’s nothing like a few days off from work to recharge and rejuvenate morale. These happy employees tend to exceed expectations, enjoy their work, and create a positive work atmosphere. With remote work on the rise, the question becomes; how do you manage vacation days for […]

Text Normalization with Spark – Part 2

Overview This is second in a two part series that talks about Text Normalization using Spark.In this blog post, we are going to understand the jargon (jobs,stags and executors) of Apache Spark with Text Normalization application using Spark history server UI. To get a better understanding of the use case, please refer our Text Normalization […]

Importing and Analyzing Data in Datameer

Overview Datameer, an end-to-end big data analytics platform, is built on Apache Hadoop to perform integration, analysis, and visualization of massive volumes of both structured and unstructured data. It can be rapidly integrated with any data sources such as new and existing data sources to deliver an easy-to-use, cost-effective, and sophisticated solution for big data […]

Kylo Setup for Data Lake Management

Overview Kylo is a feature-rich data lake platform built on Apache Hadoop and Apache Spark. It provides data lake solution enabling self-service data ingest, data preparation, and data discovery. It integrates best practices around metadata capture, security, and data quality. It contains many special purposed routines for data lake operations leveraging Apache Spark and Apache […]

Apache Spark Performance Tuning – Degree of Parallelism

Table of Content [show] 1 Overview 2 Spark Partition Principles 3 Understanding Use Case Performance 4 Understanding Spark Data Partitions 5 Spark Partition Tuning 5.1 Running Spark on YARN with Partition Tuning 6 Conclusion 7 References Overview This is the third article of a four-part series about Apache Spark on YARN. Apache Spark allows developers […]

Embrace Relationships with Neo4J, R & Java

2 Use Case 3 Solution 3.1 Prerequisites 3.2 Download StackOverflow Dataset 3.3 Data Manipulation with R 3.4 Create Nodes and Relationship file with Java 3.5 Create GraphDB with Batch Importer 3.6 Visualize Graph with Neo4j 4 Conclusion 5 References Introduction Graphs are everywhere, used by everyone, for everything. Neo4j is one of the most popular […]

Customer Churn – Logistic Regression with R

1 Overview 2 Learning/Prediction Steps 2.1 Data Description 2.2 Data Preprocessing 2.3 Partitioning the Data & Logistic Regression 2.4 Model Summary 2.5 Prediction Accuracy 3 References Overview In the customer management lifecycle, customer churn refers to a decision made by the customer about ending the business relationship. It is also referred as loss of clients […]