Category Archives: Technology

Javascript – Loop promises one by one with a sleep in between

//Async task

let asyncTask = (input) => {

  //Below is a sample

  returnnewPromise((resolve, reject) => {

    console.log('Task no: '+input)

    resolve();

  })

};


//Loop function that throttles async tasks - one by one and with an sleep interval.

let loop = (count, oncompletion) => {

  //Execute main code and recurse if count > 0

  asyncTask(count).then(

    resonse=> {

    count--

    if(count) {

      setTimeout( ()=>loop(count), loopInterval );

    } else

      onCompletion();

  },

  error=> {

    console.log('Some error happend.');

  });

};


//This function is called back after looping through all async tasks.

let onCompletion = () => {

  console.log('Completed');

}


//Run the loop

let loopCount = 5;

let loopInterval = 1000;

loop(loopCount, loopInterval, onCompletion);

Enabling JWT authentication for plugin routes in HapiJS APIs

If you are using securing your HapiJS APIs using JWT, below is the code snippet most tutorials suggest:

server.register([
	{ register: require('hapi-auth-jwt') },
	{ register: require('./routes/test-route') }
	], 
	(err) => {
            if (err) {
              console.error('Failed to load a plugin:', err);
            } else {
			//For JWT 
			server.auth.strategy('token', 'jwt', {
				key: new Buffer(process.env.AUTH_CLIENT_SECRET,'base64'),
				verifyOptions: {
					algorithms:['HS256'],
					audience: process.env.AUTH_CLIENT_ID
				}
			});

			//For testing
			server.route({
				method: 'GET',
				path: '/',
				config: { auth: 'token' },
				handler: function (request, reply) {
					reply('API server running happi and secure!');
				}
			});
            }
        }
);

//Server start
server.start((err) => {
	if (err) {
		throw err;
	}
	console.log(`Server running at: ${server.info.uri}`);
});

In the “GET /” route, the config, auth: ‘token’ specifies that the token JWT auth strategy should be applied.
However, a problem might arise, when you want to include a route from a plugin – lets say a “GET /test” route needs to be added from ./routes/test-route.js.
In the test-route.js, when I added config: {auth: ‘token’} under “GET /test”, Hapi complains “Error: Unknown authentication strategy token in /test. This is because the auth strategy “token” is defined externally in server.js (if that’s your entry point).

The solution is to specify server.auth.default(‘token’); in your entry point or server.js. With this configuration, we don’t need to specify config : {auth: ‘token’} under each route. If we want to exclude a route from authenticating, we can specify config: {auth: false} under that route.

The solution looks like this:

server.register([
	{ register: require('hapi-auth-jwt') },
	{ register: require('./routes/test-route') }
	], 
	(err) => {
            if (err) {
              console.error('Failed to load a plugin:', err);
            } else {
			//For JWT 
			server.auth.strategy('token', 'jwt', {
				key: new Buffer(process.env.AUTH_CLIENT_SECRET,'base64'),
				verifyOptions: {
					algorithms:['HS256'],
					audience: process.env.AUTH_CLIENT_ID
				}
			});

			//This enables auth for routes under plugins too.
			server.auth.default('token');

			//For testing - auth included by default
			server.route({
				method: 'GET',
				path: '/',
				handler: function (request, reply) {
					reply('API server running hapi and secure!');
				}
			});

			//For testing - auth excluded through config
			server.route({
				method: 'GET',
				path: '/',
				config: { auth: false },
				handler: function (request, reply) {
					reply('API server running hapi!');
				}
			});
            }
        }
);

//Server start
server.start((err) => {
	if (err) {
		throw err;
	}
	console.log(`Server running at: ${server.info.uri}`);
});

Ionic 2, AngularFire 2 and Firebase 3

  1. Create a new Ionic 2 project:
    ionic start example-ionic blank --v2
  2. Install Firebase 3 and AngularFire 2
    cd example-ionic
    npm install angularfire2 firebase —save
    typings install file:node_modules/angularfire2/firebase3.d.ts --save --global
  3. In app.ts
    import {Component} from '@angular/core';
    import {Platform, ionicBootstrap} from 'ionic-angular';
    import {StatusBar} from 'ionic-native';
    import {HomePage} from './pages/home/home';
    
    import {
     defaultFirebase,
     FIREBASE_PROVIDERS
    } from 'angularfire2';
    
    const COMMON_CONFIG = {
     apiKey: "YOUR_API_KEY",
     authDomain: "YOUR_FIREBASE.firebaseapp.com",
     databaseURL: "https://YOUR_FIREBASE.firebaseio.com",
     storageBucket: "YOUR_FIREBASE.appspot.com"
    };
    
    @Component({
     template: '<ion-nav [root]="rootPage"></ion-nav>',
     providers: [
     FIREBASE_PROVIDERS,
     defaultFirebase(COMMON_CONFIG)
     ]
    })
    export class MyApp {
     rootPage: any = HomePage;
    
     constructor(platform: Platform) {
     platform.ready().then(() => {
     // Okay, so the platform is ready and our plugins are available.
     // Here you can do any higher level native things you might need.
     StatusBar.styleDefault();
     });
     }
    }
    
    ionicBootstrap(MyApp);
  4. In home.ts
    import {Component} from '@angular/core';
    import {NavController} from 'ionic-angular';
    
    import {
     AngularFire,
     FirebaseObjectObservable
    } from 'angularfire2';
    
    @Component({
     templateUrl: 'build/pages/home/home.html'
    })
    export class HomePage {
     item: FirebaseObjectObservable<any>;
    
     constructor(private navCtrl: NavController, af: AngularFire) {
     this.item = af.database.object('/item');
     }
    }
  5. In home.html
    <ion-header>
     <ion-navbar>
     <ion-title>
     Ionic Blank
     </ion-title>
     </ion-navbar>
    </ion-header>
    
    <ion-content padding>
     The world is your oyster.
     <p>
     {{ (item | async)?.name }}
     </p>
    </ion-content>
  6. In your firebase console, make sure have an object under /item with a “name” property. This is what we load in our example code above.
  7. Test by running the app.
    ionic serve

Become DevOps overnight: Continuous deployment for your scalable cloud app.

Some of the things we hate to spend time while development are setting up environments, building and deploying stuff. But good news is nowadays there are plenty of tools to solve this. In this post I would like to share a very quick way of becoming a DevOps overnight and automating all the boring part of getting your product running seamlessly as you develop.

Snapshot of my Tutum Services after setting up continuous deployment

#1. Iaas, Paas, Saas and tech stack decisions

At the start of our project we had to decide how our tech stack is going to be – our philosophy was to use Iaas for any stateless process or jobs like API servers or event processors. For persistence alone we decided to go with Saas solutions. We picked up NodeJS for APIs and Java / Python for daemon processes. Being part of Microsoft Bizspark, we run all these processes on Azure Linux instances. For temporary persistence, we found AWS pretty good performance or price wise and picked up Kinesis + DynamoDB. S3 was chosen for long term storage. The strategy was to be able to easily swap across cloud service providers at any point in future with almost no tight coupling with any vendor.

#2. Local development

Local development has to be as fast as possible – personally I find using docker in my early stages of development slows me down and also messes up my local machine with chunky images. So on my local machine I prefer to stick to run my apps in the standard way without any containerization.

#3. Dockerization

Docker is simply awesome when it comes to deploying programs to cloud instances. I can also easily horizontal scale test, load balance test with just multiple docker container instances on a single node. All that’s needed is to add a simple Dockerfile in every project directory. A NodeJS example is shown below.

FROM node:0.12

# Bundle app source
ADD . /src
# Install app dependencies
RUN cd /src; npm install

EXPOSE 3000
CMD ["node", "/src/app.js"]

#4. Continuous deployment with Tutum

Tutum is still in Beta, but it’s awesome and free (atleast for now)! The first step in setting up Tutum is to go to Account Info and add Cloud Providers and Source Providers – in our case it’s Microsoft Azure and GitHub. Tutum has a very clear definition of the components required for setting up a continuous deployment:

a. Repository – Here we create a new (private) repository in Tutum and link to our GitHub to sync on every update. The source code gets pulled from GitHub and docker images are built inside Tutum’s repository with every GitHub update.

b. Node – We can create Azure instances right from Tutum. You have to set up Tutum to be able to get access to Azure. Each instance is a Node in Tutum.

c. Services – A service is a process or a program that you run. Services can comprise of one or more docker containers depending upon if we scale or not. Services can be deployed on one or more nodes to horizontally scale.

While creating Nodes and Services, Tutum allows to specify tags like “dev”, “prod”, “front-end”, “back-end”. The tags determine on what nodes a service gets deployed. Thus we can have separate nodes for “front-end dev”, another for “font-end prod” etc.

Tutum is not super fast yet – I believe it’s mainly due to the time taken to build docker images. But still decent enough. For continuous deployment, we have to specify “Autodeploy” option while creating the service. Another good feature I found with Tutum is that there are jumpstart services like HA load balancer – it really makes setting up a high availability API cluster a breeze.

#5. Slack Integration

Like so many other startups we are quite excited about Slack. I have seen Slack integration with other Continuous Integration products like Circle CI and was surprised to see even Tutum Beta had that. I created a new channel in our Slack and from Integration settings enabled Incoming Webook – this gives an URL I have to paste in Tutum > Account Info > Notifications > Slack. And that’s it, we have a continuous deployment ready with all the bells and whistles.

Like I mentioned at the start there are multiple options to automate build, test and deployment. This post suggests a very economic yet scalable solution using Tutum – and literally I was able to learn and get everything running overnight!

Back to learning Grammar with ANTLR

This post is going to be about language processing. Language processing could be anything like an arithmetic expression evaluator, a SQL parser or even a compiler or interpreter. Many times when we build user facing products, we give users a new language to interact with the product. Say, if you had used JIRA for project management, it gives you a Jira Query Language. Google also has a language to search as documented here – https://support.google.com/websearch/answer/136861?hl=en. Splunk has it’s own language called SPL. How to build such a system is what we will see in this post.

A test use case

I always believe to learn something we need to have a problem to solve that can serve as a use case. Let’s say I want to come up with a new language that’s simpler than SQL. Say I want the user to be able to key in the below text:

Abishek AND (country=India OR city=NY) LOGIN 404 | show name city

And this should fetch name and city fields from a table where the text matches “Abishek” and Abishek could be either in some city in India or gone to New York. We also need to filter results that contain the text LOGIN and 404 as we are trying to trace what happened when Abishek was trying to login but landed with some error codes. Say the data is in a database, what we need here is a language parser to understand the input and then a translator that can translate to SQL so that we can run the query on DB.

What is ANTLR?

From antler.org, ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. And ANTLR can greatly help solve our use case pretty quickly. There are few more similar tools like javacc etc, but I found ANTLR to the well documented and top project in this space.

The first step: Grammar

When we want a parser, an approach many take is to go and write the parser from scratch. I remember doing so in an interview where I was asked to write an arithmetic expression evaluator. Though this approach works – it’s not the best choice when you have complex operators, keywords and many choices. Choices are an interesting thing – If you know Scala you will realise 5 + 3 is the same as 5.+(3). Usually there is more than one way to do things, in our example we could either say “LOGIN AND 404” or just say “LOGIN 404”. Grammar involves identifying these choices, sequences and tokens.

ANTLR uses a variant of the popular LL(*) parsing technique (http://en.wikipedia.org/wiki/LL_parser) which takes a top down approach. So we define the grammar from top down – fist look at what the input is – Say a file input can have a set of statements – statements can be classified into different statement types based on identifying patterns and tokens. Then statements can be broken down in different types of expressions and expressions can contain operators and operands.

In this approach a quick grammar I came up for our use case is like below:

grammar Simpleql;

statement : expr command* ; 
expr : expr ('AND' | 'OR' | 'NOT') expr # expopexp
 | expr expr # expexp
 | predicate # predicexpr
 | text # textexpr
 | '(' expr ')' # exprgroup
 ;
predicate : text ('=' | '!=' | '>=' | '<=' | '>' | '<') text ; 
command : '| show' text* # showcmd
 | '| show' text (',' text)* # showcsv
 ;
text : NUMBER # numbertxt 
 | QTEXT # quotedtxt
 | UQTEXT # unquotedtxt
 ;

AND : 'AND' ;
OR : 'OR' ;
NOT : 'NOT' ;
EQUALS : '=' ;
NOTEQUALS : '!=' ;
GREQUALS : '>=' ;
LSEQUALS : '<=' ;
GREATERTHAN : '>' ;
LESSTHAN : '<' ;

NUMBER : DIGIT+
 | DIGIT+ '.' DIGIT+
 | '.' DIGIT+
 ;
QTEXT : '"' (ESC|.)*? '"' ;
UQTEXT : ~[ ()=,<>!\r\n]+ ;

fragment
DIGIT : [0-9] ;
fragment
ESC : '\\"' | '\\\\' ; 

WS : [ \t\r\n]+ -> skip ;

Going by top down approach:

  • We can see than in my case, my input is a statement.
  • A statement comprises of an expression part and a command part.
  • Expression has multiple patterns – it can be two expressions connected by an expression operator.
  • Expression can be internally two expression without an explicit operator between them.
  • An expression can be predicate – A predicate is of the patter <text> <operator> <text>
  • An expression can be just a text. Eg: We just want to do full text search on “LOGIN”.
  • An expression can be an expression inside brackets for grouping.
  • A command has a command starting with a pipe, then a command like “show” followed by arguments.

Creating the Lexer, Parser and Listener

With ANTLR, once you come up with the grammar, you are close to done! ANTLR generates the lexer, parser and listener code for us. Lexer helps with breaking our input into tokens. We usually don’t deal with the Lexer. What we will use is the Parser – The Parser can give us a parsed expression tree like shown below.

Parsed Tree

ANTLR also gives you a tree walker than can traverse the tress and gives you a base listener with methods that get called when the traverser is navigating the tree. All I had to implement the translator was to extend the listener and overwrite the methods for the nodes I am interested in and use a stack push the translations at each node. And that’s all, my robust translator was ready pretty fast. I am not going to post about ANTLR setup and running guide here, because that’s quite clear in their documentation. But feel free to reach out to me incase of an clarifications!

Apache Spark’s missing link for Realtime Interactive Data Analysis

Spark and Interactive Data Analysis

Interactive data analysis is a scenario when we have a human asking a data question and he expects an answer in human time. Another characteristic of interactive data analysis is that usually a series of questions are asked – an operations analyst investigating site traffic might first want to group by geographic location and then drills down to other demographics like device type, user agent and finally filtering by a suspicious IP. A main requirement here is the ability to cache the data as multiple queries are fired on the same data set – this is where Apache Spark fits naturally. Spark’s RDDs can be cached in memory with graceful fallback which is many times faster than reading from disk and selecting the informative data set every time.

Adding a “Realtime” scenario

The word “realtime” has become a little confusing lately. There are two kinds of realtime here: First, the data needs realtime ingestion and being available for action or querying immediately. Second, a user asks a query and looks for an immediate answer in real time. The second case is same as interactive analysis, the first case is what we’ll focus now.

So, the use case I wish to solve with Spark is realtime and interactive analysis. At first look, Spark looks great with Spark SQL for simplifying access, Spark Streaming for realtime data and the core Spark for data on a Hadoop compatible source. The catch here is how to view and query both streaming data and historical data as a single RDD. In many cases like log files, of click stream events we have a realtime data stream and historical data which functionally is a single table. However, Spark Streaming and Spark design is similar to lambda architecture where you have a separate speed layer and separate batch layer and querying on merged view is a challenge.

The workaround I find here is to keep ingesting the data in realtime into Hadoop and keep recomputing the RDDs for each query or at a particular frequency, but this takes away the advantages of caching RDDs for future queries. I do understand this is an intentional design limitation of RDDs. Well, a problem or a limitation is an opportunity to improvise and I am looking to prototype a solution for this use case. Will be glad to hear any ideas in this space.

Existing solutions: In-memory DBs

Druid Architecture

Druid Architecture

The existing solution for the use case we have been talking about is to use in memory DBs like MemSQL (not open source) or Druid. These DBs are columnar and designed ground up for analytics. However, point to note is that these in memory DBs expect structured data. So we cannot ingest a plain text log file directly into these systems and extract fields for querying like how we do with Spark. However, if dealing with structured data, these in-memory DBs should be a great fit.

Thanks,
Abishek, LogBase

Powering my daily commute with analytics

For the last couple of days I have been wondering what I could do to save on my commute time. Everyday I travel ~25km, and in Bangalore traffic it takes aways 90mins of my time. Of course with experience we learn which route is better and when traffic is lighter, but it would be good if I had data backing and help save few more minutes.

Bangalore Traffic

Bangalore Traffic

I wanted to be able to answer questions like:

  • Can I start a little late on Mondays than Friday?
  • What is the optimum time to start my commute?
  • Which route is good for which day?
  • If it rained an hour back, how does it affect my commute?
  • How is the traffic different for different months?
  • At what rate is the traffic slowing me down every month?
  • When should I take out my car instead of motorcycle?

… and more

I started thinking of doing a small hobby project – a small analytics platform for helping me collect data and analyze. Initially thought of building a Rasperry Pi data logger, but decided to start with a cheaper version using my android phone. There are existing android apps that track your location and can plot a map or speed chart – But I wanted raw data so that I have the flexibility to come up with my own queries.

Data Collection

App User Interface

App User Interface

I wrote a native android app that I can start when I start my commute every day and stop at the end. The app takes the location data from the device GPS and stores it in it’s embedded SQLite database. Later whenever my device is connected to the internet, the app allows me to post the collected data to a cloud storage.

Cloud Storage

I wanted to store collected data in a Cloud Database so that data sync is available easily. I wrote a very simple REST API that allows posting JSON and hosted the server in Heroku. This API receives logged data from the mobile device and in turn persists in MongoHQ who generously gave out 512MB of free storage. For now I felt this is sufficient and wanted lower cost than scalability.

Analytics Interface

My requirement here is to be able fetch data from my Mongo cloud, slice and dice it on the fly, be able to plot charts and do some statistical analysis in an interactive way. I used Python and it’s a great fit here – it has a mongodb client, matplotlib for plotting charts, almost no learning curve compared to R, IPython and notebook interfaces along with plenty of modules for data analytics.

Test Run

I took a short ride out in the night – so there wasn’t traffic. But I did halt a minute in-between and came back. There is still some scope for calibrating the logging frequency as I found out my logger showed 1.2km in total against my bike showing 1.6km. But quite a good test run to start with as it did capture my halt!

Tripmeter after test run

Tripmeter after test run

Data logged in MongoHQ

Data logged in MongoHQ

Dist Vs Time Plot

Dist Vs Time Plot

The plan is to keep collecting data, try some predictive analytics with this and keep exploring insights.

If you would like to collect and play with your commute data feel free to fork my project on GitHub:

https://github.com/cyberabis/logstr

https://github.com/cyberabis/logstr-server