Bottoms-up Hadoop! Bad use cases for hadoop.

No doubt that Big Data has big momentum. With so much attention, quite a few folks including me have started working bottom-up on Big Data – we have a solution in the form of Hadoop and we are searching for a suitable problem.

Hadoop is not a magic bullet and it will be good for us to be aware of it’s limitations. I am putting together cases where the hadoop ecosystem is NOT a good fit.

Hadoop file system limitations

  • Not suitable for random read/write
  • Not suitable for transient data. Write once, read multiple times. Append works though.
  •  Not suitable for huge number of small size files due to name node memory issues.
  •  Data writes take more time due to multiple data copies
  • Suitable for unstructured data
  • Infrastructure setup is complex as with any distributed system
  • Not suitable for low latency applications requiring quick response.

MapReduce limitations

  • MapReduce is suitable for “batch query”. Does now work well for point queries.
  • Performs well while processing full data set – eg: sorting, aggregations. Not best suited for selecting and working on a smaller subset of data.

HBase limitations (addresses random access, suitability for structured data and low latency)

  • Works best in table scan scenarios or when searched by primary key.
  • Compromises on availability.
  • Hbase is NOT an RDBMS and SQL does not work, it’s more like a Map sorted by key.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s