No doubt that Big Data has big momentum. With so much attention, quite a few folks including me have started working bottom-up on Big Data – we have a solution in the form of Hadoop and we are searching for a suitable problem.
Hadoop is not a magic bullet and it will be good for us to be aware of it’s limitations. I am putting together cases where the hadoop ecosystem is NOT a good fit.
Hadoop file system limitations
- Not suitable for random read/write
- Not suitable for transient data. Write once, read multiple times. Append works though.
- Not suitable for huge number of small size files due to name node memory issues.
- Data writes take more time due to multiple data copies
- Suitable for unstructured data
- Infrastructure setup is complex as with any distributed system
- Not suitable for low latency applications requiring quick response.
- MapReduce is suitable for “batch query”. Does now work well for point queries.
- Performs well while processing full data set – eg: sorting, aggregations. Not best suited for selecting and working on a smaller subset of data.
HBase limitations (addresses random access, suitability for structured data and low latency)
- Works best in table scan scenarios or when searched by primary key.
- Compromises on availability.
- Hbase is NOT an RDBMS and SQL does not work, it’s more like a Map sorted by key.