cassandra? what do ye daemons think?

I've played around with a small test cluster and Apache Spark ~3 years ago for a side-project.

Like any non-relational DB it has its strengths and therefore it may perfectly fit some use cases (e.g. HUGE amounts of periodic "write once, read sometimes" data like logs), but the overhead by the underlying JVM is gigantic, so I wouldn't consider it practical for smaller database requirements.
It also has (had?) major issues with data integrity - I occasionally saw up to 5% loss of data when putting test data consisting of ~6 million rows in 4 partitions. I also ran into problems with getting different results on queries, depending on the node the query was handled by - these problems ranged from (huge) differences in row counts to missing data in single queries.
This might (should!?) be fixed by now, but back then I considered it way too fragile for any production use; especially because the cluster broke from time to time for no apparent reason and updates were always causing issues...

A review from Jepsen from roughly the time when I did my testing:
https://aphyr.com/posts/294-jepsen-cassandra
 
Back
Top