SMACK Stack – A Modern Tool Set for Data Science
Date: May 8, 2017
Posted by:

What Is SMACK?

The combination of various powerful data science tools is not something new. However, even though most data science tool sets focus on delivering the key aspects of data analytics for big data scenarios, some of them go beyond that. One such tool set that is able to handle not just big data but also event processing and doing so in a very fast manner, is the SMACK stack, which stands for the names of the tools it comprises of: Spark (the main processing engine of the framework), Mesos (the container of the whole ecosystem), Akka (the data model), Cassandra (the system handling storage and retrieval of data), and Kafka (the broker).

Usefulness of the SMACK Stack

SMACK manages to combine a lot of features that make it very useful for many niche data science tasks. For example, Spark and Akka enable you to build data analysis pipelines that can handle both large data files as well as event processing. What’s more, they are able to work around any latency restrictions you may have and yield a throughput within the desired specifications. As for the coordination and administration of the various tasks, Mesos has you covered, although other scheduling systems such as Yarn could also be used. When it comes to the persistence and the distribution of events, you can rely on Cassandra, while Kafka can take care of anything related to event transport.

Although SMACK does not support modern technologies like Julia that are bound to be the norm in data science in the years to come, it does handle conventional tech such as Scala, Java, Python and R. Moreover, the whole framework of data governance (Spark) is significantly faster than the Hadoop one. What’s more, the whole stack is open-source, so you don’t have to worry about licensing and other fees, making the whole framework very cost-effective (your only cost would be hiring and/or training the data scientists who will make use of it).

In addition, SMACK is adept at handling stream data, which is quite common, particularly for companies that make use of dynamic data, such as that found on web logs. Yet, big data tends to be diverse as well (one of the Vs of big data stands for Variety), something that SMACK can also handle, due to its unique architecture. So, regardless of your big data problem, SMACK is bound to able to help you solve it.

Conclusions and Next Steps

SMACK is very popular today not because of some new innovative tech but because it combines the features of various technologies that enable it to bring a lot of value to the big data you have access to. Yet, it cannot do everything by itself, no matter how robust each one of its components is. Just like any other big data framework, SMACK requires knowledgeable and competent data science professionals, in order to make the most of it. Data Science Partnership is in the position to provide you access to such professionals, so that you too can gain the most of this powerful tool set. Feel free to reach out to us for more information.

Share with...

Zacharias Voulgaris

Zach is the Chief Technical Officer at Data Science Partnership. He studied Production Engineering and Management at the Technical University of Crete, shifted to Computer Science through a Masters in Information Systems & Technology (City University of London), and then to Data Science through a PhD on Machine Learning (University of London). He has worked at Georgia Tech as a Research Fellow, at an e-marketing startup in Cyprus as an SEO manager, and as a Data Scientist in both Elavon (GA) and G2 (WA). He also was a Program Manager at Microsoft, on a data analytics pipeline for Bing.


Leave a Reply

Your email address will not be published. Required fields are marked *