With the development of Big Data technologies, more companies, institutions and universities are turning to Hadoop data processing framework for their data solution.
Hadoop becomes a preferred solution for many institutions because of its linear scalability, affordability, versatility and strong community support.
Big data implementation in Indonesia is still not as fast as expected. This could be attributed to some factors, such as:
Special Skills Needed
Although Hadoop has a very complete supporting ecosystems, to implement a Hadoop-based system that fits the needs required relatively deep expertise in terms of installation, configuration and monitoring of each of these components.
Professional Support Relatively Expensive and Limited
The absence of local support for Hadoop implementations often result in less cost effective solutions, especially for the enterprise level. Since professional support must be obtained from overseas provider. The support provided is often limited, such as by email or call, due to time and distance constraints.
To answer the above problems, SOLUSI247 as one of the leading Big Data implementor in Indonesia launched a Hadoop distribution package or distro named Yava Hadoop. These distribution is developed by local experts, and will be supported by idBigData, the big data community in Indonesia.
Data Store and Resource Manager
Reliability and linear scalability of HDFS provides storage for data with a variety of formats. Providing a broad selection of solutions based on the needs and available resources by supporting both commodity servers and high-level servers. YARN cluster management system enables various processes running on top of HDFS.
With the support from YARN cluster management, Yava provides a wide range of data access and processing capabilities: batch, streaming, interactive and real time in a single cluster.
MapReduce for batch processing, Phoenix, Hive and Tez for SQL based processing, scripting with Pig, HBase for NoSQL, searching with Solr, streaming with Storm, Spark for in memory process, and Mahout for data mining and machine learning. Yarn resource manager allows a cluster to fulfill different processing needs, avoiding costly and inconvenient redundancy of data.
Apache Sqoop allows effective data transfer between Hadoop and structured data sources such as Teradata, Netezza, Oracle, MySQL, Postgres and HSQLDB.
Apache Flume is used for streaming large amounts of data into HDFS, for example, logs from the production machine or network. Flume provides simple and flexible architecture, with reliable failover and recovery mechanisms.
Apache Ambari is a framework for provisioning, managing and monitoring Apache Hadoop cluster. Ambari provides a simple and elegant user interface. It can be integrated with existing operational tools, such as Microsoft System Center and Teradata Viewpoint.
Apache Zookeeper™ provides a distributed configuration, synchronization, and naming registry for distributed systems. Zookeeper is used to store and manage critical changes in configurations.
Apache Oozie provides the tools for workflow scheduling to manage jobs in Enterprise Hadoop.
HGrid247 is a comprehensive ETL process designer. Hgrid247 ease of use and intuitive drag-and-drop interface simplify and reduce time and complexity of data integration. By eliminating Java or other programming for MapReduce, Spark, Storm and Tez jobs and scripts, HGrid247 empowers developers and designers to develop big data jobs using visual tools. This will speed up the work and increase team productivity, because ETL developers and designers can focus on the design of the process.