DriveScale: Delivering True Independence in Storage Scaling

A key benefit of hyperconverged infrastructure (HCI) is standardization. Each node in an HCI cluster brings a standardized, predictable unit of compute, network and storage. As your needs increase, you simply add another standardized node into the cluster, and your needs will get fulfilled. At the same time, a critique of HCI is that the standardized unit is not granular enough to independently scale compute versus storage in a granular manner.

For general enterprise applications, this lack of granularity may not be a major concern – generally, as companies grow, and demands increase, there is a correlation between the need for additional compute resources and storage resources. However, this critique is most evident in many Big Data use cases – where enterprises may leverage a very dense cluster of compute against enormous, disproportionate amounts of data.

In Big Data use cases, compute resources must be able to scale independently of storage resources. In other words, compute and storage resources must be disaggregated from each other. One approach is to utilize some form of networked storage, but this can pose two major problems for Big Data (especially Hadoop) use cases – first, Hadoop prefers to work with independent disks; and secondly, storage controllers and the access to them can become bottlenecks and configuring storage systems typically requires some specialization in the storage system itself.

A further problem for Hadoop uses cases is that traditional storage media uses a direct point to point interface which means that disks have to be wired directly to a given server or servers within a cluster, making the standing up and breaking down of clusters highly manual and cumbersome.

Introducing DriveScale

DriveScale solves these problems by introducing a low cost Ethernet to SAS converter, turning each drive into an iSCSI drive. Since disk drives now appear as Ethernet connected networked devices, the ability to assign and deploy many individual disks to a cluster, and reassign them becomes very easy. The cost competitive nature of 10Gbps interfaces eliminates concerns over performance, and DriveScale keeps the disks local to each rack so that bandwidth is high and latency is low, resulting in performance parity with local drives.

The result is simple: a cluster of nodes, can now address any number of available iSCSI drives as needed. Hadoop can use its native tools to aggregate the drives as it sees fit. Drives can be provisioned and reprovisioned without changing cabling or manual intervention.

The cost of implementing a DriveScale solution is marginal since, the only additional hardware necessary is the cost of the Ethernet to SAS bridge. Hadoop users can still take advantage of the commodity pricing of drives, since they are housed in simple JBOD shelves with no storage system overhead.

Our opinion

Neuralytix believes that what DriveScale has created is a very straightforward solution to a fast growing problems, especially for large Big Data users. The simplicity of the solution is its elegance.

While there are key-value Ethernet connected drives, such as the Seagate Kinetic drive, these drives are not designed for general purposes use cases, leveraging native interfaces and tools. Investing in very high-end scalable (typically monolithic) storage systems is not cost effective for Big Data uses cases.

Neuralytix sees opportunities not only for Big Data, but also commercial high-performance computing (HPC) environments.

With the maturity and accelerating acceptance of large scale Hadoop and Big Data deployments, Neuralytix believes DriveScale has found a market with pent up demand for its solution.

Read the original story by Ben Woo in Neuralytix