Charles Babcock| Informationweek
Starting Tuesday, Joyent is offering a new type of storage combined with compute services. Joyent Manta Storage Service will keep data stored close to the servers that will analyze and work with it.
That description might be applied to Hadoop, which runs big data batch jobs on a single cluster. Joyent Manta will use MapReduce to bring data and compute together on one or several clusters. The clusters may be located in one or several data centers. And unlike Hadoop, the data will remain close to the compute resources on a persistent basis, eliminating the long extract, transform and load (ETL) processes that often accompany the execution of a big data job.
Thus, Manta Storage Service will be able to do a single job or a series of jobs, some of them batch, some of them near real-time analytics. Manta, in fact, could fit a Hadoop job into its compute and object storage system, and run a wide variety of other applications as well, depending on what its users want to do. As a service from Joyent, customers will be charged — by the second — for the Manta services they use.
In short, the development team at Joyent has taken many of the core principals of big data management and built them into a general purpose system on the Joyent infrastructure. It’s available as both a scalable big data and an elastic, general-purpose compute service. Because it potentially eliminates the long load times of other big data systems, it can be used for server log analysis, website visitor analysis, search index generation, exchange trading analysis and other uses involving large amounts of data being generated in real time, said Jason Hoffman, CTO and co-founder of Joyent.
Applications on Manta run in virtual machines on servers, and the system can fire up virtual machines close to the data in a second or third data center, as needed, to keep the compute resources and storage working closely together, Hoffman said.
Like Amazon’s S3 or Microsoft Azure’s Blob service, Manta is an object storage system that can be accessed through a single API. But it has no size limit on the objects that it stores nor limits on the number of objects it’s able to house. And unlike some NoSQL systems that risk a disconcerting, if momentary, loss of data integrity, Manta will be “strongly consistent on its writes, and highly available on its reads,” said Hoffman in an interview before the announcement. In other words, it will seek to combine near-relational-database data integrity with high-speed, Cassandra or MongoDB-type data availability.
When Manta gets a job to do, it’s able to break it down into parts related to the distribution of data, assign a virtual machine to each part and generate a virtual machine close to the location of the data with which it must work. Hoffman said it does so in a cost-effective manner.
Joyent will charge a price “slightly less” than Amazon Web Services’ S3 storage pricing, Hoffman said. Amazon currently charges 9.5 cents for the first TB of its standard S3 service a month.
For the compute side of the service, it will charge 0.004 cents for each GB of virtual machine processing power created. A task using 1,000 1-GB virtual machines would cost 4 cents per second, Hoffman said. A thousand 32-GB virtual machines running for one second would cost $1.28. Each virtual machine running on Manta may be assigned from a minimum of 32 GB of storage up to one TB.
One of the goals of the system is to bring workloads to Joyent that use its storage service and seek to get big data jobs done fast, Hoffman said.
The “new [Manta] compute-on-storage innovation is a fundamental paradigm shift that changes the economics and utility of object storage and high-performance big data analysis,” said Vish Nandlall, CTO and head of strategy at Ericsson North America, one of two users quoted in the announcement. Fifty percent of the world’s smartphone traffic goes through Ericsson data centers and it’s seeking ways to keep up with the data they generate, he noted.
“Copying data across a network from storage onto a compute cluster can take hours” using predecessor systems to Manta, said Konstantin Gredeskoul, CTO of Wanelo, an online shopping site. “We are now able perform complex cohort analysis and retention reports across hundreds of gigabytes of data in a couple of minutes,” he said.
Hoffman said Manta has been under development for nearly four years at Joyent by the same teams that have built the Joyent compute infrastructure. It was offered in a private beta for six months before Tuesday’s announcement.