Best unofficial Apache Server developers community
Username
Forgot password?
Sign in with Twitter account
Sign in with Facebook account

Downsides of storing binary data in Riak?

2

66 views

What are the problems, if any, of storing binary data in Riak?

Does it effect the maintainability and performance of the clustering?

What would the performance differences be between using Riak for this rather than a distributed file system?

asked May 23, 2011 3:57 pm CDT
posted via StackOverflow

4 Answers

1
 

The only problem I can think of is storing binary data larger than 50MBs which they advise against. The whole point of Riak is just that:

Another reason one might pick Riak is for flexibility in modeling your data. Riak will store any data you tell it to in a content-agnostic way - it does not enforce tables, columns, or referential integrity. This means you can store binary files right alongside more programmer-transparent formats like JSON or XML.

Source: Schema Design in Riak - Introduction

answered May 23, 2011 4:23 pm CDT
0
 

I personally haven't noticed any issues storing data such as images and documents (both DOC and PDF) into Riak. I don't have performance numbers but might be able to gather some should I remember.

Something of note, with Riak you can use Luwak which provides an api for storing large files. This has been pretty useful.

answered May 23, 2011 4:23 pm CDT
0
 

One problem may be that it is difficult, if not impossible, to use JavaScript map/reduce across your binary data. You'll probably need Erlang for that.

answered May 23, 2011 4:23 pm CDT
0
 

Adding to @Oscar-Godson's excellent answer, you're likely to experience problems with values much smaller than 50MBs. Bitcask is best suited for values that are up to a few KBs. If you're storing large values, you may want to consider alternative storage backends, such as innostore.

I don't have experience with storing binary values, but we've a medium-sized cluster in production (5 nodes, on the order of 100M values, 10's of TBs) and we're seeing frequent errors related to inserting and retrieving values that are 100's of KBs in size. Performance in this case is inconsistent - some times it works, others it doesn't - so if you're going to test, test at scale.

We're also seeing problems with large values when running map-reduce queries - they simply time out. However that may be less relevant to binary values... (as @Matt-Ranney mentioned).

Also see @Stephen-C's answer here

answered May 23, 2011 4:23 pm CDT

Your answer

Join with account you already have


Sign in with Twitter account
Sign in with Facebook account
Sign in with Google Friend Connect

Preview
Similar questions