Analytica > Blogs > Which data model will work best for big data?

Which data model will work best for big data?

Sean Salleh
October 7, 2013

In the old days (say, pre-Apple iPhone), you could be cool about your data model. Relatively speaking, there wasn’t that much variety in the data to be had in legacy databases or over the web. Straightforward relationships between pieces of data or records could usually be defined for data identification, storage, access and manipulation. But now with big data, the day of the traditional data model may be over. Most of the information explosion is unstructured, unruly and unmanageable. How can modelers work with big data to reach meaningful conclusions, if the data model changes every time they blink?

Approach #1 – Change nothing, use what you have

Some applications built to handle big data impose their own data model. While this may optimize performance and results in some cases, it penalizes them in others. Industrial-strength modeling solutions work better overall when data modelers can flip between different data models, according to the problem at hand. While in some cases tweaking may suffice, in others it’s a full data model redesign that is required. This is one of the advantages that Analytica offers: data models can be built up, modified, slimmed down or extended with ease compared to other solutions, including spreadsheets.

Approach #2 – Ignore big data and its models, it may go away

But no, it won’t go away. On the other hand, it’s valid to question just how much big data is either useful or necessary. Sure, in some cases you can reveal hitherto hidden customer behavior by processing petabytes of web transaction data and hit pay dirt, or anticipate leaks in cooling systems for nuclear power plants. But in many cases, it’s not a case of having more data; it’s a case of homing in on the right subset of data. In fact, having more data leads to higher risk of data overfitting and spurious data correlations.

Approach #3 – Become less wrong about big data models over time

Big data as an elephant needs to be eaten one piece at a time (so to speak). Too little data is problematical because erroneous conclusions can be drawn all too quickly. Toss a coin just three times and you’ll see why. Toss a coin three hundred thousand times and the results should average out at half the tosses as heads, the others as tails. So, goes the theory, is the case with big data. As more and more data is processed, data models are improved and so are the conclusions reached. If you think this sounds rather like increasing the runs of a Monte Carlo simulation, you’re right. In this sense, Monte Carlo methods like the ones integrated into Analytica have been handling ‘big data’ for years.

Approach #4 – Recognize the potential and the limitations of big data

Before diving into big data models headfirst, this approach offers a pragmatic assessment: big data is useful when it does things that small data can’t. In other words, if the pre-big data and the models you have now give you good answers, then don’t go chasing big data rainbows. If your results don’t tally with reality however, and big data offers better perspectives, then that’s the way to go – using Analytica’s flexibility to help you define and redefine data models until you get to ones you need for improved results.

If you’d like to know how Analytica, the modeling software from Lumina, can help you with data models and models both big and small, then try a free evaluation of Analytica to see what it can do for you.

Sean Salleh

Sean Salleh is a data scientist with experience in guiding marketing strategy from building marketing mix models, forecasting models, scenario planning models, and algorithms. He is passionate about consumer technologies and resource management. He has master's degrees in Operations Research from University of California Irvine and Mathematics from Northeastern University.