This is a blog post is a collaboration between StreamBase consultant Adam Diamant and CEO Mark Palmer, from StreamBase. It is best practice #4 in the series “The Best Practices of Real-Time Innovation” featured on the StreamBase event processing blog.
During a recent StreamBase customer tour, a theme developed around customers who incorporate statistical computing with their real-time analytical applications. For example, integrating CEP with Python (NumPy, SciPy), R, MATLAB, LAPACK, and Quantlib helps users perform statistical analysis against information in motion.
So best practice #4 is "Incorporate Statistical Computing with Event Driven Systems."
THE SITUATION: QUANTITATIVE ANALYSIS STUCK LOOKING BACKWARDS
Harvard business review calls data science the “sexiest job in the 21st century.” Even baseball teams like my own Boston Red Sox hire mathematicians to crunch data.
In most industries, data scientists are still looking at information in the rear-view mirror, simulating outcomes based on historical data. Most seek to predict long-term trends and make decisions on daily, weekly, monthly, or yearly boundaries.
But an increasing amount of data science challenges are being applied in the present, based on data that’s as close to live as possible. For example, in finance, quants use real-time statistical analysis on data that’s sometimes less than a millisecond old to perform real-time portfolio optimization, real-time asset liability modeling, real-time quantitative risk modeling, and real-time derivative pricing.
As mobile computing, sensor technology, and big data technologies continue to advance the volume, velocity, and variety of data accessible to data scientists, they will increasingly be required to analyze information in real-time, as it streams by.
THE SOLUTION: USE CEP AND STATISTICAL COMPUTING PACKAGES TOGETHER AGAINST LIVE DATA
Numerical computing environments (we'll use MATLAB for an example, since it's widely known) are often used for algorithm development, data analysis, visualization, matrix manipulations, and statistical analysis. In finance, quants use it to do time-series analysis, portfolio optimization, asset liability modeling, quantitative risk modeling, and derivative pricing.
These two environments can be integrated to perform continuous, on-the-fly, statistical analysis of data in motion, enabling automated decision making and online evaluation of live business conditions. The diagram above shows a working system that uses StreamBase to continuously absorb a stream of events from an external source (e.g., stock market data, sensor data, customer transactions) and be fed into MATLAB for real-time analysis of a stream of information.
StreamBase is used to connect, normalize and clean large amounts of streaming data. Next it calls MATLAB for statistical analysis and then StreamBase takes an action based on the results of the calculations. For example, in the diagram above, a matrix is passed from StreamBase to MATLAB, a QR decomposition is performed, and the result is passed back into StreamBase for use downstream. All of this is done in real-time by leveraging StreamBase’s ability to handle large amounts of streaming data with MATLAB’s ability to perform complex mathematical calculations.
The 6-part StreamBase tutorial by Adam Diamant demonstrates how to integrate MATLAB and StreamBase. The session covers:
1) Introduction: why incorporate StreamBase and statistical computing?
2) Opening and closing the MATLAB connection in StreamBase
3) Passing data from MATLAB to StreamBase
4) Evaluating expressions in StreamBase
5) Evaluating functions in MATLAB
6) Example integrations
The code used in the tutorial is free and available for download on the StreamBase Component Exchange (SBX) for those with a licensed copy of MATLAB. You can also download StreamBase for free at www.streambase.com.
THE IMPLICATION: FAMILIAR STATISTICAL ANALYTICS, BUT IN REAL-TIME
The resulting system architecture helps bring traditionally backwards-looking analytical tools into the world of real-time computing. Out-of-the-box integration makes it easy to bridge these two worlds – real-time and historical analytics – in just minutes, which opens the door to the data scientist to apply their algorithms to not only observe, but act on business conditions, while the actions still matter.
Post Script:
Some folks have had problems downloading the full resolution image of this app. Here it is: