In 2009, StreamBase began talking publicly about using CEP with Twitter. The topic sparked a flurry of mainstream media coverage about Complex Event Processing (CEP) on Wall Street on outlets such as CNBC, the Wall Street Journal, and the Financial Times. Even Popular Mechanics named CEP as one of 10 Tech Concepts to watch, along with Fracking and Flywheel Energy Storage! (watch this short video for an introduction to CEP)
Last week, USA Today published an article about CEP and trading on the front page of its Money section: Wall Street traders mine tweets to gain competitive edge. The report covered developments such as:
- The opening of Derwent Capital, a firm that specializes in news-driven trading strategies, including Twitter
- A report from Adam Honore from AITE group whose two recent reports (on trading strategies and on unstructured data) revealed that interest in unstructured data has jumped from 2% in 2008 to 35% in 2011
- Rich Brown, who talked about how the Thomson Reuters news sentiment engine is more effective than the "quirky and abbreviated language of Twitter"
- Johan Bollen, a professor of informatics at Indiana University, whose research claims an 87% accuracy rate in using Twitter mood measurements to predict Dow stock prices three to four days later
- MarketPsych, a firm that used to run sentiment-based strategies, switched from being a hedge fund to a software provider
The article shows that Twitter continues to be a controversial topic: some tweets are clearly important - for example, the CME Group tweets news about the commodities market and has over 780,000 followers. On the other hand, computer-measured sentiment is just that - one more barometer of what the masses are thinking. Good investment decisions are powered by experience and knowledge, and sentiment is one input of many.
The noise over use of Twitter on Wall Street drowns out the real story, which is why I was quoted in the article that Twitter is "a ripple, not a wave" on Wall Street (and Main Street as well).
The big wave is big data, and Twitter generates just a few buckets of sand in world of big data today.
In "New Rules for Big Data," the Economist reported that Google processes a petabyte of data an hour. Here's a way to put Twitter in context*: think of a bucket of sand. If every grain of sand in the bucket was 1 byte of data, then:
- The entire work of Shakespeare fills just one bucket of sand (about 5MB)
- A fast financial market data feed (OPRA) fills a beach of sand in 24 hours (about 5TB)
- Google processes all the sand in the world every week (about 100PB)
- We generate 60% more sand every year
In these terms, all of Twitter generates only a sand castle of quality data a day. While some of that data is very valuable, such as tweets from CME Group (@CMEGroup), most of it simply tells us about mass sentiment.
On the other hand, the debate about Twitter does reveal the opportunity and challenge of big data. The Economist used a quote from Oscar Wilde in 1894 to put that challenge in context - Wilde said: “It is a very sad thing that nowadays there is so little useless information."
So the big ruckus about Twitter on Wall Street reveals the real big story - today's big data challenge.
* Thanks to Rob Doherty at Cloudswitch for sharing Chad Sakac's (EMC) buckets of sand concept, the Economist New Rules for Big Data for sizing data from Google, and @RJMetrics on big data (TED Philly talk) for conceptual description of big data.