## Guest Post: Topological Data Analysis and Finance

### August 2, 2013

*Editor: Guest poster Nathan Coombs brings us observations and speculations from the cutting edge of “big data” analysis in finance.*

**Observations and speculations about topological data analysis**

By Nathan Coombs

Those involved in the social studies of finance should be interested in innovations taking place within the field of topological data analysis (TDA). This sophisticated approach to exploiting big data looks set to change how complex information is utilised, with unknown repercussions for the operation of financial and ‘real’ markets. The technology may not have yet entered into financial activities, but it almost certainly soon will.

The start-up firm, Ayasdi, is at the forefront of pioneering commercial TDA applications. Founded in 2008 by Stanford mathematics professor Gunnar Carlsson, along with Gurjeet Singh and Harlan Sexton, Ayasdi now has operational models for automated analysis of high-dimensional data sets. They have attracted $30.6 million in funding as of July 2013, and signed up major pharmaceutical groups, government agencies, and oil and gas companies. According to their website they also have their sights set on bringing the technology to finance. Even US President Obama reportedly asked for a demonstration of their system. The company’s marketing stresses that their systems conduct what they call ‘automated discovery’. That is, Ayasdi’s algorithms will uncover patterns without a human agent first formulating a hypothesis in order to conduct statistical tests. They thus claim their system is able to discover things that you didn’t even know you were looking for.

TDA of the type being developed by Ayasdi is able to take this unprecedented approach because it can discern patterns in large data sets which would be otherwise difficult to extract. Normal data mining approaches when applied to high-dimensional data sets are bound within computation limits; TDA, on the other hand, can circumvent the information processing horizon of conventional techniques. Carlsson et al. (2013) enumerate the features unique to topology which explain its efficacy in dealing with data:

First is the coordinate-free nature of topological analysis. Topology is the study of shapes and deformations; it is therefore primarily qualitative. This makes it highly suited to the study of data when the relations (distance between pairs of points: a metric space) are typically more important than the specific coordinates used.

Second is topology’s capacity to describe the changes within shapes and patterns under relatively small deformations. The ability of topology to hold on to the invariance of shapes as they are stretched and contorted means that when applied to data sets this form of analysis will not be overly sensitive to noise.

Third, topology permits the compressed representation of shapes. Rather than taking an object in all its complexity, topology builds a finite representation in the form of a network, also called a simplical complex.

These features of topology provide a powerful way to analyse large data sets. The compression capacities of topology, in combination with its coordinate-free manipulation of data and the resistance of its simplical complexes to noise, means that once a data set is converted into topological form their algorithms can much more efficiently and powerfully find patterns within it.

How these topological insights are turned into functional algorithms is, however, complex. In order to follow in fine detail Carlsson’s expositions of the methods involved (e.g. 2009), training in the field of algebraic topology is probably necessary. The terms persistent homology, Delauney triangulation, Betti numbers, and functoriality should be familiar to anyone attempting in depth understanding of the scholarly papers. Compared, for example, to the Black-Scholes-Merton formula for option pricing, which can be interpreted fairly easily by anyone with a grasp on the syntax of differential algebra, TDA works at a level of mathematical sophistication practically inaccessible to those without advanced mathematical training. In this it is not unique; most high-level information processing methods are complex. But it does result in a curiously blurred line between academic and commercial research.

Whereas for example the proprietary models created by quants in investment banks are typically only published academically after a lag time of a numbers of years, Carlsson and his colleagues at Ayasdi published their approach ahead of beginning commercial operations. Although these publications do not detail the specific algorithms developed by the company used to turn TDA into operational software, they do lay out most of the conceptual work lying behind it.

Why this openness about their approach? Partly, at least, the answer seems to rest with the complexity of the mathematics involved. As co-founder Gurjeet Singh puts it: ‘Ayasdi’s topology-oriented approach to data analysis is unique because few people understand the underlying technology … “As far as we know, we are the only company capitalizing on topological data analysis,” he added. “It’s such a small community of people who even understand how this works. We know them all. A lot of them work at our company already.”’

It is a situation that poses both challenges and opportunities for sociologists of finance. The challenge lies with getting up to speed with this branch of mathematics so that it is possible to follow the technical work pursued by companies like Ayadi. The opportunity is that since those involved in TDA seem relatively open in publishing their methods, researchers are not restricted to following developments years after they have already been deployed in the marketplace. Researchers should be able to follow theoretical developments in TDA synchronously with their application over the following years.

What might we expect of TDA when it is inevitably applied to finance? In the first instance, it should give a marked advantage to those firms who are early adopters. The capacity of its algorithms to detect unknown patterns – indeed, patterns in places no one even thought to look – should lend these firms the ability to exploit pricing anomalies. As the technology becomes more widespread, however, the exhaustive nature of TDA – it can literally discover every pattern there is to discover within a data set – could lead to the elimination of anomalies. As soon as they appear, they will be instantaneously exploited and hence erased. Every possible anomaly could be detected by every trading firm simultaneously, and with it, according to the efficient market hypothesis, any potential for arbitrage profits.

Of course, TDA is not a static technology; the addition of new and different algorithms to the fundamental data mining model could lead to various forms of it emerging. Similarly, the stratification of risk tolerance amongst market participants could lead to borderline cases where the statistical significance of the patterns detected by its algorithms separates high-risk from low-risk traders. But at its most fundamental, it does not seem obvious how TDA could do more than expose universally all pricing anomalies. TDA might therefore spell the death toll for conventional arbitrage oriented forms of trading.

Beyond this, it is still too early to speculate further about the consequences of the technology for finance. Across the broader sweep of applications in the ‘real’ economy, however, TDA will likely deepen the automation of logistics, marketing and even strategic decision-making. The technology’s capacity to automate discovery and feed such insights into predictive analytics may herald an era of economic transition, whereby it is no longer just routine tasks such as card payments and credit checks which are automated, but, moreover, middle class professional work such as research and management, previously believed to require the ineluctable human touch. In turn, such changes raise profound questions for the epistemology underlying many free market theories like F.A. Hayek’s, which place emphasis on the engaged, culturally-sustained practical knowledge of market participants. With the increasing mathematization of economic processes attendant to the automation of coordination activity, the pertinence of such epistemologies may well be on the wane.

**References**

Carlsson, G. et al. (2013) ‘Extracting insights from the shape of complex data using topology’, Scientific Reports, No. 3. Available at: http://www.nature.com/srep/2013/130207/srep01236/full/srep01236.html

Carlsson, G. (2009) ‘Topology and Data’, Bulletin of the American Mathematical Society, Vol. 46, No. 2, April, pp. 255-308. Available at: http://www.ams.org/journals/bull/2009-46-02/S0273-0979-09-01249-X/

Dr. Nathan Coombs is beginning as a Research Fellow at the University of Edinburgh in October. His postdoctoral project concerns the challenge of ‘big data’ and other automating technologies for fundamental theories of political economy. He is co-editor of the Journal of Critical Globalisation Studies.

August 5, 2013 at 11:59 am

Reblogged this on Automation of Everything and commented:

Read my short contribution on topological data analysis at the Socializing Finance blog.

August 6, 2013 at 5:31 pm

It seems early to be speculating on how this tool might be one day be appropriated by the markets. Finance scholars have their hands full with the numerous analytic tools that are being used to automate action in the markets on a daily basis; I’m not sure we’re ready to tackle the ones that aren’t being used on behalf of aspirational tech companies that are craving to implement them. TDA is ‘sophisticated’, ‘complex’, ‘forefront’, ‘pioneering’, ‘unprecedented’, ‘unique’, ‘efficacious’, ‘highly suited. Wow, I get it! But what does ‘automated discovery’ really mean? Did I miss something?

August 6, 2013 at 6:09 pm

This seems unjustifiably optimistic to me. I’m no expert, but it seems that there are numerous barriers to the use of high dimensional data, i.e. the old ‘curse of dimensionality’ when making statistical inferences on high dimensional datasets. You are optimistic that ‘every possible anomaly could be detected by every trading firm simultaneously’. But wouldn’t the computational resources required to do this be astronomical? There was a paper that circulated a while back that seemingly proved that markets can only be efficient if and only if ‘every computational problem whose solution can be verified in polynomial time can also be solved in polynomial time’ (i.e. P=NP). While P=NP remains unproven true or false, many computer scientists believe it to be extremely unlikely to be true.

The degree of optimism expressed in this post also seems to ignore the politics surrounding IT system adoption within banks and the degree of technological lock-in and path dependence that characterise their development. This is an under-researched area, but in my own research on the interest rate derivatives markets, I’ve run across several seemingly promising tech startups that also hope to ‘revolutionise’ how derivatives pricing and risk management is done. And each year it seems that there is at least one major speaker at quant conferences promising a new vision of how to apply new developments in computing to the derivatives world. But banks appear to be sticking with ‘antiquated’ platforms because they ensure that (a) banks are able to maintain control over their source code, (b) because they want to be sure that they can source enough programmers to work on that platform, (c) and because developing a new firm-wide pricing system is extremely costly with little potential upside. I mean, consider the fact that the Black-Scholes model is still used as a quotation device (and in some cases, to price + risk manage trades) in many option markets, even though it has been ‘wrong’ since the late 1980s!

August 6, 2013 at 8:29 pm

I thought the post was interesting because of what it shows about finance’s dreams for data analysis, even if those dreams are expressed in wildly optimistic terms by the firm trying to sell TDA.

As an aside, if you haven’t seen/read Cosma Shalizi’s brilliant essay on “Red Plenty” and the computational problems of economic planning, I can’t recommend it enough: In Soviet Union, Optimization Problem Solves You. Taylor’s comments on P=NP reminded me of it.

August 7, 2013 at 12:06 pm

For the sake of spirited conversation, I’m going to disagree with both Dan and Nathan. How should finance studies should position itself with regards to new analytic technology? This is an important question to answer as we face a wave – a wave! – of interest in finance that is rising in the i-affiliate disciplines.

The financial industry does not have an endogenous dream for Big Data analysis; it is being confronting by a dream generated by sales-oriented analytic companies. This story is not new. The i-infrastructure of banking and finance has always been built at the intersection between two very different industries with agonistic visions of what information is and how it works. (Nathan highlights this confrontation when he gestures at replacing Hayeckian politics with the vision of information generated by his start-up.)

Tech entrepreneurs are quite capable of documenting and broadcasting their own dreams and visions. After all, pushing new technology is what they do for a living. It seems to me the purpose of social science is somewhat different. Isn’t our job to pierce through this silky smooth veneer in search of the contradictions built into functioning technologies that defy straight-forward explanations…?

August 7, 2013 at 3:17 pm

I mentioned yesterday that I think this post is a tad too optimistic about the potential of the underlying technology. With that said, Nathan raises some very interesting questions that resonated with my own experiences as a researcher.

The first is the problem of developing ‘interactional expertise’ (to use Collins’ term) in a highly technical. This is particularly difficult in a nascent field, because there usually aren’t classes or textbooks one can refer to in order to learn through the material needed to speak with practitioners. In the case of topology + big data, I doubt even an introductory text on algebraic topology would be useful, because the methods that these firms are probably quite close to the ‘bleeding edge’. By comparison, learning about other technical areas of finance — i.e. stochastic calculus and options pricing — is comparatively easy. Most universities offer coursework that teaches students exactly what they need to know to get caught up to speed, and there are now countless textbooks on mathematical finance that can teach you the important concepts and ideas of quant finance.

Second, assuming that TDA *does* turn into a full-blown field, I am left wondering how it will shape the culture of this field of mathematics. Topology strikes me as one of the ‘purer’ areas of mathematics at the moment, and I would imagine that nobody goes into this field looking to make money. I took an intro class on topology back in undergrad (we didn’t cover any algebraic topology; just really basic point-set topology), and while it was a lot of fun and sharpened my thinking considerably, the only *practical* thing I walked away with was the ability to make some really high-brow sex jokes (https://en.wikipedia.org/wiki/Hairy_ball_theorem).

If you look back 20 years ago, I imagine the study of martingales and stochastic processes was about where algebraic topology is today — quite ‘pure’, but with no ‘blockbuster’ real world application. Researchers in these fields didn’t have to confront the hairy ethical issues that, say, cryptologists have to engage with given the potential for their work to be abused. Harrison and Kreps (1979) paper created an obligatory passage point, of sorts, for the field of derivatives pricing. During the 1980s and 1990s, it became very difficult to do important work on derivatives pricing or get hired in a bank unless you had a background in measure-theoretic probability theory. What I’m curious about is how this shift of a pure field to a ‘profitable’ field changes the social dynamics of the discipline.

August 8, 2013 at 7:57 am

Martha, Taylor and Dan,

Many thanks for your feedback. Lots of good comments and questions here, only some of which I am in a position to answer having only recently started research on this topic.

I think the comment is fair regarding TDA’s relevance for sociologists of finance. Sure you have your hands full, and yes there is no real reason for those trying to understanding the workings of financial markets to worry too much about technologies that have not become operative yet. I agree.

For me, the interest in topological data analysis is broader than its potential for finance, however. It concerns its capacity to surpass the computational limitations imposed by traditional computational techniques and what this may mean more generally for economic coordination practices. As my rather vague reference towards Hayekian epistemology was intimating, for me the interest lies in how these technologies connected with ‘big data’ may be fundamentally changing the way coordination processes are reaching: internalising conceptual reflexivity, reducing the role of practical knowledge, and excising the human touch so important for grounding the Hayekian epistemology of the free market.

Martha asks: “But what does ‘automated discovery’ really mean?” As I understand it, this means that the topological approach is so powerful in discerning patterns in the data that a hypothesis does not need to be formed in order to run statistical tests. If you wanted to take the same approach through conventional means you would have to run something like an exhaustive correlation on a data-set, running tests on every possible combination of dependent and independent variables (I’m not expert on this though, so feel free to correct me). As Taylor notes, this runs into obvious information processing barriers. The topological approach, on the other hand, by on the first pass eliminating all the extraneous noise and unnecessary metrics, can cut straight to the pattern-bearing relations, *before* having to run any such correlation upon the data set. Therefore (and I hope I am getting this right here) it is running its statistical sets only on a sub-set of the whole data-set that its topological conversion algorithms has already identified as significant for holding patterned data connections. They describe this process as ‘filtering’. From what I can tell, it represents a major breakthrough. Hence, Ayasdi’s reports that at a technical demonstration at a major pharmaceutical firm, their software detected in 20 minutes all the results that had taken a large team of data scientists 11 months to find. This could be written off as just hype, but my hunch tells me otherwise in this case.

Although there is no guarantee that TDA will be picked up and implemented by finance, I cannot see any reason why it would not be implemented. Simply put, it can do things that no other conventional data mining techniques currently can, and the fact that it is an automated system working upon the whole data-set means that it should be possible to integrate it seamlessly with automated trading systems. As I see it, the technology is not proposing a new pricing system or more exotic financial products, but rather just a more efficient and powerful way to detect anomalies in the pricing set according to existing models. In sum, I think the TDA applications promised by Ayasi set it apart from the plethora of other data mining companies floating around financial circuits eager to land a sale. The mathematics behind TDA has been is gestation for almost 10 years supported by generous grants and research initiatives. It represents a leap forward in how data is handled compared to the offerings of all the other companies out there. And the intellectual barriers to entry are set as high as they conceivably could be. But we will see. It is, of course, possible that other companies will develop alternative procedures that can engender much the same computational capacities. I’m not sure how; but it is a fast moving industry.

Taylor’s comment on how it might affect the scholarly culture around algebraic topology is a pertinent one. Until recently, algebraic topology was one of the purest branches of abstract mathematics with seemingly little practical application. There are hints floating around too that homotopy type theory might be making its way into computational applications in the near future. It would be surprising indeed if this did not lead to changes in the scholarly culture around topological ‘pure’ mathematics.

August 8, 2013 at 8:12 am

* When I say “sub-set of the whole data set” I of course do not mean a *sampled* sub-set, but rather a sub-set of the data that holds the pertinent patterning connections of the whole data set. A representative sub-set, in a sense.

March 10, 2015 at 3:35 pm

Hi Nathan,

I really would like to see how is the story of TDA and Ayasdi evolved since your last post.

Otherwise, it is quite interesting that some post actor-network approaches claim that a ‘topological’ way of thinking about social phenomena is preferable than a ‘Eucliedean’ one -See for example: After ANT: complexity, naming and topology by John Law (1999)-