Forget the mining boom, Australia. It's time for the data mining boom.
Big Data isn't just the latest IT industry buzz term. Earlier this month I visited with an old friend at her lake house in Connecticut, USA. Born in Narrandera, NSW, Debra Walton has been based in New York going on two decades, and is now Chief Content Officer of Thomson Reuters.
We got talking about big data and its challenges. As Debra's senior executive role attests, the global finance industry is well advanced in its development of data mining tools to attract greater business benefits. The amount of data in the world is staggering – and growing exponentially, and organisations like Debra's are keen to mine and commercialise it. As Debra explains:
"Products moving around the world can be tracked using chips, while satellites give traders and analysts up to the minute information about crop health, mine production, and global shipments that previously would have taken days or weeks to obtain. Machines will handle tasks—such as reading text-based SEC filings—and extract important information far quicker than humans ever could, and the data they generate will sideline those who 'rely on their gut.'"
Debra's sketch of big data gives an insight into its breadth and depth. But what is big data exactly? Functional definitions abound. Big data can be defined as the collection of data sets so large and complex that it becomes difficult to process using standard databases and data processing tools/techniques. Gartner’s widely accepted definition describes big data as “...high-volume, high velocity and/or high variety information assets that demand cost-effective innovative forms of information processing for enhanced insight, decision making and process optimisation”.
Big data includes structured data and unstructured data (anything that does not fit in typical relational databases, such as email, text, video and audio files). There is also a distinction between current data and dark data – information assets that organizations collect, process and store during regular business activities, but generally fail to use for other purposes.
Big Data in Australia
Australian institutions could play a big part in this brave new world. For example, RoZetta, a recently launched commercial offshoot of Sirca, a not-for-profit big data organisation owned by 40 Australian and NZ universities, can process, analyse and graph every trade from every sharemarket in the world in 2.5 hours. Its insights are already served up via ThomsonReuters and Dow Jones to the world's leading financial institutions. (RoZetta gets its name from a mashup of the Rosetta Stone and the zettabyte. A zettabyte is big data personified: a unit equivalent to the contents of about 250 billion DVDs).
Big data is an extension of earlier statistical techniques, and a turn to empirical approaches to decision-making. RoZetta also hopes to use its smarts to create user-friendly tools that would allow traders to "walk through" landscapes created by mapping share market activity, and a tool that will draw listed companies as "shapes" based on real-time analysis of financial variables such as yield, PE values and dividend payouts, to allow anyone managing a financial portfolio to select a mix of shares that best matches their risk appetite.
But commercialising big data doesn't come cheap. What those definitions reveal is that there are significant challenges in managing big data - be its capture, curation, storage, search, collaboration, sharing, analysis and visualisation. Australian participants in a recent international survey of IT decision makers by Vanson Bourne cited the inadequacies of their organisations' existing infrastructure as the most common barrier to implementing big data projects. The need to invest is recognised: big data-related spend was predicted to make up 23 per cent of IT budgets within three years.
Big Data and the Legal Profession
When I first entered the legal profession as a paralegal at Freehills in Sydney in the late 1980s, we were amongst the first to use computers in the preparation of legal documentation and case research.
For law firms today, leveraging big data is a way to survive the new marketplace, todifferentiate themselves with something other than their mantra of excellent lawyering. Client files, archives, timesheet and billing systems each contain large amounts of information. Externally, there are social media networks, case law databases, media reports. This can be used not only to improve law firm self-management, but to assist clients to understand the real opportunities and risks they face so they can make intelligent business decisions.
Some insight into the extraordinary potential for statistical data mining and modelling can be seen in the work of US based Juristat. It uses statistical modeling capabilities to allow lawyers and their clients to visually plot their chance of success in every aspect of the patent application process. Juristat uses a proprietary natural language processing algorithm, which organizes and structures raw US Patent & Trademark Office data to build behavioral models of examiners’ past behaviors, allowing attorneys to more accurately predict examiners’ future behaviour. Juristat says its research shows that the merits of an argument often matter less than an examiner’s bias, timing, and the prosecutor’s personal skill. By tracking and identifying examiner biases, Juristat provides the tools for patent claimants to optimize outcomes. But the results of the deep dives that these tools provide can be revealing in other ways: Juristat's latest blog reveals that male inventors have a greater chance of success at USPTO - but female examiners are faster.
Big Data and the Courts
So far, the Australian legal industry has been slow to capitalise on the huge amounts of data available from the Courts on a 24/7 basis, such as:
- general case data
- docket lists
- court calendars
- social media
Will use of big data undermine court procedures? Or could big data help overcome common procedural issues such as delays, spiralling costs, and unequal access to justice?
The Australian Productivity Commission seemed to think so: one key recommendation in its 2014 Inquiry into Access to Justice Arrangements Report was for the Australian and State and Territory Governments to provide funding for a "civil justice data clearinghouse" ([25.3]) because data on the civil legal system left "much to be desired" and the absence of data hampered policy evaluation and caused a reliance on qualitative assessments.
In the US, RAND Institute for Civil Justice used big data in a review of electronic discovery to identify ways to decrease the costs of civil litigation; for example, data analytic techniques can show all the documents that have more than a fixed probability of being discoverable. Elsewhere, a predictive model has been developed tracking settlement outcomes of securities fraud class action lawsuits.
But big data require careful interpretation. The law’s traditional concern with natural justice means that transparency is crucial. What assumptions were made in the analytic process around choice of datasets and algorithms? As UNSW's Lyria Bennett Moses and Janet Chan illustrate, relying on past data, including past settlements, when making settlement decisions can create a feedback loop so that an initial bias in favour of plaintiffs or defendants is perpetuated.
Challenges to the Law
With businesses already heading into the cloud, the privacy and security of big data are dramatically changing the legal landscape - especially internationally. It is no surprise really that since big data is a relatively new phenomenon, there is concern that there has been limited, if any, development of legislation or case law that properly appreciates the dynamics of big data. Australia is no different; the landscape of big data - its use, and regulation - is still in a period of fluidity and uncertainty.
How should privacy risks be weighed against big data rewards? What role, if any, will the Australian Privacy principles relating to the collection, use and protection of customer data play? The Department of Finance assures that the Australian Public Service Big Data Strategy "sets out the actions that the Government is taking to harness the opportunities afforded by big data without compromising the privacy of individuals". The Whole of Government Data Analytics Centre of Excellence (CoE) is touted as a space to build analytics capability across government - but its establishment by the Australian Taxation Office gives some foresight into the intended use of its capabilities and need for caution.
How should company boards and management approach data sovereignty and cloud issues? The risks are all too real. In July 2012 hackers accessed hosting provider MelbourneIT’s servers deliberately targeting MelbourneIT’s customer telecommunications company AAPT’s data as a protest against proposed new laws requiring telecommunication companies to keep call records for two years. Confidential information of AAPT clients was published.
And what about national security issues? The Data to Decisions Cooperative Research Centre (D2D CRC) is already exploring the potential for Big Data analytics in law enforcement and national security. Should individual rights and freedoms be forced to give way when citizens expect their government to keep them safe, and to equip itself with the power to do so?
Big Data and the Bar
During my travels in the US this month I also attended the Australian Bar Association's bi-annual conference, held in Boston and Washington. The conference theme "Survival of the Fittest: Challenges for Advocates in the 21st Century" prompted the NSW Chief Justice to pose the question: "iAdvocate v Rumpole: Who Will Survive?".
The Chief's clear message was that it's no good to do a Rumpole and bemoan “[i]f I don’t like the way the times are moving, I shall refuse to accompany them”. Indeed, the Chief Justice urged that "barristers, as a group within the legal profession, should lead the way in embracing, rather than resisting technology".
Of course, embracing these opportunities will inevitably carry its obligations. His Honour noted that since 2012, the American Bar Association's amended Model Rules of Professional Conduct incorporates the obligation to maintain the requisite knowledge and skill of the “benefits and risks associated with relevant technology”. Many (including this author) argue that a rule to similar effect should be introduced in Australia. That will mean, in an era of big data, the time will surely come when counsel's advice on prospects will need to take into account data from predictive modelling of case outcomes. And as the complexity of data mining and analysis evolves, the expectation that advocates will be able to confidently and accurately synthesise and articulate that material will only increase.
Saturday, 25 July 2015
Dominique Hogan-Doran is an Australian barrister based in Sydney and a technology enthusiast. She is a former President of Australian Women Lawyers and the Australian Bar Association's nominee on the Law Council of Australia's Future Focus Committee.