These notes are taken from a most satisfying CLSIG event with Alex G Smith. It was a timely wake up call for how machine learning is being perceived; instead of the marauding AI robots of horror movies, it should be the thought of messy misused data keeping us up at night. Above all it confirmed what most of us already know; that information professionals have been designing and providing ‘AI’ for years. Many things currently being touted as such are nothing more than data and algorithms. Analytics and automatic documentation is not AI.
Image by Pexels
Filters are essential in searches, but as a life choice...?
‘Filter bubbles’ as a concept was first coined by Eli Pariser in 2011, and depending on the intentions of the publisher or advertiser, they can be used for good or bad. Our online consumption habits are a dream for contextual search engines, targeted advertising or political manipulation. But it can also enable subscription databases to recommend related information. After all, when librarians are familiar with their users’ interests, we can anticipate their needs. That saved book, forwarded article etc is what makes people trust us.
Preference based information provision can require a nuanced approach to ethics. Not everyone cares for their interests being tracked and they are entitled to privacy - however it seems some industries are more ‘acceptable’. The music industry has a long history of recommendation engines, for instance Last.fm, iTunes and Spotify. In these cases AI is already here. Your past music choices influence future suggestions, which is how Spotify chooses what makes it on to your playlist.
Human behaviour is as messy as our data
Cambridge Analytica remains in the news regarding its recent descent into administration and more Russian allegations. The fact remains that there are many points to be made about the moral and ethical lines that were crossed by both this company and Facebook. There continues to be a lot of criticism of Mark Zuckerberg and Facebook, but anecdotal data suggests that people’s behaviour hasn’t changed that much.
In many ways we have been sleepwalking our way into this data mess. How many people actually read app/site terms and conditions? How many people are willing to give information to organisations simply because they wanted free WiFi? Or allow sites access to personal profiles just to access a quiz? And you can guarantee that Facebook is not unique - this is just the tip of the iceberg.
‘Humans are excellent at creating mess. We are currently polluting ourselves with unstructured data, and unrealistically expecting AI - or machine learning - to clean it all up”
Data leaks always have an ethical angle. Take for instance the 11 million Panama papers which even now are still being scoured for company wrongdoing. Meanwhile there are many innocent individuals mentioned in these papers at risk of exposure, and the two developers who implemented an open source search engine have taken pains to protect people’s privacy.
Organising and sorting this data has been made easier as much of it was leaked with metadata attached. Many connections have been made leading to more transparency around shell companies. This is not AI but purely the application of conventional data management, searching and analysis.
Lawyers and logic and legal data
Law and legal language can be relatively logical. Both knowledge management and legal eDiscovery involve organising vast quantities of messy data. Know-how depends on flowcharts, checklists and taxonomies; to take Alex’s phrase, ‘IA (information architecture) before AI. There is some AI in terms of supervised machine learning but self learning is still some way off. Learning sets take a while to set up and they have to be right otherwise there is a danger of ‘garbage in garbage out’. But much of the value is in connecting tacit knowledge, search construction and source selection and evaluation. These are all skills which law librarians have in abundance.
That’s all very well but what skills will the legal profession require in the future? Given that law is data heavy and we are not robots, there will be a need for data literacy. There will always be a requirement for making connections and application of judgement. Perhaps we shouldn’t be so focused on right answers but make the best of our human ability to explore and include multiple viewpoints.
Only then will we make the AI work in a truly human way, with a true respect for data protection and integrity. Do you agree?