On Privacy - Part 2

Saturday, August 28, 2010 Posted by Cecilia Loureiro-Koechlin 0 comments
The other day I googled my name. Most of the hits I got were expected. Information about me which I knew was searchable, because I had made the decision to make that data public previously. Some hits were from data aggregators showing profiles made up with data taken from various other websites. All of these were wrong of course having taken data without my consent (although that data are public I haven't agreed for them to be used by another website) and without me verifying its accuracy and quality. These aggregators mix data that is mine with data from people with similar names. The result is of course an unreliable and unethical source of information. This isn't good, is it? Anyway, even if I don't like this I was kind of expecting this kind of links.

There was however one hit (on the 5th page) which was completely unexpected. It was a link to a Facebook Music Application Page saying that I had "claimed" a song on a specific date 2 years ago. The music website appeared embedded within Facebook, similar to what would appear if I logged into my Facebook account. But I hadn't logged in. What worried me more though were two things:
1. It showed my Facebook Profile Picture, which I had made private to my friends only.
2. It stated that I had "claimed" the song and that I had been the first person to "send the song".

Why am I worried?

1. I had previously checked all my privacy controls on Facebook and made sure everything I wanted private was private. I missed something obviously.

2. My Facebook Profile Picture is a photo which I uploaded on Facebook, not on this second application/website. This photo is not public is private. And this application is displaying it to everyone. Even if I had set my photo to "public", meaning everyone within and outside Facebook can see my picture, it would be my "Facebook Photo" not my "Music application within Facebook Picture". This change of context is misleading.

3. I never "claimed" or "sent the song". What I did is to build a list of songs I liked from a broader list offered by the application. In other words I created a playlist with songs I got from the application. I never knew if I had been the first to pick a song, I never cared. What I know is that I didn't want to "claim" anything let alone being tagged as the one who "sent the song". When I built my playlist I was not aware of these "other" things I was doing. The application never gave me a clue. Or maybe nothing happened in that version of the application until someone decided to change it to add new events. So now it uses a different wording to categorise the actions I did when they were called differently. The change of wording is misleading.

What did I do?

I removed the application from Facebook.

Was any of the information displayed by this music application inappropriate or embarrassing?

No, no personal embarrassing information here. I would've not minded this information to be made public had I actually done what it said I did and been informed previously. This information was not true and taken out of context. It was just frustrating to see how I can easily lose control over my data.
Stuff to read about privacy:
Nissenbaum, (1997) “Toward an approach to privacy in public: the challenges of information technology,” Ethics and Behavior 7(3) , pp. 207–219.
Nissenbaum, H. (1998), “Protecting Privacy in an Information Age: The Problem of Privacy in Public,” Law and Philosophy, 17, pp. 559-596.
Regulating the Use of Social Media Data

On Privacy - part 1

Tuesday, August 10, 2010 Posted by Cecilia Loureiro-Koechlin 0 comments
You may have read (a lot) about online privacy recently, privacy on social networks and web applications that feed or are fed by social networks. I am also a user of those networks and hear/read what people are saying. People are either, ignorant or panicking, afraid of the unknown, unable to control what is happening to their data or at least not knowing if the controls they apply actually work. Being technology literate myself I sometimes doubt if my data are safe out there, I am concerned too. So I thought about blogging about this. This is what is happening, from my point of view anyway.

With the rise of web2.0 technologies (and open source) the ways in which people engage with technologies have changed profoundly. This includes techies and normal users :) Now techies can easily create applications in no time. They can access sources of information through APIs and create networks of interconnected applications (mashups, widgets, APIs, Atom Feeds, etc) Normal users have acquired new powers as now they can also create content by using the applications techies develop. Users have come in the millions to create their blogs, wikis, open accounts in social networking sites, buy online (e.g., eBay, Amazon), read the news, watch videos, etc. In all these places users enter their information, trusting it will be kept safe and private.

Because now users have so many accounts here and there (some of which they forget about), application providers have thought of ways to interconnect those applications and help users to manage their information (ha, although “help” might be in their minds I think their main aim is to make more profit out of those interconnections and to make the web into a super massive web).

A hypothetical example is, to tweet from another site that I just added Filemón as a friend or that I just tagged myself as social software fan. For that I would need to access my Twitter account while being connected to the first site.

To do this, exchange of information is necessary between these applications, and at least the user is (hopefully) aware that there are two applications there. There are other cases where this isn’t clear. When within an application one accesses other (external but which look internal) applications which pick up our information. For example a music application where we select the songs we like and share them with our contacts in a social networking site. That music application might have their own website, accounts, etc. and our data taken from the social networking site, although we may have never opened an account with them.

What do they do with our data is unknown at least to the lay person. Actually, most people are not even aware that there can be other applications collecting information about them when they are just using the one application they know. And if to all that confusion we add the (legal but unethical?) aggregation of (private) data without users’ consent we get a complete mess of data and application interconnection. A mess that of course the techies can understand, and even people like me find promising in terms of potential for development of the semantic web for example.

But what do normal (semi-technology-literate) users think? People get lost in the mess and do two things:
Well, the mess is such a messy mess that people start fearing it. We fear what we do not understand. We fear what we cannot control.
Or people just do not care, or over-trust, or are so ignorant they are not aware of any problem. They use the web and disseminate their data with little care. They publish their photos, their contacts, their address, PEI (personal embarrassing information) etc taking lots of risks like identity theft, being exposed to people they do not want to (boss) and loss of privacy.

My advice to people in general is to be careful about where they go online and the kinds of content they upload there. Trust only sites where you have control over your data and its privacy.

But this is not about telling people: if you want to be online you have to be public. Or put in other words, that "privacy is dead." That is wrong and stupid. My hope is that online providers (hey Facebook!) start taking privacy issues more seriously. Not limiting it to the technical (yeah, (.) private ( ) public options are not enough, privacy is not a boolean variable) but considering the human and social aspects of it, by learning how people deal with their privacy off and online, and by understanding the implications that making assumptions and getting privacy wrong has on people and societies. (hmmm, " public" by default is wrong!) Because the data that they have, is not only data, is information about people and their lives, data is people.

I think it is going to be a s-l-o-w process, creating laws, enforcing laws, creating awareness and learning to be careful online. All that while technologies develop at the speed of light, difficult but not impossible.



Note: I think data aggregation is a good thing, when it is regulated, when trusted sources are used and when owners of data are aware of who and what people are doing with their data, and when users can, if they decide, stop sharing it with some applications or everyone. Aggregation of data and the semantic web can help content be more accessible, organised and therefore useful. But if you don't do it right you can misslead people, violate contextual integrity, threaten privacy and more. I wrote about that here.



Something extra to read:
Howard, A. (2010) Online privacy debates heat up in Washington. O'Reilly Radar
(2010) IT privacy campaigners celebrate. BCS
Don't share things you don't want people to discover

On data aggregation: Benefits and Issues

Wednesday, April 21, 2010 Posted by Cecilia Loureiro-Koechlin 1 comments
My work at the moment has to do with digital repositories, registries and data aggregation. I work in a university. I am a project analyst and have the privilege of witnessing state of the art technology development and most importantly users’ reactions. Developers around me use semantic web technologies to create systems to harvest and update data about research from a variety of sources, store and record provenance as well as preserve, give access to and view these data. The result is a registry that mirrors data which can have a variety of uses.

Data about research is data that describes research activities and researchers. Most of these data come from already publicly available sources: departmental and project websites. These sources are in the hundreds (lots of URLs to remember!) They are dispersed and disconnected. The point with collecting all these data is to have them in one place and to build connections between data objects which originally were not connected. Data objects are researcher, project, grant, etc. For example, we can find researcher A’s biography in website “one,” a list of his publications in website “two,” his name in three project websites, his name in grants in a research council website and a list of research interests in a group website. We can put everything together and investigate whether all this data belongs to the same person. If so, we can present a much completer picture of this researcher.

The benefits of data aggregation are obvious, at least to us. We can create improved pictures of researchers and their research activities at individual, departmental, university and field levels. Having all these connections can facilitate discovery of research opportunities or trends. We can identify connections between researchers who do not know each other, for example if they have similar research interests. We can build connections between research groups or identify research islands. Instead of having to navigate through a huge amount of websites (via Google) users can access this information which is stored in one place. It can also help the inexperienced (e.g. students) to find information.

To give you an idea about how this happens --> users can see these data via a registry explorer (a search engine), or via APIs to create websites, widgets, etc. All these have been developed in the office. The list of benefits is much longer than this but I think the above can give you a good idea.

With all these fantastic benefits one could think no one could resist data aggregation. Everyone would prefer to access aggregated data rather than the individual, disconnected sources. Everyone would like to be aggregated so they can have a nice online profile. Well, that is not entirely true. While some people (geeks!) love the idea, some people think data aggregation raises many issues and brings with it many risks. Risks which they think are not worth taking.

--> This bit is a more general discussion
Having read some general literature on this topic I can summarise the main issues here. Data aggregation:
  • Threatens individuals’ privacy: one aspect of privacy is controlling information about oneself. I decide what and where to disclose (or not disclose) my information. Do we have the right to take data from sources which are not ours, store them, aggregate them and display them? even thought these data are already public? Even if we publish data in a public place, we have the right to their privacy. It doesn’t mean we want everyone to read it. -- Allows systems of surveillance: people may choose to disclose some of their private data in bits in different places, but by aggregating their data we are not only exposing those bits but creating a more comprehensive picture of people's activities and interests? Aggregated data can play a big role in big-brother monitoring people. That is invasion of privacy, isn’t it?
  • Can lead to security problems: since data aggregation makes it easier to identify people – people can be identified through bits of anonymous data put together – it can help identity theft and other kinds of crimes.
  • Can mislead people: aggregated data is not always comprehensible or true. How reliable are data aggregators and their sources? How do we know if the data presented is correct and belongs to the same person? One can get their profile mixed with someone else’s and that can lead to serious misunderstandings.
  • Does not always follow the same original intentions of the creators. Can we use data as we wish, for uses which are different than the originally intended by their owners? Would this be ethical? How can we reinforce principles like the use limitation principle and the purpose specification principle? (van Wel and Royakkers, 2004)
  • Can violate contextual integrity, in other words can de-contextualise data changing its original meaning: the process of collecting and aggregating data involves the moving of information from its original (appropriate) context to different ones not necessarily appropriate. Some people will find this morally offensive (Nissenbaum, 1998).
The above are general issues and apply mostly to online data aggregators which are spreading rapidly over the web. (e.g., http://www.nodalbits.com/bits/spokeo-latest-personal-data-aggregator-exposing-data-privacy-fears/) These aggregators are hungry machines, they pick up everything they can (with or without permission) and offer their data (?) to a variety of business and users.

Nissenbaum (1997) warns us about two misleading (but common) assumptions:
  • Erroneous assumption 1: There is a realm of public information about persons to which no privacy norms apply.
  • Erroneous assumption 2: An aggregation of information does not violate privacy if its parts, taken individualy, do not.
--> end of general discussion

These issues can be partly related to the work we are doing with information about research.

While we aggregate data in a much smaller, limited and controlled universe we are facing some challenges as well. We are using - not personal but research - data from official, public websites in the university and we make sure we always ask for consent from our contributors. If someone does not want to be in the registry we do not take their data. Simple.

Although not dealing with data of an intimate, personal, nature we are exposing the work of researchers. Whereas some researchers would like the publicity, some researches would consider this information as private - to themselves or a small circle of colleagues - at least at early stages of their work.

In some way we are creating a system of surveillance where others can monitor performance. Again, not everyone likes to be watched.

There are other things people have raised, things like:
  • How complete and accurate a picture we can build of their department or university if some people choose not to contribute and if we do not have control of the sources? How useful can an incomplete registry be?
  • How many errors or gaps can be identified or make more evident once data are aggregated? Can they be corrected?
  • Sharing: Do we need to share what we are doing? We do not want everyone to see what we are doing. (Research-data/Research activity privacy?)
  • Duplicty: Is it going to replace our official websites? why do you duplicated them?
  • Coverage: I am only interested in my field of research and I know where I can find relevant information. Aggregating research data is not useful.
Interesting isn’t it? There are more issues of course but again I hope the above gives you a good idea.

My work in the coming months is to try to clarify these issues with a set of users and to identify ways to address them (solve or soften them.) I can see this will involve three areas of work, one improving collection and visualisation of data, two educating users and publicising our services in better ways and three listening to what our contributors say about their aggregated data. Yes, software development is not only about coding but about finding out what people need and how they will react to what we do.

You can read:
Ethics of data mining and aggregation
Data aggregation: Actually a threat?
Lita van Wel and Lamber Royakkers (2004) Ethical issues in web data mining. Ethics and Information Technology 6: 129–140
Nissenbaum, (1997) “Toward an approach to privacy in public: the challenges of information technology,” Ethics and Behavior 7(3) , pp. 207–219.
Nissenbaum, H. (1998), “Protecting Privacy in an Information Age: The Problem of Privacy in Public,” Law and Philosophy, 17, pp. 559-596.

Also
Exploring a ‘Deep Web’ That Google Can’t Grasp

On Twitter social and not so social experiences

Monday, April 19, 2010 Posted by Cecilia Loureiro-Koechlin 0 comments
I’ve been on Twitter for over a year now and I have to say my opinion of it has changed a bit. http://clk0.blogspot.com/2009/08/this-is-what-i-think-about-twitter.html I joined when a friend of mine, Dr T, told me it was fun and that he found it extremely useful. I have to say that I find it useful too but perhaps not at the same level. Dr T’s experience has been quite different from mine.

Here I want to compare our two completely different Twitter experiences. Dr T’s has been extremely social, active and multi-dimensional whereas mine has been rather individual and uni-dimensional. Why has this happened? I guess that is because of our different original aims and motivations, and our behaviour.

On Twitter as in any other SNS you can create your own network of contacts and that network will define a great part of your future interactions. Depending on the time you devote to it you can build up a following list of people whose tweets you find interesting. Perhaps people who you think you would like to meet in real life! (and I am not talking about celebrities.) Dr T was keen on meeting new people and be part of something on Twitter. A group, a community? I was just curious and wanted access to information (news, trivia, etc.) I couldn’t (or didn’t have the time to) get by other means.

Dr T talks to people a lot. I just read tweets, broadcast a little and seldom address someone. Talking means using the @ symbol for example to address one or more people, means replying when they address you and means following threads of conversation. Conversations are extremely important to build online relationships. Conversations and socialisation in the online world basically mean the same. Conversations define the social in Social Networking.

Dr T tweets from his bed, his kitchen, his office, the gym, pubs, etc. I tweet from the office. Dr T uses Twitter in conjunction with other tools e.g., FourSquare, Tumblr, Facebook, etc. I cannot be bothered. He’s attended tweetups! and was part of a public Twitter art display. I found that amusing. He has tweeted 5 to 6 times more than me. He spends time looking after his following list and adding more people. He is much more conscientious of the people he follows and follow him. I do not have a strategy for following people. I don’t mind noise and I have never dedicated more than 2 minutes to check my following and followers lists. I follow people with different interests. Actually I do not have a topic per se but just follow random interesting people. I find people when they are referred to in tweets and sometimes when they talk to me. Many of the people I follow do not tweet more than once a week. Maybe that is why I do not get much noise! hmm No... Some of them tweet 24/7 but I am not watching 24/7.

The above just shows how different our online behaviour has been but the consequences of those behaviours have been even more dissimilar. Dr T has been able to build real friendships over Twitter. He has met some of these people and thinks they are cool. I, on the other had, haven’t been able to move beyond my computer screen. Not that I haven't tried. I tried mobile tweeting but got frustrated when the client's provider started to charge. I know. I could've looked for another client, but to be honest, I couldn't be bothered.

Update: I got Twitter on my mobile again. It took me a bit of time and a new mobile :)
Labels: