I’ve been working on this proposal for OpenCourseWare at Dartmouth College to help with my advocacy there. I’m hoping to present it to President Kim after I return to campus in the Spring. I also hope that it will be useful for activists at other schools. It’s of course CC licensed. Click here for a recent version in .doc format. There’s an unformatted (wiki markup only) copy/paste of the full text after the jump. It’s also up on the freeculture.org wiki, if you want to play around with it in wiki format. However, since I may not check the wiki page often, the best way to suggest changes back to me is probably to email me or write in the comments. Read the rest of this entry »
I want to collect a bunch of python scripts that log in to your account on a network service, scrape out as much data of yours as possible, and save that data in an easily parseable format. For some services, such as gmail and Google Docs (and some other Google services, thanks in part to the Data Liberation Front), this is mostly a question of doing the logging in and then clicking a few buttons to use something convenient like an “export” feature. For other services, such as facebook, a bunch of python needs to be written that manually crawls through pages and grabs all relevant data.
I want to have all of these scripts in a central, open repository that invites contributions. (As an aside, I’d also like a more general central repository/tracker for bits of python that are useful for crawling websites–from a full, manual wikipedia scraper to something like a single function that gets you a logged-in cookie for gmail.com). Once I have this repository, I want to pack all the software together with an easy front-end where you simply enter your login credentials for all the network services that you use. I want this to run every single night, doing incremental backups of all of my data.
I want this for the implication for computing autonomy–using proprietary network services is still bad because I can’t understand or build on (or share) the code that I am running (or, more closely, causing to be run), but at least this way I can feel like I have more “control” over my data (more on this two paragraphs down).
I also want this for the practical benefits. Techies sometimes say about computer data that “if it isn’t backed up, it doesn’t exist.” We often (rightfully) think of putting our data into a network service as making it more likely to live forever–after all, they often have their own backup system which is much more robust and heavily funded than ours. But of course we should not assume anything about what a proprietary network service is doing behind the scenes! There may be no backup system at all, or it may fail (see the sidekickocalypse), or the service-providers may just decide to pull the plug and let all your data die (see the Geocitenocide). Also, if a service goes down but it has backups, there is nothing you can do to expediate the process of having those backups restored. Your data is “safe” but it’s locked up until further notice. This is why I want a client that runs these scrapes every night, ideally using incrementation (maybe a browser plugin can track when you add data to a site? Maybe this whole idea ought to be implemented as a browser plugin?).
Of course, knowing that I always have access to my data in easily-parsable formats has another important advantage: it makes it easier to leave one network service for another (or just altogether), especially if an additional part of this project is to collect scripts that can take these exported chunks of data and import them back in to other services or just other pieces of software (I’d like all of my flickr photos on facebook and also my on-disk copy of F-Spot. Also post profile data from facebook back on my myspace. kthxbai.) Locking up data has long been a malicious strategic device used to keep people using your software/service even after you’ve decided you don’t want to anymore–from network services that are data black holes to locally-run software that uses DRM or generally proprietary file formats.
With this type of control over my data, it’s easier to leave a proprietary network service (this is a good reminder that computing autonomy is strongly related to data control–or data ownership, if you prefer). This has useful implications for people who occupy a middle ground on computing freedom in relation to network services. These people may believe that the usefulness of a computing resource is more important than its respect for one’s freedom–these are the “I’d use a Free alternative if one existed that was as good (or at least nearly as good)” people. These people will be have fewer excuses when a great piece of free software comes together that fits their needs (and, conversely, they’ll have the freedom to leave if/when a nicer proprietary service comes along).
Of course, the TOS for some (most?) of these services may be violated by the use of the software that I’m describing. However, I for one would love to see what happens when a bunch of people violate TOSs by doing this. Can we script cleverly enough that the service can never tell? Do people get found out and all lose their accounts? Do they really care that much, now that they have all of their data? Do they write angry blog posts about how they were “booted off of facebook for trying to download [or even 'take back'] their own data,” which eventually end up on larger news outlets? Do these stories make people care more about data portability and computing autonomy in network services? Does facebook come back with its tail between its legs and implement its own export feature?
Every few days, I get an email from twitter telling me that someone new is following me. 70% of the time it’s spammers, 20% of the time it’s people I don’t care about, and 10% of the time it’s people who I want to follow back (and I’m thankful that twitter gave me the heads up!)
The problem:
Though these emails include a few statistics about the user (number of tweets, number of followers, number of people following), they don’t include any of their actual tweets. This is unfortunate, because the best way to tell whether or not you care about someone on twitter is to look at what they say! As it is, I have to click a link in the email to visit their profile if I want to do that.
The solution: Some python that I wrote, which logs in to your email, finds these messages from twitter, deletes them, and sends you more useful messages that
include recent tweets by the person who has just followed you!
In the future, I’d like to make this happen for identica as well. Also, I’d like to (and am very confident that I can write the code to) be able to follow the person back by responding to the email.
For example, today I wrote a python script that scrapes the images out of a gallery2 install. I did this for a few reasons:
*I just learned a bunch of new tricks for vim, so I wanted to exercise them before I forget them
*I want to become comfortable writing python
*I want to become comfortable with (at least some of) the python libraries involved in web scraping
Update: Huge Huge thanks to Cole Ott, Kevin Donovan, and Jared Benedict for their help collecting arguments and editing drafts. Sorry I forgot to include this note initially. This piece would have sucked far more without you guys’ help!
I’m still waiting to see whether or not The Dartmouth will publish my response. So far I haven’t heard anything back.
In her article, Johnson questions the benefits of OpenCourseWare, writing, “we should not fool ourselves into thinking that publishing course material serves any purpose but to garner publicity for the College.” Publicity is only one advantage of OpenCourseWare, and it is perhaps the least important one. OCW also has immense advantages within the university. The 2005 audit of MIT OCW showed that 71% of students, 42% of alumni, and 59% of faculty used it. Students use it to learn more about a course that they’re considering signing up for, or to follow along with one that they can’t fit into their schedules. Alumni use it for continuing education and to maintain a feeling of connection with their alma-mater (donations, anyone?). Most importantly, professors use OCW to observe their colleagues (both on campus and at other schools) in order to learn from their teaching methods and to identify potential collaborations. In this way, OpenCourseWare expands learning across generations within the university itself.
However, the whole point of OCW is that it expands learning beyond the university. In her article, Johnson challenges the idea that OCW systems are effective learning resources, writing “A student would have to be a very rare breed of self-starter to be able to gain anything from the available course materials.” Now, I’ll be the first to say that there’s nothing quite like being in the classroom and taking part in a dialogue involving both students and professors. It is my humble opinion that this is the most effective way to educate, and Dartmouth professors ought to stress discussion more in their courses. This is also the reason why a Dartmouth OCW system would never de-incentivize enrollment in the college.
That being said, lectures and syllabi can be incredibly useful to people who don’t have access to high-quality learning resources or who just need something that is free, accessible, and fast. Imagine the farmer in Kenya who wants to increase his crop yield or the student in Argentina who can’t understand her out of date textbook. One student in the US used MIT’s OCW to help him study for the physics AP exam. The front page of the MIT OCW website has a large banner linking to a page with many of these stories of how their system has been useful to students, educators, and independent learners around the world.
Clearly, OCW is not just something that worked once for MIT because they’re a big name and they were the first. The OpenCourseWare Consortium has over 200 member universities, and that doesn’t include many other open course projects such as Yale Open Courses and Harvard Medical School’s MyCourses. The future of education allows people of all backgrounds access to learning resources from top professors around the world with the click of a button.
If we are to make the world’s problems our own problems, as President Kim has urged us to, there is an obvious moral argument for why we as an institution should not be hoarding the great learning resources that we are creating. The demand for higher education is increasing far more rapidly than our universities can accommodate. Our mission should be to expand education and knowledge worldwide, from Hanover to Hanoi.
Johnson’s article brings up a cost-benefit analysis, which is an important thing to consider. It’s true, MIT OCW is expensive. However, it is largely funded by outside grants (from the William and Flora Hewlett Foundation and the Andrew W. Mellon Foundation, to name a few), thus it is not true that every dollar put towards OCW is taken away from another aspect of the university. Furthermore, MIT is not the only model—the University of Michigan significantly reduces costs in their OCW system by using students in the publishing process through their dScribe system. Here at Dartmouth, many courses in both Thayer and the Physics & Astronomy department are already being captured on video for internal use. Also, many courses in the computer science department have lecture notes, syllabi, homework assignments, and even practice exams that are publicly available from the department’s website. The cost of making these materials OpenCourseWare would be very small—the barriers are almost purely administrative.
Finally, implementing OpenCourseWare at Dartmouth would be far more than simply a hop on the higher-ed bandwagon. Because of the transparency of OCW, each new system has the ability to observe existing ones to learn from and build off of them with fresh ideas. Though it’s clear that OpenCourseWare is part of the future of higher education, it’s not yet clear what the most effective system looks like. Dartmouth could really push the movement by exploring how OCW systems could be more collaborative and participatory. I sincerely believe that by building on the work of our peers and adding our own twist, Dartmouth can use OpenCourseWare to truly advance higher education in a lasting way.
One Friday this summer, me and the other Creative Commonsinterns went to Stanford Law School for a talk and discussion led by Ryan Calo of the Center for Internet and Society. We joined with interns working at Free Culture-type orgs around the bay area and had a really interesting discussion about privacy policies and notice.
The Issues:
Privacy policies are unclear. They are written in legalese that laypersons can’t understand.
Privacy Policies are unreasonable. Because people don’t read them and because “users” have no alternatives, companies are free to retain the right to do whatever they want with your information
Privacy Policies are non-negotiable. You can either accept it or refuse to use the product/service. There is hardly ever an alternative product/service with a more liberal privacy policy.
In response to these issues, a few things have been proposed. There were two parallel but different approaches that each involved what I think of as a Creative Commons-like model (It’s worth noting as well that Ryan had another fascinating idea involving incorporating human avatars into interfaces, about which he has a blog post). Essentially, the two ideas, as i recall them, broke down thus:
The user could brand her content with the privacy options that she wants, with some sort of badge.
The service could brand its privacy policy with some sort of human-readable badge or notice.
The former seems more difficult to successfully implement—you would need all participating services to comply. I’m more interested in the latter proposal, mostly because it seems so elegant in its simplicity. I’m imagining shamelessly copying some aspects of Creative Commons licenses:
Three-tiered views: lawyer-readable legalese, human-readable plain english (in simple, bullet-pointed terms), and machine readable metadata (RDFa or something).
Standardization: all privacy policies generated from a set of more-or-less on/off switches, like CC’s commercial/noncommercial, remix/no-derivs, copyleft/noncopyleft.
My idea is that the service-provider would go to mycoolprivacypolicy.com, and use a simple interface like the CC license chooser to piece together their privacy policy. They would be given the legal code as well as the machine and human code.
The readability issue with privacy policies is solved by the the human-readable code. The unreasonability and non-negotiability of these privacy policies is also helped, but less directly.
With the two characteristics I outlined above, you could imagine browser plugins that allowed users to engage in a dialogue with the privacy implications of their browsing. For example, you could tell your browser to notify you whenever you were on a website that reserved the right to use your information for promotional purposes. You could have it remember when a privacy policy stipulates that it can change at any time, and alert you when a change occurs. Basically, this adds up to a system where people take privacy policies seriously again—where they are actually read and thought about. When people are paying attention to privacy, services will compete over it, and users will win. In other words, more reasonable privacy policies will crop up because services will want to be the first to Truly Respect Your Privacy ™, which will help with the unreasonability issue was well as the negotiability issue (policies won’t actually be negotiable, but users will have choices).
Perhaps the first step in implementing such a system is figuring out the standard for these privacy policies. In other words, what are the yes/no questions that need to be answered in order to build a full privacy policy. Perhaps services require the ability to have different answers for different pieces of data? I might write here again soon with a first stab at such a list.
P3P is a (now defunct?) project that i really ought to research further, but basically seems to be exactly what i’m discussing here. It might include the necessary standards that I just mentioned.
If P3P is now defunct, why did it fail? As I recall from our conversation that friday, the answer was “nobody implemented it.” I’d like to close with this thought: perhaps we are at a unique moment where P3P or something similar is about to have many great opportunities to be adopted, if the right people talk about it soon. Let me explain.
During my last week in San Francisco, I saw Evan Prodromou of identi.ca and autonomous, as well as my boss Nathan Yergler and Google’s Chris DiBona, speak at CC Salon SF. Evan talked specifically about Free Network Services, and one thing that he said that really struck me with its blunt simplicity was that we need to basically clone all networking websites … twitter, facebook, dopplr, digg, last.fm … everything. Before you accuse Evan of trivializing the development of Free Software, I should note that he also said that we could make this process fun and improve on these services in ways beyond simply making them free. Indeed, the project is already under way, with sites like identi.ca and libre.fm already picking up steam, and mumblings about many others floating around.
Perhaps privacy is relevant enough to computing freedom that it ought to be included in any sort of definition of a Free Network Service. Perhaps not. Either way, there is certainly a great deal of overlap. Libre.fm even devotes (at the time of writing) almost half of its home page to a statement about its liberal privacy policy.
My point is that if we’re going to be rebuilding the social web right now—and we are—then we ought to make sure that it ships with a “solution” to privacy. We need to make discussions about a P3P-like system part of our discussions about Free Network Services.
It was a friday night, early last quarter at Dartmouth. Nobody had much work to do. I could have gone out to some shitty frat party and had an okay time. I could have sat in my room and listened to music or blogged or something.
Instead, I threw some duct tape in my backpack and headed to Cole’s dorm. I’d had a lot of fun hanging out with some friends from hellosilo over spring break, who turned me onto the idea of coworking. Perhaps the best thing one can do with her time is create things. Having interesting conversation is a close second. With coworking, she gets both.
The next morning, a lot of my friends had nothing to show for their previous nights’ work other than the occasional hangover. I had a chunk of duct-taped cardboard. I think I win.
Attention Tree! by Philipp Klinger (Off to Ville de Québec) on flickr, cc-by-nc-nd
That is, every set of information and logic should be organized into a data structure, such as a tree.
I came up with this idea while I was taking a course called Technology and Power, which was basically a philosophy course. We read Heidegger, and I decided that I hated philosophy (we later read some other philosophers who convinced me that I didn’t _actually_ hate philosophy, such as Foucault). But I just kept thinking, “there’s got to be a better way to get these ideas across.” We happened to be working with file structures in a computer science course that I was taking at the time.
As I was reading all of this philosophy, I quickly became skeptical that the authors were descending into poetry and losing track of the logical flow of their arguments. This is something that I do _not_ appreciate. I like poetry and I like logical arguments, but mixing of the two does both a disservice. A statement that sounds profound and important and interesting and just ooey-gooey-delicious is no substitute for a statement that is supported by logic. Read the rest of this entry »
This was my mid-term paper for Ehrlich’s FILM42 class in Spring ‘09. Embed of the animation below, followed by the paper after the jump (no promises that the video will remain, but the wikipedia page sure isn’t going anywhere).