Category: Data Custody

Data Custody Decentralisation and Neutrality

Yet another smart device rendered useless

Post author By Rahul
Post date 2020-08-04

More news about an Internet-connected hardware product being rendered unusable as a result of a business decision: “Canadian smart glasses going ‘offline’ weeks after company bought by Google“

North said Focals 1.0, its first generation of smart glasses released last year, will be discontinued. The wearables company also said it has cancelled any plans to ship its second-generation Focals 2.0.

“Focals smart glasses and its services are being discontinued and will no longer be available after July 31, 2020. You won’t be able to connect your glasses through the app or use any features, abilities, or experiments from your glasses,” the statement read.

As of Saturday, users will no longer be able to log into the Focals app and its support services will be discontinued. The app will also be removed from Google Play and the Apple App Store.

We have seen many examples of this happening before: “Smart devices are services, not products”, “More on the inherent temporariness of internet-connected devices“.

If your device or appliance requires an Internet connection to the manufacturer to function, it’s a big risk. Be very conscious of this when you make purchases. Consider devices that work with the open source home automation framework OpenHAB, as an alternative and invest the time needed to make it work – it’s not an out of the box experience but now you alone get to decide how long your device lasts.

Communities

Post author By Rahul
Post date 2020-07-25
2 Comments on Communities

We discussed the coming explosion of independent publishers in a large number of niches that combine content, community and commerce.

Steading a community is different from having a large number of followers. The former venture capitalist Li Jin describes the hallmarks of a true community:

I believe the following need to be present: high intentionality, P2P interactions, & UGC content.

1) Intentionality: Members seek out the community as a destination, not just as part of a broader platform’s feed

2) P2P interactions: Strong engagement and ties between members

3) UGC content: Members contribute content vs. just engaging w/ what’s broadcasted to them

Just like a publisher’s content can be across a site, Instagram, Twitter, newsletter, a YouTube channel, the corresponding communities can exist in a variety of places.

The journalist Jon Russel, currently of The Ken, runs his own group on Telegram that, as of this writing, has over five hundred and seventy members.

The writer Jacob Lund Fisker‘s Early Retirement Extreme community runs as a bulletin board.

Azeem Azhar runs both his newsletter and his community on Substack using Substack’s discussion threads feature named, well, Community. Here is an example paid newsletter issue with its community.

Many others run private Slack groups.

Interesting to me is that these communities are almost all off the public web and in the dark forests of the Internet, not indexable by Google and other search engines. As Li Jin describes above, truly vibrant communities may form because of a common interest in the publisher’s content, but it is their discussion that adds the most value. Their not being open to the internet if what engenders their openness.

Audience as Capital Data Custody

A mistake, online presence and ownership

Post author By Rahul
Post date 2020-07-16

This is a short but important one. Earlier this week, Google lost ownership of the blogspot.in domain. On the day this post was written, it is still marked ‘for sale’.

If you created your blog in India, Google automatically redirected the <your user name>.blogspot.com domain to <your user name>.blogspot.in. Over time, the latter became the domain most sites would link to. This means with this change, all the links that referenced your blog via blogspot.in – Twitter, Facebook, other blogs, even Google search results – are now broken.

In the new Fire 2.0 era your presence online is your capital, meaning the ownership of your identity must reside with you. Not with Linkedin or Twitter or Facebook or about.me. Nor with Google or WordPress or Substack, all of whose business priorities and decisions are independent of yours. These can all be destinations where you publish, build your tribe, create your reputation, but your canonical identity should be your own domain.

This instance seems to have been a mistake but it could well be simply Google deprecating country-specific domain names – a business decision. Like its decision to shutter Google+. Or Yahoo’s to shut down Geocities. If your blog was on one of those, it’s gone from the Internet. Or it could have been on Livejournal, sold by the American company SixApart to a Russian media entity, now conforms to Russian law and serves Russian ads. In all of these cases, you do not have control over your presence.

If, like Google, if ever you forget to renew your domain, let it be your oversight.

Audience as Capital Data Custody Discovery and Curation Making Money Online The Dark Forest of the Internet

For newsletters to become the new blogs, discovery is the missing piece

Post author By Rahul
Post date 2020-07-10
1 Comment on For newsletters to become the new blogs, discovery is the missing piece

The last couple of posts described why online archival of sites and blogs is something I’m interested in. Specifically, the web is getting old, domains expire, blog hosting services change. That reminded me of this article from 2013 by the blogger Jason Kottke:

Instead of blogging, people are posting to Tumblr, tweeting, pinning things to their board, posting to Reddit, Snapchatting, updating Facebook statuses, Instagramming, and publishing on Medium. In 1997, wired teens created online diaries, and in 2004 the blog was king. Today, teens are about as likely to start a blog (over Instagramming or Snapchatting) as they are to buy a music CD. Blogs are for 40-somethings with kids.

Kottke himself is one of the Internet’s most well-known, longest-published bloggers, having written for twenty-two years running, with well over ten of those full-time. But his essay highlighted a trend that has continued unabated. There are more people writing online than ever before, but that has increasingly been on closed platforms like Medium.

The trend around newsletters is encouraging. We have talked before of how major journalists moving to their own newsletters could even spawn a wave of independent, reader-supported journalism. There are many hundreds of high-quality newsletters now, to the point where discovering them is going to be an issue. There is no good search/browse/recommend for newsletters yet.

Newsletters are email, a technology much older than the web itself. But they’re easier to keep track of someone’s writing than a blog. RSS and RSS Readers never really caught one because it was one more piece of software readers had to use, but everyone has an email inbox. For the writer, publishing an email is as simple as, probably simpler than publishing a blog post.

The downside is discovery – where do you find interesting things people are writing?

Discovery is going to particularly important if newsletters are to thrive as an easy means of causal writing and distribution for the average person – because while newsletters have been around from very early on in the form of people just mailing a group of friends and growing organically from there, the latest wave of newsletter services typefied by the venture-funded Substack for who monetization is an important goal. That changes what the service optimizes discovery and promotion for: newsletters about topics that are ‘current’, that have the highest chance of conversion to paid, and not the long tail. It starts looking like other Silicon Valley businesses:

Arguably, it’s another example of money and prestige coming for an internet-age creative format that was better when it was a hush-hush community activity—non-remunerative, an anti-discovery algorithm, full of in-speak, artistically strange (see: podcasts, blogs, fan fiction, memes).

Without discovery, newsletters aren’t going to replace social media as the place most people share what’s interesting to them. Nevertheless, they remain an extremely hopeful medium for independent, direct-to-reader journalism.

Data Custody Decentralisation and Neutrality Privacy and Anonymity

The Library of Congress and online archival – Part 2

Post author By Rahul
Post date 2020-07-09

(Part 1)

Online archival is important to me. I am particularly interested in blogs that still have great value but are no longer maintained – some of these are by people who I had followed in the 2000s. Some of these are friends who have long since stopped writing other than on social media.

If their writings are on third-party services like Blogspot, the service itself can be shut down or they can be taken down because of inactivity. If they are on their own domain, the owner may allow the domain to expire.

In some cases, the owner may deliberately erase posts, asking even the internet Archive to delete its records. The current head of the Microsoft-owned Github, Nat Friedman, used to write a fun, eclectic, useful, and – to me – inspirational blog that blended his personal and professional lives. Some years ago it was wiped clean of the content I used to follow. More recently it was wiped again. Now it’s just a Medium-hosted blog with a half-dozen posts. I respect Nat’s decision to not have his old life displayed online. I just wish I had my own archive of it, one that I of course intend to keep private.

For now I have a short list of sites that I have downloaded using wget, with flags to download images and other linked content, and change URLs to local ones so I can browse the site offline. i’m interested in whether the US Library of Congress’ online archival format, web ARChive, and its toolset, is an improvement.

Endnote: Archiving entire blogs or websites is different from individual articles, of course. We’ve seen my iOS shortcut that both saves a Markdown-formatted cruft-less version of online articles locally as well as optionally saves to one of Instapaper, Pocket or Evernote.

(ends)

Data Custody Decentralisation and Neutrality Uncategorized

The Library of Congress and online archival – Part 1

Post author By Rahul
Post date 2020-07-08

This past weekend I read about the US Library of Congress’ online archival system, partly out of simple fascination with the scale at which they operate, and partly to learn from it, to create my own offline archive of web pages and websites that are important to me.

The Library of Congress’ site describes the process:

The Library’s goal is to create an archival copy—essentially a snapshot—of how the site appeared at a particular point in time. The Library attempts to archive as much of the site as possible, including html pages, images, flash, PDFs, and audio and video files to provide context for future researchers. The Library (and its agents) use special software to download copies of web content and preserve it in a standard format. The crawling tools start with a “seed URL” – for instance, a homepage – and the crawler follows the links it finds, preserving content as it goes. Library staff also add scoping instructions for the crawler to follow links to that organization’s host on related domains, such as third party sites and social media platforms, based on permissions policies.

The Library of Congress uses open source and custom-developed software to manage different stages of the overall workflow. The Library has developed and implemented an in-house workflow tool called Digiboard, which enables staff to select websites for archiving, manage and track required permissions and notices, perform quality review processes, among other tasks. To perform the web harvesting activity which downloads the content, we primarily use the Heritrix archival web crawler External. For replay of archived content, the Library has deployed a version of OpenWayback External to allow researchers to view the archives. Additionally, the program uses Library-wide digital library services to transfer, manage, and store digital content. Institutions and others interested in learning more about Digiboard and other tools the Library user can contact the Web Archiving team for more information. The Library is continually evaluating available open-source tools that might be helpful for preserving web content.

It’s extremely encouraging that it explicitly specifies open-source tools. The most interesting part to me is the data format it uses:

Web archives are created and stored in the Web ARChive (WARC) and (for some older collections) the Internet Archive ARC container file formats.

I am now digging into the tools available to save, search and view articles in this format.

(Part 2 – A little more on why this is important to me)

Audience as Capital Data Custody Making Money Online Privacy and Anonymity Real-World Crypto

On the independence of editorial and business during business model transitions

Post author By Rahul
Post date 2020-07-04

This Financial Times article on the effect of the pandemic on the already precarious state of newspapers’ finances is a good read overall. And at least during the pandemic, it is not behind the FT’s strict paywall.

This little bit in particular stood out for me:

While the audience for online news jumped to new highs during the pandemic, most sites convert fewer than 1 per cent of website visitors into paying readers. Although there are no sector-wide figures, some publishers admit most of those that do pay in America and Europe are older, more wealthy and white.

If it is the dominant class in any market that is the one that pays, there is a risk in the newspaper biasing its coverage towards the interests of that class. Today’s advertiser-driven model carries the same risk – does the move to paid subscriptions simply swap one set of patrons for another?

All media has had tension between business and editorial, and good media has always had a wall between the two sides. But that tension is heightened at times of major business model transitions like this. In the new model, you have a direct relationship with your audience, which pays you. When you lose them, you lose both your readership and your revenue. Independence of editorial gets harder.

This is going to be the big test for both news organizations and independent publishers with the inevitable move to pay-to-read.

End note:

One model is to rely entirely on donations, and force them to be anonymous, like via cryptocurrency. We explored this briefly in part 4 of our series on 21st Century Media. Now neither side of the news organization has any way of knowing who the audience is. It is unclear if there is a natural upper bound on how large of a news organization can run on donations alone. That altruism seems to be the natural governance model for the internet doesn’t mean it is a viable business model.

Another variation of this model could be for news organizations to move to subscriptions, but for a third party neutral organization to act as the trustee of the identities of subscribers. Now this organization could be supported by donations, but now we’re talking about one or a handful that need to be supported, not every news org.

(ends)

Data Custody Privacy and Anonymity

More on the inherent temporariness of internet-connected devices

Post author By Rahul
Post date 2020-06-29

About a month ago, we saw how you never really own your internet-connected smart devices, how you’re essentially just renting them until it becomes inconvenient for the provider. This Wall Street Journal article I read has more examples of such devices and the consequences of them ceasing to work:

An automated pet feeder that stopped dispensing food even though its reservoir was topped up because the company was facing pandemic-related trouble.
A stationary bike, whose main selling point was live workout competitons with other owners, disabled all of its tech because it lost a legal dispute, leaving its bikes no different from traditional ‘dumb’ exercycles.
An in-vehicle diagnostic tool from 2016 that promised 5 years of 3G connectivity shut down along with the company itself, again because of pandemic-related business challenges.

These couple of lines towards the end of the article sum up the issue well

That’s the cost of the pace of technology today. The vinyl record has gone mostly unchanged for over 50 years, and my record player has never required a firmware update. All of our newer gadgets will likely be obsolete within three, four, or five years, depending on the abilities and willingness of the companies that make them. We pay for new gear, gumming up landfills with our retired, defunct cyber curios when we fail to recycle them properly.

Data Custody Privacy and Anonymity The Next Computer

Renting storage while being storage-rich

Post author By Rahul
Post date 2020-06-28

Something I wrote a few days ago has stayed with me. In describing my re-adoption of the P2P file-syncing tool Resilio Sync, I had said

It seems strange to me that I’m paying to rent a few dozen GB on some company’s servers far away when I have already paid for hundreds of GB of high-performance storage on all my devices: my iPhone has 128GB, my iPad 256GB, my Macbook 250GB – all solid-state…

I also have two spinning-desk WD external hard drives: one 2TB another 1TB. Taken together that is a lot of storage. And yet I pay for 200GB for iCloud every month at INR 219 in India, which is roughly the USD 2.99 Apple charges in the US. The 2TB drive now costs INR 5700, which is just 26 months of my iCloud fees. Put another way, I could buy a new drive roughly every two years, even assuming prices don’t drop, for what I’m paying Apple to host my data.

My iCloud storage looks like this:

So there’s still a lot of free space, most of the used space is Photos, and the next biggest contributor is iCloud Drive followed by Backups.

Now I have always wanted to find a better way of managing my photos. In terms of data custody, Apple Photos stores all photos in its proprietary library database, so while my photos are on-disk they are not in open format. In addition, syncing with iCloud is near-hopeless – even leaving my external hard drive (where my Library resides) into my Macbook Pro overnight doesn’t complete the sync, and causes my external drive to heat up uncomfortably. To the point where it once shut down. So while this is not yet a solved problem, I now have one more incentive to solve photo management.

Back with I had an iPod Touch (2008), iPhone 4 (2011), iPhone 5 (2012), I used to diligently back up to disk with iTunes. Some time after, though, I probably gave in to iOS prompting and switched to iCloud backups. There’s more than enough free space on my Macbook Pro to back up my iPhone and iPad, and I can always move the backup file to an external hard drive. The Macbook Pro itself is backed up to an external Time Machine drive, so the backups are safe. Plus of course my iTunes collection is backed up as well. And, if I move them off iCloud, my photos too. At some point in the past I had set up rsync to one-way mirror my Time Machine disk to the other (larger) external drive, so I can have that extra layer of redundancy if I like (the drives are mostly unused).

Finally, iCloud Drive. On Mac OS, the sync service doesn’t really matter. Files are files, in a hierarchy of directories. It does matter on iOS though. But as I wrote in the previous post that I quoted from at the beginning, Resilio Sync is now a first-class file provider and not that different from using iCloud Drive. My devices are mostly on the same Wifi network for most of the day and in any case are linked to fast, cheap 4G internet.

Over the next few weeks, I’m going to try and migrate my data off iCloud to I can get my storage needs back to the free 5GB tier. And as a happy side effect, be more responsible about data custody.

End note: I used iCloud storage because it is easy to be lazy. As I said, I used to be diligent about backing up my iPhone to my Mac via iTunes and my Mac to a drive via Time Machine. At some point I opted in to have both backed up to iCloud [1], because signing up to a paid plan was as easy as an in-app purchase, and it was reassuring seeing all your devices backed up:

I traded time and discipline, both of which I have, for money. For many people it is the right choice to make. For me, it’s not. And that needs to change.

[1] Well, the Mac through the Desktop and Documents sync with iCloud.

Data Custody

Google Docs, Notion and collaboration paradigms

Post author By Rahul
Post date 2020-06-27

A recent blog post about the state of Google’ services caught my attention. The provocatively titled “Google blew a ten-year lead” makes the case that innovation across many services – Google Docs, Sheets, Drive, Gmail, among others – has stagnated. This part about “office” software in particular:

Docs and Sheets haven’t changed in a decade. Google Drive remains impossible to navigate. Sharing is complicated.

I’ve given up on Google Docs. I can never find the documents Andy shares with me. The formatting is tired and stuck in the you-might-print-this-out paradigm. Notion is a much better place to write and brainstorm with people.

When it comes to Docs, Google’s lead was well over ten years. I was an early user of Writely, which Google bought in 2006. What is routine today was magical fifteen years ago. After that it quickly launched the triad of cloud word processor, spreadsheet and presentation software complete with auto-save and real-time collaboration.

And in 2012, Drive brought it all together. Now you had a browser-based way to manage all your Google documents plus other files that you chose to upload. This was a pretty good place to be in 2012.

Since then, Drive + Docs/Sheets/Slide have steadily improved. There’s now richer editing, versioning, APIs, cross-application embedding (a portion of a Sheet inside Docs), publish to the web, fine-grained sharing and increasingly capable mobile apps. Drive behaves in some ways like a desktop application with right-click menus, multi-item select, drag-and-drop – all inside a browser.

But when the writer compares Docs to Notion, you realise that what I just described is iterative improvements on the old desktop-based paradigm pioneered in the 80s by Lotus and then Microsoft Office [1]. Notion is internet-native documents [2]. They resemble web pages more than documents. They blend together databases and linear pages and can switch between those views. Hierarchies are seamless and natural. Collaboration is workspace-first, which is really how teams work, distributed or not. I may be a skeptic about the use of Notion as a general-purpose information management system, but I think it is more naturally suited to online collaborative work than Google Docs. It is, quite literally, a paradigm shift.

(ends)

[1] This is not to diminish the work that has gone into this.

[2] The super-new Roam Research is interesting as well.