More than JSON: ActivityPub and JSON-LD

In which our hero discovers the power of normalization and JSON-LD

The problem with JSON

I’ve been doing a lot of research for my current side project, Pterotype. It’s a new kind of social network built as a WordPress plugin that respects your freedom, encourages choice, and interoperates with existing social networks through the power of ActivityPub. It’s undergone several iterations already – the beta has been out for a while now, and I’ve been working hard on a version 2 for the last several months.

One of the things I wasn’t satisfied with in the first version of Pterotype was the way it stores incoming data. ActivityPub messages are serialized in a dialect of JSON called JSON-LD. I didn’t really get JSON-LD when I started this project. It seems overcomplicated and confusing, and I was more interested in shipping something that worked than understanding the theoretical underpinnings of the federated web. So I just kept the incoming data in JSON format. This worked, sort of, but I kept running into annoying, hard-to-reason about situations. For example, consider this ActivityPub object, representing a new note that Sally published:

"@context": "",
"id": "",
"type": "Create",
"actor": {
"type": "Person",
"id": "",
"name": "Sally"
"object": {
"id": "",
"type": "Note",
"content": "This is a simple note"
"published": "2015-01-25T12:34:56Z"

The problem is that the above object, according to the ActivityPub specification, is semantically equivalent to this one:

"@context": "",
"id": "",
"type": "Create",
"actor": "",
"object": "",
"published": "2015-01-25T12:34:56Z"

This is the object graph in action – the actor and object properties are pointers to other objects, and as such they can either be JSON objects embedded within the Create activity, or URIs that dereference to the actual object (dereferencing is a fancy word for following the URI and replacing it with whatever JSON object is on the other side). Since I was representing these ActivityPub objects in this JSON format, that meant that whenever I saw an actor or object property, I always had to check whether it was an object or a URI and if it was a URI I had to dereference it to the proper object. This led to tons of annoying boilerplate and conditionals:

if ( is_string( $activity['object'] ) ) {
$activity['object'] = dereference_object( $activity['object'] );

Yikes. So I came up with what I thought was a clever solution: just walk the object graph and dereference every URI I found whenever I saw a new JSON object. So I would receive Sally’s Create activity and traverse the JSON representation of its graph, dereferencing the actor and object objects in the process. This effectively turned the second representation above into the first one. Problem solved, right?

Well, not quite. There are actually a bunch of problems with that approach. First, not all URIs in the JSON object should be dereferenced. For example, there is an ActivityPub attribute called url that is – you guessed it – a URL! And it is supposed to stay a URL, not get dereferenced to some other thing. Okay, so I’ll only dereference URIs that belong to attributes I know should contain references to other objects – actor, object, etc. But there’s still a problem! There’s no guarantee that we’ll be able to successfully dereference a URI. Maybe the server that was hosting that object went down. Maybe there’s a temporary network failure. Maybe it’s the year 3000 and bitrot has taken down 80% of the internet. The point is, even if we preemptively dereference all the URIs we can, we still need to handle the case where we couldn’t access the actual object and are stuck with the URI. Which means we still need those stupid conditionals everywhere!

JSON-LD to the rescue

So what’s the actual solution for this? Well, as it turns out these were exactly the types of issues that JSON-LD is designed to solve. JSON-LD provides a way to normalize data into a standard form based on a context that defines a schema for the data. Here’s the second version of Sally’s activity from above after undergoing JSON-LD expansion:

"": [
"@id": ""
"@id": "",
"": [
"@id": ""
"": [
"@type": "",
"@value": "2015-01-25T12:34:56Z"
"@type": [

So what’s up with those weird URL-looking attributes? And why has everything become an array?

The expansion algorithm has normalized the data into a form that is supposed to be universally normalized. The attributes object, actor, etc. – have become URIs with a universal meaning and a known schema. In other words, any application that speaks JSON-LD knows what an is, even if they don’t know what an actor is.

Importantly for our purposes, take a look at what the object field has turned into. We went from:

"object": ""


"": [
"@id": ""

Because the object attribute is specified in the ActivityStreams JSON-LD vocabulary to be of @type: @id, the expansion process was able to infer that object ought to be, well, an object. This neatly solves the problem of “is this string attribute actually a reference” – all references are clearly marked by their @id attributes now. Plus, this allows us to be smarter about when we dereference an object – for example, we can defer dereferencing until we actually need to access the attributes of the linked object. This approach also addresses the problem of network errors when dereferencing – if we can’t dereference, we just end up with an object that has only an @id, which can still be handled gracefully by the application.

Hopefully this gave some insight into the types of challenges involved with building ActivityPub-powered applications and the point of JSON-LD. Have questions? Did I do something wrong? Let me know in the comments or on the Fediverse!

ActivityPub: Good Enough for Jazz

Kaniini, one of the lead developers of Pleroma, recently published a blog post called ActivityPub: The “Worse is Better” Approach to Federated Social Networking. It’s a critique of the security and safety of the ActivityPub protocol. They make some good points:

  • ActivityPub doesn’t support fine-grained access control checks, e.g. I want someone to be able to see my posts but not respond to them
  • Instances you’ve banned can still see threads from your instance in some ActivityPub implementations, because someone from a third instance replies to the thread and that reply reaches the banned instance

The post also generated an interesting Fediverse thread discussing the tradeoffs between proliferating the existing protocol versus making changes to it, and whether it would be possible to improve the protocol without breaking backward compatibility. It’s worth a read.

Here’s the thing: ActivityPub is a protocol, and protocols are only valuable as long as there is software out there actually using the protocol. At the end of the day, that’s the most important measure of success. Don’t get me wrong – protocols need to do the job they set out to do well. But at some point, the protocol works well enough that it becomes more important to foster adoption than to continue improving. I believe that ActivityPub has reached that point.

Now, I’m not suggesting that we stop development on the protocol. But future improvements to it should be iterative, building on the existing specification, and backward compatible whenever possible. For example, by all means let’s come up with a better access control model for ActivityPub – but we should also come up with a compatibility layer that assumes some default set of access capabilities for implementations that haven’t upgraded. This lets us move forward without leaving the protocol’s participants behind, preserving ActivityPub’s value.

We are in good company here. This model is exactly how HTTP became the protocol that powers the internet. If you have the time, check out this excellent (brief) history of the HTTP protocol. Here are the highlights: Tim Berners-Lee came up with HTTP 0.9, which was an extremely simple protocol that allowed clients to request a resource and receive a response. HTTP 1.0 added headers and a variety of other features. HTTP 1.1 added performance optimizations and fixed ambiguities in the 1.0 specification.

Critically, all of these versions of HTTP were similar enough that a server that supported HTTP 1.1 could trivially also support HTTP 1.0 and 0.9 (because 0.9 is actually a subset of 1.1). In fact, the Apache and Nginx web servers, which power most websites on the internet, still support HTTP 0.9! By designing and iterating on HTTP in a way that preserved backward compatibility, the early web pioneers were able to build a robust, performant, secure protocol while still encouraging global adoption.

If we want the Fediverse to be just as robust, performant, secure, and globally adopted, we should take the same approach.

Announcing Pterotype

In my last post, I wrote about an emerging web standard called ActivityPub that lets web services interoperate and form a federated, open social network. I made an argument about how important this new standard is – how it tears down walled gardens, discourages monopolies and centralization, and encourages user freedom.

I genuinely believe what I wrote, too. And so, to put my money where my mouth is, I’m excited to announce Pterotype! It’s a WordPress plugin that gives your blog an ActivityPub feed so that it can take advantage of all the benefits ActivityPub has to offer.

Why WordPress?

My mission is to open up the entire internet. I want every website, every social network, and every blog to be a part of the Fediverse. And WordPress runs literally 30% of the internet. It’s not my favorite piece of software, and I certainly never expected to write any PHP, but the fact is that writing a WordPress plugin is the highest-impact way to grow the Fediverse the fastest.

So wait, what does this actually do?

Great question, glad you asked. Pterotype makes your blog look like a Mastodon/Pleroma/whatever account to users on those platforms. So, if you install Pterotype on your blog, Mastodon users will be able to search for in Mastodon and see your blog as if it was a Mastodon user. If they follow your blog within Mastodon (or Pleroma, or…), your new posts will show up in their home feed. This is what I meant in my last post about ActivityPub making sites first-class citizens in social networks – you don’t need a Mastodon account to make this work, and your content will show up in any service that implements ActivityPub without you needing an account on those platforms either.

Here’s what this blog looks like from Mastodon:

The plugin also syncs up comments between WordPress and the Fediverse. Replies from Mastodon et. al on your posts will show as WordPress comments, and comments from WordPress will show up as replies in the Fediverse. This is what I meant about tearing down walled gardens: people can comment on your blog posts using the platform of their choice, instead of being limited by the platform hosting the content.

Sounds amazing! Can I use it now?

Yes, with caveats. Pterotype is in early beta. The core features are in there – your blog will get a Fediverse profile, posts will federate, and comments will sync up – but it’s a pretty fiddly (and sometimes buggy) experience at the moment. If you do want to try it out, the plugin is in the plugin repository. If you install it on your blog, please consider signing up for the beta program as well – it’s how I’m collecting feedback and bug reports so I can make the plugin the best that it can be.

If you’d rather just follow my progress and dive in when it’s finished, that’s fine too! I made my development roadmap publicly available, and the plugin itself is open-source on GitHub. I’m currently doing a major refactor, pulling out all of the ActivityPub-related logic into its own library – once that’s done, it’ll be back to business as usual adding features and stability to Pterotype.

If you’ve read this far, and this project resonates with you, then you might be interested in becoming a sponsor on Patreon. Pterotype is free and open-source, so this is its only source of funding. For moment-to-moment updates, you can follow me on Mastodon.

See you on the Fediverse!

What is ActivityPub, and how will it change the internet?

A new kind of social network

There’s a new social network in town. It’s called Mastodon. You might have even heard of it. On the surface, Mastodon feels a lot like Twitter: you post “toots” up to 500 characters; you follow other users who say interesting things; you can favorite a toot or re-post it to your own followers. But Mastodon is different from Twitter in some fundamental ways. It offers many more ways for users to control the posts they see. It fosters awareness of the effect your posts have on others through a content warning system and encourages accessibility with captioned images. At its core, though, there’s a more fundamental difference from existing social networks: Mastodon isn’t controlled by a single corporation. Anyone can operate a Mastodon server, and users on any server can interact with users on any other Mastodon server.

This decentralized model is called federation. Email is a good analogy here: I can have a Gmail account and you can have an Outlook account, but we can still send mail to each other. In the same way, I can have an account on, and you can have an account on, but we can still follow each other, like and re-post each other’s toots, and @mention each other. Just like Gmail servers know how to talk to Outlook servers, Mastodon servers know how to talk to other Mastodon servers (if you hear people talking about a Mastodon “instance”, they mean server). And just like Gmail and Outlook are controlled by different corporations, Mastodon servers are owned and operated by many different people and organizations. If you wanted, you could host your own Mastodon instance!

Why does this matter? It means that Mastodon users have choice about where they hang out online. If Twitter decides that your posts shouldn’t be on their platform, they can shut down your account and there’s nothing you can do about it (or conversely, if they decide your f-ed up content is totally fine, there’s nothing anyone else can do about it). On the other hand, if you disagree with the administrators of your Mastodon instance, you have the choice to move your account to another instance (switching providers, as it were) or to host your own instance if you’re willing to dedicate the time and effort.

The federated model also tends to align incentives better than centralized alternatives. Mastodon instances are usually run and moderated by members of the community that uses that particular Mastodon server – for example, I’m part of a community of tech folks over at, and our server is administrated and moderated by a member of the community. He has a vested interest in making a nice place to hang out since he hangs out there too. Contrast that with Twitter: Twitter is owned and operated by a massive, venture-backed, for-profit corporation. Now, I’m certainly not against companies making money (more on that later), but Twitter only cares about making Twitter a nice place to hang out to the extent that they profit by it, which has led them to make some user-unfriendly choices.

So Mastodon is pretty cool. But that’s not what really gets me excited. I’m excited about how Mastodon servers allow users on different instances to interact. It’s a protocol called ActivityPub, and it’s going to change the way the internet works.


ActivityPub is a social networking protocol. Think of it as a language that describes social networks: the nouns are users and posts, and the verbs are like, follow, share, create… ActivityPub gives applications a shared vocabulary that they can use to communicate with each other. If a server implements ActivityPub, it can publish posts that any other server that implements ActivityPub knows how to share, like and reply to. It can also share, like, or reply to posts from other servers that speak ActivityPub on behalf of its users.

This is how Mastodon instances let users interact with users on other instances: because every Mastodon instance implements ActivityPub, one instance knows how to interpret a post published from another instance, how to like a post from another instance, how to follow a user from another instance, etc.

ActivityPub is much bigger than just Mastodon, though. It’s a language that any application can implement. For example, there’s a YouTube clone called PeerTube that also implements ActivityPub. Because it speaks the same language as Mastodon, a Mastodon user can follow a PeerTube user. If the PeerTube user posts a new video, it will show up in the Mastodon user’s feed. The Mastodon user can comment on the PeerTube video directly from Mastodon. Think about that for a second. Any app that implements ActivityPub becomes part of a massive social network, one that conserves user choice and tears down walled gardens. Imagine if you could log into Facebook and see posts from your friends on Instagram and Twitter, without needing an Instagram or Twitter account.

So here’s how ActivityPub is going to change the internet:

No more walled gardens

ActivityPub separates content from platform. Posts from one platform propagate to other platforms, and users don’t need to make separate accounts on every platform that they want to use. This has an additional benefit: since your ActivityPub identity (your Mastodon account, your PeerTube account, etc.) is valid across all ActivityPub-compliant applications, it serves as a much stronger identity signal, preventing malicious actors from impersonating you (e.g. creating a Twitter account in your name). If you can share one account across multiple platforms, no one can pretend to be you on those platforms – you are already there!

Social networking comes built-in

With traditional internet media, you need to rely on external services to share your work on social networks. If you want people to share your YouTube video around, you need to post it to Facebook or Twitter. But ActvityPub-enabled applications are social by nature. A PeerTube video can be shared or liked by default by users on Mastodon. A Plume blogger can build an audience on Mastodon or PeerTube without any additional effort since Mastodon and PeerTube users can follow Plume blogs natively. Users on all these platforms see content from the other apps on the platform of their choice. And Mastodon, PeerTube, and Plume are just the beginning – as more platforms begin implementing ActivityPub, the federated network grows exponentially.

Network effects that help users instead of harming them

“Network effects” leaves kind of a dirty taste in my mouth. It’s usually used as a euphemism for “vendor lock-in”; the reason that Facebook became such a giant was that everyone needed to be on Facebook to participate in Facebook’s network. However, ActivityPub flips this equation on its head. As more platforms become ActivityPub compliant, it becomes more valuable for platforms implement ActivityPub: more apps means more users on the federated network, more posts to read and share, and more choice for users. This network effect discourages vendor lock-in. In the end, the users win.

It’s going to be an uphill battle

I hope I’ve convinced you of the radical impact that ActivityPub could have on the internet. But there are some significant barriers preventing widespread adoption. The thorniest one is money.

Why is money an issue? Aren’t Mastodon and PeerTube free and open-source? Well, first of all, open source projects need funding too (that’s a big topic that deserves its own blog post, so I’m leaving it alone for now). The bigger issue right now is user adoption. The ActivityPub network is only viable if people use it, and to compete in any significant way with Facebook and Twitter we need a lot of people to use it. To compete with the big guys, we need big money. We need to be able to spread the word through marketing and blogging, to engineer new ActivityPub applications, and to support people working full-time on bringing about this revolution.

I know this isn’t necessarily a popular view in the open-source world, but I see funding as a critical priority to bring about the vision that ActivityPub promises. Unfortunately, it’s not clear how to obtain it.

All the major ActivityPub-compliant applications I’ve written about are open source projects, built and run by volunteers with tiny budgets. Traditional social networking companies like Twitter and Facebook are funded by selling advertisements on their platform. But in order to make any significant revenue from ads, you need a centralized audience whose attention you control. Facebook needs to be able to say, “we have X billion users; give us your money and we will show them your ads”. Plus, the big social companies extract value from their users by segmenting them based on their behavior and interests, enabling micro-targeted ad campaigns.

None of that is possible when the users and content are spread across many servers and platforms. There is no centralized audience to segment and advertise to. We’ll need to rethink the fundamental business model of social networking if we want ActivityPub to take off.

That being said, I do think ActivityPub offers tremendous business value. It turns your corporate blog into a social network by offering native sharing, following, liking, and replying. And it does so on your customer’s terms, which not only prevents abusive, spammy content but also helps your company’s reputation with users and potential customers. These benefits are valuable, and I think there is a way to turn that value into funding.

It’s important to think about how to make this revolution happen. ActivityPub has the potential to change the way we think and act on the internet, in a way that encourages decentralization and puts users first again. That’s a vision worth fighting for.

A DSL for Music

Haskell School of Music

I recently discovered Haskell School of Music. It’s a book about algorithmic music, which is awesome because: a) I’ve been obsessed with procedural generation for years and b) I like music as much as I like programming. So you can imagine my excitement when I discovered that someone had written a textbook combining my favorite areas of study.

Haskell School of Music is aimed at intermediate-level CS students, so it covers a lot of the basics of functional programming. It aims to be an introduction to the Haskell programming language while also thoroughly examining computer music. It starts simply by defining the data structures that represent music, and progresses to functional programming concepts, procedurally generating music, and doing signal processing and MIDI interfacing to actually play the songs.

I like Haskell, but I want to write music in Clojure. Why? First of all, because it’s the One True Language (it’s fine if you disagree with me – your opinion is valid even if it’s objectively incorrect). But more importantly, Clojure excels as an environment for writing domain-specific languages (DSLs). And as it turns out, writing algorithmic music using a DSL is a major win. Not only are DSLs expressive enough to portray creative expression, but DSLs written in Lisps are inherently extensible – whereas Haskell’s static typing adds barriers to extensibility. There’s also an excellent music synthesis library for Clojure, Overtone, that I want to be able to take advantage of.

Before we can explore what a DSL for music would look like, we need to understand how HSoM represents music as data.

Music as data

HSoM breaks music down into its component pieces. It represents music using Haskell data structures:

type Octave = Int
data PitchClass = Cff | Cf | C | Dff -- ...etc, all the way to Bss
type Pitch = (PitchClass, Octave)
type Duration = Rational
data Primitive a = Note Duration a
                 | Rest Duration
data Control =
  Tempo Rational
  Transpose Int
  -- ...a bunch more
data Music a =
  Prim (Primitive a)
  | Music a :+: Music a
  | Music a :=: Music a
  | Modify Control (Music a)

Many of these type declarations are straightfoward, but a couple bear further discussion. A PitchClass is an algebraic data type representing all of the pitches: C#, Ab, F, and so on. By pairing a pitch class with an octave, we get a Pitch, which represents a specific note (for instance, middle C would be (C, 4). A Primitive is a basic music building block, either a note or a rest. Note that it is polymorphic: this is so that we can define types like Note Duration Pitch but also types like Note Duration (Pitch, Loudness) so that we can attach additional data to a primitive if we need to. A control represents the concept of making a modification to some music by changing the tempo, transposing it, or otherwise changing the output while keeping the underlying notes the same.

The Music type is where things get really interesting. It’s an algebraic data type representing the concept of music in general. In fact, it’s powerful enough to fully represent any piece of music, from Hozier to Bach. A Music value is one of four possible types: a Prim, which is either a note or a rest; a Modify, which takes another Music as an argument and modifies it in some way; the :+: infix constructor, which represents two separate Music values played sequentially; and the :=: infix constructor, which represents two separate Music values played simultaneously.

The Music type has some important properties. First, it’s polymorphic for the same reason that the Primitive type is. This allows us to attach any type of data we want to music primitives, letting us express any musical concept (volume, gain, you name it).

Second, three out of its four constructors are recursive – they take other Music values as arguments. This is the key that makes the data model so powerful. It allows you to model arbitrary configurations of notes, e.g. Note 1/4 (C 4) :=: Note 1/4 (E 4) :=: Note 1/4 (G 4) is a C major triad, and that expression evaulates to a Music value that can itself be passed to Modify, :+:, or :=: to weave it into a larger piece of music.

The result is an extraordinarily concise definition that still manages to encompass all possible pieces of music. Using these data structures, we can describe any song we can imagine.

But as powerful as this data type is, I wouldn’t call it a domain-specific language. The static type system makes it inflexible: how would you combine a Note 1/8 ((C 4) 8), representing a note with pitch and loudness, with a Note 1/8 (E 4), representing a note with just a pitch? Sure, you could write a function to convert from one to the other, but at that point you’ve lost elegance and flexibility.

Here’s where Clojure comes in.

A DSL for music with Clojure

What would a domain-specific language for music look like in Clojure? I found inspiration in the HTML templating library Hiccup. Hiccup represents HTML documents (a graph of complex nested nodes, just like music values) using Clojure vectors, like so:

[:div {:class "foo"} [:p "foo"]]

The Hiccup vectors are actually a DSL that can describe arbitrary HTML markup. Anything that can be expressed in HTML can be expressed using Hiccup. It straddles the line between data and code – the vector is flexible and expressive enough to represent any web page, but can be manipulated using standard Clojure library functions.

If we apply this idea to the data structure from HSoM, we end up with something like this:

;; notes and rests are maps
(def eighth-note-c
  {:duration 1/8
   :pitch [:C 4]})

(def eighth-note-e
  {:duration 1/8
   :pitch [:E 4]})

(def eighth-note-rest
  {:duration 1/8})

;; simultaneous music values
[:= eighth-note-c eighth-note-e]

;; sequential music values
[:+ eighth-note-c eighth-note-rest eighth-note-e]

;; modifying music values
 {:tempo 120      ;; the control
  :transpose 3}
 {:duration 1/8   ;; the note
  :pitch [C 4]}]

;; :=, :+, and :modify can operate on any music value,
;; including arbitrary nesting
[:modify {:tempo 120}
    [:= {:duration 1/4
         :pitch [D 4]}
      {:duration 1/4
       :pitch [F 4]}]
    [:= {:duration 1/4
         :pitch [C 4]}
      {:duration 1/4
       :pitch [E 4]}]]]

At first glance, this looks the same as the Haskell data types from HSoM. Both representations represent notes with pitch and duration; both use the :modify, := and :+ operators to compose music; both support recursive composition of any depth.

But the Clojure version is actually more expressive and flexible than the Haskell equivalent. A note can have any metadata we want attached:

{:duration 1/4
 :pitch [:C 4]
 :loudness 6}

Our DSL has no problem composing notes with differing metadata:

 {:duration 1/4
  :pitch [:C 4]
  :loudness 6}
 {:duration 1/8
  :pitch [:Eb 4]
  :staccato true}]

Furthermore, because Clojure is dynamically typed and supports duck typing via map keywords, we can write functions that operate on all notes and music values, even those with unexpected metadata.

Like the Hiccup vectors, our music vectors blur the boundary between a DSL and a data structure. The vectors are expressive enough to represent any musical concept, but can still be passed around and operated on by normal Clojure functions. As an added advantage, the vectors look similar enough to the HSoM data structures that I can easily follow along with the textbook using Clojure and Overtone.

What’s next

So I have a way to represent music in Clojure now. What’s next? Haskell School of Music ships with a library called Euterpea that knows how to turn the Music data structure into actual sound. So the next step for me is probably porting something like that to Clojure. I’m hoping to offload most of that work to Overtone. After that, I’ll explore algorithmic composition using the techniques outlined in HSoM. Stay tuned!