Turtle vs RDF/XML vs N-Triples vs JSON-LD
Edit (04/05/2019): I expanded this article to include JSON-LD and added the contents section below. I had not included JSON-LD originally because I never really use it but agree it needed to be added for completeness.
Introduction
What Do the Formats Look Like?
Which is the Best Format for Me?
Conclusion
TL;DR
Appendix
Very simply, linked data is stored as triples that consist of a subject, predicate and object. These triples connect entities to other entities or literals (more on them later) to create a directed knowledge graph:
In this example we can see that Tokyo is located in the country Japan and has an area of 2188km². The entities (Tokyo and Japan) and predicates each have URIs to identify them in a machine readable format. The area in the rectangle does not however as it is what is called a literal (a constant value) which we discuss later in detail.
What Do the Formats Look Like?
RDF is commonly stored in one of four formats: N-Triples (.nt), Turtle (.ttl), JSON-LD (.json) or RDF/XML (.rdf). Which you use is mainly down to preference as all of these formats are supported by the main RDF libraries and triplestores. There do exist benefits and disadvantages however that I will highlight later in this article.
Storing and reading RDF as N-Triples is simple as every line of a .nt file is a single triple (<subject> <predicate> <object>) that together form a directed knowledge graph:
Similar to the small example at the beginning of this article, circles represent entities, rectangles represent literals and arrows illustrate the predicates. This knowledge graph, with the 11 connections, is represented in N-Triples format as such
Note: Code blocks of each example can be found in the appendix
Here we have 11 triples containing information about Bob Marley and Jamaica extracted from DBpedia — the linked data representation of Wikipedia. For illustration purposes I have coloured entities red, predicates purple, literals green and literal tags plus datatypes orange.
The fifth triple in this example links the entity of Bob Marley to the entity of his birthplace, Jamaica. As you can see from the triples that follow, objects can be subjects (even predicates can be subjects or objects) to create the desired knowledge graph. You will notice that the typed literals in the ninth and tenth triples are numbers and have an attached type (hence the name typed literal), in this case float. This is often used to ensure numbers are valid, for example ages are usually integers. Finally in the eleventh triple, we can see the foaf ontology used once more to connect the entity of Jamaica to its homepage URL (subset of URI).
Reading (as a human) RDF in Turtle format is much easier as you can define prefixes at the beginning of the .ttl file, shortening each triple. Another feature of turtle is that multiple triples with the same subject are grouped into blocks (so the URI for Bob Marley for example is not repeatedly listed), for example:
This represents the exact same knowledge graph as the N-Triples above. In the top section, prefixes are defined so that the long repeated sections of the URIs can be written in their short form. For example the lin
can be shortened in Turtle to
You can see that the data about Bob Marley and Jamaica is separated into separate blocks. This grouping along with defined prefixes, makes Turtle format a lot easier to understand than N-Triples. If you notice, related information to the subject is separated with a semi-colon and finished with a full-stop and a newline to indicate a new subject.
Next we have JSON-LD which is an older format as it fits linked data into an already existing format - JSON. Since its creation, Turtle which is more humanly readable was created but I compare these in more detail later in this article.
I have again colour coordinated this example which represents the same knowledge graph:
For clarity, I am referring to JSON objects as “blocks” here to avoid the confusion between JSON objects and linked data objects (subject, predicate, object).
Each entity (coloured red) is declared on the top level. This is why the entities within the Bob Marley and Jamaica blocks are declared again at the end of the .json file.
The oldest RDF format is RDF/XML which is not used as often anymore but is still standard due to this fact. Again, here is RDF/XML representing the exact same knowledge graph as the N-Triples, Turtle and JSON-LD above.
Like Turtle, prefixes can be defined at the top of RDF/XML files to avoid unnecessary repetition of URIs. As we can see however, RDF/XML is still not as humanly readable as Turtle but there are a few considerations to make before choosing which format you should store your RDF in
Which is the Best Format for Me?
It might seem like an obvious decision to always store your linked data in the much prettier (most humanly readable) Turtle format but, as I mentioned above, there are advantages and disadvantages to using each of the above formats.
As mentioned, RDF/XML was the first RDF format created and is therefore considered the standard format. This means that most RDF libraries and triplestores output RDF in this format by default.
If you want to work with legacy RDF systems or want to use XML libraries to manipulate your data (as RDF/XML is valid XML) then RDF/XML should be the format you use.
This format however was created in attempt to store a new data structure in an old format. RDF/XML is therefore falling in popularity due to the benefits of the other newer RDF formats. If you are using a newer triplestore, you cannot load RDF/XML for example.
Similarly to RDF/XML, JSON-LD is an attempt to store a new data structure into an existing format.
The advantages therefore are similar as JSON-LD is valid JSON which means you can manipulate it with the many standard JSON libraries.
Again however, this format is dropping in popularity so newer triplestores don’t even support loading it.
Turtle is similar to RDF/XML in many ways but is quite simply, much nicer to look at. If your RDF is to be read by humans at any point, it is clearly best to store your RDF in Turtle format as all of the ‘mess’ is removed. This lack of ‘mess’ also makes Turtle the preferred format to use if bandwidth is an issue.
All modern RDF libraries and triplestores can work with Turtle RDF and as there are no opening and closing lines at the beginning and end of a Turtle file (unlike RDF/XML), the data can be streamed in blocks. Systems today often require live data streams or request data through APIs so the ability to stream linked data is something you must take into consideration.
N-Triples are simple, each line consists of a subject, predicate and object separated by a space and ending with a period. This makes N-Triples very easy to parse and like Turtle, all modern RDF libraries and triplestores can work with N-Triples.
N-Triples may seem expensive to store but this extreme verbosity, that makes them so easy to parse, assists modern compression techniques. RDF stored as N-Triples can therefore achieve highly efficient compression ratios.
As discussed above, Turtle can be streamed in blocks. N-Triples however can be streamed line by line which makes it much more robust. If a line of an N-Triples file is lost, one triple is lost… In a Turtle file however, that entire block is lost which could potentially be thousands of triples.
Similarly, if you rearrange the lines in an N-Triple file, your file is still valid and the knowledge graph that your triples represent is unaffected. If you do this however to a Turtle file, the RDF becomes invalid. This is a problem if you wish to efficiently process multiple incoming streams or API responses. If your system is processing N-Triples, your data can be handled as soon as you receive line one.
Triples are made up of three parts: a subject, predicate and object.
These combine to create a directed knowledge graph.
Entities are represented by URIs.
Objects can hold information (constant values) about their subjects and these are called Literals.
Plain literals are strings that can optionally have attached tags (such as a language tag)
Typed literals have attached datatypes (such as a number having the attached integer datatype)
Each example can be found here: