Why the Semantic Web Is Hard (Part I)

Why the Semantic Web Is Hard (Part I)

A long time ago I worked on the Protege project, building tools to help people represent knowledge. At the time, the Semantic Web was starting to become more important, and I was very optimistic. Roughly speaking, the core thoughts going through my head were along the lines of:

  • The core problem with knowledge representation is that people keep building small-and-incompatible knowledge bases. If we want to build what is now known as General Artifical Intelligence, we need to have much bigger knowledge-bases. If you only have two dozen concepts and fifty instances, how the knowledge is represented, and the nuances of the representation language, just doesn’t matter (in much the same way that if you’re only writing a very small program, most programming languages will work fine).
  • Building large-scale knowledge bases is hard. It requires a huge amount of effort, and the serious participation of many experts, to build a useful knowledge base.
  • Maintaining a large-scale knowledge base is probably even harder than building it. You need to be able to spot false statements, you need to be able to say “that was true in 1993, but not in 2001, but then again in 2004”, you need some form of automated reasoner that can spot inconsistencies as they emerge and balance inconsistent statements, you need a truth maintenance system …
  • The semantic web, which enables us to grab individual bits of knowledge from different places and sew them together, might be really awesome. It’s got a very limited representation language (unclear semantics, but easy to parse) and the ability to represent large amounts of knowledge as triples. If it takes off, we should be able to build large-scale knowledge bases out of many small ones.

(I was also very excited by microformats).

But then, in what I thought was the biggest act of intellectual stupidity since the “WS-*” debacle, RDF led to OWL which became OWL-2 and … in short, instead of building large knowledge-bases, or even shareable fragments of them, the knowledge-representation community mostly focused on representational semantics and language design and the Semantic Web vanished in a puff of smoke, leaving only Croatan-like traces behind on github for future generations to ponder.

I still mostly believe in the four bullet points above. We will not get to AI without somehow creating large-scale knowledge bases, and they will be very hard to build and maintain.

However, I’m pretty sure that the Semantic Web makes no sense. At least not in the way I was thinking about it. Sewing together large numbers of small knowledge bases into a coherent whole is hard.

Since I suspect that the software industry will be making a hard-run at General Artificial Intelligence (GAI) in the 2020’s, and I think there will be a fair number of people trying to revive Semantic-Web-Style ideas, I think it’s worth explaining why it makes no sense in a little more detail.

Continue reading with Part II.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.