Why the Semantic Web Is Hard (Conclusion)
Part 1 of this series was about the state of knowledge representation during the early 2000’s, when the Semantic Web was picking up steam. Part 2 introduced the problem of chronic pain, and PEMF therapy, and Part 3 documented an extensive list of (hard) barriers to effectively aggregating the PEMF studies.
Indeed the biggest conclusion from part 3 is that the task of aggregating many knowledge bases into one knowledge base almost certainly requires intelligence. Even when we restrict the domain to managing chronic pain, and even if we restrict the application to meta-analysis of therapeutic studies, we simply cannot aggregate large numbers of small knowledge-bases without already having intelligence at hand.
And, clearly, if we can’t aggregate medical data from a semantic web, the general project I had envisioned (automatically assembling large-scale knowledge-bases from large numbers of disparate sources) is doomed to fail. And that, to the extent that General Artificial Intelligence (GAI) requires such a knowledge base to get going, we’ve got problems.
So what to do? We can either:
- Centralize things. Almost all of the problems mentioned in Part III are really “lots of people, with different levels of knowledge, cultural biases, and goals, are encoding the information.” For PEMF, if we could fund a large-scale single study of PEMF, we would have a starting point. There’d always be more to learn, and we’d still have solve how to add information to an existing body of knowledge (and how to maintain knowledge bases), but we’d have a start. But a large-scale study seems expensive and seems unlikely (simply put, there’s nobody with enough to gain to fund the study. Society as a whole has a lot to gain from understanding how to manage (and even cure) chronic pain, but none of the individual actors do). So, this scenario boils down to: government funded knowledge-base creation (which, in the large-scale sense, is what Cyc was).
- Create a large-scale standardized vocabulary with very loose semantics and then build knowledge-base merging assistants which would enable smaller-knowledge bases to be more easily aggregated by already intelligent beings one-at-a-time. Presumably these tools would have the ability to spot obvious incompatibilities between the data sets.
The first approach seems very unlikely and, in addition, wouldn’t enable us to reuse much of the already-published-or-gathered information. The second approach is, I believe, still too hard to do (it’s also not really in anyone’s interest) but is the eventual point of things like schema.org or OKN.
Since neither of these approaches is feasible, this line of reasoning also implies that GAI is not going to come about because people have encoded knowledge into a knowledge base, or because we’ve aggregated a significant number of knowledge bases into a large-scale knowledge base or, really, because we have any large-scale human-readable knowledge-bases at all.
In short, if GAI happens in the next 100 years, it will be through self-learning systems that aren’t using declarative knowledge or formal representations for most of their knowledge.