Semantic Web
The Semantic Web is a vision for linking data across webpages, applications and files. Some people consider it part of the natural evolution of the web, in which Web 1.0 was about linked webpages, Web 2.0 was about linked apps and Web 3.0 is about linked data. It was actually part of computer scientist Tim Berners-Lee's original plan for the World Wide Web but was not practical to implement at scale at the time.
The grand vision is that all data will someday be connected in a single Semantic Web. In practice, today's semantic webs are fractured across specialized uses, including search engine optimization (SEO), business knowledge management and controlled data sharing.
In SEO, all major search engines now support Semantic Web capabilities for connecting information using specialized schemas about common categories of entities, such as products, books, movies, recipes and businesses that a person might query. These schemas help generate the summaries that appear in Google search results.
In the case of business knowledge management, companies can use various tools to curate a semantic network (knowledge graph), a graphical representation of entities and their relationships, from information scraped from corporate documents, business services and the public web. This can improve planning, analysis and collaboration in the organization.
This article is part of
What is Web 3.0 (Web3)? Definition, guide and history
Controlled data sharing is more aspirational and experimental. The core idea is that individuals and businesses could create a secure data repository about themselves and their interests and then share links to this data with trusted parties, such as businesses, doctors or government agencies.
Elements of the Semantic Web framework
Berners-Lee proposed an illustration or model called the Semantic Web Stack to help visualize the different kinds of tools and operations that must come together to enable the Semantic Web. The stack can help developers explore ways to go from simply linking to other webpages to linking data and information across webpages, documents, applications and data sources.
The following is a breakdown:
- At the bottom of the stack, raw data can be expressed using Unicode text characters. Uniform Resource Identifiers (URIs) can provide links to data within a given page.
- Next up, the Extensible Markup Language (XML) is often used to structure information in pages in a machine-readable format.
- Above that, the Resource Description Framework (RDF) provides a standard way for describing entities, properties and the relationships between them for data exchange.
- The Web Ontology Language (OWL) formalizes a way to represent knowledge about and between entities. It works with the W3C's Rule Interchange Format (RIF) to describe things that are harder to formalize.
- The SPARQL query language can search data stored across different sources and work with OWL and RDF to find information.
- Other technologies also need to work with these core semantic processing services to secure data, create an audit trail to enforce trust and provide a user experience.
Potential Semantic Web uses
There are several actual and potential applications of the Semantic Web, including the following:
SEO is the most common use today. A website owner or content creator adds linked data tags according to standard search engine schemas, which makes it easier for search engines to automatically extract data about, for example, store hours, product types, addresses and third-party reviews. The Rotten Tomatoes website enhanced click-through by 25% when it added structured data.
Auto summarization. Websites and third-party apps can use tagged data to automatically pull specific types of information from various sites into summary cards. For example, movie theaters can list showtimes, movie reviews, theater locations and discount pricing that shows up in searches.
Sharing product details across the supply chain using GS1 Web Vocabulary. This allows manufacturers and wholesalers to automatically transmit information about foods, beverages and other consumer products in a computer-accessible manner. It makes it easier for websites to list nutrition labels, sizes, allergy information, awards, expiration dates and availability dates with grocery stores and online shops that may sell a product.
Standardizing skill taxonomies. Learning platforms, job websites and HR teams may all use different terms to describe job skills. Increasingly, enterprises use Semantic Web technologies to translate different ways of describing skills into a standard taxonomy. This can help teams broaden their applicant search and improve the training programs they develop for employees.
Provide controlled data access. Consumers often fill out dozens of forms containing the same information, such as name, address, Social Security number and preferences with dozens of different companies. If these organizations are breached, the data is lost to hackers. To address these problems, Berners-Lee's company, Inrupt, is working with various communities, hospitals and governments to roll out secured data pods built on the Solid Open Source protocol that allows consumers to share access to their data.
Digital twin data sharing. Several vendors, including Bentley and Siemens, are developing connected semantic webs for industry and infrastructure that they call the industrial metaverse. These next-generation digital twin platforms combine industry-specific ontologies, controlled access and data connectivity to let users view and edit the same data about buildings, roads and factories from various applications and perspectives.
How is the Semantic Web related to Web 3.0?
The Semantic Web is often called Web 3.0. Berners-Lee started describing something like the Semantic Web in the earliest days of his work on the World Wide Web starting in 1989. At the time, he was developing sophisticated applications for creating, editing and viewing connected data. But these all required expensive NeXT workstations, and the software was not ready for mass consumption.
The popularity of the Mosaic browser helped build a critical mass of enthusiasm and support for web formats. The later development of programmable content in JavaScript, which soon became the standard for browser-based programming, opened opportunities for content creation and interactive apps.
Then Tim O'Reilly, founder and CEO of O'Reilly Media, popularized the term Web 2.0 with a conference of the same name. However, Web 2.0 still did not formalize a way to describe the data on a page, the defining capability of the Semantic Web. Meanwhile, Berners-Lee continued his quest to connect data through his work at the World Wide Web Consortium.
The convention of referring to the Semantic Web as Web 3.0 later began to take hold among influential observers. In 2006, journalist John Markoff wrote in The New York Times that a Web 3.0 built on a semantic web represented the future of the internet. In 2007, futurist and inventor Nova Spivak suggested that Web 2.0 was about collective intelligence, while the new Web 3.0 would be about connective intelligence. Spivak predicted that Web 3.0 would start with a data web and evolve into a full-blown Semantic Web over the next decade.
Gavin Wood coined the term Web3 in 2014 to describe a decentralized online ecosystem based on blockchain. Inrupt, which has continued some of Berners-Lee's pioneering work, argues that the Semantic Web is about building Web 3.0, which is distinct from the term Web3. The main point of contention is that Web3's focus on blockchain adds considerable overhead. In contrast, Inrupt's approach focuses on secure centralized storage that is controlled by data owners to enforce identity and access control, simplify application interoperability and ensure data governance. Proponents claim that these mechanisms add the missing ingredients required for the Semantic Web to evolve from a platform for better searches to a more connected web of trusted data.
Semantic Web limitations and criticisms
The first generation of Semantic Web tools required deep expertise in ontologies and knowledge representation. As a result, the primary use has been adding better metadata to websites to describe the things on a page. It requires the extra step of filling in the metadata when adding or changing a page. Content management systems are getting better at it.
However, this only really simplifies the challenges for SEO. Building more sophisticated Semantic Web applications for combining data from multiple sites is still a difficult problem made more complex by using different schemas to describe data and creative differences in how individuals describe the world.
Semantic analysis for identifying a sentence's subject, predicate and object is great for learning English, but it is not always consistent when analyzing sentences written by different people, which can vary enormously. Things can get more convoluted when it comes to popular buzzwords that can mean different and sometimes contradictory things. For example, while scientists all seem to agree a quantum leap is the smallest change in energy an atom can make, marketers all seem to think it is pretty big.
The other major challenge is building trust in the data represented in a semantic web. It is becoming increasingly important to know not only what is written on a page but who said it and what their biases might be. A recommendation from a highly respected site like Consumer Reports will likely have a different weight than one from SpamBob432 on Amazon. Efforts to provide an audit trail for the Semantic Web could help not only connect the data but also help understand data quality and the level of trustworthiness as well.
The future of the Semantic Web
An increasing number of websites automatically add semantic data to their pages to boost search engine results. But there is still a long way to go before data about things is fully linked across webpages. Translating the meaning of data across different applications is a complex problem to solve.
Innovations in AI and natural language processing might help bridge some of these gaps, particularly in specific domains like skill taxonomies, contract intelligence or building digital twins. Increasingly, the future may involve a hybrid approach combining better governance of the schemas an organization or industry uses to describe data and AI and statistical techniques to fill in the gaps. Getting closer to the original vision of a web of connected data will require a combination of better structure, better tools and a chain of trust.