The most surprising thing about Weaviate schema configuration is that it’s not about defining what data you have, but rather what relationships your data can express.
Let’s see how that plays out. Imagine we’re building a knowledge graph for movies. We want to store movies, actors, and directors, and link them together.
Here’s a snippet of a Weaviate schema in JSON:
{
"classes": [
{
"class": "Movie",
"description": "A film or television production",
"properties": [
{
"name": "title",
"dataType": ["text"],
"description": "The official title of the movie"
},
{
"name": "releaseYear",
"dataType": ["int"],
"description": "The year the movie was released"
},
{
"name": "directedBy",
"dataType": ["Person"],
"description": "The person who directed the movie"
},
{
"name": "actedIn",
"dataType": ["Person"],
"description": "The actors who acted in the movie"
}
]
},
{
"class": "Person",
"description": "An individual who participated in a movie",
"properties": [
{
"name": "name",
"dataType": ["text"],
"description": "The full name of the person"
},
{
"name": "born",
"dataType": ["date"],
"description": "The date of birth of the person"
},
{
"name": "directed",
"dataType": ["Movie"],
"description": "Movies directed by this person"
},
{
"name": "actedIn",
"dataType": ["Movie"],
"description": "Movies this person acted in"
}
]
}
]
}
When you create this schema, Weaviate sets up the underlying data structures. The Movie class will have fields for title (text), releaseYear (integer), and importantly, directedBy and actedIn. These last two are not simple text fields; their dataType is set to ["Person"]. This tells Weaviate that these properties will link to objects of the Person class. Similarly, the Person class has directed and actedIn properties that link back to Movie objects.
Think of it like defining the tables and foreign keys in a relational database, but much more flexible and graph-oriented. When you add a Movie object, say, "Inception" released in 2010, and you want to link it to a Person object representing Christopher Nolan, you don’t just store Nolan’s name in the directedBy field. Instead, you provide the UUID of the Person object for Christopher Nolan. Weaviate then establishes a direct, traversable link between the "Inception" movie and the "Christopher Nolan" person.
This is where the "relationship" focus becomes clear. Weaviate isn’t just storing movie titles and actor names; it’s storing the connections: who directed which movie, who acted in which movie, and vice-versa. This allows for powerful graph traversals. You could ask Weaviate: "Give me all movies directed by people who also acted in at least one movie released after 2000." This kind of query is complex in traditional databases but natural in Weaviate due to its schema design.
The dataType can be a single class name (like ["Person"]) or a list of class names if a property can link to multiple types of objects (though this is less common for direct links and more for union types). You can also specify primitive types like text, int, float, boolean, date, dateTime, geoCoordinates, uuid, blob, and string.
The invertedIndex property, which is true by default for most data types, is crucial for search performance. When invertedIndex is true, Weaviate builds an inverted index for that property, allowing for fast filtering and searching. For reference properties (like directedBy linking to Person), the inverted index is always enabled implicitly to support efficient traversal.
Consider this: if you define a property like actedIn on the Movie class with dataType: ["Person"], and then on the Person class you define a reciprocal property actedIn with dataType: ["Movie"], Weaviate automatically understands this bidirectional relationship. When you add a link from a movie to a person, Weaviate can optionally automatically create the reciprocal link on the person object pointing back to the movie. This is managed by setting the has property in the schema for cross-references. For example, on the Movie class, the directedBy property would have has: "directed" which corresponds to the directed property on the Person class. This is how Weaviate maintains graph consistency and enables efficient "back-traversals."
The schema also dictates how your data is indexed for search. For text properties, you can specify indexFilterable: true and indexSearchable: true. indexFilterable allows you to use the property in where filters (e.g., releaseYear = 2010), while indexSearchable allows you to search within the text content of the property using full-text search capabilities.
You can also define tokenization and vectorization settings directly in the schema. For instance, you can tell Weaviate to use a specific vectorizer for a text property or to disable vectorization entirely for certain fields if they are not intended for semantic search. This level of control allows you to fine-tune how each piece of data is indexed and made searchable, impacting both performance and the quality of search results.
If you have a property that is an array of primitive types, like genres: ["Sci-Fi", "Action"], you can specify tokenization: "word" or tokenization: "field". word tokenization splits the array elements into individual words (so "Sci-Fi" becomes "sci" and "fi"), while field tokenization treats each element as a distinct token. This impacts how these array values are indexed and searched.
The next logical step after defining your core classes and properties is to explore how to manage data imports and schema evolution without downtime.