Data Layout and Modeling Guide
Introduction
This guide is the central source of truth for structuring your data before sending it to Luigi's Box. Proper data modeling is the single most important factor for achieving high-quality search results and recommendations.
The concepts and conventions described here apply to all indexing methods, whether you are using the real-time Content Update API or providing periodic Feeds.
The index-object: The core data structure
The fundamental unit of information in Luigi's Box is the index-object. Think of it as a container
for a single, searchable item, such as a product, category, brand, or article.
While the concept is the same for both the API and Feeds, the way you structure the data is different. The API utilizes a nested JSON object, whereas Feeds typically employ a flatter XML or JSON structure.
Every index-object follows this basic structure:
{
"identity": "<unique-identity>",
"type": "<object-type>",
"active_from": "<iso-8601-date>",
"active_to": "<iso-8601-date>",
"fields": {
"title": "<title>",
"web_url": "<web-url>",
"...": "..."
},
"nested": [
{ "...": "..." }
]
}
Top-level parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
identity |
String | ✓ | Unique identifier for the object at the index level. Must match the identity reported by analytics events. See Identity guide. |
type |
String | ✓ | Object type (e.g., item, category, article). Different types can be searched separately. |
generation |
String | Object generation marker for bulk data synchronization. | |
active_from |
String | ISO 8601 date/time when the object becomes searchable (e.g., 2019-05-17T21:12:35+00:00). |
|
active_to |
String | ISO 8601 date/time when the object stops being searchable (e.g., 2019-05-17T21:12:35+00:00). |
|
fields |
Object | ✓ | Object attributes. Every field is searchable and can be used for filtering. Must include a title field. |
nested |
Array | Array of nested objects (categories, variants, etc.) linked to the current object. |
Indexing types
For a typical ecommerce store, you will want to index several types of content. The type parameter
determines how the content is categorized and searched.
| Logical Type | Recommended type Name |
How to Index |
|---|---|---|
| Products |
item or product
|
Index as a standalone object. This is your primary content. |
| Categories | category |
Index as a nested object along with the product it belongs to. See Nested categories. |
| Brands | brand |
Index as a nested object along with the product it belongs to. |
| Articles / Blog posts | article |
Index as a standalone object. |
Structuring the fields object: Core principles
The fields object is hwere you will define most of your searchable data.
Field naming conventions
- Use descriptive, human-readable names (e.g, "Screen Size", "color").
- Avoid using dots (
.) or brackets ([]) in your field names. These characters can interfere with data access.
Data types
Luigi's Box automatically infers the data type (text, numeric, boolean, date) from the first
value it sees for an attribute. Once set, the type for a given field cannot be changed via the API.
For dates, always use the ISO 8601 format (e.g., 2025-10-14T10:00:00Z).
Using arrays
You can provide multiple values for any attribute by using an array. This is useful for things like tags, available colors, available sizes and so on.
"fields": {
"color": ["Red", "Black", "Blue"],
"tags": ["New Arrival", "Eco-Friendly"],
"size": ["S", "M", "L", "XL"]
}
Special fields
Certain field names have special, built-in behaviors that power ranking, filtering, and display logic. Using them is key to unlocking the full potential of the platform.
Core display fields
| Field Name | Type | Required | Description |
|---|---|---|---|
title |
String | ✓ | The primary display name of the object. Used as the title in autocomplete and search widgets. This field is essential for search relevance and should contain the most important identifying information. |
web_url |
String | ✓ | The canonical URL of the object on the web. Used to generate clickable links in search results and autocomplete. Must be a valid, absolute URL (e.g., https://example.com/products/item-123). |
image_link |
String | A URL to the primary image for the object. Used by autocomplete.js, search.js, and recco.js libraries to display product images in search results and recommendations. Should be an absolute URL to a web-accessible image. |
Pricing fields
| Field Name | Type | Required | Description |
|---|---|---|---|
price |
String | A fully formatted price string including currency symbols and locale-appropriate formatting (e.g., "€19.99", "1,232.60 €", "kr12,341", "8 129 zł"). The system automatically extracts a numeric value into price_amount for sorting and filtering. If you use an unusual format or need precise control, explicitly send price_amount. |
|
price_amount |
Number | The numeric value of the price, auto-extracted from price if not provided. Used for sorting, filtering, and range queries. Send this explicitly if your price format is non-standard or if you want to ensure accuracy. |
|
price_old |
String | The original price before discount, formatted like price. Used to calculate and display discount percentages or savings in the UI. Useful for showing "was/now" pricing. |
|
price_old_amount |
Number | The numeric value of the old price, auto-extracted from price_old if not provided. |
|
price_* |
String | Any field starting with price_ (e.g., price_eur, price_usd, price_czk) is treated as a price field. A corresponding _amount field (e.g., price_eur_amount) is automatically extracted unless explicitly provided. Useful for multi-currency stores where you need to support multiple price points. |
Availability fields
| Field Name | Type | Required | Description |
|---|---|---|---|
availability |
Number | A binary availability indicator. Must be 1 (available) or 0 (unavailable). Available results are automatically prioritized in ranking. If omitted, the object is treated as available. |
|
availability_rank |
Number | A more advanced and granular version of availability. Accepts values from 1 (most available) to 15 (unavailable). Use this for nuanced availability states like "low stock" (e.g., 3), "backorder" (e.g., 8), or "out of stock" (e.g., 15). Takes precedence over availability if both are provided. Lower numbers rank higher. |
|
availability_rank_text |
String | The exact availability message to display to users (e.g., "Ships within 14 days", "Only 2 left in stock", "Pre-order", "Usually ships in 24 hours"). This field does not affect ranking but is used for frontend display. |
Ranking signal fields
| Field Name | Type | Required | Description |
|---|---|---|---|
boost |
Number | A manual ranking boost for the object. Accepts values 1, 2, or 3, where higher values increase the item's position in search results. Use sparingly for promotional items, featured products, or seasonal highlights. Overuse can diminish search relevance. |
|
introduced_at |
Date | The date when the product was added to your catalog (ISO 8601 format, e.g., 2025-10-14T10:00:00Z). Used as a ranking signal to prioritize newer items. Particularly useful for fashion, electronics, or time-sensitive content where recency matters. |
|
_margin |
Number | A hidden field representing the item's relative profit margin. Must be a float between 0.0 and 1.0 (e.g., 0.42 = 42% margin). Used internally as a ranking signal to prioritize higher-margin products in search results. Not exposed in API responses. Prefix with underscore to keep it hidden. |
Pattern-based special fields
| Field Name | Type | Required | Description |
|---|---|---|---|
_* |
Any | Any field starting with an underscore (e.g., _internal_note, _supplier_id, _cost) is treated as hidden. These fields are fully searchable and can be used for filtering, but are never exposed in public API responses. Useful for internal metadata, ranking signals, sensitive information, or business logic that shouldn't be visible to end users. |
|
geo_* |
Object | Any field starting with geo_ (e.g., geo_location, geo_store, geo_warehouse) is treated as a geographical location point. Value must be an object with lat and lon properties: {"lat": 49.0448, "lon": 18.5530}. Use geo_location as the standard field name for the primary location. Enables location-based search, filtering, and distance calculations. |
Reserved Fields
| Field Name | Description |
|---|---|
_category |
Reserved for internal use. Do not use this field name in your data. The system uses it for internal category processing and hierarchy management. Using this field may cause unexpected behavior. |
Special fields: Complete example
Here's a comprehensive example showing how to use multiple special fields together:
{
"identity": "premium-headphones-2024",
"type": "item",
"fields": {
"title": "Premium Wireless Headphones",
"web_url": "https://example.com/products/premium-headphones",
"image_link": "https://cdn.example.com/images/headphones.jpg",
"price": "€249.99",
"price_old": "€299.99",
"price_usd": "$279.99",
"availability": 1,
"availability_rank": 2,
"availability_rank_text": "Only 3 left in stock",
"boost": 2,
"introduced_at": "2024-09-15T08:00:00Z",
"_margin": 0.35,
"color": ["Black", "Silver"],
"brand": "AudioTech",
"_supplier_id": "SUPP-8842",
"_cost": 150.00,
"geo_location": {
"lat": 40.7128,
"lon": -74.0060
}
}
}
Advanced data modeling: Using the nested array
The nested array allows you to model complex relationships between your objects, such as categories,
brands, and product variants.
Nested categories / ancestors
Most often, products belong to a category which is part of a hierarchy (e.g., a "White T-Shirt" product belongs to the "T-Shirts" category, which is under "Men", which is under "Apparel").
To correctly represent a product's full category path (e.g., "Apparel > Men > T-shirts"), you must
provide the complete hierarchy. This is essential for powering user-facing features like breadcrumb
navigation and hierarchical faceting.
This is achieved by sending the most specific category that a product belongs to (the "leaf" category)
as a nested object, and then listing all of its parent categories in a special ancestors array.
Conceptual model
Imagine a product, a "White T-shirt", has a category path of "Apparel > Men > T-shirts".
-
The leaf category: The most specific category the product belongs to is "T-shirts".
This will be your primary
nestedobject. -
The ancestors: The parent categories that form the path to this leaf are "Apparel" and "Men".
These will go into the
ancestorsarray. -
The order: The order is crucial.
Theancestorsarray must be ordered from top-level down to immediate parent:
- First ancestor: "Apparel"
- Second ancestor: "Men"
Implementation
Based on the model above, here is how you would structure the JSON payload.
Rule 1: The nested object is the leaf category
The nested array should contain an object for the most specific category, in this case, "T-shirts".
Rule 2: The ancestors array defines the path
Inside the fields of the "T-shirts" category, you add the ancestors array.
Rule 3: The ancestors array must be in order
The array must list "Apparel" first, followed by "Men".
Example: Product in a single hierarchy
{
"objects": [
{
"identity": "74f5cdd860b5d9585b18edfab7c21670",
"type": "item",
"fields": {
"title": "White T-shirt",
"web_url": "/products/1"
},
"nested": [
{
"type": "category", // This is the LEAF category (most specific)
"identity": "category-t-shirts",
"fields": {
"title": "T-shirts",
"web_url": "/categories/apparel/men/t-shirts",
"ancestors": [
{
"type": "category", // FIRST ancestor (top-level parent)
"identity": "category-apparel",
"fields": {
"title": "Apparel",
"web_url": "/categories/apparel"
}
},
{
"type": "category", // SECOND ancestor (immediate parent of T-shirts)
"identity": "category-men",
"fields": {
"title": "Men",
"web_url": "/categories/apparel/men"
}
}
]
}
}
]
}
]
}
Multiple category hierarchies
If a product belongs to more than one category path (e.g., "Cheddar cheese" is in both
"Dairy > Cow milk" and "Wine > Snacks"), provide a nested object for each leaf category,
with each one containing its own respective ancestors path.
Example: Product in multiple hierarchies
{
"objects": [
{
"identity": "5e119a13ec6511e323bfdc41cd181fdb",
"type": "item",
"fields": {
"title": "Cheddar cheese",
"web_url": "/products/1"
},
"nested": [
{
// FIRST CATEGORY PATH: Dairy > Cow milk
"type": "category", // LEAF category for first path (Cow Milk)
"identity": "1692378648",
"fields": {
"title": "Cow milk",
"image_link": "/images/cow-milk.png",
"ancestors": [{ // Path: Dairy → Cow milk
"type": "category", // Top-level parent for first path (Dairy)
"identity": "category-dairy",
"fields": {
"title": "Dairy",
"web_url": "/categories/dairy",
"image_link": "/images/dairy.png"
}
}]
}
},
{
// SECOND CATEGORY PATH: Wine > Snacks
"type": "category", // LEAF category for second path (Snacks)
"identity": "category-snacks",
"fields": {
"title": "Snacks",
"web_url": "/categories/wine/snacks",
"image_link": "/images/snacks.png",
"ancestors": [{
"type": "category", // Top-level parent for second path (Wine)
"identity": "category-wine",
"fields": {
"title": "Wine",
"web_url": "/categories/wine",
"image_link": "/images/wine.png"
}
}]
}
}
]
}
]
}
If you are integrating Product listing, see searching within full category hierarchy to make sure you get the best results.
Nested Variants
For products that come in different variations (e.g., by size or color), you can index them as nested objects with the type set to "variant". Each variant must have its own unique identity. This allows the search to group variants and display the most relevant one.
{
"identity": "tshirt-main-product",
"type": "item",
"fields": { "title": "Premium T-shirt" },
"nested": [
{
"type": "variant",
"identity": "tshirt-main-product-red-m",
"fields": {
"title": "Red T-shirt - M",
"color": "Red",
"size": "M",
"web_url": "/products/tshirt?variant=red-m"
}
},
{
"type": "variant",
"identity": "tshirt-main-product-blue-l",
"fields": {
"title": "Blue T-shirt - L",
"color": "Blue",
"size": "L",
"web_url": "/products/tshirt?variant=blue-l"
}
}
]
}
Post-indexing information
Searchability and visibility
- Searchable by default: Every field you send is automatically searchable.
-
Hidden fields: To index an attribute for internal use (like for ranking) but prevent it from appearing
in public API responses, prefix its name with an underscore (e.g.,
_margin).
Output data structure
- Arrays by default: In API responses, all field values are returned as arrays, even if you indexed a single scalar value. This simplifies frontend development by eliminating the need to check the data type.
Derived fields
The system automatically creates some fields for you during processing (e.g., category_lvl_1,
category_lvl_2,..., category_lvl_5). You do not need to send these in your indexing requests.
Additional Resources
- Content update API documentation
- Feeds documentation
- Identity
- Product listing pages
- Category hierarchy best practices