Identity
The concept of object identity is important for proper functioning of the Luigi's Box feedback loop.
The "object" in "object identity" refers to a particular content type on your web such as products, categories, brands or articles. These are the most frequent types and depending on your specifics and business requirements, you may have fewer or more types.
The "identity" refers to a unique identifier for an object by which that object can be unambiguously identified. The identity must be unique across all types. If you use the same identity for several objects, the later object will overwrite data of the former object sharing the same ID. Note that you can use both numerical or textual identities.
Object identity is important in two processes in Luigi's Box.
Object identification for updates
Object identities serve as IDs in all of the Luigi's Box data stores that we use. When you use the Indexing APIs to send data for the object identified with an identity, we first look up an object using that identity. If such object exists, replace it is replaced. If it does not exist a new object with that identity is created.
Because of this mechanism, it is very important that you are using immutable identities that persist over the full lifetime of the object. Depending on your implementation, choosing a mutable identity may lead to duplicities in the data. For example, if you use the URL as an object identity and that URL changes, you may inadvertently index a duplicate object by leaving the old object version with the original URL in the index without removing it. While dealing with mutable identities is technically challenging, it is best avoided by choosing an immutable object identity, such as your internal product ID or SKU.
Pairing
Object identities are used to match analytics data with the catalog data. All of the user interactions with the objects provide signals to the models, e.g., if an object ends up in a purchase event, that event provides a strong signal for a variety of models. For this feedback loop to work, the object identity in analytics has to match the identity in the catalog.
From the perspective of the feedback loop, it is vital that you identify the products using the same identifier that you use when indexing the data. Object identity is something that you should decide early on, because when you change the identity, the models will forget everything that they learned about that object historically. You can check your pairing in the Luigi's Box application in the Catalog browser > Data quality checks section.
You should aim for having a non-zero pairing stats in application. Having low pairing numbers usually indicates that your analytics and catalog identities are not aligned.
Migrating from URLs to identities
Historically, we've been recommending and using URLs as object identities because they were very easy to collect and understand. Over time, using URLs has caused more hassle than the benefit, since the URLs keep changing. When the URL of the product changes, it will cause the product to unpair from any historically collected data and drop in ranking.
If you are using URLs as object identites and would like to migrate to using immutable identifiers such as product codes, get in touch with our support. The process will involve:
Model migration
- If your catalog data contains the attribute which will be promoted to identity, we will rebuild the models using this attribute and not input is necessary on your side.
- If the catalog does not contain the attribute which will be promoted to identity yet, we will ask you to provide us with a mapping CSV file with 2 columns - URL and object identity. We will use this file to migrate the models in the background so you do not lose any historical data.
Updating analytics data collection
- If you are reporting analytics using the Events API you will need to update the API calls to report based on the identities instead of URLs.
- If the analytics is based on the JavaScript collector, you will need to expose the object identities on the frontend. See Product indentification guide for more details.
Reindexing catalog from URLs to identities
- If you are indexing data using the Content Updates API you will reindex the data using the identities. We recommend to reindex product by product, first delete the product by its URL and then index it again based on its identity. This way you will avoid having duplicities in your catalog.
- If you are indexing data using Feeds, we will manage the reindex for you.
Basic properties of the identity
To sum up, these are the basic properties of the identity:
- Unique across all types
- Immutable
- Numerical or textual
- Use consistently in analytics and catalog data