Analytics
Analytics is the fundamental part of Luigi's Box services. It is not only designed to let you see the performace of individual services (such as Search or Recommender), but it collects the user interaction with your website, which are then fed to the models which drive ranking and product selection.
Luigi's Box analytics is designed to be collected at the "interaction site", independed of the API calls you make to get the products. This usually means the frontend of your website/application. Note that calling the search/recommender API will not record that search/recommendation in analytics automatically, the analytics event has to be sent separataly by the API consumer. Also note that if you use Luigi's Box frontend libraries, they will send the analytics events automatically.
When should you care about implementing Luigi's Box analytics? When you are integrating on the backend, or on the frontend not using Luigi's Box libraries.
You can opt for a javascript-based tracking, where you need to push events to the dataLayer which describe the products that users see, or you can send events directly to our REST API, which you will most likely use if you prefer to track searches from your backends or you need to track a special environment, where javascript is not an option (e.g., mobile apps).
Past and offline transactions import
Read the docs →JSON+LD (deprecated)
Read the docs →Identity and the feedback loop
One of the primary goals of analytics is to provide feedback to various AI models inside Luigi's Box.From the perspective of the feedback loop, it is vital that you identify the products using the same identifier that you use when indexing the data. Object identity is something that you should decide early on, because when you change the identity, the models will forget everything that they learned about that object historically.
Concepts
Regardless of which integration you choose, there are several concepts that you need to udnerstand and which form a complete information about the user interaction.
Concept | |
---|---|
List | List of products that the user saw |
Query | what the user typed into your searchbox |
Filters | additional restrictions used, e.g. search only in "Accessories" department |
Search results | specifically title, URL address, position in the list of results and price (if applicable). |
Conversion intent | did the user add an item to the shopping cart? |
List
Lists are a generic group of products that were presented to the user. They will typically be on of: autocomplete, search results, recommendations or product listing.
Query
Query is what the user typed into the search box to get search results.
It is important that you don't encode the query in any way (e.g., you should do no percent-encoding, or whitespace trimming). Use excactly the same query string that the user typed, even if you do some internal preprocessing before using the query for search.
Note, that a valid search may not have a query at all. E.g., imagine a scenario when using an advanced search and the user types no query, but chooses to search for all products from a certain brand. In this case you would use empty query with filters.
See below for filters explanation.
Filters
Anything that influences which products are displayed should be annotated as filters. Sorting, or facets, or sometimes even the category have effect on what is displayed.
Luigi's Box only cares about active filters. When you are showing a facet where a user can limit phones by the manufacturer, but the user did not select anything yet, it's not a filter and you should not annotate it.
Take a look at the picture above. There are 4 filters:
-
Color, Price, Brand and Sort by. Notice that the Brand filter is unused so there is no need to annotate it. In this case, you should send:
- Color: Black,
- Price: 20-800,
- Sort: Price.
Note that some filters are implicit and not visible or modifiable by the user.
Imagine a scenario where your users are assigned specific access levels and you are limiting the search results to only show results the user can access. In this case, you should send the access level as a filter.
Search results
The relevant information about search result is:
Title — this will usually be the same title as you are showing to the user, e.g. a product title. The title is not required to be unique (and in many cases won't, but that's ok).
URL — the canonical link to the search result (e.g. a product). It should be valid URL, including http/https protocol. The URLs should be unique, and each URL should link to exactly one product. In our experience, the most common problem that we see is that the URLs are not canonical. The URL should only contain enough parameters that it still links to the product and nothing more. Some examples of URLs that violate this requirement are:
URL | |
---|---|
https://www.example.com/products/black-shirt?ref=search | notice the ref parameter which is not necessary for the link to work and is only used for analytical purposes. |
https://www.example.com/products/black-shirt?q=shirt&page=2 | notice the q and page parameters which are not necessary for the link to work and are only used to construct a link back to search. |
https://www.example.com/products/black-shirt?gclid=283h1bxz81jzgj | notice the gclid parameter which is not necessary for the link to work and is only used for tracking purposes. |
https://www.example.com/products/black-shirt✔ | This is the correct canonical URL that should be used in all cases |
- Position — a number indicating a search result position in the full list of search results, considering pagination.
- Price (optional) — price of the product. This may not always be available, e.g., the product is not in stock anymore, or the search result is a blog post, which is not sellable and thus has no price. It's ok to not annotate price if the search result does not have a price (i.e. the search result is an article and not a product).
Luigi's Box only cares about search results that are visible. Imagine a scenario where your search found thousands of search results, but you only present a list of first 20 search results, along with a pagination component, which lets the user scroll through the search results and view additional results.
In this case, you should only annotate the 20 visible search results, and ignore the rest. The first result will have position 1, the last result position 20. When the user clicks through to page 2, annotate the visible results, and the first result will have position 21, the last one position 40.
Conversions
Everything that is important to you from the business perspective can be tracked as “conversion”, regardless of whether it is an action of “buying” an item, “liking” it, “favoriting” it, or anything else.
There are usually many different types of conversions found at different places.
It is usually possible to convert both from the list of search results and the product detail itself. You should annotate both conversion actions.
It's ok to have several conversion actions for the same product, e.g. buy a product, or add a product to favorite list. Just make sure to give each conversion a different name.
Autocomplete
The process of annotating autocomplete results is the same as with the regular Search Results. Here are the notable differences:
- Autocomplete is usually not paginated so you can safely skip the position annotations
- In case your Autocomplete results don't show prices you can safely skip the price attribute.
- Sometimes autocomplete can have filters too. See the image below for a an example. In this case, you must annotate this filter, e.g. Category: Phones.
Product listings
Any display of product list which is of interest can be considered a “search”. Therefore, if the user clicks a menu element and is taken to the list of products which he/she can manipulate through filters and/or sort, this is considered a “search” too (albeit without a query, only with filters, if any) and you should send it to Luigi's Box to be analyzed. It makes sense to send the category name as a filter.
Recommendations
The process for annotating recommendations is similar to annotating Search Results. Here are the notable differences:
- There is no query
- Send the recommendation type in filters using the special
RecommenderModel
filter
Context
The context in an analytics event refers to a specific business situation which should be considered when applying the feedback to the models. By setting a context, you are creating a separate version of the ranking model to be used specifically for search or autocomplete requests made in that context.
The context will be usually set to a specific warehouse or a location, with the assumption that the user behavior varies significantly in different contexts.
Setting a new context forces creation of a new model, in effect taking data away from other models. Before starting to use the context feature, make sure that:
- the user behavior varies across contexts
- you are not creating too many models (contexts)
The context takes form of a key/value pair, where each key/value pair maps to a single model. When you use context "warehouse": "Berlin"
for a portion of the analytics events and "warehouse": "Munchen"
for another portion, 2 distinct and independent models will be created.
Note that since contexts are an advanced feature, the context is currently only supported when pushing analytics events via API.
See the multi-warehouse solution for more information.