Onboarding: Data Pipeline & Indexing sprint - Onboarding

Data design

In this sprint, you will design your data pipeline, including what data to sync to Algolia and how often you update it. By the end of this sprint, you will have all of our data live in Algolia. Once you have completed the tasks below, you can move onto the next sprint.

Data Pipeline & Indexing image — Register for Webinars and view the recordings

Team members

Depending on the size of your company, some of these roles may be the same person. This sprint it is important we identify these roles and get in contact with them.

Project Manager

Planning and project oversight

Systems Architects

Analyze and design IT components

Back End Engineers

Build application business logic, server scripts and API’s

First, you should review the mockups created in the previous sprint and identify alldata types included. For instance, are articles, products, FAQ included in the mockups? Any data type you want to be searchable must have its own Algolia index.

Within the data types to index there are four attribute types to include: searchable data, filterable data, display data, and business data.

Business data includes any metrics that are important to how you want the results to be ranked. Examples include: number of clicks, sales margin, date released, distance from user etc.

Now that you know all of the data types you want to sync into Algolia, you need to think about how you want to structure this data and how often it needs to be updated.

Best practices are covered in this webinar, including how to handle different ranking strategies such as ‘sort bys’. it’s likely the data you have requires some transformation. Some use cases, such as handling multiple languages also require specific indexing strategies.

At this point, it makes sense to create a system diagram of what will pass between systems and how often.

The tools you use to build your pipeline depend on the systems you are pulling data out of. For each data type, identify the system you need access to and check the relevant section below.

Out of the box connectors

Algolia supported:

Shopify

The first step of implementing a Shopify integration is setting up a full reindex. Once you have validated your Algolia account you can trigger this straightaway. This creates three indices; products, pages and collections. If you want to enrich these indices further with data from an API or 3rd party system you can utilize metafields. If you want to enrich them with data directly managed in Shopify, use named tags. If you have the option, we recommend named tags are recommended as metafields can slow the indexing process.

Adobe Commerce (Magento)

The first step of implementing an Adobe Commerce (Magento) integration is to install the extension. Next, add your credentials, enable indexing, and push your initial data to Algolia. If you need to transform data, then install the CustomAlgolia extension at the same time.

Salesforce Commerce Cloud

The first step of setting the SFCC integration is to download, install, and set up the Algolia cartridge. Please note that you made need to customize the indexing scripts within the cartridge to get specific, non-default data indexed to Algolia.

Community supported:

If you are unable to utilize one of our out of the box integrations, you can check out connector options that are built by third parties and the Algolia community.

Do it yourself

If you are unable to utilize one of our out of the box connectors, you will need to use the API clients to sync the desired data into Algolia. The official Algolia API clients have all indexing methods you need, and are available in PHP, Ruby, Javascript, Python, Swift, Kotlin, Android, .NET, Java, Go and Scala.

Php

Android

Ruby

.NET

Javascript

Java

Python

Golang

Swift

Scala

Kotlin

Laravel

Rails

Symfony

Django

If the system with the required data has an integration point where you can use one of the API clients listed above, this is an optimum point to index to Algolia.

If you can access t changes to the data (deltas), we can use addObjects, partialUpdateObjects, and deleteObjects methods.

If you can only access the entire database, we can use replaceAllObjects.

Algolia Crawler

The Algolia crawler is a good fit for your implementation if you have static, HTML content you want to index. For example, the Algolia Crawler is a great way to index data for a site search implementation. Optimally, you can enrich crawled static content with ranking data from Google Analytics or Adobe Analytics.

You can manage all configuration details, in the Crawler Editor as a JSONfile. Once you’ve set up your startUrls and sitemaps you can run the Crawler and use the path explorer and data analysis to figure out which URLs have been crawled and which haven’t. Then you can update the configuration to ensure all required URLSs are crawled.

Once all required URLs are being crawled you can configure how the Crawler should extract data into records, in a Javascript function within the configuration.

Now that you have your data index to Algolia, the next step is to set some initial relevance configurations so that we can test relevance within the dashboard. The three key settings to configure are searchable attributes, attributes for facetting, and attributes to retrieve.

Data design

Team members

Identify all data that needs to be indexed into Algolia

Design data pipeline

Build data pipeline

Out of the box connectors

Do it yourself

Algolia Crawler

Initial Data Configuration