Blog 3: Mastering Elasticsearch Indexing and Mapping with Python

3 min readDec 6, 2023

Welcome to the third installment of our series, “Mastering Elasticsearch with Python: A Beginner’s Guide.” In this post, we’re delving deeply into the world of Elasticsearch indexing and mapping. We will cover various aspects of indexing, including creating indices with custom mappings, understanding field types, exploring analysis, and implementing bulk operations in Python.

Part 1: Introduction to Elasticsearch Indexing

Understanding an Index in Elasticsearch

Definition: An index in Elasticsearch is a collection of documents with similar characteristics. It’s a key concept in Elasticsearch, analogous to a database in traditional relational databases.
Purpose: Indices provide an organized structure to store and retrieve data efficiently.

Index Creation Basics

Creating an Index: Indices are created to store documents. Each index has settings and mappings.

Part 2: Detailed Mapping and Analysis

Mapping in Depth

What is Mapping?: Mapping is the process of defining how a document and its fields are stored and indexed. Think of it as defining the schema for a table in a SQL database.

Field Types in Mapping

Text vs. Keyword:
text fields are analyzed and tokenized, ideal for full-text search.
keyword fields are not analyzed and are used for exact matches, aggregations, and sorting.
Other Common Field Types:
date: For date and time.
long, integer: Numeric data types.
boolean: True or False values.
nested: For nested objects or arrays.

Custom Mapping Creation

Example: Creating an index with a specific mapping.

mapping = {
    "mappings": {
        "properties": {
            "title": {"type": "text"},
            "published_on": {"type": "date"},
            "content": {"type": "text"},
            "tags": {"type": "keyword"}
        }
    }
}

es.indices.create(index='my_articles', body=mapping, ignore=400)

Analysis: The Heart of Full-Text Search

Understanding Analysis: Analysis is the process that converts text into tokens or terms stored in an inverted index for searching. It involves two stages: tokenization and text analysis.

Tokenizers and Analyzers

Tokenizers: Splits text into individual terms or tokens.
Analyzers: A combination of tokenizers and filters. Elasticsearch offers built-in analyzers like standard, simple, whitespace, and more.

Creating Custom Analyzers

Example: Creating an index with a custom analyzer.

custom_analyzer = {
    "settings": {
        "analysis": {
            "analyzer": {
                "my_custom_analyzer": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": ["lowercase", "asciifolding"]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "content": {
                "type": "text",
                "analyzer": "my_custom_analyzer"
            }
        }
    }
}

es.indices.create(index='custom_analysis', body=custom_analyzer, ignore=400)

Part 3: Advanced Indexing Techniques

Bulk Indexing for Efficiency

Why Bulk Indexing?: Bulk indexing is a method to index multiple documents in a single request, which is more efficient than individual indexing, especially for large datasets.

Implementing Bulk Indexing in Python

Code Example:

from elasticsearch.helpers import bulk

documents = [
    {"_index": "my_articles", "_id": 1, "_source": {"title": "Elasticsearch Basics", "published_on": "2023-03-01", "content": "Learning Elasticsearch", "tags": ["search", "database"]}},
    # Add more documents here
]

success, _ = bulk(es, documents)
print(f"Successfully indexed {success} documents")

Retrieving and Updating Mapping

Get Current Mapping:

current_mapping = es.indices.get_mapping(index='my_articles')
print(current_mapping)

Updating Mapping: Note that some aspects of mapping, like field types, cannot be changed after creation without reindexing.

Part 4: Conclusion

We’ve explored the intricate details of Elasticsearch indexing and mapping in this blog. Understanding these concepts is crucial for effectively using Elasticsearch’s powerful search capabilities.

Up Next: In our next blog, we’ll delve into constructing complex search queries in Elasticsearch and how to harness its full-text search capabilities using Python.

https://kshitijkutumbe.medium.com/blog-4-elasticsearch-query-deep-dive-text-keyword-and-vector-searches-with-python-e42586003abd

Key Takeaways:

Indexing in Elasticsearch is a multifaceted process, crucial for organizing and searching data.
Mapping defines how data is stored and indexed, with various field types and custom analyzers enhancing search capabilities.
Bulk operations are efficient for large-scale indexing tasks.

Practice these concepts and feel free to share your experiences or queries in the comments below. Happy indexing!