Using Compound Indexes

In the previous sections, we’ve been using compound indexes, which are indexes with more than one key in them. Compound indexes are a little more complicated to think about than single-key indexes, but they are very powerful. This section covers them in more depth.

Here, we will walk through an example that gives you an idea of the type of thinking you need to do when you are designing compound indexes. The goal is for our read and write operations to be as efficient as possible—but as with so many things, this requires some upfront thinking and some experimentation.

To be sure we get the right indexes in place, it is necessary to test our indexes under some real-world workloads and make adjustments from there. However, there are some best practices we can apply as we design our indexes.

First, we need to consider the selectivity of the index. We are interested in the degree to which, for a given query pattern, the index is going to minimize the number of records scanned. We need to consider selectivity in light of all operations necessary to satisfy a query, and sometimes make tradeoffs. We will need to consider, for example, how sorts are handled.

Let’s look at an example. For this, we will use a student dataset containing approximately one million records. Documents in this dataset resemble the following:

{
  "_id": ObjectId("585d817db4743f74e2da067c"),
  "student_id": 0,
  "scores": [
    {
      "type": "exam",
      "score": 38.05000060199827
    },
    {
      "type": "quiz",
      "score": 79.45079445008987
    },
    {
      "type": "homework",
      "score": 74.50150548699534
    },
    {
      "type": "homework",
      "score": 74.68381684615845
    }
  ],
  "class_id": 127
}

We will begin with two indexes and look at how MongoDB uses these indexes (or doesn’t) in order to satisfy queries. These two indexes are created as follows:

db.students.createIndex({ class_id: 1 });
{
  "createdCollectionAutomatically": true,
  "numIndexesBefore": 1,
  "numIndexesAfter": 2,
  "ok": 1
}
db.students.createIndex({ student_id: 1, class_id: 1 });
{
  "createdCollectionAutomatically": false,
  "numIndexesBefore": 2,
  "numIndexesAfter": 3,
  "ok": 1
}

In working with this dataset, we will consider the following query, because it illustrates several of the issues that we have to think about in designing our indexes:

db.students.find({ student_id: { $gt: 500000 }, class_id: 54 }).sort({ student_id: 1 }).explain('executionStats');

Note that in this query we are requesting all records with an ID greater than 500,000, so about half of the records. We are also constraining the search to records for the class with ID 54. There are about 500 classes represented in this dataset. Finally, we are sorting in ascending order based on "student_id". Note that this is the same field on which we are doing a multivalue query. Throughout this example we will look at the execution stats that the explain method provides to illustrate how MongoDB will handle this query.

If we run the query, the output of the explain method tells us how MongoDB used indexes to satisfy it:

{
  "queryPlanner": {
    "plannerVersion": 1,
    "namespace": "test.students",
    "indexFilterSet": false,
    "parsedQuery": {
      "$and": [
        {
          "class_id": {
            "$eq": 54
          }
        },
        {
          "student_id": {
            "$gt": 500000
          }
        }
      ]
    },
    "winningPlan": {
      "stage": "FETCH",
      "inputStage": {
        "stage": "IXSCAN",
        "keyPattern": {
          "student_id": 1,
          "class_id": 1
        },
        "indexName": "student_id_1_class_id_1",
        "isMultiKey": false,
        "multiKeyPaths": {
          "student_id": [],
          "class_id": []
        },
        "isUnique": false,
        "isSparse": false,
        "isPartial": false,
        "indexVersion": 2,
        "direction": "forward",
        "indexBounds": {
          "student_id": ["(500000.0, inf.0]"],
          "class_id": ["[54.0, 54.0]"]
        }
      }
    },
    "rejectedPlans": [
      {
        "stage": "SORT",
        "sortPattern": {
          "student_id": 1
        },
        "inputStage": {
          "stage": "SORT_KEY_GENERATOR",
          "inputStage": {
            "stage": "FETCH",
            "filter": {
              "student_id": {
                "$gt": 500000
              }
            },
            "inputStage": {
              "stage": "IXSCAN",
              "keyPattern": {
                "class_id": 1
              },
              "indexName": "class_id_1",
              "isMultiKey": false,
              "multiKeyPaths": {
                "class_id": []
              },
              "isUnique": false,
              "isSparse": false,
              "isPartial": false,
              "indexVersion": 2,
              "direction": "forward",
              "indexBounds": {
                "class_id": ["[54.0, 54.0]"]
              }
            }
          }
        }
      }
    ]
  },
  "executionStats": {
    "executionSuccess": true,
    "nReturned": 0,
    "executionTimeMillis": 0,
    "totalKeysExamined": 0,
    "totalDocsExamined": 0,
    "executionStages": {
      "stage": "FETCH",
      "nReturned": 0,
      "executionTimeMillisEstimate": 0,
      "works": 2,
      "advanced": 0,
      "needTime": 0,
      "needYield": 0,
      "saveState": 0,
      "restoreState": 0,
      "isEOF": 1,
      "invalidates": 0,
      "docsExamined": 0,
      "alreadyHasObj": 0,
      "inputStage": {
        "stage": "IXSCAN",
        "nReturned": 0,
        "executionTimeMillisEstimate": 0,
        "works": 1,
        "advanced": 0,
        "needTime": 0,
        "needYield": 0,
        "saveState": 0,
        "restoreState": 0,
        "isEOF": 1,
        "invalidates": 0,
        "keyPattern": {
          "student_id": 1,
          "class_id": 1
        },
        "indexName": "student_id_1_class_id_1",
        "isMultiKey": false,
        "multiKeyPaths": {
          "student_id": [],
          "class_id": []
        },
        "isUnique": false,
        "isSparse": false,
        "isPartial": false,
        "indexVersion": 2,
        "direction": "forward",
        "indexBounds": {
          "student_id": ["(500000.0, inf.0]"],
          "class_id": ["[54.0, 54.0]"]
        },
        "keysExamined": 0,
        "seeks": 1,
        "dupsTested": 0,
        "dupsDropped": 0,
        "seenInvalidated": 0
      }
    }
  },
  "serverInfo": {
    "host": "admatic-instance-1",
    "port": 27017,
    "version": "4.0.23",
    "gitVersion": "07c6611b38d2aacbdb1846b688db70b3273170fb"
  },
  "ok": 1
}

As with most data output from MongoDB, the explain output is JSON. Let’s look first at the bottom half of this output, which is almost entirely the execution stats. The "executionStats" field contains statistics that describe the completed query execution for the winning query plan. We will look at query plans and the query plan output from explain a little later.

Within "executionStats", first we will look at "totalKeysExamined". This is how many keys within the index MongoDB walked through in order to generate the result set. We can compare "totalKeysExamined" to "nReturned" to get a sense for how much of the index MongoDB had to traverse in order to find just the documents matching the query.

results matching ""

    No results matching ""