[Mongo] 검색 엔진 없이 효과적으로 검색해보기 (Feat. $text, $search, $regex, aggregate)

문제 상황

검색어 사이에 다른 문자열(특히 공백)이 존재하는 경우, 원하는 document를 효과적으로 얻기 어려웠음.

특히 사용자가 특정 상품을 검색할 때 공백을 입력할 경우가 많을 거라 생각하고 이에 적극적으로 대비하고 싶었음.

아래는 기존 코드와 검색 결과이다.

findBySearch(filter) {
    const product = await Product.find({ name: { $regex: `${filter}`} });
    return product;
 }

{ name: '여름 반팔 니트' }

// '검색어' => '검색 결과'
// '여름 반팔' => O 
// '반팔 니트' => O
// '여름 니트' => X
// '여름반팔'  => X

$text, $search

장점

1. 공백이 포함된 검색어의 경우 자동으로 OR 로직으로 검색 처리

(ex. "여름 니트" => 여름 or 니트 검색)

2. 여러 필드에 걸쳐 값을 찾을 수 있음.

설정하기

1. text index 설정

$text는 collecitons 속 field의 string값을 탐색하는 메서드로, 대상 field는 index 옵션을 가져야 한다.

mongoDB를 직접적으로 사용하거나, shell을 사용하는 경우 db.collection.createIndex() 메서드를 사용하면 된다.

mongoose는 collection 객체를 가져오기보단 schema를 통해 index를 설정한다.

weights 옵션

weight 옵션은 검색 키워드에 가장 적절한 정도를 점수로 측정할 때 각 필드에 매기는 가중치이다.
사용자의 의도에 적절한 검색 결과를 도출하려면 이는 매우 중요한 옵션이다.
아래 코드에서 그 적용을 확인할 수 있다.

2. search score에 근거하여 검색 결과 정렬하기

점수를 확인하는 방법은 { $meta: 'textScore' } 옵션이다.

$meta는 메타 데이터 즉, 도큐먼트 자체에 대한 정보를 확인하는 옵션이며, 'textScore'와 'indexKey' 를 value로 받는다.

'textScore'는 document가 weight 옵션 가중치에 근거하여 얻은 점수이며

'indexKey'는 $text가 아닌 경우라면 검색 결과 배열 속 index를 의미한다.

//schema.js
ProductSchema.index(
  {
    name: 'text',
    shortDescription: 'text',
    detailDescription: 'text',
    keyword: 'text',
  },
  {
    weights: {
      name: 20,
      keyword: 10,
      shortDescription: 2,
      detailDescription: 1,
    },
  },
);

//model.js
async findBySearch(filter) {
    const product = await Product.find(
      { $text: { $search: filter } },
      { score: { $meta: 'textScore' } },
    ).sort({ score: { $meta: 'textScore' } });
    console.log(product);
    return product;
}

추가 문제

$text, $search VS. $regex

1. 검색어가 '니트'일 때, $text, $search로는 '여름니트' document를 얻을 수 없다.

2. $text는 대소문자 또는 발음 기호에 따른 구분 여부를 옵션으로 설정할 수 있지만, 포함 여부를 고려하는 옵션은 존재하지 않는다. $search 안에 정규표현식을 넣을 수 있으면 좋겠지만, 불가능하다.

(공부한게 억울하니까... mongoDB에서 만들어주면 좋겠다.)

https://stackoverflow.com/questions/24343156/mongodb-prefix-wildcard-fulltext-search-text-find-part-with-search-string

mongoDB prefix wildcard: fulltext-search ($text) find part with search-string

I have mongodb with a $text-Index and elements like this: { foo: "my super cool item" } { foo: "your not so cool item" } If i do search with mycoll.find({ $text: { $search: "super"} }) i...

stackoverflow.com

3. $regex로 찾자니 textScore로 정렬하는 기능을 쓸 수 없게 된다.

두 기능을 비교한 좋은 글을 첨부한다.

https://medium.com/nodejs-server/text-search-vs-regex-in-mongodb-c9cf11dc8816

$text, $search vs $regex In MongoDB

검색 엔진 없이 MongoDB query만으로 최대한 검색해보기

medium.com

$unionWith

- 참고: https://www.mongodb.com/community/forums/t/combine-text-with-regex/153083/2

Combine $text with $regex

Hi @Matteo_Tarantino , Why do you need this kind of double search if you already perform a text search. In general we recommend using Atlas search for full text searches if it happens that your cluster is an Atlas cluster: If you still insist on running th

www.mongodb.com

로직

1. $unionWith를 통해 text - search pipeline(이하 "Search")과 match - regex pipeline(이하 "Regex)을 합치는 방법

2. Regex의 경우 Search와 달리 score가 없으므로 임의로 점수를 메겨주는 방법 선택

3. $unionWith는 중복을 허용하기에, $group 활용

const regex_unionWith = await Product.aggregate([
      { $match: { $text: { $search: filter } } },
      { $addFields: { score: { $meta: 'textScore' } } },
      { $unionWith: {
          coll: 'products',
          pipeline: [
            { $match: {
                $or: [
                  { targetField1: { $regex: `${filter}` } },
                  { targetField2: { $regex: `${filter}` } },
                  { targetField3: { $elemMatch: { $regex: `${filter}` } } }, // array 
                ],
             }},
             { $addFields: { score: 1 } },
          ],
      }},
      { $sort: { score: -1 } }, // $group에서 $first를 이용하기 위해서 정렬이 되어야 함.
      { $group: {
        _id: "$_id",
        field1: {$first: "$field1"},
        ...,
        score: {$sum: "$score"}
      } },
    ]);

아쉬운 점

1. *현 프로젝트에서는 모든 field를 프론트로 넘기기 때문에 $group 코드 작성에 번거로움이 있음.

2. regex 필드에서 점수 차등을 줄 수 있는 방법 연구 중( $regexMatch, $accumulator )

'Database' 카테고리의 다른 글

[DB] Connection pool의 실용성과 유의점 (0)	2022.09.20
[Sequelize] snake_case를 camelCase로 다루기 (0)	2022.07.28
[Mongo] find / limit query의 중요성 (feat. Array.slice()) (0)	2022.07.02

평범하지 않은 삽질

[Mongo] 검색 엔진 없이 효과적으로 검색해보기 (Feat. $text, $search, $regex, aggregate)

문제 상황

$text, $search

장점

설정하기

1. text index 설정

weights 옵션

2. search score에 근거하여 검색 결과 정렬하기

추가 문제

$unionWith

로직

아쉬운 점

'Database' 카테고리의 다른 글

티스토리툴바

[Mongo] 검색 엔진 없이 효과적으로 검색해보기 (Feat. $text, $search, $regex, aggregate)

문제 상황

$text, $search

장점

설정하기

1. text index 설정

weights 옵션

2. search score에 근거하여 검색 결과 정렬하기

추가 문제

$unionWith

로직

아쉬운 점

'Database' 카테고리의 다른 글

'Database' Related Articles

티스토리툴바