db-tutorial/docs/12.数据库/04.文档数据库/01.MongoDB/06.MongoDB建模示例.md

586 lines
14 KiB
Markdown
Raw Normal View History

2022-04-11 16:52:35 +08:00
---
title: MongoDB 建模示例
date: 2020-09-12 10:43:53
2022-06-09 15:16:02 +08:00
categories:
- 数据库
- 文档数据库
- MongoDB
2022-06-09 15:16:02 +08:00
tags:
- 数据库
- 文档数据库
- MongoDB
- 建模
2022-06-09 15:16:02 +08:00
permalink: /pages/88c7d3/
2022-04-11 16:52:35 +08:00
---
2020-09-12 10:43:53 +08:00
2022-04-11 16:52:35 +08:00
# MongoDB 建模示例
2020-09-12 10:43:53 +08:00
## 关系型模型
### 嵌入式文档一对一关系模型
#### 嵌入式文档一对一关系模型 - 嵌入式文档模式
```json
// patron document
{
_id: "joe",
name: "Joe Bookreader"
}
// address document
{
patron_id: "joe", // reference to patron document
street: "123 Fake Street",
city: "Faketon",
state: "MA",
zip: "12345"
}
```
合并为:
```json
{
"_id": "joe",
"name": "Joe Bookreader",
"address": {
"street": "123 Fake Street",
"city": "Faketon",
"state": "MA",
"zip": "12345"
}
}
```
#### 嵌入式文档一对一关系模型 - 子集模式
假设,有一个用于描述电影信息的 collection 定义:
```json
{
"_id": 1,
"title": "The Arrival of a Train",
"year": 1896,
"runtime": 1,
"released": ISODate("01-25-1896"),
"poster": "http://ia.media-imdb.com/images/M/MV5BMjEyNDk5MDYzOV5BMl5BanBnXkFtZTgwNjIxMTEwMzE@._V1_SX300.jpg",
"plot": "A group of people are standing in a straight line along the platform of a railway station, waiting for a train, which is seen coming at some distance. When the train stops at the platform, ...",
"fullplot": "A group of people are standing in a straight line along the platform of a railway station, waiting for a train, which is seen coming at some distance. When the train stops at the platform, the line dissolves. The doors of the railway-cars open, and people on the platform help passengers to get off.",
"lastupdated": ISODate("2015-08-15T10:06:53"),
"type": "movie",
"directors": ["Auguste Lumière", "Louis Lumière"],
"imdb": {
"rating": 7.3,
"votes": 5043,
"id": 12
},
"countries": ["France"],
"genres": ["Documentary", "Short"],
"tomatoes": {
"viewer": {
"rating": 3.7,
"numReviews": 59
},
"lastUpdated": ISODate("2020-01-09T00:02:53")
}
}
```
在应用中,有的场景只需要显示电影的简单浏览信息,不需要显示类似 fullplot、poster 这样的详细信息。因为,我们可以考虑将原结构一份为二,并通过 id 字段关联起来。
用于展示摘要信息的 movie collection
```json
// movie collection
{
"_id": 1,
"title": "The Arrival of a Train",
"year": 1896,
"runtime": 1,
"released": ISODate("1896-01-25"),
"type": "movie",
"directors": ["Auguste Lumière", "Louis Lumière"],
"countries": ["France"],
"genres": ["Documentary", "Short"]
}
```
用于展示细节信息的 movie_details collection
```json
// movie_details collection
{
"_id": 156,
"movie_id": 1, // reference to the movie collection
"poster": "http://ia.media-imdb.com/images/M/MV5BMjEyNDk5MDYzOV5BMl5BanBnXkFtZTgwNjIxMTEwMzE@._V1_SX300.jpg",
"plot": "A group of people are standing in a straight line along the platform of a railway station, waiting for a train, which is seen coming at some distance. When the train stops at the platform, ...",
"fullplot": "A group of people are standing in a straight line along the platform of a railway station, waiting for a train, which is seen coming at some distance. When the train stops at the platform, the line dissolves. The doors of the railway-cars open, and people on the platform help passengers to get off.",
"lastupdated": ISODate("2015-08-15T10:06:53"),
"imdb": {
"rating": 7.3,
"votes": 5043,
"id": 12
},
"tomatoes": {
"viewer": {
"rating": 3.7,
"numReviews": 59
},
"lastUpdated": ISODate("2020-01-29T00:02:53")
}
}
```
### 嵌入式文档一对多关系模型
#### 嵌入式文档一对多关系模型 - 嵌入式文档模式
```json
// patron document
{
_id: "joe",
name: "Joe Bookreader"
}
// address documents
{
patron_id: "joe", // reference to patron document
street: "123 Fake Street",
city: "Faketon",
state: "MA",
zip: "12345"
}
{
patron_id: "joe",
street: "1 Some Other Street",
city: "Boston",
state: "MA",
zip: "12345"
}
```
合并为:
```json
{
"_id": "joe",
"name": "Joe Bookreader",
"addresses": [
{
"street": "123 Fake Street",
"city": "Faketon",
"state": "MA",
"zip": "12345"
},
{
"street": "1 Some Other Street",
"city": "Boston",
"state": "MA",
"zip": "12345"
}
]
}
```
#### 嵌入式文档一对多关系模型 - 子集模式
考虑一个电商网站用于表示商品的 collection
```json
{
"_id": 1,
"name": "Super Widget",
"description": "This is the most useful item in your toolbox.",
"price": { "value": NumberDecimal("119.99"), "currency": "USD" },
"reviews": [
{
"review_id": 786,
"review_author": "Kristina",
"review_text": "This is indeed an amazing widget.",
"published_date": ISODate("2019-02-18")
},
{
"review_id": 785,
"review_author": "Trina",
"review_text": "Nice product. Slow shipping.",
"published_date": ISODate("2019-02-17")
},
...{
"review_id": 1,
"review_author": "Hans",
"review_text": "Meh, it's okay.",
"published_date": ISODate("2017-12-06")
}
]
}
```
评论按时间倒序排列。 当用户访问产品页面时,应用程序将加载十条最近的评论。可以将集合分为两个集合,而不是与产品一起存储所有评论:
产品集合存储有关每个产品的信息,包括产品的十个最新评论:
```json
{
"_id": 1,
"name": "Super Widget",
"description": "This is the most useful item in your toolbox.",
"price": { "value": NumberDecimal("119.99"), "currency": "USD" },
"reviews": [
{
"review_id": 786,
"review_author": "Kristina",
"review_text": "This is indeed an amazing widget.",
"published_date": ISODate("2019-02-18")
}
...
{
"review_id": 776,
"review_author": "Pablo",
"review_text": "Amazing!",
"published_date": ISODate("2019-02-16")
}
]
}
```
review collection 存储所有的评论
```json
{
"review_id": 786,
"product_id": 1,
"review_author": "Kristina",
"review_text": "This is indeed an amazing widget.",
"published_date": ISODate("2019-02-18")
}
{
"review_id": 785,
"product_id": 1,
"review_author": "Trina",
"review_text": "Nice product. Slow shipping.",
"published_date": ISODate("2019-02-17")
}
...
{
"review_id": 1,
"product_id": 1,
"review_author": "Hans",
"review_text": "Meh, it's okay.",
"published_date": ISODate("2017-12-06")
}
```
### 引用式文档一对多关系模型
考虑以下映射出版商和书籍关系的示例。
该示例说明了引用式文档的优点,以避免重复发布者信息。
```json
{
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher: {
name: "O'Reilly Media",
founded: 1980,
location: "CA"
}
}
{
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English",
publisher: {
name: "O'Reilly Media",
founded: 1980,
location: "CA"
}
}
```
为避免重复出版商数据,可以使用引用型文档,并将出版商信息与书本分开保存。 使用引用时,关系的增长决定了将引用存储在何处。 如果每个出版商的图书数量很少且增长有限,则有时将图书参考存储在出版商文档中可能会很有用。 否则,如果每个发布者的书籍数量不受限制,则此数据模型将导致可变的,不断增长的数组,如以下示例所示:
```json
{
name: "O'Reilly Media",
founded: 1980,
location: "CA",
books: [123456789, 234567890, ...]
}
{
_id: 123456789,
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
{
_id: 234567890,
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English"
}
```
为了避免可变的,增长的数组,请将发行者参考存储在书籍文档中:
```json
{
_id: "oreilly",
name: "O'Reilly Media",
founded: 1980,
location: "CA"
}
{
_id: 123456789,
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher_id: "oreilly"
}
{
_id: 234567890,
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English",
publisher_id: "oreilly"
}
```
## 树形结构模型
2021-05-13 16:57:28 +08:00
![img](https://raw.githubusercontent.com/dunwu/images/dev/snap/20200911194846.svg)
2020-09-12 10:43:53 +08:00
### 具有父节点的树形结构模型
上图结构可以用父引用来表示:
```json
db.categories.insertMany([
{ "_id": "MongoDB", "parent": "Databases" },
{ "_id": "dbm", "parent": "Databases" },
{ "_id": "Databases", "parent": "Programming" },
{ "_id": "Languages", "parent": "Programming" },
{ "_id": "Programming", "parent": "Books" },
{ "_id": "Books", "parent": null }
])
```
- 检索节点的父节点:
```
db.categories.findOne( { _id: "MongoDB" } ).parent
```
- 可以在父字段上创建索引以启用父节点的快速搜索:
```
db.categories.createIndex( { parent: 1 } )
```
- 可以通过父字段查询找到其直接子节点:
```
db.categories.find( { parent: "Databases" } )
```
- 检索子树,可以参考: [`$graphLookup`](https://docs.mongodb.com/manual/reference/operator/aggregation/graphLookup/#pipe._S_graphLookup).
### 具有子节点的树形结构模型
```json
db.categories.insertMany([
{ "_id": "MongoDB", "children": [] },
{ "_id": "dbm", "children": [] },
{ "_id": "Databases", "children": ["MongoDB", "dbm"] },
{ "_id": "Languages", "children": [] },
{ "_id": "Programming", "children": ["Databases", "Languages"] },
{ "_id": "Books", "children": ["Programming"] }
])
```
- 检索节点的 children
```
db.categories.findOne( { _id: "Databases" } ).children
```
- 可以在 children 字段上创建索引以启用子节点的快速搜索:
```
db.categories.createIndex( { children: 1 } )
```
- 可以在 children 字段中查询节点,以找到其父节点及其兄弟节点:
```
db.categories.find( { children: "MongoDB" } )
```
### 具有祖先的树形结构模型
```json
db.categories.insertMany([
{
"_id": "MongoDB",
"ancestors": ["Books", "Programming", "Databases"],
"parent": "Databases"
},
{
"_id": "dbm",
"ancestors": ["Books", "Programming", "Databases"],
"parent": "Databases"
},
{
"_id": "Databases",
"ancestors": ["Books", "Programming"],
"parent": "Programming"
},
{
"_id": "Languages",
"ancestors": ["Books", "Programming"],
"parent": "Programming"
},
{ "_id": "Programming", "ancestors": ["Books"], "parent": "Books" },
{ "_id": "Books", "ancestors": [], "parent": null }
])
```
- 检索节点的祖先或路径的查询是快速而直接的:
```json
db.categories.findOne({ "_id": "MongoDB" }).ancestors
```
- 可以在 ancestors 字段上创建索引,以启用祖先节点的快速搜索:
```json
db.categories.createIndex({ "ancestors": 1 })
```
- 可以通过 ancestors 字段查询查找其所有后代:
```json
db.categories.find({ "ancestors": "Programming" })
```
### 具有实体化路径的树形结构模型
```json
db.categories.insertMany([
{ "_id": "Books", "path": null },
{ "_id": "Programming", "path": ",Books," },
{ "_id": "Databases", "path": ",Books,Programming," },
{ "_id": "Languages", "path": ",Books,Programming," },
{ "_id": "MongoDB", "path": ",Books,Programming,Databases," },
{ "_id": "dbm", "path": ",Books,Programming,Databases," }
])
```
- 可以查询以检索整个树,并按字段路径排序:
```
db.categories.find().sort( { path: 1 } )
```
- 可以在 path 字段上使用正则表达式来查找 Programming 的后代
```
db.categories.find( { path: /,Programming,/ } )
```
- 可以检索 Books 的后代,其中 Books 也位于层次结构的最高级别:
```
db.categories.find( { path: /^,Books,/ } )
```
- 要在 path 字段上创建索引,请使用以下调用:
```
db.categories.createIndex( { path: 1 } )
```
### 具有嵌套集的树形结构模型
2021-05-13 16:57:28 +08:00
![img](https://raw.githubusercontent.com/dunwu/images/dev/snap/20200911204252.svg)
2020-09-12 10:43:53 +08:00
```javascript
db.categories.insertMany([
2022-04-11 16:52:35 +08:00
{ _id: 'Books', parent: 0, left: 1, right: 12 },
{ _id: 'Programming', parent: 'Books', left: 2, right: 11 },
{ _id: 'Languages', parent: 'Programming', left: 3, right: 4 },
{ _id: 'Databases', parent: 'Programming', left: 5, right: 10 },
{ _id: 'MongoDB', parent: 'Databases', left: 6, right: 7 },
2022-07-16 10:44:19 +08:00
{ _id: 'dbm', parent: 'Databases', left: 8, right: 9 }
2022-04-11 16:52:35 +08:00
])
2020-09-12 10:43:53 +08:00
```
可以查询以检索节点的后代:
```javascript
2022-04-11 16:52:35 +08:00
var databaseCategory = db.categories.findOne({ _id: 'Databases' })
2020-09-12 10:43:53 +08:00
db.categories.find({
left: { $gt: databaseCategory.left },
2022-07-16 10:44:19 +08:00
right: { $lt: databaseCategory.right }
2022-04-11 16:52:35 +08:00
})
2020-09-20 23:12:17 +08:00
```
## 设计模式
### 大文档,很多列,很多索引
解决方案是:列转行
2021-05-13 16:57:28 +08:00
![img](https://raw.githubusercontent.com/dunwu/images/dev/snap/20200919225901.png)
2020-09-20 23:12:17 +08:00
### 管理文档不同版本
MongoDB 文档格式非常灵活,势必会带来版本维护上的难度。
解决方案是:可以增加一个版本号字段
- 快速过滤掉不需要升级的文档
- 升级时,对不同版本的文档做不同处理
### 统计网页点击量
统计数据精确性要求并不是十分重要。
解决方案:用近似计算
每隔 10 次写一次:
```json
{ "$inc": { "views": 1 } }
2020-09-12 10:43:53 +08:00
```
2020-09-20 23:12:17 +08:00
### 精确统计
解决方案:使用预聚合
2020-09-12 10:43:53 +08:00
## 参考资料
- [Data Model Examples and Patterns](https://docs.mongodb.com/manual/applications/data-models/)