Merge pull request #3 from jackfrued/master

更新数据
2021-03-04 15:25:47 +08:00 · 2021-03-04 15:25:47 +08:00 · 6f59a2f581
parent 5f079328ee 0d01379603
commit 6f59a2f581
90 changed files with 46358 additions and 1864 deletions
--- a/Day66-70/66.数据分析概述.md
+++ b/Day66-70/66.数据分析概述.md
@ -24,14 +24,25 @@
 ### 数据分析的流程
-一个完整的数据分析流程应该包含以下几个方面，当然因为行业和工作内容的不同会略有差异。
+我们提到数分析这个词很多时候可能指的都是**狭义的数据分析**，这类数据分析主要目标就是生成可视化报表并通过这些报表来洞察业务中的问题。**广义的数据分析**还包含了数据挖掘的部分，不仅要通过数据实现对业务的监控和分析，还要利用机器学习算法，找出隐藏在数据背后的知识，并利用这些知识为将来的决策提供支撑。简单的说，**一个完整的数据分析应该包括基本的数据分析和深入的数据挖掘两个部分**。
-1. 确定目标（输入）：理解业务，确定要解决的问题
+基本的数据分析工作一般包含以下几个方面的内容，当然因为行业和工作内容的不同会略有差异。
-2. 收集数据（数据库、电子表格、三方接口、网络爬虫、开放数据集、……）
+
-3. 数据清洗（数据清洗、数据变换、特征工程、……）
+1. 确定目标（输入）：理解业务，确定指标口径
-4. 探索数据（分组、聚合、拼接、运算、可视化、……）
+2. 获取数据：数据库、电子表格、三方接口、网络爬虫、开放数据集、……
-5. 模型迭代（选择模型、应用算法、模型调优、……）
+3. 清洗数据：缺失值处理、异常值处理、格式化处理、数据变换、归一化、离散化、……
-6. 模型部署（输出）：模型落地，改进业务，A/B测试，报告撰写
+4. 探索数据：运算、统计、分组、聚合、可视化（趋势、变化、分布等）、……
 5. 数据报告（输出）：数据发布，工作成果总结汇报
 6. 分析洞察（后续）：数据监控、发现趋势、洞察异常、……
 深入的数据挖掘工作应该包含一下几个方面的内容，当然因为行业和工作内容的不同会略有差异。
 1. 确定目标（输入）：理解业务，明确挖掘目标
 2. 数据准备：数据采集、数据描述、数据探索、质量判定、……
 3. 数据加工：提取数据、清洗数据、数据变换、归一化、离散化、特殊编码、降维、特征选择、……
 4. 数据建模：模型比较、模型选择、算法应用、……
 5. 模型评估：交叉检验、参数调优、结果评价、……
 6. 模型部署（输出）：模型落地，业务改进，运营监控、报告撰写
 ### 数据分析相关库
@ -66,21 +77,64 @@
 ![](res/run-anaconda-navigator.png)
-对于Windows用户，建议按照安装向导的提示和推荐的选项来安装Anaconda。在完成安装之后，通过Windows的“开始菜单”找到Anaconda，并选择要执行的功能。我们可以选择启动名为“Jupyter Notebook”的工具（以下都简称为Notebook）来开始数据科学的探索之旅，我们也可以运行名为“Spyder”的工具来编写Python代码。
+对于Windows用户，建议按照安装向导的提示和推荐的选项来安装Anaconda（除了安装路径，基本也没有什么需要选择的），安装完成后可以在“开始菜单”中找到“Anaconda3”。
 #### conda命令
 如果希望使用conda工具来管理依赖项或者创建项目的虚拟环境，可以在终端或命令行提示符中使用conda命令。Windows用户可以在“开始菜单”中找到“Anaconda3”，然后点击“Anaconda Prompt”来启动支持conda的命令行提示符。macOS用户建议直接使用“Anaconda-Navigator”中的“Environments”，通过可视化的方式对虚拟环境和依赖项进行管理。
 1. 版本和帮助信息。
    - 查看版本：`conda -V`或`conda --version`
    - 获取帮助：`conda -h`或`conda --help`
    - 相关信息：`conda list`
 2. 虚拟环境相关。
    - 显示所有虚拟环境：`conda env list`
    - 创建虚拟环境：`conda create --name venv`
    - 指定Python版本创建虚拟环境：`conda create --name venv python=3.7`
    - 指定Python版本创建虚拟环境并安装指定依赖项：`conda create --name venv python=3.7 numpy pandas`
    - 通过克隆现有虚拟环境的方式创建虚拟环境：`conda create --name venv2 --clone venv`
    - 分享虚拟环境并重定向到指定的文件中：`conda env export > environment.yml`
    - 通过分享的虚拟环境文件创建虚拟环境：`conda env create -f environment.yml`
    - 激活虚拟环境：`conda activate venv`
    - 退出虚拟环境：`conda deactivate`
    - 删除虚拟环境：`conda remove --name venv --all`
    > **说明**：上面的命令中，`venv`和`venv2`是虚拟环境文件夹的名字，可以将其替换为自己喜欢的名字，但是**强烈建议**使用英文且不要出现空格或其他特殊字符。
 3. 包（三方库或工具）管理。
    - 查看已经安装的包：`conda list`
    - 搜索指定的包：`conda search matplotlib`
    - 安装指定的包：`conda install matplotlib`
    - 更新指定的包：`conda update matplotlib`
    - 移除指定的包：`conda remove matplotlib`
    > **说明**：在搜索、安装和更新软件包时，默认会连接到官方网站进行操作，如果觉得速度不给力，可以将默认的官方网站替换为国内的镜像网站，推荐使用清华大学的开源镜像网站。将默认源更换为国内镜像的命令是：`conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/ `。如果需要换回默认源，可以使用命令`conda config --remove-key channels`。
 ### 使用Notebook
 #### 安装和启动Notebook
-如果已经安装了Anaconda，可以按照上面所说的方式直接启动Notebook。对于安装了Python环境但是没有安装Anaconda的用户，可以用Python的包管理工具pip来安装`jupyter`，然后在终端（Windows系统称之为命令行提示符）中运行`jupyter notebook`命令来启动Notebook，如下所示。
+如果已经安装了Anaconda，macOS用户可以按照上面所说的方式在“Anaconda-Navigator”中直接启动“Jupyter Notebook”（以下统一简称为Notebook）。Windows用户可以在“开始菜单”中找到Anaconda文件夹，接下来选择运行文件夹中的“Jupyter Notebook”就可以开始数据科学的探索之旅。
-安装：
+对于安装了Python环境但是没有安装Anaconda的用户，可以用Python的包管理工具pip来安装`jupyter`，然后在终端（Windows系统称之为命令行提示符）中运行`jupyter notebook`命令来启动Notebook，如下所示。
 安装Notebook：
 ```Bash
 pip install jupyter
 ```
-运行：
+安装三大神器：
 ```Bash
 pip install numpy pandas matplotlib
 ```
 运行Notebook：
 ```Bash
 jupyter notebook
@ -175,6 +229,8 @@ Notebook是基于网页的用于交互计算的应用程序，可以用于代码
 ### 补充知识
 > **温馨提示**：GitHub默认不支持对Markdown文档中数学公式的渲染，为了不影响浏览文档，你可以为浏览器安装支持GitHub渲染LaTex数学公式的插件，如Chrome浏览器的MathJax Plugin for GitHub插件、Firefox浏览器的LatexMathifyGitHub插件等。
 #### 描述型统计
 1. 集中趋势
@ -183,35 +239,25 @@ Notebook是基于网页的用于交互计算的应用程序，可以用于代码
    - **均值**（mean）：均值代表某个数据集的整体水平，它的缺点是容易受极值的影响，可以使用加权平均值来消除极值的影响，但是可能事先并不清楚数据的权重，所以对于正数可以用几何平均值来替代算术平均值，二者的计算公式如下所示。
-        算术平均值：
+        算术平均值：$\bar{x}=\frac{\sum_{i=1}^{n}x_{i}}{n}=\frac{x_{1}+x_{2}+\cdots +x_{n}}{n}$
        $
        \bar{x}=\frac{\sum_{i=1}^{n}x_{i}}{n}=\frac{x_{1}+x_{2}+\cdots +x_{n}}{n}
        $
        几何平均值：$\left(\prod_{i=1}^{n}x_{i}\right)^{\frac{1}{n}}={\sqrt[{n}]{x_{1}x_{2} \cdots x_{n}}}$
-        几何平均值：
+    - **分位数**：将一个随机变量的概率分布范围分为几个具有相同概率的连续区间，比如最常见的中位数（二分位数，median），就是将数据集划分为数量相等的上下两个部分。除此之外，常见的分位数还有四分位数（quartile）、百分位数（percentile）等。
        $
        \left(\prod_{i=1}^{n}x_{i}\right)^{\frac{1}{n}}={\sqrt[{n}]{x_{1}x_{2} \cdots x_{n}}}
        $
- **分位数**：将一个随机变量的概率分布范围分为几个具有相同概率的连续区间，比如最常见的中位数（二分位数，median），就是将数据集划分为数量相等的上下两个部分。除此之外，常见的分位数还有四分位数（quartile）、百分位数（percentile）等。
+        - 中位数：当数据量$n$是奇数时，${Q}=x_{\frac{n+1}{2}}$，当数据量$n$是偶数时，$Q=(x_{\frac{n}{2}} + x_{{\frac{n}{2}}+1}) / 2$。
    - 中位数：
            $
            {Q}_{\frac {1}{2}}(x)={\begin{cases}x'_{\frac{n+1}{2}},&n是奇数\\{\frac {1}{2}}(x'_{\frac{n}{2}}+x'_{{\frac{n}{2}}+1}),&n是偶数\end{cases}}
            $
        - 四分位数：
            **第一四分位数**（$Q_1$），又称**较小四分位数**或**下四分位数**，等于该样本中所有数值由小到大排列后第25%的数字。
-        **第二四分位数**（$Q_2$），又称**中位数**，等于该样本中所有数值由小到大排列后第50%的数字。
+            **第二四分位数**（$Q_2$），即**中位数**，等于该样本中所有数值由小到大排列后第50%的数字。
            **第三四分位数**（$Q_3$），又称**较大四分位数**或**上四分位数**，等于该样本中所有数值由小到大排列后第75%的数字。
            **四分位距离**（$IQR$，Inter-Quartile Range），即$Q_3-Q_1$的值。
-    在实际工作中，我们经常通过四分位数再配合[箱线图](https://zhuanlan.zhihu.com/p/110580568)来发现异常值。例如，小于$Q_1 - 1.5 \times IQR$的值或大于$Q3 + 1.5 \times IQR$的值可以视为普通异常值，而小于$Q_1 - 3 * IQR$的值或大于$Q3 + 3 * IQR$的值通常视为极度异常值。这种检测异常值的方法跟[“$3\sigma$法则”](https://zh.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7%E5%8E%9F%E5%89%87)的道理是一致的，如下图所示。
+            在实际工作中，我们经常通过四分位数再配合[箱线图](https://zhuanlan.zhihu.com/p/110580568)来发现异常值。例如，小于$Q_1 - 1.5 \times IQR$的值或大于$Q3 + 1.5 \times IQR$的值可以视为普通异常值，而小于$Q_1 - 3 \times IQR$的值或大于$Q3 + 3 \times IQR$的值通常视为极度异常值。这种检测异常值的方法跟[“3西格玛法则”](https://zh.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7%E5%8E%9F%E5%89%87)的道理是一致的，如下图所示。
            ![](res/quartile_and_3sigma.png)
@ -219,7 +265,7 @@ Notebook是基于网页的用于交互计算的应用程序，可以用于代码
    - **极值**：就是最大值（maximum）、最小值（minimum），代表着数据集合中的上限和下限。
    - **极差**（range）：又称“全距”，是一组数据中的最大观测值和最小观测值之差，记作$R$。一般情况下，极差越大，离散程度越大，数据受极值的影响越严重。
-        	- 方差（variance）：将每个值与均值的偏差进行平方，最后除以总数据量的值。简单来说就是表示数据与期望值的偏离程度。方差越大，就意味着每个值与平均值的差值平方和越大、越不稳定、波动越剧烈，因此代表着数据整体比较分散，呈现出离散的趋势；而方差越小，代表着每个值与平均值的差值平方和越小、越稳定、波动越平滑，因此代表着数据整体很集中。
+    - **方差**（variance）：将每个值与均值的偏差进行平方，最后除以总数据量的值。简单来说就是表示数据与期望值的偏离程度。方差越大，就意味着每个值与平均值的差值平方和越大、越不稳定、波动越剧烈，因此代表着数据整体比较分散，呈现出离散的趋势；而方差越小，代表着每个值与平均值的差值平方和越小、越稳定、波动越平滑，因此代表着数据整体很集中。
    - **标准差**（standard deviation）：将方差进行平方根，与方差一样都是表示数据与期望值的偏离程度。
    - **分位差**：分位数的差值，如上面提到的四分位距离。
@ -230,52 +276,53 @@ Notebook是基于网页的用于交互计算的应用程序，可以用于代码
 #### 推理性统计
-1. 概率分布
+1. 基本概念
    - 随机试验：在相同条件下对某种随机现象进行观测的试验。随机试验满足三个特点：
        - 可以在相同条件下重复的进行。
        - 每次试验的结果不止一个，事先可以明确指出全部可能的结果。
        - 重复试验的结果以随机的方式出现（事先不确定会出现哪个结果）。
    - 随机变量：如果$X$指定给概率空间$S$中每一个事件$e$有一个实数$X(e)$，同时针对每一个实数$r$都有一个事件集合$A_r$与其相对应，其中$A_r=\{e: X(e) \le r\}$，那么$X$被称作随机变量。从这个定义看出，$X$的本质是一个实值函数，以给定事件为自变量的实值函数，因为函数在给定自变量时会产生因变量，所以将$X$称为随机变量。
    - 概率质量函数/概率密度函数：概率质量函数是描述离散型随机变量为特定取值的概率的函数，通常缩写为**PMF**。概率密度函数是描述连续型随机变量在某个确定的取值点可能性的函数，通常缩写为**PDF**。二者的区别在于，概率密度函数本身不是概率，只有对概率密度函数在某区间内进行积分后才是概率。
 2. 概率分布
    - 离散型分布：如果随机发生的事件之间是毫无联系的，每一次随机事件发生都是独立的、不连续的、不受其他事件影响的，那么这些事件的概率分布就属于离散型分布。
-        - 二项分布（Binomial distribution）：$n$个独立的是/非试验中成功的次数的离散概率分布，其中每次试验的成功概率为$p$。一般地，如果随机变量$X$服从参数为$n$和$p$的二项分布，记为$X\sim B(n,p)$。$n$次试验中正好得到$k$次成功的概率由概率质量函数给出，如下所示。
+        - 二项分布（binomial distribution）：$n$个独立的是/非试验中成功的次数的离散概率分布，其中每次试验的成功概率为$p$。一般地，如果随机变量$X$服从参数为$n$和$p$的二项分布，记为$X\sim B(n,p)$。$n$次试验中正好得到$k$次成功的概率由概率质量函数给出，$\displaystyle f(k,n,p)=\Pr(X=k)={n \choose k}p^{k}(1-p)^{n-k}$，对于$k= 0, 1, 2, ..., n$，其中${n \choose k}={\frac {n!}{k!(n-k)!}}$。
-            $
+        - 泊松分布（poisson distribution）：适合于描述单位时间内随机事件发生的次数的概率分布。如某一服务设施在一定时间内受到的服务请求的次数、汽车站台的候客人数、机器出现的故障数、自然灾害发生的次数、DNA序列的变异数、放射性原子核的衰变数等等。泊松分布的概率质量函数为：$P(X=k)=\frac{e^{-\lambda}\lambda^k}{k!}$，泊松分布的参数$\lambda$是单位时间（或单位面积）内随机事件的平均发生率。
            \displaystyle f(k,n,p)=\Pr(X=k)={n \choose k}p^{k}(1-p)^{n-k}
            $
            对于$k= 0, 1, 2, ..., n$，其中${n \choose k}={\frac {n!}{k!(n-k)!}}$
        - 泊松分布：适合于描述单位时间内随机事件发生的次数的概率分布。如某一服务设施在一定时间内受到的服务请求的次数、汽车站台的候客人数、机器出现的故障数、自然灾害发生的次数、DNA序列的变异数、放射性原子核的衰变数等等。泊松分布的概率质量函数为：
            $
            P(X=k)=\frac{e^{-\lambda}\lambda^k}{k!}
            $
          泊松分布的参数$\lambda$是单位时间（或单位面积）内随机事件的平均发生率。
    - 连续型分布：
-        - 正态分布：又名**高斯分布**（Gaussian distribution），是一个非常常见的连续概率分布，经常用自然科学和社会科学中来代表一个不明的随机变量。若随机变量$X$服从一个位置参数为$\mu$、尺度参数为$\sigma$的正态分布，记为$X \sim N(\mu,\sigma^2)$，其概率密度函数为：
+        - 均匀分布（uniform distribution）：如果连续型随机变量$X$具有概率密度函数$f(x)=\begin{cases}{\frac{1}{b-a}} \quad &{a \leq x \leq b} \\ {0} \quad &{\mbox{other}}\end{cases}$，则称$X$服从$[a,b]$上的均匀分布，记作$X\sim U[a,b]$。
-            $
+        - 指数分布（exponential distribution）：如果连续型随机变量$X$具有概率密度函数$f(x)=\begin{cases} \lambda e^{- \lambda x} \quad &{x \ge 0} \\ {0} \quad &{x \lt 0} \end{cases}$，则称$X$服从参数为$\lambda$的指数分布，记为$X \sim Exp(\lambda)$。指数分布可以用来表示独立随机事件发生的时间间隔，比如旅客进入机场的时间间隔、客服中心接入电话的时间间隔、知乎上出现新问题的时间间隔等等。指数分布的一个重要特征是无记忆性（无后效性），这表示如果一个随机变量呈指数分布，它的条件概率遵循：$P(T \gt s+t\ |\ T \gt t)=P(T \gt s), \forall s,t \ge 0$。
-            \displaystyle f(x)={\frac {1}{\sigma {\sqrt {2\pi }}}}\;e^{-{\frac {\left(x-\mu \right)^{2}}{2\sigma ^{2}}}}
+        - 正态分布（normal distribution）：又名**高斯分布**（Gaussian distribution），是一个非常常见的连续概率分布，经常用自然科学和社会科学中来代表一个不明的随机变量。若随机变量$X$服从一个位置参数为$\mu$、尺度参数为$\sigma$的正态分布，记为$X \sim N(\mu,\sigma^2)$，其概率密度函数为：$\displaystyle f(x)={\frac {1}{\sigma {\sqrt {2\pi }}}}e^{-{\frac {\left(x-\mu \right)^{2}}{2\sigma ^{2}}}}$。
-            $
+        - 伽马分布（gamma distribution）：假设$X_1, X_2, ... X_n$为连续发生事件的等候时间，且这$n$次等候时间为独立的，那么这$n$次等候时间之和$Y$（$Y=X_1+X_2+...+X_n$）服从伽玛分布，即$Y \sim \Gamma(\alpha,\beta)$，其中$\alpha=n, \beta=\lambda$，这里的$\lambda$是连续发生事件的平均发生频率。
        - 卡方分布（chi-square distribution）：若$k$个随机变量$Z_1,Z_2,...,Z_k$是相互独立且符合标准正态分布（数学期望为0，方差为1）的随机变量，则随机变量$Z$的平方和$X=\sum_{i=1}^{k}Z_i^2$被称为服从自由度为$k$的卡方分布，记为$X \sim \chi^2(k)$。
-        - 伽马分布：假设$X_1, X_2, ... X_n$为连续发生事件的等候时间，且这$n$次等候时间为独立的，那么这$n$次等候时间之和$Y$（$Y=X_1+X_2+...+X_n$）服从伽玛分布，即$Y \sim \Gamma(\alpha,\beta)$，其中$\alpha=n, \beta=\lambda$，这里的$\lambda$是连续发生事件的平均发生频率。
+3. 大数定律：样本数量越多，则其算术平均值就有越高的概率接近期望值。
-        - 卡方分布：若$k$个随机变量$Z_1,Z_2,...,Z_k$是相互独立且符合标准正态分布（数学期望为0，方差为1）的随机变量，则随机变量$Z$的平方和$X=\sum_{i=1}^{k}Z_i^2$被称为服从自由度为$k$的卡方分布，记为$X \sim \chi^2(k)$。
+    - 弱大数定律（辛钦定理）：样本均值依概率收敛于期望值，即对于任意正数$\epsilon$，有：$\lim_{n \to \infty}P(|\bar{X_n}-\mu|>\epsilon)=0$。
    - 强大数定律：样本均值以概率1收敛于期望值，即：$P(\lim_{n \to \infty}\bar{X_n}=\mu)=1$。
-    - 大数定律：样本数量越多，则其算术平均值就有越高的概率接近期望值。
+4. 中心极限定理：如果统计对象是大量独立的随机变量，那么这些变量的平均值分布就会趋向于正态分布，不管原来它们的概率分布是什么类型，即：$X_1, X_2, ..., X_n$是一组独立同分布的随机变量，且有$E(x_i)=\mu, D(X_i)=\sigma ^2$，当$n$足够大时，均值$\bar{X}=\frac{\sum_i^nX_i}{n}$的分布接近于$N(\mu,\sigma ^2/n)$正态分布，如果对$\bar{X}$进行标准化处理，可以得到$X'=\frac{\bar{X} - \mu}{\sigma / \sqrt n}$标准正态分布。
-        - 弱大数定律（辛钦定理）：样本均值依概率收敛于期望值，即对于任意正数$\epsilon$，有：
+5. 假设检验
            $
            \lim_{n \to \infty}P(|\overline{X_n}-\mu|>\epsilon)=0
            $
        - 强大数定律：样本均值以概率1收敛于期望值，即：
            $
            P(\lim_{n \to \infty}\overline{X_n}=\mu)=1
            $
    - 中心极限定理：如果统计对象是大量独立的随机变量，那么这些变量的平均值分布就会趋向于正态分布，不管原来它们的概率分布是什么类型。
 2. 假设检验
    假设检验就是通过抽取样本数据，并且通过**小概率反证法**去验证整体情况的方法。假设检验的核心思想是小概率反证法（首先假设想推翻的命题是成立的，然后试图找出矛盾，找出不合理的地方来证明命题为假命题），即在原假设（零假设，null hypothesis）的前提下，估算某事件发生的可能性，如果该事件是小概率事件，在一次研究中本来是不可能发生的，现在却发生了，这时候就可以推翻原假设，接受备择假设（alternative hypothesis）。如果该事件不是小概率事件，我们就找不到理由来推翻之前的假设，实际中可引申为接受所做的无效假设。
    假设检验会存在两种错误情况，一种称为“拒真”，一种称为“取伪”。如果原假设是对的，但你拒绝了原假设，这种错误就叫作“拒真”，这个错误的概率也叫作显著性水平$\alpha$，或称为容忍度；如果原假设是错的，但你承认了原假设，这种错误就叫作“取伪”，这个错误的概率我们记为$\beta$。
 6. 条件概率和贝叶斯定理
    **条件概率**是指事件A在事件B发生的条件下发生的概率，通常记为$P(A|B)$。设A与B为样本空间$\Omega$中的两个事件，其中$P(B) \gt 0$。那么在事件B发生的条件下，事件A发生的条件概率为：$P(A|B)=\frac{P(A \cap B)}{P(B)}$，其中$P(A \cap B)$是联合概率，即A和B两个事件共同发生的概率。
    事件A在事件B已发生的条件下发生的概率，与事件B在事件A已发生的条件下发生的概率是不一样的。然而，这两者是有确定的关系的，**贝叶斯定理**就是对这种关系的陈述，即：$P(A|B)=\frac{P(A)P(B|A)}{P(B)}$，其中：
    - $P(A|B)$是已知B发生后，A的条件概率，也称为A的后验概率。
    - $P(A)$是A的先验概率（也称为边缘概率），是不考虑B时A发生的概率。
    - $P(B|A)$是已知A发生后，B的条件概率，称为B的似然性。
    - $P(B)$是B的先验概率。
    按照上面的描述，贝叶斯定理可以表述为：`后验概率 = (似然性 * 先验概率) / 标准化常量`，简单的说就是后验概率与先验概率和相似度的乘积成正比。
 描述性统计通常用于研究表象，将现象用数据的方式描述出来；推理性统计通常用于推测本质，也就是你看到的表象的东西有多大概率符合你对隐藏在表象后的本质的猜测。
--- a/Day66-70/67.NumPy的应用.md
+++ b/Day66-70/67.NumPy的应用.md
--- a/Day66-70/68.Pandas的应用.md
+++ b/Day66-70/68.Pandas的应用.md
@ -14,6 +14,10 @@ Pandas核心的数据类型是`Series`、`DataFrame`，分别用于处理一维
 #### 绘制图表
 #### Index的使用
--- a/Day66-70/69.数据可视化.md
+++ b/Day66-70/69.数据可视化.md
@ -24,6 +24,12 @@ from matplotlib import pyplot as plt
 %matplotlib inline
 ```
 通过下面的魔法指令，可以生成矢量图（SVG）。
 ```Python
 %config InlineBackend.figure_format='svg'
 ```
 #### 绘图的流程
 1. 创建画布
--- a/Day66-70/70.数据分析项目实战.md
+++ b/Day66-70/70.数据分析项目实战.md
@ -1,2 +1,14 @@
 ## 数据分析项目实战
 ### 2020年北京积分落户分析
 ### 某招聘网站招聘数据分析
 ### 某电商网站订单数据分析
--- a/Day66-70/code/Day67.ipynb
+++ b/Day66-70/code/Day67.ipynb
@ -0,0 +1,862 @@
 {
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array1 = np.array([1, 2, 3, 4, 5])\n",
    "array1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array2 = np.arange(0, 20, 2)\n",
    "array2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array3 = np.linspace(-5, 5, 101)\n",
    "array3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array4 = np.random.rand(10)\n",
    "array4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array5 = np.random.randint(1, 101, 10)\n",
    "array5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array6 = np.random.normal(50, 10, 20)\n",
    "array6"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array7 = np.array([[1, 2, 3], [4, 5, 6]])\n",
    "array7"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array8 = np.zeros((3, 4))\n",
    "array8"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array9 = np.ones((3, 4))\n",
    "array9"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array10 = np.full((3, 4), 10)\n",
    "array10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array11 = np.eye(4)\n",
    "array11"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array12 = np.array([1, 2, 3, 4, 5, 6]).reshape(2, 3)\n",
    "array12"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array13 = np.random.rand(3, 4)\n",
    "array13"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array14 = np.random.randint(1, 100, (3, 4))\n",
    "array14"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array15 = np.random.randint(1, 100, (3, 4, 5))\n",
    "array15"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array16 = np.arange(1, 25).reshape((2, 3, 4))\n",
    "array16"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array17 = np.random.randint(1, 100, (4, 6)).reshape((4, 3, 2))\n",
    "array17"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array18 = plt.imread('guido.jpg')\n",
    "array18"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array19 = np.arange(1, 100, 2)\n",
    "array20 = np.random.rand(3, 4)\n",
    "print(array19.size, array20.size)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(array19.shape, array20.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(array19.dtype, array20.dtype)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(array19.ndim, array20.ndim)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array21 = np.arange(1, 100, 2, dtype=np.int8)\n",
    "print(array19.itemsize, array20.itemsize, array21.itemsize)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(array19.nbytes, array20.nbytes, array21.nbytes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import Iterable\n",
    "\n",
    "print(isinstance(array20.flat, np.ndarray), isinstance(array20.flat, Iterable))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array22 = array19[:]\n",
    "print(array22.base is array19, array22.base is array21)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array23 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])\n",
    "print(array23[0], array23[array23.size - 1])\n",
    "print(array23[-array23.size], array23[-1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array24 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n",
    "print(array24[2])\n",
    "print(array24[0][0], array24[-1][-1])\n",
    "print(array24[1][1], array24[1, 1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array24[1][1] = 10\n",
    "print(array24)\n",
    "array24[1] = [10, 11, 12]\n",
    "print(array24)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(array24[:2, 1:])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(array24[2])\n",
    "print(array24[2, :])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(array24[2:, :])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(array24[:, :2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(array24[1, :2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(array24[1:2, :2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array24[1:2, :2].base"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(array24[::2, ::2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(array24[::-2, ::-2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_style": "center",
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "guido_image = plt.imread('guido.jpg')\n",
    "guido_shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "plt.imshow(guido_image)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "plt.imshow(guido_image[::-1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "plt.imshow(guido_image)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "plt.imshow(guido_image[:,::-1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "plt.imshow(guido_image)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "plt.imshow(guido_image[30:350, 90:300])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array25 = np.array([50, 30, 15, 20, 40])\n",
    "array25[[0, 1, -1]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array26 = np.array([[30, 20, 10], [40, 60, 50], [10, 90, 80]])\n",
    "array26[[0, 2]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array26[[0, 2], [1, 2]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array26[[0, 2], [1]]\n",
    "array26[[0, 2], 1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array27 = np.arange(1, 10)\n",
    "array27[[True, False, True, True, False, False, False, False, True]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array27 >= 5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "~(array27 >= 5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array27[array27 >= 5]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array28 = np.array([1, 2, 3, 4, 5, 5, 4, 3, 2, 1])\n",
    "print(array28.sum())\n",
    "print(array28.mean())\n",
    "print(array28.max())\n",
    "print(array28.min())\n",
    "print(array28.std())\n",
    "print(array28.var())\n",
    "print(array28.cumsum())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array29 = np.array([3, 4])\n",
    "array30 = np.array([5, 6])\n",
    "array29.dot(array30)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array31 = np.array([[1, 2, 3], [4, 5, 6]])\n",
    "array32 = np.array([[1, 2], [3, 4], [5, 6]])\n",
    "array31.dot(array32)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array31.dump('array31-data')\n",
    "array32 = np.load('array31-data', allow_pickle=True)\n",
    "array32"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array32.flatten()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array33 = np.array([35, 96, 12, 78, 66, 54, 40, 82])\n",
    "array33.sort()\n",
    "array33"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array32.swapaxes(0, 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array32.transpose()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array34 = array33.take([0, 2, -3, -1])\n",
    "array34"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array35 = np.arange(1, 10)\n",
    "print(array35 + 10)\n",
    "print(array35 * 10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array36 = np.array([1, 1, 1, 2, 2, 2, 3, 3, 3])\n",
    "print(array35 + array36)\n",
    "print(array35 * array36)\n",
    "print(array35 ** array36)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(np.sqrt(array35))\n",
    "print(np.log2(array35))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array37 = np.array([[4, 5, 6], [7, 8, 9]])\n",
    "array38 = np.array([[1, 2, 3], [3, 2, 1]])\n",
    "print(array37 * array38)\n",
    "print(np.power(array37, array38))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array39 = np.array([[0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3]])\n",
    "array40 = np.array([1, 2, 3])\n",
    "array39 + array40"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array41 = np.array([[1], [2], [3], [4]])\n",
    "array39 + array41"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "array42 = np.array([[1, 1, 1], [2, 2, 2], [3, 3, 3]])\n",
    "array43 = np.array([[4, 4, 4], [5, 5, 5], [6, 6, 6]])\n",
    "np.hstack((array42, array43))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.vstack((array42, array43))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.concatenate((array42, array43))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.concatenate((array42, array43), axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = np.array([1, 2, 3])\n",
    "y = np.array([4, 5, 6])\n",
    "np.cross(x, y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m1 = np.matrix('1 2 3; 4 5 6')\n",
    "m1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m2 = np.asmatrix(np.array([[1, 1], [2, 2], [3, 3]]))\n",
    "m2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m1.dot(m2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m1 * m2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m3 = np.array([[1., 2.], [3., 4.]])\n",
    "np.linalg.inv(m3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m4 = np.array([[1, 3, 5], [2, 4, 6], [4, 7, 9]])\n",
    "np.linalg.det(m4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 解线性方程组ax=b\n",
    "# 3x + y = 9，x + 2y = 8\n",
    "a = np.array([[3,1], [1,2]])\n",
    "b = np.array([9, 8])\n",
    "np.linalg.solve(a, b)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
 }
--- a/Day66-70/res/3sigma_rule.jpg
+++ b/Day66-70/res/3sigma_rule.jpg
--- a/Day66-70/res/C558C1F83388892F5A2305AE4AAEB865.jpg
+++ b/Day66-70/res/C558C1F83388892F5A2305AE4AAEB865.jpg
--- a/Day66-70/res/IMG_3305(20201030-083355).PNG
+++ b/Day66-70/res/IMG_3305(20201030-083355).PNG
--- a/Day66-70/res/IMG_3306(20201030-083427).PNG
+++ b/Day66-70/res/IMG_3306(20201030-083427).PNG
--- a/Day66-70/res/IMG_3307(20201030-083545).PNG
+++ b/Day66-70/res/IMG_3307(20201030-083545).PNG
--- a/Day66-70/res/IMG_3308(20201030-083633).PNG
+++ b/Day66-70/res/IMG_3308(20201030-083633).PNG
--- a/Day66-70/res/IMG_3309(20201030-084050).PNG
+++ b/Day66-70/res/IMG_3309(20201030-084050).PNG
--- a/Day66-70/res/IMG_3310(20201030-084209).PNG
+++ b/Day66-70/res/IMG_3310(20201030-084209).PNG
--- a/Day66-70/res/IMG_3311(20201030-084301).PNG
+++ b/Day66-70/res/IMG_3311(20201030-084301).PNG
--- a/Day66-70/res/IMG_3312(20201030-084448).PNG
+++ b/Day66-70/res/IMG_3312(20201030-084448).PNG
--- a/Day66-70/res/IMG_3313(20201030-084559).PNG
+++ b/Day66-70/res/IMG_3313(20201030-084559).PNG
--- a/Day66-70/res/IMG_3314(20201030-084807).PNG
+++ b/Day66-70/res/IMG_3314(20201030-084807).PNG
--- a/Day66-70/res/IMG_3315(20201030-084832).PNG
+++ b/Day66-70/res/IMG_3315(20201030-084832).PNG
--- a/Day66-70/res/IMG_3316(20201030-084855).PNG
+++ b/Day66-70/res/IMG_3316(20201030-084855).PNG
--- a/Day66-70/res/IMG_3317(20201030-090626).PNG
+++ b/Day66-70/res/IMG_3317(20201030-090626).PNG
--- a/Day66-70/res/IMG_3318(20201030-091317).PNG
+++ b/Day66-70/res/IMG_3318(20201030-091317).PNG
--- a/Day66-70/res/IMG_3319(20201030-091350).PNG
+++ b/Day66-70/res/IMG_3319(20201030-091350).PNG
--- a/Day66-70/res/IMG_3320(20201030-092925).PNG
+++ b/Day66-70/res/IMG_3320(20201030-092925).PNG
--- a/Day66-70/res/IMG_3321(20201030-093408).PNG
+++ b/Day66-70/res/IMG_3321(20201030-093408).PNG
--- a/Day66-70/res/IMG_3322(20201030-093446).PNG
+++ b/Day66-70/res/IMG_3322(20201030-093446).PNG
--- a/Day66-70/res/IMG_3323(20201030-093637).PNG
+++ b/Day66-70/res/IMG_3323(20201030-093637).PNG
--- a/Day66-70/res/IMG_3324(20201030-094125).PNG
+++ b/Day66-70/res/IMG_3324(20201030-094125).PNG
--- a/Day66-70/res/IMG_3325(20201030-101519).PNG
+++ b/Day66-70/res/IMG_3325(20201030-101519).PNG
--- a/Day66-70/res/QQ20201208-135154@2x.png
+++ b/Day66-70/res/QQ20201208-135154@2x.png
--- a/Day66-70/res/broadcast-1.png
+++ b/Day66-70/res/broadcast-1.png
--- a/Day66-70/res/broadcast-2.png
+++ b/Day66-70/res/broadcast-2.png
--- a/Day66-70/res/broadcast-3.png
+++ b/Day66-70/res/broadcast-3.png
--- a/Day66-70/res/download-anaconda.png
+++ b/Day66-70/res/download-anaconda.png
--- a/Day66-70/res/image-flip-1.png
+++ b/Day66-70/res/image-flip-1.png
--- a/Day66-70/res/image-flip-2.png
+++ b/Day66-70/res/image-flip-2.png
--- a/Day66-70/res/image-flip-3.png
+++ b/Day66-70/res/image-flip-3.png
--- a/Day66-70/res/install-anaconda.png
+++ b/Day66-70/res/install-anaconda.png
--- a/Day66-70/res/jupyter-create-notebook.png
+++ b/Day66-70/res/jupyter-create-notebook.png
--- a/Day66-70/res/ndarray-dtype.png
+++ b/Day66-70/res/ndarray-dtype.png
--- a/Day66-70/res/ndarray-index.PNG
+++ b/Day66-70/res/ndarray-index.PNG
--- a/Day66-70/res/ndarray-slice.PNG
+++ b/Day66-70/res/ndarray-slice.PNG
--- a/Day66-70/res/notebook-get-help.png
+++ b/Day66-70/res/notebook-get-help.png
--- a/Day66-70/res/notebook-magic-command.png
+++ b/Day66-70/res/notebook-magic-command.png
--- a/Day66-70/res/notebook-search-namespace.png
+++ b/Day66-70/res/notebook-search-namespace.png
--- a/Day66-70/res/notebook-shortcut.png
+++ b/Day66-70/res/notebook-shortcut.png
--- a/Day66-70/res/quartile_and_3sigma.png
+++ b/Day66-70/res/quartile_and_3sigma.png
--- a/Day66-70/res/run-anaconda-navigator.png
+++ b/Day66-70/res/run-anaconda-navigator.png
--- a/Day66-70/res/use-jupyter-notebook.png
+++ b/Day66-70/res/use-jupyter-notebook.png
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/opencourse.iml
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/opencourse.iml
@ -0,0 +1,20 @@
 <?xml version="1.0" encoding="UTF-8"?>
 <module type="JAVA_MODULE" version="4">
  <component name="NewModuleRootManager" inherit-compiler-output="true">
    <exclude-output />
    <content url="file://$MODULE_DIR$">
      <sourceFolder url="file://$MODULE_DIR$/src" isTestSource="false" />
    </content>
    <orderEntry type="inheritedJdk" />
    <orderEntry type="sourceFolder" forTests="false" />
    <orderEntry type="module-library">
      <library>
        <CLASSES>
          <root url="jar://$MAVEN_REPOSITORY$/org/jetbrains/annotations/18.0.0/annotations-18.0.0.jar!/" />
        </CLASSES>
        <JAVADOC />
        <SOURCES />
      </library>
    </orderEntry>
  </component>
 </module>
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/out/production/opencourse/META-INF/opencourse.kotlin_module
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/out/production/opencourse/META-INF/opencourse.kotlin_module
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/out/production/opencourse/org/mobiletrain/Example01.class
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/out/production/opencourse/org/mobiletrain/Example01.class
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/out/production/opencourse/org/mobiletrain/Example02.class
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/out/production/opencourse/org/mobiletrain/Example02.class
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/out/production/opencourse/org/mobiletrain/Example03.class
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/out/production/opencourse/org/mobiletrain/Example03.class
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/out/production/opencourse/org/mobiletrain/Example04.class
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/out/production/opencourse/org/mobiletrain/Example04.class
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/out/production/opencourse/org/mobiletrain/Example05$RequestHandler.class
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/out/production/opencourse/org/mobiletrain/Example05$RequestHandler.class
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/out/production/opencourse/org/mobiletrain/Example05.class
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/out/production/opencourse/org/mobiletrain/Example05.class
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/src/org/mobiletrain/Example01.java
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/src/org/mobiletrain/Example01.java
@ -0,0 +1,8 @@
 package org.mobiletrain;
 class Example01 {
    public static void main(String[] args) {
        System.out.println("hello, world");
    }
 }
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/src/org/mobiletrain/Example02.java
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/src/org/mobiletrain/Example02.java
@ -0,0 +1,12 @@
 package org.mobiletrain;
 public class Example02 {
    public static void main(String[] args) {
        int total = 0;
        for (int i = 1; i <= 100; ++i) {
            total += i;
        }
        System.out.println(total);
    }
 }
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/src/org/mobiletrain/Example03.java
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/src/org/mobiletrain/Example03.java
@ -0,0 +1,18 @@
 package org.mobiletrain;
 import java.util.Arrays;
 public class Example03 {
    public static void main(String[] args) {
        boolean[] values = new boolean[10];
        Arrays.fill(values, true);
        System.out.println(Arrays.toString(values));
        int[] numbers = new int[10];
        for (int i = 0; i < numbers.length; ++i) {
            numbers[i] = i + 1;
        }
        System.out.println(Arrays.toString(numbers));
    }
 }
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/src/org/mobiletrain/Example04.java
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/src/org/mobiletrain/Example04.java
@ -0,0 +1,56 @@
 package org.mobiletrain;
 import java.util.List;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.Scanner;
 class Example04 {
    /**
     * 产生[min, max)范围的随机整数
     */
    public static int randomInt(int min, int max) {
        return (int) (Math.random() * (max - min) + min);
    }
    /**
     * 输出一组双色球号码
     */
    public static void display(List<Integer> balls) {
        for (int i = 0; i < balls.size(); ++i) {
            System.out.printf("%02d ", balls.get(i));
            if (i == balls.size() - 2) {
                System.out.print("| ");
            }
        }
        System.out.println();
    }
    /**
     * 生成一组随机号码
     */
    public static List<Integer> generate() {
        List<Integer> redBalls = new ArrayList<>();
        for (int i = 1; i <= 33; ++i) {
            redBalls.add(i);
        }
        List<Integer> selectedBalls = new ArrayList<>();
        for (int i = 0; i < 6; ++i) {
            selectedBalls.add(redBalls.remove(randomInt(0, redBalls.size())));
        }
        Collections.sort(selectedBalls);
        selectedBalls.add(randomInt(1, 17));
        return selectedBalls;
    }
    public static void main(String[] args) {
        try (Scanner sc = new Scanner(System.in)) {
            System.out.print("机选几注: ");
            int num = sc.nextInt();
            for (int i = 0; i < num; ++i) {
                display(generate());
            }
        }
    }
 }
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/src/org/mobiletrain/Example05.java
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Java/opencourse/src/org/mobiletrain/Example05.java
@ -0,0 +1,29 @@
 package org.mobiletrain;
 import com.sun.net.httpserver.HttpExchange;
 import com.sun.net.httpserver.HttpHandler;
 import com.sun.net.httpserver.HttpServer;
 import java.io.IOException;
 import java.io.OutputStream;
 import java.net.InetSocketAddress;
 class Example05 {
    public static void main(String[] arg) throws Exception {
        HttpServer server = HttpServer.create(new InetSocketAddress(8000), 0);
        server.createContext("/", new RequestHandler());
        server.start();
    }
    static class RequestHandler implements HttpHandler {
        @Override
        public void handle(HttpExchange exchange) throws IOException {
            String response = "<h1>hello, world</h1>";
            exchange.sendResponseHeaders(200, 0);
            try (OutputStream os = exchange.getResponseBody()) {
                os.write(response.getBytes());
            }
        }
    }
 }
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/USvideos.csv
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/USvideos.csv
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part01/example01.py
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part01/example01.py
@ -0,0 +1 @@
 print('hello, world')
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part01/example02.py
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part01/example02.py
@ -0,0 +1 @@
 print(sum(range(1, 101)))
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part01/example03.py
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part01/example03.py
@ -0,0 +1,4 @@
 values = [True] * 10
 print(values)
 numbers = [x for x in range(1, 11)]
 print(numbers)
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part01/example04.py
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part01/example04.py
@ -0,0 +1,24 @@
 from random import randint, sample
 def generate():
    """生成一组随机号码"""
    red_balls = [x for x in range(1, 34)]
    selected_balls = sample(red_balls, 6)
    selected_balls.sort()
    selected_balls.append(randint(1, 16))
    return selected_balls
 def display(balls):
    """输出一组双色球号码"""
    for index, ball in enumerate(balls):
        print(f'{ball:0>2d}', end=' ')
        if index == len(balls) - 2:
            print('|', end=' ')
    print()
 num = int(input('机选几注: '))
 for _ in range(num):
    display(generate())
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part01/example05.py
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part01/example05.py
@ -0,0 +1,13 @@
 from http.server import HTTPServer, SimpleHTTPRequestHandler
 class RequestHandler(SimpleHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.end_headers()
        self.wfile.write('<h1>goodbye, world</h1>'.encode())
 server = HTTPServer(('', 8000), RequestHandler)
 server.serve_forever()
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part01/example06.py
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part01/example06.py
@ -0,0 +1,25 @@
 # 一行代码实现求阶乘函数
 fac = lambda x: __import__('functools').reduce(int.__mul__, range(1, x + 1), 1)
 print(fac(5))
 # 一行代码实现求最大公约数函数
 gcd = lambda x, y: y % x and gcd(y % x, x) or x
 print(gcd(15, 27))
 # 一行代码实现判断素数的函数
 is_prime = lambda x: x > 1 and not [f for f in range(2, int(x ** 0.5) + 1) if x % f == 0]
 for num in range(2, 100):
    if is_prime(num):
        print(num, end=' ')
 print()
 # 一行代码实现快速排序
 quick_sort = lambda items: len(items) and quick_sort([x for x in items[1:] if x < items[0]]) \
                           + [items[0]] + quick_sort([x for x in items[1:] if x > items[0]]) \
                           or items
 items = [57, 12, 35, 68, 99, 81, 70, 22]
 print(quick_sort(items))
 # 生成FizzBuzz列表
 # 1 2 Fizz 4 Buzz 6 ... 14 ... FizzBuzz 16 ... 100
 print(['Fizz'[x % 3 * 4:] + 'Buzz'[x % 5 * 4:] or x for x in range(1, 101)])
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part01/example07.py
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part01/example07.py
@ -0,0 +1,12 @@
 from functools import lru_cache
@lru_cache()
 def fib(num):
    if num in (1, 2):
        return 1
    return fib(num - 1) + fib(num - 2)
 for n in range(1, 121):
    print(f'{n}: {fib(n)}')
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part01/example08.py
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part01/example08.py
@ -0,0 +1,23 @@
 from functools import wraps
 from threading import RLock
 def singleton(cls):
    instances = {}
    lock = RLock()
    @wraps(cls)
    def wrapper(*args, **kwargs):
        if cls not in instances:
            with lock:
                if cls not in instances:
                    instances[cls] = cls(*args, **kwargs)
        return instances[cls]
@singleton
 class President:
    pass
 President = President.__wrapped__
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part01/example09.py
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part01/example09.py
@ -0,0 +1,19 @@
 import copy
 class PrototypeMeta(type):
    def __init__(cls, *args, **kwargs):
        super().__init__(*args, **kwargs)
        cls.clone = lambda self, is_deep=True: \
            copy.deepcopy(self) if is_deep else copy.copy(self)
 class Student(metaclass=PrototypeMeta):
    pass
 stu1 = Student()
 stu2 = stu1.clone()
 print(stu1 == stu2)
 print(id(stu1), id(stu2))
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part01/example10.py
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part01/example10.py
@ -0,0 +1,15 @@
 import random
 import time
 import requests
 from bs4 import BeautifulSoup
 for page in range(10):
    resp = requests.get(
        url=f'https://movie.douban.com/top250?start={25 * page}',
        headers={'User-Agent': 'BaiduSpider'}
    )
    soup = BeautifulSoup(resp.text, "lxml")
    for elem in soup.select('a > span.title:nth-child(1)'):
        print(elem.text)
    time.sleep(random.random() * 5)
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part02/idiom01.py
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part02/idiom01.py
@ -0,0 +1,9 @@
 name = 'jackfrued'
 fruits = ['apple', 'orange', 'grape']
 owners = {'name': '骆昊', 'age': 40, 'gender': True}
 # if name != '' and len(fruits) > 0 and len(owners.keys()) > 0:
 #     print('Jackfrued love fruits.')
 if name and fruits and owners:
    print('Jackfrued love fruits.')
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part02/idiom02.py
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part02/idiom02.py
@ -0,0 +1,8 @@
 a, b = 5, 10
 # temp = a
 # a = b
 # b = a
 a, b = b, a
 print(f'a = {a}, b = {b}')
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part02/idiom03.py
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part02/idiom03.py
@ -0,0 +1,8 @@
 chars = ['j', 'a', 'c', 'k', 'f', 'r', 'u', 'e', 'd']
 # name = ''
 # for char in chars:
 #     name += char
 name = ''.join(chars)
 print(name)
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part02/idiom04.py
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part02/idiom04.py
@ -0,0 +1,9 @@
 fruits = ['orange', 'grape', 'pitaya', 'blueberry']
 # index = 0
 # for fruit in fruits:
 #     print(index, ':', fruit)
 #     index += 1
 for index, fruit in enumerate(fruits):
    print(index, ':', fruit)
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part02/idiom05.py
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part02/idiom05.py
@ -0,0 +1,9 @@
 data = [7, 20, 3, 15, 11]
 # result = []
 # for i in data:
 #     if i > 10:
 #         result.append(i * 3)
 result = [num * 3 for num in data if num > 10]
 print(result)
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part02/idiom06.py
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part02/idiom06.py
@ -0,0 +1,14 @@
 data = {'x': '5'}
 # if 'x' in data and isinstance(data['x'], (str, int, float)) \
 #         and data['x'].isdigit():
 #     value = int(data['x'])
 #     print(value)
 # else:
 #     value = None
 try:
    value = int(data['x'])
    print(value)
 except (KeyError, TypeError, ValueError):
    value = None
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part03/example.py
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part03/example.py
@ -0,0 +1,60 @@
 """
 扑克
 """
 import enum
 import random
@enum.unique
 class Suite(enum.Enum):
    """花色(枚举)"""
    SPADE, HEART, CLUB, DIAMOND = range(4)
 class Card:
    """牌"""
    def __init__(self, suite, face):
        self.suite = suite
        self.face = face
    def __repr__(self):
        suites = '♠♥♣♦'
        faces = ['', 'A', '2', '3', '4', '5', '6', '7', '8', '9', '10', 'J', 'Q', 'K']
        return f'{suites[self.suite.value]}{faces[self.face]}'
 class Poker:
    """扑克"""
    def __init__(self):
        self.cards = [Card(suite, face) for suite in Suite
                      for face in range(1, 14)]
        self.current = 0
    def shuffle(self):
        """洗牌"""
        self.current = 0
        random.shuffle(self.cards)
    def deal(self):
        """发牌"""
        card = self.cards[self.current]
        self.current += 1
        return card
    @property
    def has_next(self):
        """还有没有牌可以发"""
        return self.current < len(self.cards)
 def main():
    """主函数（程序入口）"""
    poker = Poker()
    poker.shuffle()
    print(poker.cards)
 if __name__ == '__main__':
    main()
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part04/example.py
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part04/example.py
@ -0,0 +1,47 @@
 import cProfile
 # @profile
 def is_prime(num):
    for factor in range(2, int(num ** 0.5) + 1):
        if num % factor == 0:
            return False
    return True
 class PrimeIter:
    def __init__(self, total):
        self.counter = 0
        self.current = 1
        self.total = total
    def __iter__(self):
        return self
    def __next__(self):
        if self.counter < self.total:
            self.current += 1
            while not is_prime(self.current):
                self.current += 1
            self.counter += 1
            return self.current
        raise StopIteration()
@profile
 def eat_memory():
    items = []
    for _ in range(1000000):
        items.append(object())
    return items
 def main():
    eat_memory()
    # list(PrimeIter(1000))
    # cProfile.run('list(PrimeIter(10000))')
 if __name__ == '__main__':
    main()
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part04/example.py.lprof
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/part04/example.py.lprof
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/requirements.txt
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/opencourse/requirements.txt
@ -0,0 +1,37 @@
 appnope==0.1.0
 astroid==2.3.3
 backcall==0.1.0
 beautifulsoup4==4.9.0
 certifi==2020.4.5.1
 chardet==3.0.4
 decorator==4.4.2
 idna==2.9
 ipython==7.13.0
 ipython-genutils==0.2.0
 isort==4.3.21
 jedi==0.17.0
 lazy-object-proxy==1.4.3
 line-profiler==3.0.2
 lxml==4.5.0
 mccabe==0.6.1
 memory-profiler==0.57.0
 numpy==1.18.3
 pandas==1.0.3
 parso==0.7.0
 pexpect==4.8.0
 pickleshare==0.7.5
 prompt-toolkit==3.0.5
 psutil==5.7.0
 ptyprocess==0.6.0
 Pygments==2.6.1
 pylint==2.4.4
 python-dateutil==2.8.1
 pytz==2019.3
 requests==2.23.0
 six==1.14.0
 soupsieve==2.0
 traitlets==4.3.3
 typed-ast==1.4.1
 urllib3==1.25.9
 wcwidth==0.1.9
 wrapt==1.11.2
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/使用Pandas做数据分析.ipynb
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/使用Pandas做数据分析.ipynb
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/res/action.png
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/res/action.png
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/res/use-pandas-in-jupyter-notebook.png
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/res/use-pandas-in-jupyter-notebook.png
--- a/公开课/文档/年薪50W+的Python程序员如何写代码/年薪50W+的Python程序员如何写代码.md
+++ b/公开课/文档/年薪50W+的Python程序员如何写代码/年薪50W+的Python程序员如何写代码.md
@ -0,0 +1,766 @@
 ## 年薪50W+的Python程序员如何写代码
 ### 为什么要用Python写代码
 #### 没有对比就没有伤害
 > **很多互联网和移动互联网企业对开发效率的要求高于对执行效率的要求**。
 ##### 例子1：hello, world
 C的版本：
 ```C
 #include <stdio.h>
 int main() {
    printf("hello, world\n");
    return 0;
 }
 ```
 Java的版本：
 ```Java
 class Example01 {
    public static void main(String[] args) {
        System.out.println("hello, world");
    }
 }
 ```
 Python的版本：
 ```Python
 print('hello, world')
 ```
 #####  例子2：1-100求和
 C的版本：
 ```C
 #include <stdio.h>
 int main() {
    int total = 0;
    for (int i = 1; i <= 100; ++i) {
        total += i;
    }
    printf("%d\n", total);
 	return 0;
 }
 ```
 Python的版本：
 ```Java
 print(sum(range(1, 101)))
 ```
 ##### 例子3：创建和初始化数组（列表）
 Java的版本：
 ```Java
 import java.util.Arrays;
 public class Example03 {
    public static void main(String[] args) {
        boolean[] values = new boolean[10];
        Arrays.fill(values, true);
        System.out.println(Arrays.toString(values));
        int[] numbers = new int[10];
        for (int i = 0; i < numbers.length; ++i) {
            numbers[i] = i + 1;
        }
        System.out.println(Arrays.toString(numbers));
    }
 }
 ```
 Python的版本：
 ```Python
 values = [True] * 10
 print(values)
 numbers = [x for x in range(1, 11)]
 print(numbers)
 ```
 ##### 例子4：双色球随机选号
 Java的版本：
 ```Java
 import java.util.List;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.Scanner;
 class Example03 {
    /**
     * 产生[min, max)范围的随机整数
     */
    public static int randomInt(int min, int max) {
        return (int) (Math.random() * (max - min) + min);
    }
    /**
     * 输出一组双色球号码
     */
    public static void display(List<Integer> balls) {
        for (int i = 0; i < balls.size(); ++i) {
            System.out.printf("%02d ", balls.get(i));
            if (i == balls.size() - 2) {
                System.out.print("| ");
            }
        }
        System.out.println();
    }
    /**
     * 生成一组随机号码
     */
    public static List<Integer> generate() {
        List<Integer> redBalls = new ArrayList<>();
        for (int i = 1; i <= 33; ++i) {
            redBalls.add(i);
        }
        List<Integer> selectedBalls = new ArrayList<>();
        for (int i = 0; i < 6; ++i) {
            selectedBalls.add(redBalls.remove(randomInt(0, redBalls.size())));
        }
        Collections.sort(selectedBalls);
        selectedBalls.add(randomInt(1, 17));
        return selectedBalls;
    }
    public static void main(String[] args) {
        try (Scanner sc = new Scanner(System.in)) {
            System.out.print("机选几注: ");
            int num = sc.nextInt();
            for (int i = 0; i < num; ++i) {
                display(generate());
            }
        }
    }
 }
 ```
 Python的版本：
 ```Python
 from random import randint, sample
 def generate():
    """生成一组随机号码"""
    red_balls = [x for x in range(1, 34)]
    selected_balls = sample(red_balls, 6)
    selected_balls.sort()
    selected_balls.append(randint(1, 16))
    return selected_balls
 def display(balls):
    """输出一组双色球号码"""
    for index, ball in enumerate(balls):
        print(f'{ball:0>2d}', end=' ')
        if index == len(balls) - 2:
            print('|', end=' ')
    print()
 num = int(input('机选几注: '))
 for _ in range(num):
    display(generate())
 ```
 > **温馨提示**：珍爱生命，远离任何形式的赌博。
 ##### 例子5：实现一个简单的HTTP服务器。
 Java的版本：
 > **说明**：JDK 1.6以前，需要通过套接字编程来实现，具体又可以分为多线程和NIO两种做法。JDK 1.6以后，可以使用`com.sun.net.httpserver`包提供的`HttpServer`类来实现。
 ```Java
 import com.sun.net.httpserver.HttpExchange;
 import com.sun.net.httpserver.HttpHandler;
 import com.sun.net.httpserver.HttpServer;
 import java.io.IOException;
 import java.io.OutputStream;
 import java.net.InetSocketAddress;
 class Example05 {
    public static void main(String[] arg) throws Exception {
        HttpServer server = HttpServer.create(new InetSocketAddress(8000), 0);
        server.createContext("/", new RequestHandler());
        server.start();
    }
    static class RequestHandler implements HttpHandler {
        @Override
        public void handle(HttpExchange exchange) throws IOException {
            String response = "<h1>hello, world</h1>";
            exchange.sendResponseHeaders(200, 0);
            try (OutputStream os = exchange.getResponseBody()) {
                os.write(response.getBytes());
            }
        }
    }
 }
 ```
 Python的版本：
 ```Python
 from http.server import HTTPServer, SimpleHTTPRequestHandler
 class RequestHandler(SimpleHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.end_headers()
        self.wfile.write('<h1>hello, world</h1>'.encode())
 server = HTTPServer(('', 8000), RequestHandler)
 server.serve_forever()
 ```
 或
 ```Python
 python3 -m http.server 8000
 ```
 #### 一行Python代码可以做什么
 > **很多时候，你的问题只需一行Python代码就能解决**。
 ```Python
 # 一行代码实现求阶乘函数
 fac = lambda x: __import__('functools').reduce(int.__mul__, range(1, x + 1), 1)
 # 一行代码实现求最大公约数函数
 gcd = lambda x, y: y % x and gcd(y % x, x) or x
 # 一行代码实现判断素数的函数
 is_prime = lambda x: x > 1 and not [f for f in range(2, int(x ** 0.5) + 1) if x % f == 0]
 # 一行代码实现快速排序
 quick_sort = lambda items: len(items) and quick_sort([x for x in items[1:] if x < items[0]]) + [items[0]] + quick_sort([x for x in items[1:] if x > items[0]]) or items
 # 生成FizzBuzz列表
 ['Fizz'[x % 3 * 4:] + 'Buzz'[x % 5 * 4:] or x for x in range(1, 101)]
 ```
 #### 设计模式从未如此简单
 > **Python是动态类型语言，大量的设计模式在Python中被简化或弱化**。
 思考：如何优化下面的代码。
 ```Python
 def fib(num):
    if num in (1, 2):
        return 1
    return fib(num - 1) + fib(num - 2)
 ```
 代理模式在Python中可以通过内置的或自定义的装饰器来实现。
 ```Python
 from functools import lru_cache
@lru_cache()
 def fib(num):
    if num in (1, 2):
        return 1
    return fib(num - 1) + fib(num - 2)
 for n in range(1, 121):
    print(f'{n}: {fib(n)}')
 ```
 > **说明**：通过Python标准库`functools`模块的`lru_cache`装饰器为`fib`函数加上缓存代理，缓存函数执行的中间结果，优化代码的性能。
 单例模式在Python中可以通过自定义的装饰器或元类来实现。
 ```Python
 from functools import wraps
 from threading import RLock
 def singleton(cls):
    instances = {}
    lock = RLock()
    @wraps(cls)
    def wrapper(*args, **kwargs):
        if cls not in instances:
            with lock:
                if cls not in instances:
                    instances[cls] = cls(*args, **kwargs)
        return instances[cls]
 ```
 > **说明**：需要实现单例模式的类只需要添加上面的装饰器即可。
 原型模式在Python中可以通过元类来实现。
 ```Python
 import copy
 class PrototypeMeta(type):
    def __init__(cls, *args, **kwargs):
        super().__init__(*args, **kwargs)
        cls.clone = lambda self, is_deep=True: \
            copy.deepcopy(self) if is_deep else copy.copy(self)
 ```
 > **说明**：通过元类给指定了`metaclass=PrototypeMeta`的类添加一个`clone`方法实现对象克隆，利用Python标准库`copy`模块的`copy`和`deepcopy`分别实现浅拷贝和深拷贝。
 #### 数据采集和数据分析从未如此简单
 > **网络数据采集是Python最擅长的领域之一。**
 例子：获取豆瓣电影“Top250”。
 ```Python
 import random
 import time
 import requests
 from bs4 import BeautifulSoup
 for page in range(10):
    resp = requests.get(
        url=f'https://movie.douban.com/top250?start={25 * page}',
        headers={'User-Agent': 'BaiduSpider'}
    )
    soup = BeautifulSoup(resp.text, "lxml")
    for elem in soup.select('a > span.title:nth-child(1)'):
        print(elem.text)
    time.sleep(random.random() * 5)
 ```
 > **利用NumPy、Pandas、Matplotlib可以轻松实现数据分析和可视化**。
 ![](res/use-pandas-in-jupyter-notebook.png)
 ### 写出Python代码的正确姿势
 > **用Python写代码就要写出Pythonic的代码**。
 #### 姿势1：选择结构的正确姿势
 跨界开发者的代码：
 ```Python
 name = 'jackfrued'
 fruits = ['apple', 'orange', 'grape']
 owners = {'name': '骆昊', 'age': 40, 'gender': True}
 if name != '' and len(fruits) > 0 and len(owners.keys()) > 0:
    print('Jackfrued love fruits.')
 ```
 Pythonic的代码：
 ```Python
 name = 'jackfrued'
 fruits = ['apple', 'orange', 'grape']
 owners = {'name': '骆昊', 'age': 40, 'gender': True}
 if name and fruits and owners:
    print('Jackfrued love fruits.')
 ```
 #### 姿势2：交换两个变量的正确姿势
 跨界开发者的代码：
 ```Python
 temp = a
 a = b
 b = temp
 ```
 或
 ```Python
 a = a ^ b
 b = a ^ b
 a = a ^ b
 ```
 Pythonic的代码：
 ```Python
 a, b = b, a
 ```
 #### 姿势3：用序列组装字符串的正确姿势
 跨界开发者的代码：
 ```Python
 chars = ['j', 'a', 'c', 'k', 'f', 'r', 'u', 'e', 'd']
 name = ''
 for char in chars:
    name += char
 ```
 Pythonic的代码：
 ```Python
 chars = ['j', 'a', 'c', 'k', 'f', 'r', 'u', 'e', 'd']
 name = ''.join(chars)
 ```
 #### 姿势4：遍历列表的正确姿势
 跨界开发者的代码：
 ```Python
 fruits = ['orange', 'grape', 'pitaya', 'blueberry']
 index = 0
 for fruit in fruits:
    print(index, ':', fruit)
    index += 1
 ```
 Pythonic的代码：
 ```Python
 fruits = ['orange', 'grape', 'pitaya', 'blueberry']
 for index, fruit in enumerate(fruits):
    print(index, ':', fruit)
 ```
 #### 姿势5：创建列表的正确姿势
 跨界开发者的代码：
 ```Python
 data = [7, 20, 3, 15, 11]
 result = []
 for i in data:
    if i > 10:
        result.append(i * 3)
 ```
 Pythonic的代码：
 ```Python
 data = [7, 20, 3, 15, 11]
 result = [num * 3 for num in data if num > 10]
 ```
 #### 姿势6：确保代码健壮性的正确姿势
 跨界开发者的代码：
 ```Python
 data = {'x': '5'}
 if 'x' in data and isinstance(data['x'], (str, int, float)) \
        and data['x'].isdigit():
    value = int(data['x'])
    print(value)
 else:
    value = None
 ```
 Pythonic的代码：
 ```Python
 data = {'x': '5'}
 try:
    value = int(data['x'])
    print(value)
 except (KeyError, TypeError, ValueError):
    value = None
 ```
 ### 使用Lint工具检查你的代码规范
 阅读下面的代码，看看你能看出哪些地方是有毛病的或者说不符合Python的编程规范的。
 ```Python
 from enum import *
@unique
 class Suite (Enum):
    SPADE, HEART, CLUB, DIAMOND = range(4)
 class Card(object):
    def __init__(self,suite,face ):
        self.suite = suite
        self.face = face
    def __repr__(self):
        suites='♠♥♣♦'
        faces=['','A','2','3','4','5','6','7','8','9','10','J','Q','K']
        return f'{suites[self.suite.value]}{faces[self.face]}'
 import random
 class Poker(object):
    def __init__(self):
        self.cards =[Card(suite, face) for suite in Suite
            for face in range(1, 14)]
        self.current=0
    def shuffle (self):
        self.current=0
        random.shuffle(self.cards)
    def deal (self):
        card = self.cards[self.current]
        self.current+=1
        return card
    def has_next (self):
        if self.current<len(self.cards): return True
        return False
 p = Poker()
 p.shuffle()
 print(p.cards)
 ```
 #### PyLint的安装和使用
 Pylint是Python代码分析工具，它分析Python代码中的错误，查找不符合代码风格标准（默认使用的代码风格是 PEP 8）和有潜在问题的代码。
 ```Bash
 pip install pylint
 pylint [options] module_or_package
 ```
 Pylint输出格式如下所示。
 > 模块名:行号:列号:    消息类型    消息
 消息类型有以下几种：
 1. C - 惯例：违反了Python编程惯例（PEP 8）的代码。
 2. R - 重构：写得比较糟糕需要重构的代码。
 3. W - 警告：代码中存在的不影响代码运行的问题。
 4. E - 错误：代码中存在的影响代码运行的错误。
 5. F - 致命错误：导致Pylint无法继续运行的错误。
 Pylint命令的常用参数：
 1. `--disable=<msg ids>`或`-d <msg ids>`：禁用指定类型的消息。
 2. `--errors-only`或`-E`：只显示错误。
 3. `--rcfile=<file>`：指定配置文件。
 4. `--list-msgs`：列出Pylint的消息清单。
 5. `--generate-rcfile`：生成配置文件的样例。
 6. `--reports=<y_or_n>`或`-r <y_or_n>`：是否生成检查报告。
 ### 使用Profile工具剖析你的代码性能
 #### cProfile模块
 `example01.py`
 ```Python
 import cProfile
 def is_prime(num):
    for factor in range(2, int(num ** 0.5) + 1):
        if num % factor == 0:
            return False
    return True
 class PrimeIter:
    def __init__(self, total):
        self.counter = 0
        self.current = 1
        self.total = total
    def __iter__(self):
        return self
    def __next__(self):
        if self.counter < self.total:
            self.current += 1
            while not is_prime(self.current):
                self.current += 1
            self.counter += 1
            return self.current
        raise StopIteration()
 cProfile.run('list(PrimeIter(10000))')
 ```
 执行结果：
 ```
   114734 function calls in 0.573 seconds
   Ordered by: standard name
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.006    0.006    0.573    0.573 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 example.py:14(__init__)
        1    0.000    0.000    0.000    0.000 example.py:19(__iter__)
    10001    0.086    0.000    0.567    0.000 example.py:22(__next__)
   104728    0.481    0.000    0.481    0.000 example.py:5(is_prime)
        1    0.000    0.000    0.573    0.573 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
 ```
 ####line_profiler
 给需要剖析时间性能的函数加上一个`profile`装饰器，这个函数每行代码的执行次数和时间都会被剖析。
 `example02.py`
 ```Python
@profile
 def is_prime(num):
    for factor in range(2, int(num ** 0.5) + 1):
        if num % factor == 0:
            return False
    return True
 class PrimeIter:
    def __init__(self, total):
        self.counter = 0
        self.current = 1
        self.total = total
    def __iter__(self):
        return self
    def __next__(self):
        if self.counter < self.total:
            self.current += 1
            while not is_prime(self.current):
                self.current += 1
            self.counter += 1
            return self.current
        raise StopIteration()
 list(PrimeIter(1000))
 ```
 安装和使用`line_profiler`三方库。
 ```Bash
 pip install line_profiler
 kernprof -lv example.py
 Wrote profile results to example02.py.lprof
 Timer unit: 1e-06 s
 Total time: 0.089513 s
 File: example02.py
 Function: is_prime at line 1
 #      Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
 1                                           @profile
 2                                           def is_prime(num):
 3     86624      43305.0      0.5     48.4      for factor in range(2, int(num ** 0.5) + 1):
 4     85624      42814.0      0.5     47.8          if num % factor == 0:
 5      6918       3008.0      0.4      3.4              return False
 6      1000        386.0      0.4      0.4      return True
 ```
 ####memory_profiler 
 给需要剖析内存性能的函数加上一个`profile`装饰器，这个函数每行代码的内存使用情况都会被剖析。
 `example03.py`
 ```Python
@profile
 def eat_memory():
    items = []
    for _ in range(1000000):
        items.append(object())
    return items
 eat_memory()
 ```
 安装和使用`memory_profiler`三方库。
 ```Python
 pip install memory_profiler
 python3 -m memory_profiler example.py
 Filename: example03.py
 Line #    Mem usage    Increment   Line Contents
 ================================================
     1   38.672 MiB   38.672 MiB   @profile
     2                             def eat_memory():
     3   38.672 MiB    0.000 MiB       items = []
     4   68.727 MiB    0.000 MiB       for _ in range(1000000):
     5   68.727 MiB    1.797 MiB           items.append(object())
     6   68.727 MiB    0.000 MiB       return items
 ```
 ### 如何构建综合职业素养
 #### 学习总结
 1. 了解全局
 2. 确定范围
 3. 定义目标
 4. 寻找资源
 5. 创建学习计划
 6. 筛选资源
 7. 开始学习，浅尝辄止（YAGNI）
 8. 动手操作，边学边玩
 9. 全面掌握，学以致用
 10. 乐为人师，融会贯通
 #### 时间管理
 1. 提升专注力
 2. 充分利用碎片时间
 3. 使用番茄工作法
 4. 时间是怎么浪费掉的
 5. 任何行动都比不采取行动好
   ![](res/action.png)
 #### 好书推荐
 1. 职业规划：《软技能 - 代码之外的生存指南》
 2. 吴军系列：《浪潮之巅》、《硅谷之谜》、《数学之美》、……
 3. 时间管理：《成为一个更高效的人》、《番茄工作法图解》
--- a/番外篇/Python数据分析师面试题.md
+++ b/番外篇/Python数据分析师面试题.md
@ -0,0 +1,17 @@
 ## Python数据分析师面试题
 ### 基础知识部分
 ### 编程能力部分
 ### 商业项目部分
 1. 近期公司的X指标出现了明显的下滑，说说你会如果系统化的分析指标下滑的原因。
 2. 公司对App进行了版本迭代，对X功能做出了调整，请说明你会如何评估改版的效果。
 3. 公司对App做了一次营销拉新活动，请说明你会如何评估本次拉新活动的效果。
 4. 请说说你在设计数据报表时一般会考虑哪些问题。
 5. 
--- a/番外篇/Python面试题汇总.md
+++ b/番外篇/Python面试题汇总.md
--- a/番外篇/知乎问题回答.md
+++ b/番外篇/知乎问题回答.md
@ -7,14 +7,14 @@
 > 说明：以下数据参考了主要的招聘门户网站以及职友集。
 | 职位                                           | 所需技能                                                     | 招聘需求量 |
-| ---------------------------------------------- | ------------------------------------------------------------ | ---------------- |
+| ---------------------------------------------- | ------------------------------------------------------------ | ---------- |
-| Python后端开发工程师                           | Python基础<br>Django / Flask / Tornado / Sanic<br>RESTful / 接口文档撰写<br>MySQL / Redis / MongoDB / ElasticSearch<br>Linux / Git / Scrum / PyCharm | 大               |
+| Python后端开发工程师                           | Python基础<br>Django / Flask / Tornado / Sanic<br>RESTful / 接口文档撰写<br>MySQL / Redis / MongoDB / ElasticSearch<br>Linux / Git / Scrum / PyCharm | 一般       |
 | Python爬虫开发工程师                           | Python基础<br>常用标准库和三方库<br>Scrapy / PySpider<br>Selenium / Appnium<br>Redis / MongoDB / MySQL<br>前端 / HTTP(S) / 抓包工具 | 较少       |
-| Python量化交易开发工程师                       | Python基础<br>数据结构 / 算法 / 设计模式<br>NoSQL（KV数据库）<br>金融学（两融、期权、期货、股票） / 数字货币 | 较大（一线城市） |
+| Python量化交易开发工程师                       | Python基础<br>数据结构 / 算法 / 设计模式<br>NoSQL（KV数据库）<br>金融学（两融、期权、期货、股票） / 数字货币 | 一般       |
-| Python数据分析工程师 /<br>Python机器学习工程师 | 统计学专业 / 数学专业 / 计算机专业<br>Python基础 / 算法设计<br>SQL / NoSQL / Hive / Hadoop / Spark<br>NumPy / Scikit-Learn / Pandas / Seaborn<br>PyTorch / Tensorflow / OpenCV | 较大（一线城市） |
+| Python数据分析工程师 /<br>Python机器学习工程师 | 统计学专业 / 数学专业 / 计算机专业<br>Python基础 / 算法设计<br>SQL / NoSQL / Hive / Hadoop / Spark<br>NumPy / Scikit-Learn / Pandas / Seaborn<br>PyTorch / Tensorflow / OpenCV | 大         |
 | Python自动化测试工程师                         | Python基础 / 单元测试 / 软件测试基础<br>Linux / Shell / JIRA / 禅道 / Jenkins / CI / CD<br>Selenium / Robot Framework / Appnium<br>ab / sysbench / JMeter / LoadRunner / QTP | 大         |
-| Python自动化运维工程师                         | Python基础 / Linux / Shell <br>Fabric / Ansible / Playbook<br>Zabbix / Saltstack / Puppet<br>Docker / paramiko | 较大（一线城市） |
+| Python自动化运维工程师                         | Python基础 / Linux / Shell <br>Fabric / Ansible / Playbook<br>Zabbix / Saltstack / Puppet<br>Docker / paramiko | 大         |
-| Python云平台开发工程师                         | Python基础<br>OpenStack / CloudStack<br>Ovirt / KVM<br>Docker / K8S | 较少（一线城市） |
+| Python云平台开发工程师                         | Python基础<br>OpenStack / CloudStack<br>Ovirt / KVM<br>Docker / K8S | 较少       |
 如果弄清了自己将来要做的方向，就可以开始有针对性的学习了，下面给大家一个推荐书籍的清单。
@ -14,6 +14,10 @@ Pandas核心的数据类型是`Series`、`DataFrame`，分别用于处理一维



		`#### 绘制图表`



	`#### Index的使用`	`#### Index的使用`