mysql大量数据去重


mysql大量数据去重

ads_info

  • 一共有1kw+的数据, 其中有4w * 2 (按ad_id分组)是重复的, 需要去重 show index from ads_info

+----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| ads_info |          0 | PRIMARY    |            1 | id          | A         |    16825292 |     NULL | NULL   |      | BTREE      |         |               |
| ads_info |          1 | ad_id      |            1 | ad_id       | A         |    16825292 |     NULL | NULL   | YES  | BTREE      |         |               |
| ads_info |          1 | adset_id   |            1 | adset_id    | A         |    16825292 |     NULL | NULL   | YES  | BTREE      |         |               |
| ads_info |          1 | account_id |            1 | account_id  | A         |      178992 |     NULL | NULL   | YES  | BTREE      |         |               |
+----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

1 强制给ads_info 加一个UNIQUE KEY

ALTER IGNORE TABLE ads_info
     ADD UNIQUE KEY(ad_id);

2 新建一个 ad_idUNIQUE KEY 表, 插入数据, 删除原表


create table `ads_info_c` like `ads_info`;

alter table `ads_info_c` add qunique key ad_id_u(ad_id);

INSERT IGNORE INTO ads_info_c (SELECT * FROM ads_info)  

drop table ads_info;

RENAME TABLE ads_info_c TO ads_info;
  • 批量插入mysql(防止unique key 报错中断)

    已经有 unique key, 使用 ignore


INSERT IGNORE INTO `TABLE_NAME`(xx,xx,xx,..) VALUES(oo,oo,oo);

注意:

假如 一次批量insert 1000, 有一个重复,另外的999条数据会写入 , 但是ID会有1000的增量,
连续的ID会有断层.
Buy me a 肥仔水!