mysql大量数据去重
表 ads_info
- 一共有1kw+的数据, 其中有4w * 2 (按ad_id分组)是重复的, 需要去重
show index from ads_info
+----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| ads_info | 0 | PRIMARY | 1 | id | A | 16825292 | NULL | NULL | | BTREE | | |
| ads_info | 1 | ad_id | 1 | ad_id | A | 16825292 | NULL | NULL | YES | BTREE | | |
| ads_info | 1 | adset_id | 1 | adset_id | A | 16825292 | NULL | NULL | YES | BTREE | | |
| ads_info | 1 | account_id | 1 | account_id | A | 178992 | NULL | NULL | YES | BTREE | | |
+----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
1 强制给
ads_info
加一个UNIQUE KEY
ALTER IGNORE TABLE ads_info
ADD UNIQUE KEY(ad_id);
2 新建一个
ad_id
为UNIQUE KEY
表, 插入数据, 删除原表
create table `ads_info_c` like `ads_info`;
alter table `ads_info_c` add qunique key ad_id_u(ad_id);
INSERT IGNORE INTO ads_info_c (SELECT * FROM ads_info)
drop table ads_info;
RENAME TABLE ads_info_c TO ads_info;
-
批量插入mysql(防止unique key 报错中断)
已经有 unique key, 使用 ignore
INSERT IGNORE INTO `TABLE_NAME`(xx,xx,xx,..) VALUES(oo,oo,oo);
注意:
假如 一次批量insert 1000, 有一个重复,另外的999条数据会写入 , 但是ID会有1000的增量,
连续的ID会有断层.