生而为人

程序员的自我修养

0%

1
2
3
4
5
with temp_table as (
select '1' as f1, null as f2
union all
select '1', '3'
)

count distinct 结果是字段均不为null的去重值

所以在通过count distinct判断字段的重复情况时,注意把相关字段的null附上值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
select count(*)
from temp_table
--
`count`(*)
2

select count(distinct f1, f2)
from temp_table
--
`count`(DISTINCT `f1`, `f2`)
1

select count(f1, f2)
from temp_table
--
`count`(`f1`, `f2`)
1

关联字段中有null时是关联不上的,即使左右都是null

1
2
3
4
5
6
7
8
9
select a.f1, a.f2, b.f1, b.f2
from temp_table a
left join temp_table b
on a.f1 = b.f1
and a.f2 = b.f2
--
f1 f2 f1 f2
1 NULL NULL NULL
1 3 1 3

sum中包含null,不会影响sum的值

1
2
3
4
5
6
7
8
select sum(ct)
from(select 1 as ct
union all
select null)a

--
`sum`(`ct`)
1

1
2
3
select count(distinct seller_bu_id) as ct1, 
count(distinct seller_bu_id, management_city_id) as ct2
from mart_caterb2b.dim_seller_management_city_info
  1. 如果 ct1 = ct2,说明seller_bu_id:management_city_id是n:1的关系,n是1还是多,需要继续探查
  2. 如果 ct1 < ct2, 说明seller_bu_id:management_city_id是1:n的关系,n是多
  3. 如果 ct1 > ct2, 说明management_city_id中有null值,比例关系需要去掉这些null值重新计算。

Update

1. 单条更新

2. 批量更新

1
2
3
4
5
6
7
UPDATE mytable 
SET myfield = CASE other_field
WHEN 1 THEN 'value'
WHEN 2 THEN 'value'
WHEN 3 THEN 'value'
END
WHERE id IN (1,2,3)

Select

1. limit

1
2
语句1:select * from student limit 9,4 //第一个参数表示从该参数的下一条数据开始,第二个参数表示每次返回的数据条数。
语句2:slect * from student limit 4 offset 9

mysql limit分页

1. 时间类型字段问题

1.1 时区问题

现象一:写入时间在mysql中查看,与实际时间相差几个小时,程序读出来又是正常的

关于MySQL存入的时间和取出时间不一致的解决

MySQL数据时区问题,及datetime和timestamp类型存储的差异

Time Zone:GMT,UTC,DST,CST

MySQL Config–参数system_time_zone和参数time_zone

查询系统分区:

1
show global variables like'%time_zone%'

解决方案:

1
jdbc:mysql://10.48.204.231:5002/waimai_hubble_analysis_test?useUnicode=true&useSSL=false&serverTimezone=GMT%2B8

[toc]

find

grep

linux使用find命令和grep命令查找文件和文件内容

awk

curl

如果有特殊字符,导致解析失败

  1. 用单引号引起来,这样会原封不动的使用,而不会尝试去解析

    1
    curl -k -v -X POST "https://xxx.net/livy/batches" -u 'admin:xxx!QAZ' -H "Content-Type: application/json" -H "X-Requested-By: admin" -d '{"file":"/users/jingqicao/sparkSubmission/SAMStreaming/ver-01/bin/SAM-1.0-SNAPSHOT.jar", "driverMemory": "30G", "driverCores": 4, "executorCores": 14, "executorMemory": "98G", "numExecutors": 256, "name": "SAMStreaming-ver01-1020-02", "className":"com.microsoft.sam.SAMJobRunner", "args":["ver-01-1020-02","/users/jingqicao/sparkSubmission/SAMStreaming/ver-01/bin/SAMJob_4HD.conf"]}'

nautilus

ps

查看当前进程

1
2
ps aux
ps -ef

查看进程父子关系

1
2
3
4
5
6
7
8
9
10
11
# 1. 
ps -o pid,ppid,cmd | grep xxx

# 2.
ps l | grep xxx

# 3. 保底方式
for pid in $(ps w | grep network_monitor.sh | grep -v grep | awk '{print $1}'); do
ppid=$(awk '/PPid/ {print $2}' /proc/$pid/status 2>/dev/null)
echo "PID $pid, PPID $ppid"
done