[toc]
文件/目录差异对比法
当我们进行代码审计或校验备份时,往往需要检查原始与目标目录的一致性;
Python的标准库已经自带了满足此需求的模块filecmp;
filecmp可以实现文件、目录遍历子目录的差异对比功能;
比如报告中输出目录目录比原始多出的文件或子目录;
即使文件同名也会判断是否为同一个文件(内容级比对)等;
python2.3 或更高版本默认自带fileemp模块,无需额外安装;
模块常用方法说明
filecmp 提供了三个操作方法,分别为cmp(单文件对比)、cmpfiles(多文件对比)
dircmp(目录对比)
| 单文件对比,采用filecmp.cmp(f1,f2[,shallow])方法,比较文件名为f1和f2的文件 |
| 相同反回True,不相同返回False,shallow默认为True,意思是只根据os.stat()方法 |
| 返回的文件基本信息进行对比,比如最后访问时间、修改时间、状态改变时间等 |
| 忽略对文件内容的对比。当shallow为False时,则os.stat()与文件内容同时进行校验 |
| |
| ➜ test echo "good user" > f1 |
| ➜ test echo "good user" > f3 |
| ➜ test echo "good user2" > f2 |
| In [2]: import filecmp |
| |
| In [3]: filecmp.cmp("/Users/macbookdzsbe/Desktop/MyPython/test/f1","/Users/macbookdzsbe/Desktop/MyPython/test/f3") |
| Out[3]: True |
| |
| In [4]: filecmp.cmp("/Users/macbookdzsbe/Desktop/MyPython/test/f1","/Users/macbookdzsbe/Desktop/MyPython/test/f2") |
| Out[4]: False |
dir1 与 dir2目录中指定文件清单对比
两目录下文件的 md5 信息如下
两目录下文件的 md5 信息如下,其中f1、f2匹配; f3不匹配
f4,f5对应的目录中不存在无法比较
| [root@test dir2] |
| 1a9dbd408f626389539e9feb1d234df7 ../dir1/f1 |
| 235dce31eebce0213322102f698bc249 ../dir1/f2 |
| 1a9dbd408f626389539e9feb1d234df7 ../dir1/f3 |
| d41d8cd98f00b204e9800998ecf8427e ../dir1/f5 |
| [root@test dir2] |
| 1a9dbd408f626389539e9feb1d234df7 f1 |
| 235dce31eebce0213322102f698bc249 f2 |
| 68e51baf228d447b199a75268dd5634b f3 |
| 764c976782dec989b043d08714ec21a5 f4 |
| |
| In [4]: import filecmp |
| |
| In [5]: filecmp.cmpfiles("/root/test/dir1", "/root/test/dir2",['f1','f2','f3','f4','f5']) |
| Out[5]: (['f1', 'f2'], ['f3'], ['f4', 'f5']) |
目录对比,能过dircmp(a,b[,ingore[,hide]]) 类创建一个目录比较对象;
其中a和b是参加比较的目录名。ignore代表文件名忽略的列表,并默认为[‘RCS’,’CVS’,’’tags];
hide代表隐藏的列表,默认为[os.curdir, os.pardir]。dircmp类可以获得目录比较的详细信息;
如只有在a目录中包括的文件、a与b都存在的子目录、匹配的文件等,同时支持递归;
dircmp提供了三个输出报的方法
对比dir1与dir2目录差异
| [root@test test] |
| mkdir: 已创建目录 "dir1" |
| mkdir: 已创建目录 "dir1/a" |
| mkdir: 已创建目录 "dir1/a/a1" |
| mkdir: 已创建目录 "dir1/a/b" |
| mkdir: 已创建目录 "dir1/a/b/b1" |
| mkdir: 已创建目录 "dir1/a/b/b2" |
| mkdir: 已创建目录 "dir1/a/b/b3" |
| mkdir: 已创建目录 "dir1/f1" |
| mkdir: 已创建目录 "dir1/f2" |
| mkdir: 已创建目录 "dir1/f3" |
| mkdir: 已创建目录 "dir1/f4" |
| [root@test test] |
| [root@test test] |
| dir1/ |
| ├── a |
| │ ├── a1 |
| │ └── b |
| │ ├── b1 |
| │ ├── b2 |
| │ └── b3 |
| ├── f1 |
| ├── f2 |
| ├── f3 |
| ├── f4 |
| └── test.py |
| [root@tf-test test] |
| mkdir: 已创建目录 "dir2" |
| mkdir: 已创建目录 "dir2/a" |
| mkdir: 已创建目录 "dir2/a/a1" |
| mkdir: 已创建目录 "dir2/a/b" |
| mkdir: 已创建目录 "dir2/a/b/b1" |
| mkdir: 已创建目录 "dir2/a/b/b2" |
| mkdir: 已创建目录 "dir2/a/b/b3" |
| mkdir: 已创建目录 "dir2/aa{" |
| mkdir: 已创建目录 "dir2/aa{/aa1}" |
| mkdir: 已创建目录 "dir2/f1" |
| mkdir: 已创建目录 "dir2/f2" |
| mkdir: 已创建目录 "dir2/f3" |
| mkdir: 已创建目录 "dir2/f5" |
| [root@tf-test test] |
| [root@tf-test test] |
| dir2/ |
| ├── a |
| │ ├── a1 |
| │ └── b |
| │ ├── b1 |
| │ ├── b2 |
| │ └── b3 |
| ├── aa{ |
| │ └── aa1} |
| ├── f1 |
| ├── f2 |
| ├── f3 |
| ├── f5 |
| └── test.py |
| [root@test test] |
| |
| |
| |
| import filecmp |
| a = "/root/test/dir1" |
| b = "/root/test/dir2" |
| |
| dirobj=filecmp.dircmp(a,b,['test.y']) |
| |
| |
| dirobj.report() |
| dirobj.report_partial_closure() |
| dirobj.report_full_closure() |
| |
| print "left_list:" + str(dirobj.left_list) |
| print "right_list" + str(dirobj.right_list) |
| print "common:" + str(dirobj.common) |
| print "left_only:" + str(dirobj.left_only) |
| print "right_only:" + str(dirobj.right_only) |
| print "common_dirs:" + str(dirobj.common_dirs) |
| print "common_files:" + str(dirobj.common_files) |
| print "common_funny:" + str(dirobj.common_funny) |
| print "same_file:" + str(dirobj.same_files) |
| print "diff_files:" + str(dirobj.diff_files) |
| print "funny_files:" + str(dirobj.funny_files) |
| diff /root/test/dir1 /root/test/dir2 |
| Only in /root/test/dir1 : ['f4'] |
| Only in /root/test/dir2 : ['aa{', 'f5'] |
| Identical files : ['test.py'] |
| Common subdirectories : ['a', 'f1', 'f2', 'f3'] |
| diff /root/test/dir1 /root/test/dir2 |
| Only in /root/test/dir1 : ['f4'] |
| Only in /root/test/dir2 : ['aa{', 'f5'] |
| Identical files : ['test.py'] |
| Common subdirectories : ['a', 'f1', 'f2', 'f3'] |
| |
| diff /root/test/dir1/a /root/test/dir2/a |
| Common subdirectories : ['a1', 'b'] |
| |
| diff /root/test/dir1/f1 /root/test/dir2/f1 |
| |
| diff /root/test/dir1/f2 /root/test/dir2/f2 |
| |
| diff /root/test/dir1/f3 /root/test/dir2/f3 |
| diff /root/test/dir1 /root/test/dir2 |
| Only in /root/test/dir1 : ['f4'] |
| Only in /root/test/dir2 : ['aa{', 'f5'] |
| Identical files : ['test.py'] |
| Common subdirectories : ['a', 'f1', 'f2', 'f3'] |
| |
| diff /root/test/dir1/a /root/test/dir2/a |
| Common subdirectories : ['a1', 'b'] |
| |
| diff /root/test/dir1/a/a1 /root/test/dir2/a/a1 |
| |
| diff /root/test/dir1/a/b /root/test/dir2/a/b |
| Common subdirectories : ['b1', 'b2', 'b3'] |
| |
| diff /root/test/dir1/a/b/b1 /root/test/dir2/a/b/b1 |
| |
| diff /root/test/dir1/a/b/b2 /root/test/dir2/a/b/b2 |
| |
| diff /root/test/dir1/a/b/b3 /root/test/dir2/a/b/b3 |
| |
| diff /root/test/dir1/f1 /root/test/dir2/f1 |
| |
| diff /root/test/dir1/f2 /root/test/dir2/f2 |
| |
| diff /root/test/dir1/f3 /root/test/dir2/f3 |
| left_list:['a', 'f1', 'f2', 'f3', 'f4', 'test.py'] |
| right_list['a', 'aa{', 'f1', 'f2', 'f3', 'f5', 'test.py'] |
| common:['a', 'f1', 'f2', 'f3', 'test.py'] |
| left_only:['f4'] |
| right_only:['aa{', 'f5'] |
| common_dirs:['a', 'f1', 'f2', 'f3'] |
| common_files:['test.py'] |
| common_funny:[] |
| same_file:['test.py'] |
| diff_files:[] |
| funny_files:[] |
校验源与备份目录差异
有时候我们无法确认备份目录与源目录文件是否保持一致,包括源目录中的新文件或目录;
更新文件或目录 有无成功同步,定期进行校验,没有成功则希望有针对性地进行补备份;
本未例使用了filecmp模块的left_only、diff_files方法递归获取目录的更新项;
再通过shuti.copyfile、 os.makedirs 方法对更新项进行复制,最终保持一致状态;
| ➜ test cat dir_contrast.py |
| |
| |
| import os, sys |
| import filecmp |
| import re |
| import shutil |
| import difflib |
| holderlist = [] |
| |
| def compareme(dir1,dir2): |
| dircomp=filecmp.dircmp(dir1,dir2) |
| only_in_one=dircomp.left_only |
| diff_in_one=dircomp.diff_files |
| dirath=os.path.abspath(dir1) |
| |
| [holderlist.append(os.path.abspath(os.path.join(dir1,x))) for x in only_in_one] |
| [holderlist.append(os.path.abspath(os.path.join(dir1,x))) for x in diff_in_one] |
| if len (dircomp.common_dirs) > 0: |
| for item in dircomp.common_dirs: |
| compareme(os.path.abspath(os.path.join(dir1, item)), \ |
| os.path.abspath(os.path.join(dir2,item))) |
| return holderlist |
| |
| def main(): |
| if len(sys.argv) >2: |
| dir1 = sys.argv[1] |
| dir2 = sys.argv[2] |
| else: |
| print "Usage:", sys.argv[0], "datadir backupdir" |
| sys.exit() |
| |
| source_files = compareme(dir1,dir2) |
| dir1 = os.path.abspath(dir1) |
| if not dir2.endswith('/'): dir2=dir2+'/' |
| dir2=os.path.abspath(dir2) |
| destination_files=[] |
| createdir_bool = False |
| |
| for item in source_files: |
| destination_dir= re.sub(dir1,dir2, item) |
| |
| destination_files.append(destination_dir) |
| if os.path.isdir(item): |
| if not os.path.exists(destination_dir): |
| os.makedirs(destination_dir) |
| createdir_bool = True |
| |
| if createdir_bool: |
| destination_files = [] |
| source_files=[] |
| source_files=compareme(dir1,dir2) |
| for item in source_files: |
| destination_dir = re.sub(dir1,dir2, item) |
| destination_files.append(destination_dir) |
| |
| print "update item:" |
| print source_files |
| copy_pair = zip(source_files,destination_files) |
| |
| for item in copy_pair: |
| if os.path.isfile(item[0]): |
| shutil.copyfile(item[0], item[1]) |
| |
| if __name__ == '__main__': |
| main() |