What to compare two excel in Python?

  1. if it is about aggregate data -- copy and paste (minus)

  2. if it is about detail data -- try datacompy in python

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    import datacompy # pip install datacompy
    import pandas as pd
    import sys

    files = []
    try:
    files = [sys.argv[1], sys.argv[2]]
    sheet = int(sys.argv[3])
    except:
    raise Exception('No Input')

    df1 = pd.read_excel(files[0], sheet_name=sheet)
    df2 = pd.read_excel(files[1], sheet_name=sheet)
    join_columns = ['xxx'] # key to join

    compare = datacompy.Compare(df1,
    df2,
    join_columns=join_columns,
    rel_tol=0.01, # difference between two float <= 0.01
    df1_name='data 1',
    df2_name='data 2'
    )
    print(compare.matches()) # return bool
    print(compare.report()) # return detailed report

    what it does is joining two tables and comparing each column and each row.