Class Notes of CSCI 515 Data Engineering and Management (Spring 2024) ===================================================================== Thursday, April 04, 2024 -------------------------- log_2(x) = log_10(x) / log_10(2) Information gain( i ) = Entropy of parent table D – Σ( |k|/|n| × Entropy of each value k of subset table S_i) Entropy Gain(owner) = 0.722 - (3 / 5) x 0.918 = 0.722 - 0.5508 = 0.1712 Gain(gender) = 0.722 - 2 / 5 = 0.722 - 0.4 = 0.322 Classification Gain(Travel) = 0.60 - 5/10 x 0.20 = 0.60 - 0.10 = 0.50 Gini Gain(Travel) = 0.66 - 5/10 x 0.32 = 0.66 - 0.16 = 0.50 Entropy Gain(Travel) = 1.571 - 5/10 x 0.722 = 1.571 - 0.361 = 1.21 P(bus) = 4 / 5 = 0.8 P(car) = 0 / 5 = 0.0 p(train) = 1 / 5 = 0.2 Classification Error = 1 – max{p_j} = 1 - 0.8 = 0.2 Gini Index = 1 – Σ(p_j^2) for all j = 1 - (0.8x0.8 + 0.2x0.2) = 1 - 0.68 = 0.32 Entropy = Σ[-p_j(log_2(p_j))] for all j = -0.8xlog(0.8)/log(2) - 0.2xlog(0.2)/log(2) = 0.8x0.097/0.3 + 0.2x0.7/0.3 = 0.259 + 0.467 = 0.72 P(bus) = 4 / 10 = 0.4 P(car) = 3 / 10 = 0.3 P(train) = 3 / 10 = 0.3 Classification Error = 1 – max{p_j} = 1 - 0.4 = 0.6 Gini Index = 1 – Σ(p_j^2) for all j = 1 - (0.4x0.4 + 0.3x0.3) = 1 - 0.34 = 0.66 Entropy = Σ[-p_j(log_2(p_j))] for all j Entropy = -0.4x(log_2(0.4)) + -0.3x(log_2(0.3)) + -0.3x(log_2(0.3)) = -0.4xlog(0.4)/log(2) + -0.3xlog(0.3)/log(2) + -0.3xlog(0.3)/log(2) = -0.4x(-0.4)/0.3 + -0.3x(-0.52)/0.3 + -0.3x(-0.52)/0.3 = 0.533 + 1.04 = 1.571 Entropy = Σ[-p_j(log_2(p_j))] for all j Gini Index = 1 – Σ(p_j^2) for all j Classification Error = 1 – max{p_j} Thursday, February 22, 2024 -------------------------- 0 1 0 0 H = [ 0 0 1 0 ] 1/2 0 0 1/2 0 0 0 0 S = H + dW 0 1 0 0 0 = [ 0 0 1 0 ] + [ 0 ] x [ 1/4 1/4 1/4 1/4 ] 1/2 0 0 1/2 0 0 0 0 0 1 0 1 0 0 0 0 0 0 = [ 0 0 1 0 ] + [ 0 0 0 0 ] 1/2 0 0 1/2 0 0 0 0 0 0 0 0 1/4 1/4 1/4 1/4 0 1 0 0 = [ 0 0 1 0 ] 1/2 0 0 1/2 1/4 1/4 1/4 1/4 G = αS + (1 - α)Iv 0 1 0 0 = 0.85 x [ 0 0 1 0 ] + 1/2 0 0 1/2 1/4 1/4 1/4 1/4 1 0.15 x [ 1 ] x [0.25 0.25 0.25 0.25 ] 1 1 3/80 71/80 3/80 3/80 = [ 3/80 3/80 71/80 3/80 ] 37/80 3/80 3/80 37/80 1/4 1/4 1/4 1/4 P1 = [ 1/4 1/4 1/4 1/4 ] x 3/80 71/80 3/80 3/80 [ 3/80 3/80 71/80 3/80 ] 37/80 3/80 3/80 37/80 1/4 1/4 1/4 1/4 = [ 1/4x(3/80+3/80+37/80+1/4) ... ] = [ 0.197 ... ] (1x4) x (4x4) => (1x4) π(k) = [ 0.21 0.26 0.31 0.21 ] ≈ π (PageRank vector) Tuesday, February 13, 2024 -------------------------- | 0 | --> a the has | 1 | --> if be from | 2 | --> then is so | 3 | --> on and or at ... Tuesday, January 30, 2024 ------------------------- Slide 4.3 --------- shell> cat test.php shell> php test.php 5 $x Slide 4.6 --------- $array[0] = 0 $array[1] = 1 $array[2] = 2 $array[3] = 3 $array[4] = 4 $total = 0 = 0 + 1 + 2 + 3 + 4 = 15 Slide 4.6 --------- rwx rwx rwx --- --- --- S G O r: read w: write x: execute S: Self G: Group O: Others (the Web) shell> chmod 755 test.html S G O --- --- --- rwx rwx rwx --- --- --- 111 101 101 (1*2**2 + 0*2**1 + 1*2**0 = 5) --- --- --- 7 5 5