Class Notes of CSCI 515 Data Engineering and Management (Spring 2025) ===================================================================== Thursday, April 10, 2025 ----------------------- Entropy = Σ[-pj(log2pj)] for all j P(Bus) = 0.2 P(Car) = 0.4 P(Train) = 0.4 Entropy = -0.2*log0.2 - 2*0.4*log0.4 = -0.2*log0.2/log2 - 2*0.4*log0.4/log2 = -0.2*(-0.70/0.3) - 2*0.4*(-0.40/0.3) = 0.467 + 1.067 = 1.534 Gini Index = 1 – Σ(pj2) = 1 - (0.2*0.2 + 0.4*0.4*2) = 1 - 0.04 - 0.32 = 0.64 Classification Error = 1 – max{pj} = 1 - 0.4 = 0.6 Information gain( i ) = Entropy of parent table D – Σ( |k|/|n| × Entropy of each value k of subset table Si ) Gain(Entropy) = 1.571 - 5/10x0.722 - 0 = 1.21 P(Bus) = 0.8 P(Car) = 0 P(Train) = 0.2 Entropy = -0.8*log0.8 - 0.2*log0.2 = -0.8*log0.8/log2 - 0.2*log0.2/log2 = -0.8*(-0.097/0.3) - 0.2*(-0.70/0.3) = 0.259 + 0.466 = 0.722 Gini Index = 1 – Σ(pj2) = 1 - (0.8*0.8 + 0.2*0.2) = 1 - 0.64 - 0.04 = 0.32 Classification Error = 1 – max{pj} = 1 - 0.8 = 0.2 Entropy = Σ[-pj(log2pj)] for all j P(Bus) = 0.4 P(Car) = 0.3 P(Train) = 0.3 Entropy = -0.4xlog0.4 - 0.3(log0.3)*2 = -0.4x(log0.4/log2) - 2x0.3(log0.3/log2) = -0.4x-1.32 - 0.6x1.74 = 1.571 Gini Index = 1 – Σ(pj2) = 1 - (0.4*0.4 + 0.3*0.3*2) = 1 - 0.16 - 0.18 = 0.66 Classification Error = 1 – max{pj} = 1 - 0.4 = 0.6 Tuesday, March 04, 2025 ----------------------- 0.166 + 0.071 + 0.045 + 0.023 = 0.305 0.061 x 5 = 0.305 1. H (Web Hyperlink Matrix) 0 1 0 0 H = [ 0 0 1 0 ] 1/2 0 0 1/2 0 0 0 0 2. Dangling Node Fix S =H+dw (4x1) x (1x4) 0 1 0 0 0 S = [ 0 0 1 0 ] + [ 0 ] x [ 1/4 1/4 1/4 1/4 ] 1/2 0 0 1/2 0 0 0 0 0 1 0 1 0 0 0 0 0 0 S = [ 0 0 1 0 ] + [ 0 0 0 0 ] 1/2 0 0 1/2 0 0 0 0 0 0 0 0 1/4 1/4 1/4 1/4 0 1 0 0 S = [ 0 0 1 0 ] 1/2 0 0 1/2 1/4 1/4 1/4 1/4 3. Google Matrix G = αS + (1 - α)Iv 0 1 0 0 S = [ 0 0 1 0 ] 1/2 0 0 1/2 1/4 1/4 1/4 1/4 0 1 0 0 G = 0.85 x [ 0 0 1 0 ] + 1/2 0 0 1/2 1/4 1/4 1/4 1/4 1 [ 1 ] x [ 1/4 1/4 1/4 1/4 ] 1 1 0 1 0 0 = 0.85 x [ 0 0 1 0 ] + 1/2 0 0 1/2 1/4 1/4 1/4 1/4 1/4 1/4 1/4 1/4 [ 1/4 1/4 1/4 1/4 ] 1/4 1/4 1/4 1/4 1/4 1/4 1/4 1/4 3/80 71/80 3/80 3/80 [ 1/4 1/4 1/4 1/4 ] × [ 3/80 3/80 71/80 3/80 ] 37/80 3/80 3/80 37/80 1/4 1/4 1/4 1/4 = [ 0.197 abaabb => 0 => a => 1 => b => 2 => a => 1 => a => 1 => b => 2 => 3 $array[0] = 0 $array[1] = 1 $array[2] = 2 $array[3] = 3 $array[4] = 4 $total = 10 page 1 page 2 page 3 ... page 500 query = decision tree Apache: 10, 23, 379 ASP.NET: 20, 56, 379 decision tree: 10, 200, 340 file: ... ... X platform: ... yacht: ... shell> ls all note.txt -rwx------. 1 wen.chen.hu domain users 181 Jan 12 04:48 note.txt* O G E 111000000 = 700 = 111 000 000 rwxrwxrwx --- --- --- 7 0 0 O: Owner permission (you) G: Group members permission E: Everyone else permission (including the Web) shell> chmod 755 note.txt -rwxr-xr-x. 1 wen.chen.hu domain users 181 Jan 12 04:48 note.txt* 111101101 = 755 = 111 101 101 rwxrwxrwx --- --- --- 7 5 5 ----------------------------------------------------------- mysql -h undcemmysql.mysql.database.azure.com -u wenchen -p