Week 12¶

Sets¶

Sets are collections of objects. Unlike lists they are unordered and every element of a set is unique (no repetitions)

We create sets by enclosing elements in braces:

s = {1, 2, 3}

print(s)

{1, 2, 3}

In order to convert a list or a string into a set we can use the set() function:

mylist = [1, 2, 3, 4, 3, 2, 1, 0]

myset = set(mylist)

print(myset)

{0, 1, 2, 3, 4}

t = 'mississippi'
t_set = set(t)

print(t_set)

{'s', 'i', 'p', 'm'}

s = {'a', 2, 'b', 3, 'hello'}

print(s)

{2, 3, 'a', 'hello', 'b'}

Checking if an element is in a set:

3 in s

True

'c' in s

False

Note Checking if an element is in a set is usually much faster than checking if an element is in a list:

from time import time

mylist = list(range(10**7))
myset = set(mylist)

st = time()
for n in range(10**7, 10**7+10):
    print(n, n in mylist)
print(time()-st)

10000000 False
10000001 False
10000002 False
10000003 False
10000004 False
10000005 False
10000006 False
10000007 False
10000008 False
10000009 False
1.182452917098999

st = time()
for n in range(10**7, 10**7+10):
    print(n, n in myset)
print(time()-st)

10000000 False
10000001 False
10000002 False
10000003 False
10000004 False
10000005 False
10000006 False
10000007 False
10000008 False
10000009 False
0.0020880699157714844

for loops work for sets:

print(s)

{2, 3, 'a', 'hello', 'b'}

for x in s:
    print(x)

2
3
a
hello
b

Set operations¶

s1 = {'a', 'b', 'c', 'd'}
s2 = {'c', 'd', 'e', 'f'}

Union of sets:

s = s1 | s2

print(s)

{'e', 'a', 'd', 'f', 'b', 'c'}

Intersection of sets:

s = s1 & s2
print(s)

{'c', 'd'}

Difference of sets:

s = s1-s2
print(s)

{'a', 'b'}

s = s2-s1
print(s)

{'f', 'e'}

Symmetric difference (elements that are in either set but not in their intersection):

s = s1^s2
print(s)

{'e', 'a', 'f', 'b'}

Adding an element to a set:

print(s1)

{'a', 'd', 'c', 'b'}

s1.add('x')

print(s1)

{'a', 'd', 'b', 'c', 'x'}

Removing an element from a set:

s1.discard('a')

print(s1)

{'d', 'b', 'c', 'x'}

The pop() function removes a random element from a set and returns this element.

print(s1)

{'d', 'b', 'c', 'x'}

x = s1.pop()

print(s1)

{'b', 'c', 'x'}

print(x)

d

Checking is a set is a subset of another set:

s1 = {'a', 'b', 'c'}
s2 = {'a', 'b'}

s1 < s2

False

s2 < s1

True

Note. Sets are mutable:

s1 = {'a', 'b', 'c'}
s2 = s1

s2.discard('a')

print(s2)

{'c', 'b'}

print(s1)

{'c', 'b'}

Note. Elements of a set must be non-mutable objects:

s1 = {'a', 1, [1,2]}

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-46-6091d6037810> in <module>()
----> 1 s1 = {'a', 1, [1,2]}

TypeError: unhashable type: 'list'

s1 = {'a', 1, tuple([1,2])}

print(s1)

{'a', 1, (1, 2)}

s2 = frozenset(s1)

print(s2)

frozenset({'a', 1, (1, 2)})

s = {s2}

print(s)

{frozenset({'a', 1, (1, 2)})}

Note: Empty braces {} denote the empty dictionary. To create an empty set use set()

s = set()

print(s)

set()

s.add(1)

print(s)

{1}

s1 = {'a', 'b'}
s2 = {'c', 'd'}
s = s1 & s2

print(s)

set()

Project 9: PageRank¶

from IPython.display import Image
Image("web.png", width=300)

System of equations for PageRank computations in the above network:

$$ \begin{cases} x_1 - x_2 - \frac{1}{2}x_4 = 0 \\ x_2 - \frac{1}{3} x_1 - \frac{1}{2}x_3 - \frac{1}{2}x_4 = 0 \\ x_3 - \frac{2}{3}x_1 = 0 \\ x_4 - \frac{1}{2}x_3= 0 \\ x_1 + x_2 + x_3 + x_4 = 1 \\ \end{cases} $$

Matrix equation:

$$ \begin{bmatrix} \ \ 1 & -1 & \ \ 0 & -\frac{1}{2} \\ -\frac{1}{3} & \ \ 1 & -\frac{1}{2} & - \frac{1}{2} \\ -\frac{2}{3} & \ \ 0 & \ \ 1 & \ \ 0 \\ \ \ 0 & \ \ 0 & -\frac{1}{2} &\ \ 1 \\ \ \ 1 & \ \ 1 & \ \ 1 & \ \ 1 \\ \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \\ \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 1 \\ \end{bmatrix} $$

Solving systems of linear equations:¶

The numpy function np.linalg.solve(A, b) gives a solution of the matrix equation $Ax = b$

import numpy as np

A = np.array([[1, 1], [1, -1]])
b = np.array([2,3])

np.linalg.solve(A, b)

array([ 2.5, -0.5])

NOte: This function works only if A is an square invertible matrix:

A = np.array([[1,1], [1,1]])
b = np.array([1,1])

np.linalg.solve(A, b)

---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
<ipython-input-63-aad9f3567b18> in <module>()
----> 1 np.linalg.solve(A, b)

/Users/bb1/anaconda/lib/python3.6/site-packages/numpy/linalg/linalg.py in solve(a, b)
    382     signature = 'DD->D' if isComplexType(t) else 'dd->d'
    383     extobj = get_linalg_error_extobj(_raise_linalgerror_singular)
--> 384     r = gufunc(a, b, signature=signature, extobj=extobj)
    385 
    386     return wrap(r.astype(result_t, copy=False))

/Users/bb1/anaconda/lib/python3.6/site-packages/numpy/linalg/linalg.py in _raise_linalgerror_singular(err, flag)
     88 
     89 def _raise_linalgerror_singular(err, flag):
---> 90     raise LinAlgError("Singular matrix")
     91 
     92 def _raise_linalgerror_nonposdef(err, flag):

LinAlgError: Singular matrix

Due to rounding errors this function may not work even for invertible matrices:

A = np.array([[1,1], [1,1.00000000000000000000001]])
b = np.array([1,1])

np.linalg.solve(A, b)

---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
<ipython-input-65-aad9f3567b18> in <module>()
----> 1 np.linalg.solve(A, b)

/Users/bb1/anaconda/lib/python3.6/site-packages/numpy/linalg/linalg.py in solve(a, b)
    382     signature = 'DD->D' if isComplexType(t) else 'dd->d'
    383     extobj = get_linalg_error_extobj(_raise_linalgerror_singular)
--> 384     r = gufunc(a, b, signature=signature, extobj=extobj)
    385 
    386     return wrap(r.astype(result_t, copy=False))

/Users/bb1/anaconda/lib/python3.6/site-packages/numpy/linalg/linalg.py in _raise_linalgerror_singular(err, flag)
     88 
     89 def _raise_linalgerror_singular(err, flag):
---> 90     raise LinAlgError("Singular matrix")
     91 
     92 def _raise_linalgerror_nonposdef(err, flag):

LinAlgError: Singular matrix

The numpy function np.linalg.lstsq(A, b) is computing least square solutions of a matrix equation Ax = b. This will work for any matrix.

A = np.array([[1, 2], [3, 4], [5, 6]])
b = np.array([1, 1, 1])

sol = np.linalg.lstsq(A, b)
print(sol)

(array([-1.,  1.]), array([  1.83244870e-31]), 2, array([ 9.52551809,  0.51430058]))

This function returns a tuple of elements with several values:

The first element is a numpy array with least square solutions of the matrix equation:

sol[0]

array([-1.,  1.])

The second element if the distance between Ax and b, where x is the computed solution:

sol[1]

array([  1.83244870e-31])

The third element is the rank of the matrix A:

sol[2]

2

The last element are singular values of the matrix A:

sol[3]

array([ 9.52551809,  0.51430058])

Application: computation of rankings in our sample network:

A = np.array([[1, -1, 0, -1/2], [-1/3, 1, -1/2, -1/2], [-2/3, 0, 1, 0], [0,0,-1/2, 1], [1, 1, 1, 1]])
b = np.array([0,0,0,0,1])

print(np.linalg.lstsq(A, b))

(array([ 0.35294118,  0.29411765,  0.23529412,  0.11764706]), array([  2.44447803e-33]), 4, array([ 2.07871928,  1.75166294,  1.36867082,  1.1370571 ]))