Thursday, July 30, 2015

13. Breadth-First Search (BFS) (notes)


Youtube link: https://www.youtube.com/watch?v=s-CYnVz-uh4

Notes:

1. in BFS, need to check if a node is already seen before or not.
2. BFS can create shortest path (by using parent arrays)

Saturday, July 18, 2015

Tun/Tap interface tutorial (zz)


http://backreference.org/2010/03/26/tuntap-interface-tutorial/

Thursday, July 16, 2015

Introducing Linux Network Namespaces(zz)



http://blog.scottlowe.org/2013/09/04/introducing-linux-network-namespaces/

Saturday, July 11, 2015

Leacture 4: heaps and heap sort


https://www.youtube.com/watch?v=B7hVxCmfPtM

Notes: any array can be views as a heap. Meaning image it as a balanced binary tree. With left(i)=2i, right(i)=2i+1, parent(i)=i/2 etc.

max heap means the heap also has a "max" property. Root is bigger than its children.

ma_heapify(A, i), bottom up, recursive. All the leap nodes trivially satisfy the requirement.

Sunday, July 5, 2015

Linux start-up process


http://www.cromwell-intl.com/unix/linux-boot.html
http://luv.asn.au/overheads/linux-startup.html
http://www.yolinux.com/TUTORIALS/LinuxTutorialInitProcess.html
http://www.comptechdoc.org/os/linux/startupman/linux_surcsysinit.html

Notifier chain update: API changes


http://lwn.net/Articles/171560/

9. Table Doubling, Karp-Rabin


https://www.youtube.com/watch?v=BRO7mVIFt08

Lecture Notes: http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/lecture-videos/MIT6_006F11_lec09.pdf


1. in hashing, suppose n is the number of keys and m is the number of slots in the hash table. We want m to similar size to n. But n may grow or shrink, so we need to grow and shrink table to get better performance (and rehash). When grow, double the size. When shrink, wait until n is m/4 and then decrease m by half.

Amortized cost, like paying rent $1500/month = $50/day

2. Karp-Rabin algorithm: rolling hash. Lots of common bits between two hash. Easy to append one bit and remove the first bit.

Saturday, July 4, 2015

String Matching with Finite Automata


https://www.youtube.com/watch?v=M_XpGQyyqIQ

Notes:

1. Finite automata is a state machine. It has start and end state(s). A three letter pattern has four states (s0 to s3). Matched one more letter means state changes. If not match, it will go back to one of the appropriate state. ( Instead, in naive matching, if one letter does not match, it will go back to s0).

2. Finite automata is used to parse in compiler and implement regular expression.

3. State transition table. ( column is alphabet, row is states).

4. Matching time is O(N), preprocessing time depends on implementation.

Knuth-Morris-Pratt algorithm for String Matching


https://www.youtube.com/watch?v=kBW6oPaVjq0

Need a prefix table.

The preprocessing is very similar to string matching.
One is to match pattern to itself. The other is to match pattern to text.
Time complexity: O(m) + O(n)  ( O(m) for preprocessing and O(n) for matching).

Scalability 101 and dropbox


Scalability from Harvard :

https://www.youtube.com/watch?v=-W9F__D3oY4

Notes:
1. scalability 101.
2. issues when having multiple servers, share data, etc.


Scale dropbox:

https://www.youtube.com/watch?v=PE4gwstWhmc

Notes:
1. heavy write V.S heavy read
2. no need to internet scale at beginning.

Friday, July 3, 2015

computational complexity: P, NP, EXP, R


MIT opencouseware : https://www.youtube.com/user/MIT/search?query=algorithm

23. Computational Complexity


--------|---------|------------|-----------|------------------------>
<--p-->
<-----np------>
                 NP-CMP
<---------------exp----->
<------------------------- --------="" r="">

1. Tetris is NP-complete
2. Reduction: mapping a new problem A  to old problem B.


Notes about "Scaling Memcache at Facebook" paper (NSDI 2013)


Paper link: https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/nishtala

1. memcached can reduce the latency. Memcached is a in-memory key-value store. It is similar to a cache. When there is a hit, the data just return from memory. When miss, the web server can go back to slow path by using SQL query which hit disk.

2. They provide a "lease" feature to reduce the work load and solve following two problem:
stale sets and thundering herds.. Thundering herds is a hot key which is read and written many times in a short time. By lease a token to client, memcached can refuse the writes if it is too frequently. (Btw, google file system has a "lease" concept too, which is used to discover failure servers).

3. The memcache is scaled from single cluster, to single region, to multiple regions.  They have different issues in different scale sizes. The main challenge is huge data, that is billions of requests per second, trillions of items and  a billion users around the world.

Notes about Google file system paper (SOSP 2003)


Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. "The Google file system." ACM SIGOPS operating systems review. Vol. 37. No. 5. ACM, 2003.

Paper link: http://research.google.com/archive/gfs.html

1. As another paper from google, one of successful model in google is to use many cheap commodity hardware to achieve high scale and good performance with low cost. Since the number of machines are high (hundreds or thousands) and the hardware is cheap, that means the hardware failure is norm. The core logic for HA and fault tolerance is in software to deal with these hardware failures. This is in contrast to use reliable but expensive hardware, which cost more in hardware and has less demand in software.

To deal with hardware failure in design phase. The basic idea it replica and quick start (with logging). Google file system (GFS) uses one master server and many chunk servers. Each chunk of data is duplicate on multiple chunk servers. To handle master failure, GFS uses a shadow master (which is slightly delayed in time from the master) and can replace the master by replaying the action logs from master.

2. To achieve high performance with a simple design, GFS does not stick to existing standard file system interface (such as POSIX interface). GFS designers analyzed the work load and discard some normal file system feature and APIs. The benefit is the new interface is cleaner and easier to scale. The drawback is the application has to be re-written to use new APIs.

First example of such design is they use 64MB chunk size, which is much bigger than the one in a normal file system. The rationale is that the most work loads are using huge files. Bigger chunk size means less metadata and easier to scale. The drawback is internal fragmentation. Also, GFS uses checksums for each 64KB data inside the chunk to detect data corruption with finer granularity.

Another example is the modification to append to file API. The new append does not allow the caller to pass in the offset. Instead, the server (master) will decide that, and returns offset to the caller. This makes concurrent appending from many clients much easier to implement.

Yet another example is the GFS discards the inode concept. The directory metadata does not contains the files inside the directory. The goal is to avoid the bottleneck when multiple files need to be modified in one directory.

3. Lesson learned:
 a) need to understand workload first, and system design can be tailored (optimized) to a specific kind of workload.
 b) standard (old) API can be modified to satisfy new requirements. In this case, there is no backward compatibility issue.