Thrity thousand feet view Running a Spark application essentially requires only two roles: a Driver and Executors: The Driver is responsible for dividing the user’s application into multiple jo...
How does Spark run?
Locality sensitive hashing
Similarity Search How to find the most similar vector to a certain vector in a list of vectors? A brute force algorithm is compare each vector in the list once with similarity measurements like Euc...
What problem does Spark solve?
Before Spark, there was Hadoop Hadoop is an open source framework utilized for efficiently storing and processing large datasets. It enables the clustering of multiple computers to analyze massive...
Async programming
Why is asynchronous programming necessary? Operating system can be seen as a virtual machine (VM) in which processes exist. Processes do not need to know the exact number of cores or how much memo...
RESTful API design
RESTful API The REST specification treats all content as a resource, meaning that everything on the network is a resource, and the REST architecture operates on resources by fetching, creating, mod...
React, redux and hooks
React & React hooks Class Component vs Functional Component The following is an example of writing a class component in React: class Welcome extends React.Component { render() { return &...
Implementing a simple RPC framework
gRPC Example syntax = "proto3"; option java_multiple_files = true; option java_package = "io.grpc.hello"; option java_outer_classname = "HelloProto"; option objc_class_prefix = "HLW"; package hel...
Faster database queries
Cache Most production systems use the classic combination of MySQL and Redis. Redis acts as a front-end cache for MySQL, blocking most of the query requests for MySQL and essentially relieving the ...
Code refactoring
Refactoring is the improvement of the internal structure of software without changing its observable behavior. When to refactor When adding new features The most common time to refactor is wh...
How to design a distributed cache system?
The following are my notes from reading G.K’s system design book How to design a caching system? The caching system is a widely used technology in almost all applications today. In addition, ...