Jacky Wang

Hours

Expertise

state sync, RPC, NFT & DeFi

Mentor

Rongjian Lan

Timesheet

https://t.me/JackyXCoder

Social

Day ONE

May 1, 2020

End Date

June 30, 2022

Presence

Active

Bio

Jacky Wang developed DxChain Network’s public blockchain focusing on storage service. He built mainnet deployment, light client implementation, watchdog monitoring, DPOS consensus, and storage contract. Jacky holds a Physics bachelor’s from Fudan University and an Electrical and Computer Engineering master’s from University of Arizona.

Jacky enjoys writing beautiful and clean code. In leisure time, he spends time singing Chinese pop karaoke, hiking Lake Chabot twice a week, and watching drama movies like “Three Billboards”.

Upcoming Plans

Monthly Deliverables: 2021 Oct

Keep testing stream sync protocol. Fix bugs and problems and (hopefully) deploy to testnet for further test.
Code design, review and local test of liquidity staking in core protocol.
White paper, pitch deck, and demo for NFT^2 project.

Quarterly Milestones: 2021 Q4

Have stream sync protocol tested and deployed to Mainnet.
Product full design and implementation for Harmony liquidity staking.
Product testnet launch for Harmony liquidity staking.

Yearly Planning: 2022 Q1 - Q3

Product launch for NFT^2 project
Product launch for Harmony liquidity staking
Resharding

Achievements in 2021 Q2-Q3

Optimize and fix for spamming attacks from p2p network
Optimize and migrate explorer db performance
RPC performance optimization and fix

2021/1/24

20+ hours fighting pager duty calls

https://docs.google.com/document/d/1eZX0qubm5saHa6V3ehrojIAzdq41X2AIqIOFbXHcBfk/edit
Flooded by pagers. Handling pagers at 10 pagers / 5min
Worked with Soph for network update, storage expansion, e.t.c.

10 hours on rate limiter

Work with Rongjian and Soph on rate limiter
Embed GitHub
Though the rate limiter at node level is not deployed, it do give us some knowledge (Learn better about the rate at node level, application patterns, e.t.c.)

1 hour for dfk web console research

Analyze the request pattern from dfk. Looks like a lot can be optimized.

2021/1/17

20 hours spent on mainnet stuck issue

RPC rate limiter: Embed GitHub
Proposing, testing Rate limiter Embed GitHub，

8 hours on stream sync
2 hours on migrating rate limiter parameter to harmony config.

2021/1/10

25 hours used for node-status issue.

Embed GitHub, , , , ,
The issue is introduced by Embed GitHub, and expose the data race issue at caching the sync results.
Another issue exposed is the result returned when no DNS server return any result is wrong (true instead of false)

State Sync

4 hours for deploying the downloader code to stressnet. https://watchdog.hmny.io/report-stn
8 hours for debugging, code changes
One bug discovered from deploying is that there is no “memory” in discover. Thus if there are no valid streams, client node will keep discovering without and cooldown. So a cooldown mechanism is required in discover. Keep implementing.

2021/12/13~2021/12/19

2021/12/13

3 hours for devops maintenance

Write report
Extending storages
Stuck nodes
Error analysis (sync loop stuck)

3 hours for prometheus issue for rpc2.

Legacy problem from Embed GitHub
rpc2 prometheus metrics not counted correctly

2021/12/14

1 hour for downloader CPU increasing issue.

Restarted testing instances (probe node)
Code analysis
Updates on code, and redeploy
Still large number of connection setup / broke. Problem is not resolved by last fix.

5 hours for prometheus rpc2 (Embed GitHub)

Added timeout mechanism at rpc at util layer.
Added the timeout to harmony config with config layer
Added prometheus metrics to web socket traffics.
Trying to fix rpc2 metrics not showing issue. Failed with another implementation. Keep seeking.

2021/12/15

2 hours for downloader CPU issue.

Observe no difference.
Changed discover mechanism to see difference.

2021/12/16

1 hour for downloader CPU issue.

Discover does not have large relation with CPU usage.
CPU still fine (45% ) wait for one day more to check again.

2 hour to research prometheus issue
3 hour to implement the rate limiter for stream sync

Implementation on-going

2021/12/17

1 hour for downloader CPU issue

CPU bursted.
Comment out code and retry

2.5 hours for Implementing test cases for rate limiter
1 hour to research into prometheus issue. (no findings)
3.5 hours for mainnet sync stuck issue

Pulled logs.
Analyzed code.
Looks like consensus cache race issue. Multiple threads calling UpdateBlockAndStatus function.

2021/12/18

1 hour for prometheus issue.

2021/10/26

Continue debugging for stream sync.

The previous found suspect do impact CPU (very occasionally), but it is still not the root cause. The nodes dies after 3 days' running...
The next suspect is... still the discovery protocol... Keep researching

Research for design for liquidity staking

Looked into the implementation of Lido, StaFi, and bLuna.
Lido and StaFi almost use the same implementation. The final derivative token is a token with increasing intrinsic value (in unit of deposit token), which is not the perfect candidate for DeFi products because of Impermanent loss
Starting to talk to Lido / StaFi team.

Core protocol change for liquidity staking

Discuss, track, and help with core protocol change for staking precompiles.
Code review for https://github.com/harmony-one/harmony/pull/3906 (Finished two rounds, third round in progress)

2021/10/19

Continue working on stream sync. Found a suspect for CPU explosion

The problem lies in two peers sending out the INIT package in two end's streaming protocol because of the immediate discovery after stream number is lower than lower threshold. And this will fail the stream setup and result in a large number of discovery and stream setup operations (which is heavy)
Have reproduced the issue in two nodes in different regions resulting in high CPU cost (which is a hipe, not a slow growth).
Starting with the fix - Adding a cooldown mechanism in discovery and stream manager.

Proceed with liquidity staking

Code review and discussion with xiaopeng
Discussed over signature of the precompiles, as well as the gas issue.

2021/10/08

Working on stream sync. First looking at the potential memory / CPU leak. Running two machines running customized code on mainnet with stream sync turned on. The customized code include:

Frequent discovery call (10s/discovery).
Unstable stream sync protocol (injected error in stream sync, occasionally fail)

The result remains to be observed.

Finished code revisit and RPC fix concerning stream sync.
Next step is to

Investigate the CPU / memory leak
Spin up stressnet for reproducing the stucking short range sync.

🎙️Jacky Wang 🛡: Streaming state sync on mainnet. 50/0/0%