Hours
10
Expertise
state sync, RPC, NFT & DeFi
Mentor
Rongjian Lan
Timesheet
Telegram
Social
Day ONE
May 1, 2020
End Date
June 30, 2022
Presence
Active
Bio
Jacky Wang developed DxChain Network’s public blockchain focusing on storage service. He built mainnet deployment, light client implementation, watchdog monitoring, DPOS consensus, and storage contract. Jacky holds a Physics bachelor’s from Fudan University and an Electrical and Computer Engineering master’s from University of Arizona.
Jacky enjoys writing beautiful and clean code. In leisure time, he spends time singing Chinese pop karaoke, hiking Lake Chabot twice a week, and watching drama movies like “Three Billboards”.
Upcoming Plans
- Monthly Deliverables: 2021 Oct
- Keep testing stream sync protocol. Fix bugs and problems and (hopefully) deploy to testnet for further test.
- Code design, review and local test of liquidity staking in core protocol.
- White paper, pitch deck, and demo for NFT^2 project.
- Quarterly Milestones: 2021 Q4
- Have stream sync protocol tested and deployed to Mainnet.
- Product full design and implementation for Harmony liquidity staking.
- Product testnet launch for Harmony liquidity staking.
- Yearly Planning: 2022 Q1 - Q3
- Product launch for NFT^2 project
- Product launch for Harmony liquidity staking
- Resharding
Achievements in 2021 Q2-Q3
- Optimize and fix for spamming attacks from p2p network
- Optimize and migrate explorer db performance
- RPC performance optimization and fix
2021/1/24
- 20+ hours fighting pager duty calls
- https://docs.google.com/document/d/1eZX0qubm5saHa6V3ehrojIAzdq41X2AIqIOFbXHcBfk/edit
- Flooded by pagers. Handling pagers at 10 pagers / 5min
- Worked with Soph for network update, storage expansion, e.t.c.
- 10 hours on rate limiter
- Work with Rongjian and Soph on rate limiter
- Though the rate limiter at node level is not deployed, it do give us some knowledge (Learn better about the rate at node level, application patterns, e.t.c.)
- 1 hour for dfk web console research
- Analyze the request pattern from dfk. Looks like a lot can be optimized.
2021/1/17
- 20 hours spent on mainnet stuck issue
- RPC rate limiter:
- Proposing, testing Rate limiter ,
- 8 hours on stream sync
- 2 hours on migrating rate limiter parameter to harmony config.
2021/1/10
- 25 hours used for node-status issue.
- , , , , ,
- The issue is introduced by , and expose the data race issue at caching the sync results.
- Another issue exposed is the result returned when no DNS server return any result is wrong (true instead of false)
- State Sync
- 4 hours for deploying the downloader code to stressnet. https://watchdog.hmny.io/report-stn
- 8 hours for debugging, code changes
- One bug discovered from deploying is that there is no “memory” in discover. Thus if there are no valid streams, client node will keep discovering without and cooldown. So a cooldown mechanism is required in discover. Keep implementing.
2021/12/13~2021/12/19
2021/12/13
- 3 hours for devops maintenance
- Write report
- Extending storages
- Stuck nodes
- Error analysis (sync loop stuck)
- 3 hours for prometheus issue for rpc2.
- Legacy problem from
- rpc2 prometheus metrics not counted correctly
2021/12/14
- 1 hour for downloader CPU increasing issue.
- Restarted testing instances (probe node)
- Code analysis
- Updates on code, and redeploy
- Still large number of connection setup / broke. Problem is not resolved by last fix.
- 5 hours for prometheus rpc2 ()
- Added timeout mechanism at rpc at util layer.
- Added the timeout to harmony config with config layer
- Added prometheus metrics to web socket traffics.
- Trying to fix rpc2 metrics not showing issue. Failed with another implementation. Keep seeking.
2021/12/15
- 2 hours for downloader CPU issue.
- Observe no difference.
- Changed discover mechanism to see difference.
2021/12/16
- 1 hour for downloader CPU issue.
- Discover does not have large relation with CPU usage.
- CPU still fine (45% ) wait for one day more to check again.
- 2 hour to research prometheus issue
- 3 hour to implement the rate limiter for stream sync
- Implementation on-going
2021/12/17
- 1 hour for downloader CPU issue
- CPU bursted.
- Comment out code and retry
- 2.5 hours for Implementing test cases for rate limiter
- 1 hour to research into prometheus issue. (no findings)
- 3.5 hours for mainnet sync stuck issue
- Pulled logs.
- Analyzed code.
- Looks like consensus cache race issue. Multiple threads calling
UpdateBlockAndStatus
function.
2021/12/18
- 1 hour for prometheus issue.
2021/10/26
- Continue debugging for stream sync.
- The previous found suspect do impact CPU (very occasionally), but it is still not the root cause. The nodes dies after 3 days' running...
- The next suspect is... still the discovery protocol... Keep researching
- Research for design for liquidity staking
- Looked into the implementation of Lido, StaFi, and bLuna.
- Lido and StaFi almost use the same implementation. The final derivative token is a token with increasing intrinsic value (in unit of deposit token), which is not the perfect candidate for DeFi products because of Impermanent loss
- Starting to talk to Lido / StaFi team.
- Core protocol change for liquidity staking
- Discuss, track, and help with core protocol change for staking precompiles.
- Code review for https://github.com/harmony-one/harmony/pull/3906 (Finished two rounds, third round in progress)
2021/10/19
- Continue working on stream sync. Found a suspect for CPU explosion
- The problem lies in two peers sending out the INIT package in two end's streaming protocol because of the immediate discovery after stream number is lower than lower threshold. And this will fail the stream setup and result in a large number of discovery and stream setup operations (which is heavy)
- Have reproduced the issue in two nodes in different regions resulting in high CPU cost (which is a hipe, not a slow growth).
- Starting with the fix - Adding a cooldown mechanism in discovery and stream manager.
- Proceed with liquidity staking
- Code review and discussion with xiaopeng
- Discussed over signature of the precompiles, as well as the gas issue.
2021/10/08
- Working on stream sync. First looking at the potential memory / CPU leak. Running two machines running customized code on mainnet with stream sync turned on. The customized code include:
- Frequent discovery call (10s/discovery).
- Unstable stream sync protocol (injected error in stream sync, occasionally fail)
- Finished code revisit and RPC fix concerning stream sync.
- Next step is to
- Investigate the CPU / memory leak
- Spin up stressnet for reproducing the stucking short range sync.
The result remains to be observed.
- 🎙️Jacky Wang 🛡: Streaming state sync on mainnet. 50/0/0%