设为首页收藏本站

ZMX - IT技术交流论坛 - 无限Perfect,追求梦想 - itzmx.com

 找回密码
 注册论坛

QQ登录

只需一步,快速开始

新浪微博账号登陆

只需一步,快速开始

用百度帐号登录

只需两步,快速登录

搜索
查看: 90|回复: 0

第三位来询问tracker问题的,应该是问为什么服务器udp下行流量>上行,小樱邮件解答80K连接数每秒并发数据包

[复制链接]
 成长值: 765

签到天数: 5241 天

[LV.Master]伴坛终老

发表于 2026/5/6 08:33 | 显示全部楼层 |阅读模式 |Google Chrome 147.0.0.0|Windows 10
天涯海角搜一下: 百度 谷歌 360 搜狗 有道 雅虎 必应 即刻
第三位来询问tracker问题的,应该是问为什么服务器udp下行流量>上行,小樱邮件解答80K连接数每秒并发数据包

这段话是以前和群友说的
对于tcp,cdn会导致流量放大数倍,因为添加了极大多数的无用标头,原始只会返回这点东西,header内容如图所示
1.jpg

2.jpg

通过cdn流量放大了十几倍,udp返回的东西更少,更加省流量,所以udp你连下载完成数都看不到,就是为了节省
包括cdn要把标头发给服务器,服务器上就平白无故多出来10M下行流量的差距,下行上行不对等了,总之问题不大

这里是Tracy Rogers发来的邮件内容
直接发全部的对话,对话太长就不进行摘选了,中文就是我回的
Hello,
I am the developer of https://tracker.wildkat.net/
It is a clean design of the bittorrent tracker protocol in udp/http.  It is written in python3.  It is a tracker and a private or open server.  It is all self contained.
If configured it is a full fledged torrent web site plus tracker with automatic ingestion of torrents.
Once the server was however accepted by ngosang the load doubled from tracking about 120k torrents to 350k torrents.  In this new scenario many UDP packets cannot be processed.  The server receive them.  The cpu is fine.  The design of software cannot keep up.
If I do workload scenario on my macbook pro is fine.  I cannot stress system whatsoever and always works great, hooray!  However in hosted VM this is not the case.
I have tried a few different redesign over last couple days to address the traffic and accept more, but nothing really matters.  I have recently did a redesign to use UDP_REUSEPORT, but it also just comes with a lot more problems as well as the overall design wasn't written for this.  Performance also did not seem to improve as much as one would anticipate.
Poke around the web site and let me know if you would be interesting in doing some collaboration work on my project.  It is on github but marked private at the moment.

wc -l tracker_server.py
   41939 tracker_server.py

It is a large monolith of code presently.
This could be an OCI limitation even, I don't know. I don't think so though as my bandwidth and cpu are just fine.  I think its just the design and workers stalling and queuing up.
The alternative is doing nothing and it runs as is serving half the traffic it receives.

   daily
                     rx      |     tx      |    total    |   avg. rate
     ------------------------+-------------+-------------+---------------
     yesterday     15.66 GiB |    8.67 GiB |   24.33 GiB |    2.42 Mbit/s
         today     10.40 GiB |    6.37 GiB |   16.77 GiB |    2.64 Mbit/s
     ------------------------+-------------+-------------+---------------

Is receive much more traffic than I am reply to.

我在垃圾箱中找到了你这封邮件
看起来是触发了Linux1024限制,你可以在这里阅读所有简要的评论,或者评论中提到的网站文章,@victorarle 他现在利用我的优化,每秒服务75K Connection并发,当前运行500w peer
https://github.com/ngosang/trackerslist/issues/640#issuecomment-4274220333

不过就像你说的,你正在使用免费的ORACLE服务器,会有更严格的网络连接限制,你可以在相同的系统上运行opentracker来进行测试,以此判断是否是vps的问题,更推荐购买10美元一年的付费vps来搭建tracker,以便提供更可靠的网络稳定运行

这是我的仓库
https://github.com/1265578519/opentracker

你现在要做的就是,更换软件测试


其次对于你提到的收到udp数据包,但是无法发送,提高缓冲区大小有一定帮助,但是效果可能并不明显,要优化程序源代码减少上下文切换才是正确方向
```
编辑文件:/etc/sysctl.conf

net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
socket缓冲区大小,默认值124928改成67108864
net.core.rmem_default = 33554432
net.core.wmem_default = 33554432
注意Linux程序读取是是这个default的值,记得也要同时修改max的值(需要重启当前运行的进程来获取新的值,udp服务端被动接收请求只会产生一个socket,不像tcp是每个客户端请求的时候服务器会单独分配一个新socket,根据设置也就是udp服务端进程最多可以用到32M内存)

然后执行 /sbin/sysctl -p 让参数生效
```

Hello,

Its not spam 🙂

The opentrackr number you report from victorarle I find hard to believe is a true number to be honest.  I should be receiving the same level of traffic and actually more because the tracker is on another more chinese based list as well.

HTTP-METRICS dt=60.0s req=50 (0.8/s) get=50 post=0 2xx=50 3xx=0 4xx=0 5xx=0 err=0 announce=40 scrape=4 manage=0 live=6 splash=0 webhook=0 other=0 http=0 https=50 slow250=12 slow1000=0 avg_ms=154.8
UDP-METRICS dt=60.0s rx=40371 (672.8/s) enq=37470 deq=37547 done=37547 drop_adm=2901 drop_q=0 connect=9284 announce=19101 scrape=395 invalid_cid=8767 unknown_action=0 banned_err=15 pkt_err=0 send_ok=37547 send_err=0 send_kib_s=22.2 q=317/512 inflight=325/512 workers=8 conn_ids=23043
RUNTIME-METRICS dt=60.0s active_torrents=372922 live_peers=560388 registry_torrent_keys=401949 registry_downloaded_keys=217990 registry_downloaded_touched=217990 registry_hash_locks=417890 registry_persisted_rows=0 ban_cache=1038 wild_enq_q=4096/4096


Over 60 secs only 672 packets per second were being processed.  75k connections per second would be an absolute insane amount to the realm of about 150,000 packets/sec minimum.

I actively ban abuse traffic like a crazy person as well.  I think the numbers would be through the roof if I didn't do banning.  I don't know what opentrackr abuse handling is.


I already have this set:

cat /etc/sysctl.d/99-tracker-net.conf
# Tracker network tuning
net.ipv4.tcp_synack_retries=3
net.ipv4.tcp_fin_timeout=20

# RFS flow table for better RX CPU distribution under load
net.core.rps_sock_flow_entries=32768

# UDP receive/backlog tuning for tracker load
net.core.rmem_max=67108864
net.core.rmem_default=4194304
net.core.netdev_max_backlog=250000
net.ipv4.udp_rmem_min=32768
net.ipv4.udp_mem=8388608 12582912 16777216


sysctl net.core.rmem_default net.core.rmem_max net.core.wmem_default net.core.wmem_max \
       net.core.netdev_max_backlog net.ipv4.udp_mem net.ipv4.udp_rmem_min net.ipv4.udp_wmem_min
net.core.rmem_default = 4194304
net.core.rmem_max = 67108864
net.core.wmem_default = 212992
net.core.wmem_max = 212992
net.core.netdev_max_backlog = 250000
net.ipv4.udp_mem = 8388608    12582912    16777216
net.ipv4.udp_rmem_min = 32768
net.ipv4.udp_wmem_min = 4096


while true; do
  a=$(grep '^Udp:' /proc/net/snmp | tail -n1)
  sleep 10
  b=$(grep '^Udp:' /proc/net/snmp | tail -n1)
  python3 - <<'PY' "$a" "$b"
import sys
def parse(line):
    return list(map(int, line.split()[1:]))
a=parse(sys.argv[1]); b=parse(sys.argv[2])
# /proc/net/snmp Udp fields order:
# InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors InCsumErrors IgnoredMulti MemErrors
names=["InDatagrams","NoPorts","InErrors","OutDatagrams","RcvbufErrors","SndbufErrors","InCsumErrors","IgnoredMulti","MemErrors"]
d=[b-a for i in range(len(names))]
print(" ".join(f"{n}={v}" for n,v in zip(names,d)))
PY
done
InDatagrams=5672 NoPorts=129 InErrors=0 OutDatagrams=4827 RcvbufErrors=0 SndbufErrors=0 InCsumErrors=0 IgnoredMulti=0 MemErrors=0
InDatagrams=5471 NoPorts=166 InErrors=0 OutDatagrams=4909 RcvbufErrors=0 SndbufErrors=0 InCsumErrors=0 IgnoredMulti=0 MemErrors=0
InDatagrams=6346 NoPorts=134 InErrors=0 OutDatagrams=4892 RcvbufErrors=0 SndbufErrors=0 InCsumErrors=0 IgnoredMulti=0 MemErrors=0
InDatagrams=5889 NoPorts=183 InErrors=0 OutDatagrams=4482 RcvbufErrors=0 SndbufErrors=0 InCsumErrors=0 IgnoredMulti=0 MemErrors=0
InDatagrams=5553 NoPorts=139 InErrors=0 OutDatagrams=4417 RcvbufErrors=0 SndbufErrors=0 InCsumErrors=0 IgnoredMulti=0 MemErrors=0
InDatagrams=5451 NoPorts=125 InErrors=0 OutDatagrams=4793 RcvbufErrors=0 SndbufErrors=0 InCsumErrors=0 IgnoredMulti=0 MemErrors=0
^C

pid=$(pgrep -f 'tracker_server.py' | head -n1)
echo "pid=$pid"
ls /proc/$pid/fd | wc -l
pid=221037
169

首先,你的域名不在ngosang的trackers_best.txt列表中
https://github.com/ngosang/trackerslist/blob/master/trackers_best.txt

请注意列表中 @victorarle 他的域名出现了两次vito-tracker,种子客户端请求tracker服务器的流量是双份的

所以你只获得了ngosang较少的流量分配,你流量很少这是正常现象,如果你有流量需要,我可以为从我的服务器,为你临时通过web server随机调度5%的流量给你用于测试

opentracker提供api,页面为 http://vito-tracker.space:6969/stats?mode=everything 你可以多次f5刷新页面,观察每秒字段数值变化,overall为transaction、announce等请求的总计

你当前使用的几个Linux参数和tracker无关,属于服务器对外发请求的优化,tracker是被动接收用户并且返回数据,并不会主动访问外部网络

最开始提到的Linux1024,这是最重要的参数优化,需要通过ulimit解锁,请通过查询pid来确认,告知我回显结果
cat /proc/28492/limits | grep files

对于较新的系统,则需要同时修改,你使用了`UDP_REUSEPORT`,这也是必须要执行的操作
echo "DefaultLimitNOFILE=1048576" >> /etc/systemd/system.conf
echo "DefaultLimitNOFILE=1048576" >> /etc/systemd/user.conf

对于udp,你没有修改wmem_default的值,修改后需要同时重新启动进程才能生效,确保UdpRcvbufErrors数值不在上涨,当然这和我之前说的,这个参数效果并不大,因为tracker取决于代码的cpu运行周期,减少互斥锁和原子操作,要从代码层减少上下文切换
watch -n 1 "nstat -az | grep -E 'UdpRcvbufErrors|UdpSndbufErrors'"

ubuntu@hazen-a1:~$ sudo systemctl cat tracker | grep LimitNOFILE
LimitNOFILE=65536
ubuntu@hazen-a1:~$ pid=$(pgrep -f 'tracker_server.py' | head -n1)
ubuntu@hazen-a1:~$ echo "pid=$pid"
pid=221037
ubuntu@hazen-a1:~$ cat /proc/$pid/limits | grep files
Max open files            65536                65536                files
Ubuntu@hazen-a1:~$

I am not implementing UDP_REUSEPORT.  I did an attempt but didn't see much improvement and would require a very large refactor to make everything work correctly again.  The syncing from udp forks was not good.  I think would require a refactor and use redis perhaps to sync data.


Every 1.0s: nstat -az | grep -E 'UdpRcvbufErrors|UdpSndbufErrors'                                                                                                                                                                               hazen-a1: Tue May  5 18:29:32 2026

UdpRcvbufErrors                 2422               0.0
UdpSndbufErrors                 0                  0.0

Does not increase.


root@hazen-a1:~# pid=$(pgrep -f 'tracker_server.py' | head -n1)
grep 'Max open files' /proc/$pid/limits
ls /proc/$pid/fd | wc -l
Max open files            65536                65536                files
179
Root@hazen-a1:~#

你的数值当前是65536比较小,建议修改成一个较大的值1048576,通过ping测试,你的系统已经触发了限制引起大约10%网络丢包
https://www.itdog.cn/tcping/tracker.wildkat.net:8443
1.png


还有个丢包的可能性是文中提到的防火墙,跟踪表溢出引起的丢包,建议禁用iptables来关闭nf驱动模块
https://github.com/ngosang/trackerslist/issues/640#issuecomment-4274220333

如果无法关闭防火墙,则需要提高此数值
net.netfilter.nf_conntrack_max = 1048576

还有是tcp优化,不过通过查看你的站点 https://tracker.wildkat.net/ tcp流量的占比是0.2%,那么对于udp来说,修改这个当前无关紧要,不过为了未来如果tcp流量上涨,那么还是必须执行优化

echo "net.ipv4.tcp_max_orphans = 100000" >> /etc/sysctl.conf
echo "net.ipv4.tcp_orphan_retries = 3" >> /etc/sysctl.conf

你系统触发网络丢包的原因,那么就剩下提高ulimit的值,和防火墙nf驱动了

使用lsof 可以更好的查看当前占用的ulimit,你通过查看fd获得的179实际上并不准确

lsof -n | awk '{ print $2; }' | uniq -c | sort -rn | head

root@hazen-a1:~# sysctl net.netfilter.nf_conntrack_max
sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_max: No such file or directory
root@hazen-a1:~# cat /proc/sys/net/netfilter/nf_conntrack_count
cat: /proc/sys/net/netfilter/nf_conntrack_count: No such file or directory
Root@hazen-a1:~#

I think its just this poop cpu.  I cannot replicate any sort of performance issue on my personal computer no matter how hard I stress it.


root@hazen-a1:~# cat /proc/cpuinfo
processor&#8194;&#8194;&#8194;: 0
BogoMIPS&#8194;&#8194;&#8194;&#8194;: 50.00
Features&#8194;&#8194;&#8194;&#8194;: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer&#8194;&#8194;&#8194;: 0x41
CPU architecture: 8
CPU variant&#8194;: 0x3
CPU part&#8194;&#8194;&#8194;&#8194;: 0xd0c
CPU revision&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;: 1

processor&#8194;&#8194;&#8194;: 1
BogoMIPS&#8194;&#8194;&#8194;&#8194;: 50.00
Features&#8194;&#8194;&#8194;&#8194;: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer&#8194;&#8194;&#8194;: 0x41
CPU architecture: 8
CPU variant&#8194;: 0x3
CPU part&#8194;&#8194;&#8194;&#8194;: 0xd0c
CPU revision&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;: 1

processor&#8194;&#8194;&#8194;: 2
BogoMIPS&#8194;&#8194;&#8194;&#8194;: 50.00
Features&#8194;&#8194;&#8194;&#8194;: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer&#8194;&#8194;&#8194;: 0x41
CPU architecture: 8
CPU variant&#8194;: 0x3
CPU part&#8194;&#8194;&#8194;&#8194;: 0xd0c
CPU revision&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;: 1

processor&#8194;&#8194;&#8194;: 3
BogoMIPS&#8194;&#8194;&#8194;&#8194;: 50.00
Features&#8194;&#8194;&#8194;&#8194;: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer&#8194;&#8194;&#8194;: 0x41
CPU architecture: 8
CPU variant&#8194;: 0x3
CPU part&#8194;&#8194;&#8194;&#8194;: 0xd0c
CPU revision&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;: 1

root@hazen-a1:~# lscpu
Architecture:                aarch64
  CPU op-mode(s):            32-bit, 64-bit
  Byte Order:                Little Endian
CPU(s):                      4
  On-line CPU(s) list:       0-3
Vendor ID:                   ARM
  BIOS Vendor ID:            QEMU
  Model name:                Neoverse-N1
    BIOS Model name:         virt-7.2  CPU @ 2.0GHz
    BIOS CPU family:         1
    Model:                   1
    Thread(s) per core:      1
    Core(s) per socket:      4
    Socket(s):               1
    Stepping:                r3p1
    BogoMIPS:                50.00
    Flags:                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
NUMA:
  NUMA node(s):              1
  NUMA node0 CPU(s):         0-3
Vulnerabilities:
  Gather data sampling:      Not affected
  Ghostwrite:                Not affected
  Indirect target selection: Not affected
  Itlb multihit:             Not affected
  L1tf:                      Not affected
  Mds:                       Not affected
  Meltdown:                  Not affected
  Mmio stale data:           Not affected
  Old microcode:             Not affected
  Reg file data sampling:    Not affected
  Retbleed:                  Not affected
  Spec rstack overflow:      Not affected
  Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:                Mitigation; __user pointer sanitization
  Spectre v2:                Mitigation; CSV2, BHB
  Srbds:                     Not affected
  Tsa:                       Not affected
  Tsx async abort:           Not affected
  Vmscape:                   Not affected
root@hazen-a1:~#

No such file or directory就是最理想的状态了,代表防火墙已经成功禁用,此时不会触发nf驱动引发的丢包
你这个丢包不是灾难性持续性的,而是偶发,应该是瞬时流量引发的,你应该在代码加入可靠的差分技术
opentracker可以对每次请求随机6分钟差分,例如2小时一次就是 1:57:00~2:03:00 正负值各为3分钟,随机下发一个值返回给用户
所以此时就会很平滑,没有在出现cpu峰值情况

差分后的例子,24小时数据包是平滑的,没有任何峰值凸起,稳定在80K连接数每秒并发左右
2.png


你的服务器现在是固定30分钟请求一次,那么会发生瞬间峰值现象触发了你的ulimit 65535上限值,从而导致丢包
3.png


回到最开始的问题,你是想知道你服务器下行流量比上行流量多的原因?


解答你流量上下行的差异情况
4.png

5.png



问题1
代码层bep15协议规范支持有问题,你的 udp://tracker.wildkat.net:6969/announce 服务器Transaction Id的有效期设置错误只有1分钟,导致udp连接请求判断有误,tracker服务器持续性返回给客户端`connection ID not recognized`的错误回复包,应当大于等于peer删除时间,否则你的tracker在种子客户端上永远只能首次请求,后续宣告更新时客户端无法正常使用根本连接不上,客户端只会一直在进行错误重试

问题2
如果一个torrent没有其他用户,返回的peer数量为空,此时的数据包会非常小,所以出现了Announce Request流量>返回流量

I'll review that.  Never ran into any issue in this regard.

What torrent client is this that you are having problem on?



./tracker_query.py -t udp://tracker.wildkat.net:6969/announce -H ACF3C904377C9122BB1CD2B4454C9EBF7EC39DD5 -p

──────────────────────────────────────────────────
UDP STARTED → udp://tracker.wildkat.net:6969/announce
──────────────────────────────────────────────────
Client: qBittorrent/5.1.4
Connecting to: tracker.wildkat.net:6969
Resolved to: 2603:c028:4507:f87e:0:7c2c:abdd:e5c2 (IPv6)
Sending connect request (transaction_id: 3573380164)...
Connected (connection_id: 16444701413930001236)
Sending announce request (transaction_id: 1928047667)...
Announce successful   Response time: 75.34ms
  → Received 42 IPv6 peers (34 IPv4-mapped, 8 native IPv6)

Tracker Response Summary:
──────────────────────────────────────────────────
Response Time:          75.34 ms (Excellent)
Interval:                1800 s
Min Interval:               ? s
Seeds:                     31
Leechers:                  12
Times Downloaded:           ?
IPv4 Peers:                 0 (0 bytes)
IPv6 Peers:                42 (756 bytes)
Requested:                 50 peers (respected)
──────────────────────────────────────────────────

Peer List:
──────────────────────────────────────────────────
  1. 2409:8a1e:8060:f610:42:c0ff:fea8:1402  :34567 [ipv6]
  2. 2002:753e:a43c:0:211:32ff:fe9a:c5b     :63219 [ipv6]
  3. 240e:b8f:88b8:c400:807c:e80f:8d51:c342 :11903 [ipv6]
  4. 2804:14d:4683:8126::68f                :24550 [ipv6]
  5. 2804:14d:4683:8126:9042:eb59:71fd:f349 :24550 [ipv6]
  6. 2804:48dc:328:bd01:c579:ea0c:ef29:91a3 :63510 [ipv6]
  7. 2408:8256:f18d:2a:87e1:90b3:c28d:f4b0  :13655 [ipv6]
  8. 2408:8256:f18d:2a:4866:a0ce:77df:2cd7  :13655 [ipv6]
  9. ::ffff:117.62.164.60                   :63219 [ipv4-mapped]
10. ::ffff:167.88.3.210                    :51413 [ipv4-mapped]
11. ::ffff:176.88.72.187                   :31246 [ipv4-mapped]
12. ::ffff:185.193.156.147                 :53923 [ipv4-mapped]
13. ::ffff:146.70.215.18                   :1443  [ipv4-mapped]
14. ::ffff:113.137.59.14                   :2021  [ipv4-mapped]
15. ::ffff:188.241.144.210                 :2586  [ipv4-mapped]
16. ::ffff:41.198.158.8                    :11850 [ipv4-mapped]
17. ::ffff:149.22.95.68                    :5956  [ipv4-mapped]
18. ::ffff:216.224.124.79                  :53868 [ipv4-mapped]
19. ::ffff:200.127.115.115                 :51413 [ipv4-mapped]
20. ::ffff:89.164.10.164                   :51413 [ipv4-mapped]
21. ::ffff:152.173.75.253                  :59106 [ipv4-mapped]
22. ::ffff:85.203.44.105                   :6881  [ipv4-mapped]
23. ::ffff:163.179.9.197                   :13655 [ipv4-mapped]
24. ::ffff:113.137.59.112                  :2021  [ipv4-mapped]
25. ::ffff:46.116.248.80                   :32326 [ipv4-mapped]
26. ::ffff:146.70.174.135                  :56117 [ipv4-mapped]
27. ::ffff:113.137.59.51                   :2021  [ipv4-mapped]
28. ::ffff:102.182.250.221                 :28955 [ipv4-mapped]
29. ::ffff:187.39.254.145                  :24550 [ipv4-mapped]
30. ::ffff:183.193.62.11                   :34567 [ipv4-mapped]
31. ::ffff:189.50.82.162                   :63510 [ipv4-mapped]
32. ::ffff:190.230.231.249                 :23088 [ipv4-mapped]
33. ::ffff:204.8.98.46                     :51159 [ipv4-mapped]
34. ::ffff:173.239.240.234                 :44975 [ipv4-mapped]
35. ::ffff:103.50.33.31                    :51413 [ipv4-mapped]
36. ::ffff:176.58.192.226                  :24298 [ipv4-mapped]
37. ::ffff:37.211.73.194                   :6881  [ipv4-mapped]
38. ::ffff:127.0.0.1                       :6881  [ipv4-mapped]
39. ::ffff:180.164.103.60                  :42192 [ipv4-mapped]
40. ::ffff:185.157.14.61                   :28220 [ipv4-mapped]
41. ::ffff:3.254.95.190                    :8114  [ipv4-mapped]
42. ::ffff:154.88.81.23                    :7174  [ipv4-mapped]
──────────────────────────────────────────────────

https://www.bittorrent.org/beps/bep_0015.html

A client can use a connection ID until one minute after it has received it. Trackers should accept the connection ID until two minutes after it has been send.

Confirmed in tracker_server.py:
UDP connection ID TTL is 120 seconds
_UDP_CONN_TTL = 120 at tracker_server.py (line 19467)
Connection IDs are generated in _gen_connection_id() and stored with expiry timestamp
tracker_server.py (line 19476)
Purge loop runs every 30s and removes expired IDs
tracker_server.py (line 19498)
Validation currently checks only presence in bucket (cid in bucket)
tracker_server.py (line 19506)
On invalid CID, server returns connection ID not recognized
tracker_server.py (line 19864)
So it is not 60s; it is 120s, which aligns with common BEP-15 behavior.

给你提到的问题发生在任何一款客户端上
世界上全部的种子客户端使用你的udp tracker都有问题,你要做的是根据我的说法去兼容现有的客户端,参考附件
https://web.archive.org/web/20260506004744/https://send.itzmx.com/files/K5xUweMjMuHm0ukD1SgACjE.gif

Strange, never have such issue &#128578;. Ill review, seems that is utorrent.

I do fast test with qbittorrent, it work well.
I dont understand your issue &#128577;

I cannot replicate your issue with utorrent nor qbittorrent.  I assume this is great wall of china blocking you.  I dont know otherwise.

Advise switch to qbittorrent https://www.qbittorrent.org/ utorrent is like malware &#128533;

你可以在你的服务器上抓包搜索,可以看到每秒大量的Connection错误数据包发出
tcpdump -ni eth0 udp port 6969 -A | grep -Ei "Connection"

刚测试了qbittorrent v5.1.4,确实没有这个现象,因为qb在发udp的时候始终先请求全新的transaction ID用于获取Connection
除了qbittorrent以外的其他客户端都有这个问题,除非你的tracker不提供给他们与服务器进行连接,否则应该做兼容处理

Thanks, i'll have to do more testing.  I develop tracker to be protocol compliant.  I don't think the clients are compliant with spec.
I do analysis.  Top offender is BitComet from users.
I try again with bitcomet and I cannot replicate the connection invalid scenario with any endpoint I use.  This could be china interrupt the handshake.

比特彗星有一样的Connection错误问题,但是这和你提到所谓的中国网络无任何关系,这是tracker服务器代码设计缺陷
附件为加拿大的种子服务器上复现截图(需要在debug模式下显示日志,可用wireshark确认),如果无法打开这个大文件附件,请访问此链接
https://web.archive.org/web/20260506153635/https://send.itzmx.com/files/DHLbXxF1NC0K7FFT6kVI1qw.gif

I am working on it.  After 2 minutes I expire that connection id per protocol.  Bitcomet never requests again even though it gets the error telling it why!  I make UDP tool and monitoring behavior of that client.
Below is bitcomet stopping and starting torrent until server ages out the CID and bitcomet then does not request a new.  At CID age 171 is rejected and never requests again.
[2026-05-06 12:51:36] REQ src=x.x.x.x:20101 dst=10.0.0.110:6969 act=connect txid=2832492140 cid=4497486125440
[2026-05-06 12:51:37] REQ src=x.x.x.x:20101 dst=10.0.0.110:6969 act=connect txid=2832492140 cid=4497486125440
[2026-05-06 12:51:37] RESP src=10.0.0.110:6969 dst=x.x.x.x:20101 act=connect txid=2832492140 bytes=16
[2026-05-06 12:51:37] RESP-CONNECT assigned_cid=932386806213483733
[2026-05-06 12:51:38] REQ src=x.x.x.x:20101 dst=10.0.0.110:6969 act=announce txid=714407719 cid=932386806213483733 cid_age_sec=0
[2026-05-06 12:51:38] REQ-ANN peer_id=-BC0220-Q..Kq.....L. client_label=BitComet
[2026-05-06 12:52:01] RESP src=10.0.0.110:6969 dst=x.x.x.x:20101 act=announce txid=714407719 bytes=20
[2026-05-06 12:52:13] REQ src=x.x.x.x:20101 dst=10.0.0.110:6969 act=announce txid=714407719 cid=932386806213483733 cid_age_sec=35
[2026-05-06 12:52:13] REQ-ANN peer_id=-BC0220-Q..Kq.....L. client_label=BitComet
[2026-05-06 12:52:14] REQ src=x.x.x.x:20101 dst=10.0.0.110:6969 act=announce txid=714407719 cid=932386806213483733 cid_age_sec=36
[2026-05-06 12:52:14] REQ-ANN peer_id=-BC0220-Q..Kq.....L. client_label=BitComet
[2026-05-06 12:52:49] RESP src=10.0.0.110:6969 dst=x.x.x.x:20101 act=announce txid=714407719 bytes=20
[2026-05-06 12:52:49] REQ src=x.x.x.x:20101 dst=10.0.0.110:6969 act=announce txid=714407719 cid=932386806213483733 cid_age_sec=71
[2026-05-06 12:52:49] REQ-ANN peer_id=-BC0220-Q..Kq.....L. client_label=BitComet
[2026-05-06 12:52:58] RESP src=10.0.0.110:6969 dst=x.x.x.x:20101 act=announce txid=714407719 bytes=20
[2026-05-06 12:53:28] REQ src=x.x.x.x:20101 dst=10.0.0.110:6969 act=announce txid=714407719 cid=932386806213483733 cid_age_sec=110
[2026-05-06 12:53:28] REQ-ANN peer_id=-BC0220-Q..Kq.....L. client_label=BitComet
[2026-05-06 12:53:29] REQ src=x.x.x.x:20101 dst=10.0.0.110:6969 act=announce txid=714407719 cid=932386806213483733 cid_age_sec=111
[2026-05-06 12:53:29] REQ-ANN peer_id=-BC0220-Q..Kq.....L. client_label=BitComet
[2026-05-06 12:53:39] RESP src=10.0.0.110:6969 dst=x.x.x.x:20101 act=announce txid=714407719 bytes=20
[2026-05-06 12:53:39] REQ src=x.x.x.x:20101 dst=10.0.0.110:6969 act=announce txid=714407719 cid=932386806213483733 cid_age_sec=122
[2026-05-06 12:53:39] REQ-ANN peer_id=-BC0220-Q..Kq.....L. client_label=BitComet
[2026-05-06 12:53:58] RESP src=10.0.0.110:6969 dst=x.x.x.x:20101 act=announce txid=714407719 bytes=20
[2026-05-06 12:53:58] REQ src=x.x.x.x:20101 dst=10.0.0.110:6969 act=announce txid=714407719 cid=932386806213483733 cid_age_sec=140
[2026-05-06 12:53:58] REQ-ANN peer_id=-BC0220-Q..Kq.....L. client_label=BitComet
[2026-05-06 12:54:16] RESP src=10.0.0.110:6969 dst=x.x.x.x:20101 act=announce txid=714407719 bytes=20
[2026-05-06 12:54:17] REQ src=x.x.x.x:20101 dst=10.0.0.110:6969 act=announce txid=714407719 cid=932386806213483733 cid_age_sec=159
[2026-05-06 12:54:17] REQ-ANN peer_id=-BC0220-Q..Kq.....L. client_label=BitComet
[2026-05-06 12:54:29] RESP src=10.0.0.110:6969 dst=x.x.x.x:20101 act=error txid=714407719 bytes=36
[2026-05-06 12:54:29] RESP-ERROR msg='connection ID not recognized' reason=cid_expired_by_local_ttl_model label=BitComet peer_id=-BC0220-Q..Kq.....L. cid_age_sec=171
[2026-05-06 12:54:30] REQ src=x.x.x.x:20101 dst=10.0.0.110:6969 act=announce txid=714407719 cid=932386806213483733 cid_age_sec=172
[2026-05-06 12:54:30] REQ-ANN peer_id=-BC0220-Q..Kq.....L. client_label=BitComet

你可以查看其它tracker源代码上是怎么实现的
不然在用户访客眼里你的tracker就是坏的,因为客户端无法建立连接,但是其它人的tracker正常

I did already.
This client is very buggy in my opinion.  If a server ever restarts and client is running it would always go into error mode forever (until client program restart).
Only way to support this client accurately is to accept any CID it sends as it never processes any sort of rejection informing it to re-connect and get a valid and new CID.
I create lax option using similar to opentracker epoch mode it uses.  As long as server sees a valid connection from bitcomet it will work for 48hrs after that doesn't work unless they reconnect (which appears to be restart program).
Even though I restart tracker there is a ton of bitcomet connecting and they will not fix themselves unless client restart.
Or need to make explicit rule for peer_id -BC (BitComet) and blindly accept any CID they use.  Is crazy to me.
I submit a bug report to bitcomet, it should not behave the way it does.  It is very easy to get stuck in orphan scenario.
I updated my code to go back to original strict method but now on a valid UDP announce or scrape from a client that I have on record of a valid connect with CID I increase that cid by min interval *2.  So bitcomet can start the program and start a torrent, it will do a connect and each announce or scrape from then then on refreshes the CID by an hour.  If they stop the torrent and over an hour passes and try to start the torrent without restarting the program it will fail.  I cannot keep CID indefinitely.  This is a design flaw (in my opinion) in bitcomet.  Also if tracker is restarted all CIDS from bitcomet will be invalid unless program is restart.  Only way to solve that is to store CIDS indefinitely in DB.
Bitcomet is amplifying the torrent traffic by a lot because of this design.  On fail it requery on short timer over and over.  It expects to be able to use the same CID forever as long as the program is open.  This just isn't the design of the protocol.

https://www.cometbbs.com/t/220%E6%B5%8B%E8%AF%95%E7%89%88/97560/142
你说的重启问题,我在3月10日已经和比特彗星其中一位中文开发者提交过反馈
但是你没办法提交给所有其他种子客户端作者,例如utorrent无法和开发者取得联系,和世界上那么多客户端作者去提交不太现实
而且就算所有的客户端都发布了新版本,那用户也不一定更新,在未更新版本的用户看来,运行旧版客户端上tracker依旧是坏的
最后的是,应该在代码中至少修复announce问题,解决无法进行第二次announce,和我之前说的至少Connection要和删除peer时间相同,或者>间隔时间为2小时
对于重启那是另一个问题,你不一定要在代码上修复重启后的Connection,opentracker也有一样的重启问题

Is strange they didn't address the issue when you mention it then.
My report is at https://cometforums.com/ however it is in "pending state" so maybe no one ever sees it.  As stated in a follow up email I have addressed the issue by updating CID lifespan when it sees an announce/scrape. Is strange to me that such a huge issue is not found or corrected.

前些天开发者发布了2.21测试版Beta1 [20260426]
当前的版本测试方向是CPU多线程性能优化,之前反馈的内容在待办清单里

```
感谢各位的意见及建议。 2.21测试版 beta1已发布,欢迎试用。还没来得及处理的bug及新功能建议后续会陆续跟进
```

```
增加高级选项 bittorrent.transfer_thread_pool,使用工作线程进行BT传输加密解密运算,默认关闭
```

我给开发者回复,让他提高一下这个优先级,让他下一版Beta2测试发布的时候更新一下
https://www.cometbbs.com/t/221%E6%B5%8B%E8%AF%95%E7%89%88/97858/31

还有就是你服务器的udp丢包问题,当前依旧存在,客户端收不到来自tracker服务器的回包,重试了很多次才连接上



你可以引入邮件之前提到的差分技术试一下,c语言代码很简单就能实现,参考我的opentracker仓库
https://github.com/1265578519/OpenTracker/blob/master/opentracker/trackerlogic.h#L51

php实现,把这项功能移植到你使用的Python上并不困难
https://bbs.itzmx.com/thread-106752-1-1.html

我仓库里的opentracker还实现了根据peer数量自动返回不同的间隔时间,而不是固定的30分钟,以便大幅度降低tracker服务器的cpu使用率

Yes, I am quite aware of poor udp performance.  I am not draining the UDP queue fast enough.  Https is replying fine.  As I mention before this is a completely new tracker written from scratch.
Its frustrating for me, believe me in that &#128578;
Back when it was around 120,000 torrents everything was fine and udp reply in 50ms.
On a good note I found what feature was causing the performance to tank.  UDP is now fine.  All packets are being processed fine.
Thu May  7 13:12:51 EDT 2026
Response Time:          40.47 ms (Excellent)
Thu May  7 13:12:56 EDT 2026
Response Time:          40.59 ms (Excellent)
Thu May  7 13:13:01 EDT 2026
Response Time:          36.48 ms (Excellent)
Thu May  7 13:13:07 EDT 2026
Response Time:          40.63 ms (Excellent)
On a bad note a major feature is now disabled &#128577;
As I mention before the server is much more than just a tracker, tracker part is simple, just everything else piled on top of it.  Core logic of tracker server really hasn't changed in months.
All fixed &#128578;
Thanks for listening to my woes and highlighting the issue regarding bitcomet.
In the end it was nothing to do with kernel parameters, but simply lock contention with another feature holding up everything else.  When that queue became full and not draining fast enough then udp packets were piling up.
Now the server is back to being quiet and idle.
It can be difficult to trace down where problems are occurring without adequate load and testing in sandbox no matter what I do never simulates real load.
The problems were not appearing until torrent load went over 120k.

对的,一开始就提到了要优化程序源代码,内核参数主要是解决连ping值都也发生网络丢包的现象,你的服务器当前出现这个现象比较轻微,应该是客户端并发引起的,很多人如同我一样在种子客户端上运行了几千甚至几万个种子进行上传或下载,如果固定30分钟更新一次,那么就会在1秒内产生几万个数据包发送到tracker服务器,导致服务器瞬间峰值触发内核限制产生网络丢包
C语言可以使用perf分析,你这个Python我就不太知道怎么去定位函数了,不过你现在已经找到相关代码并且修复了

正如我之前说的
```
当然这和我之前说的,这个参数效果并不大,因为tracker取决于代码的cpu运行周期,减少互斥锁和原子操作,要从代码层减少上下文切换
```
现在是否需要随机调度5%的流量给你用于测试?仅限http/https流量

I have done no optimizations to the https endpoint so it could very likely fall over if you start routing traffic to do it.  The https endpoint is not publicly advertised and I have TXT record redirecting traffic back to UDP endpoint per bep34.
dig +short TXT tracker.wildkat.net
"BITTORRENT UDP:6969 TCP:8443"
Bep34 aware clients will automatically be redirected to udp.  Though I don't know if any clients actually support this feature.

据我所知只有utorrent支持bep34,那么这个问题就结束了喵!你的udp tracker现在正常工作,之前无法二次进行announce实在太过于bug了,不应该追求bep(bep本身也有设计缺陷),而是考虑现实网络的客户端运行环境才是最重要的
如果你要感谢我,可以在公共场合发表评论 例如ngosang issues,或者我的仓库,可以在互联网上留下一个脚印帮助其他开发者或者提供tracker服务器的管理员去调优
https://github.com/ngosang/trackerslist/issues
如果不愿意也没关系

@1265578519 thanks for taking the time and listening to me while sorting out UDP response issues and hi-lighting the issue with BitComet. Hopefully remains stable for the upcoming future.

I think maybe this is what you were referring to in one your optimizations?
This logic now introduce deterministic interval response +- <set amount> via cli, I choose 300.  In this regard if a user has 300 torrents, they do not all reply back at same time.  25-35 min response interval.

提到的一项优化改对了,不过你还需要在改进一下,现在如果同时添加udp和tcp两种协议进行announce,两者的返回时间是一样的
应该每次进行announce的时候返回随机值,你现在这是同一个用户相同的种子每次announce固定在了一个时间,应该彻底对每次请求进行随机
要考虑同时运行两种协议的情况

That torrent is randomized to that value based on hash and peer id, it will stay like that until a new peer id is created.  Add a new torrent and that specific torrent will a new value.  I did not implement "random" for each request.  It is random based on hash + peer id.  Unless those values change then that is the value for that specific torrent for you.
I think its ok.  It achieves the goal that person with massive amount of torrents will have unique values for each.  Your trackers you have listed should be tiered anyhow.  Place those both on the same tier so you are not needless query same source.


评分

参与人数 1樱币 +2 收起 理由
l999123y + 2 淡定

查看全部评分

欢迎光临IT技术交流论坛:http://bbs.itzmx.com/
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册论坛 新浪微博账号登陆用百度帐号登录

本版积分规则

手机版|Archiver|Mail me|网站地图|IT技术交流论坛 ( 闽ICP备13013206号-7 )

GMT+8, 2026/5/11 01:37 , Processed in 0.185499 second(s), 23 queries , MemCache On.

Powered by itzmx! X3.4

© 2011- sakura

快速回复 返回顶部 返回列表