From fbcb23dc059f52355660f5861d757e9fdc27d23e Mon Sep 17 00:00:00 2001 From: Jiangtian Feng Date: Sun, 17 May 2026 22:50:16 +0800 Subject: [PATCH 1/2] anolis: mm: coldpgs: only split partially mapped large folios ANBZ: #35365 mm/coldpgs.c uses an outdated heuristic to decide whether a large folio should be split before being added to the swap cache: if (folio_test_large(folio)) { if (!my_can_split_folio(folio, NULL)) goto keep_unlocked; if (!folio_entire_mapcount(folio) && my_split_folio_to_list(folio, list)) goto keep_unlocked; } The intent of !folio_entire_mapcount() was to detect "the folio is no longer entirely PMD-mapped, so split it" back when only PMD-sized THP existed. Once mTHP (multi-size THP, order < HPAGE_PMD_ORDER) is in play, this is unconditionally true: mTHP order < HPAGE_PMD_ORDER, so mTHP can never be PMD-mapped => folio_entire_mapcount(mTHP) is always 0 => the heuristic always fires => every fully-mapped mTHP selected for proactive reclaim is broken down to order-0 before swap, defeating mTHP for any memcg that opts into coldpgs. mainline shrink_folio_list() solved this in commit 8422acdc97ed ("mm: introduce a pageflag for partially mapped folios"), which replaced the heuristic with a precise check based on the deferred-split list and the new PG_partially_mapped flag (see mm/vmscan.c:1311-1313). That commit has already been backported to ANCK as ef6dcceef2e47 (ANBZ: #18805), so PG_partially_mapped is available here as well. Apply the same precise check to coldpgs: if (data_race(!list_empty(&folio->_deferred_list) && folio_test_partially_mapped(folio)) && my_split_folio_to_list(folio, list)) goto keep_unlocked; After this change, fully-mapped mTHP folios stay intact and reach the existing add_to_swap() path as a whole; only large folios that have genuinely lost some of their tail mappings (and are therefore queued on _deferred_list with the PG_partially_mapped flag set) are split up front, exactly as in the reactive reclaim path. No change for fully-mapped PMD-sized THP either: those are not on _deferred_list, so they take the same fall-through behaviour as before (any required split happens later in the add_to_swap() fallback). Mirrors mm/vmscan.c:1311-1313. Signed-off-by: Jiangtian Feng --- mm/coldpgs.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/mm/coldpgs.c b/mm/coldpgs.c index d1a938347aff..7b53e2b2f41c 100644 --- a/mm/coldpgs.c +++ b/mm/coldpgs.c @@ -882,8 +882,13 @@ static unsigned long reclaim_coldpgs_from_list(struct mem_cgroup *memcg, if (folio_test_large(folio)) { if (!my_can_split_folio(folio, NULL)) goto keep_unlocked; - - if (!folio_entire_mapcount(folio) && + /* + * Split partially mapped folios right + * away. We can free the unmapped pages + * without IO. + */ + if (data_race(!list_empty(&folio->_deferred_list) && + folio_test_partially_mapped(folio)) && my_split_folio_to_list(folio, list)) goto keep_unlocked; } -- Gitee From 57956ba318b830fbfac7f14445c6cc11020c98b1 Mon Sep 17 00:00:00 2001 From: Jiangtian Feng Date: Sun, 17 May 2026 23:05:42 +0800 Subject: [PATCH 2/2] anolis: mm: coldpgs: add large-folio swap statistics ANBZ: #35365 After patch 1 in this series ("anolis: mm: coldpgs: only split partially mapped large folios"), the coldpgs proactive reclaim path tries to keep mTHP folios intact when swapping them out. Without dedicated counters there is no way to tell from production whether the change is taking effect: - mainline already counts PSWPOUT / THP_SWPOUT / MTHP_STAT_SWPOUT in the swap_writepage IO layer (mm/page_io.c), so any large folio that successfully reaches __swap_writepage() shows up there. Those global counters, however, aggregate the reactive reclaim path and the coldpgs proactive path together, leaving the proactive contribution invisible. - mainline counts THP_SWPOUT_FALLBACK / MTHP_STAT_SWPOUT_FALLBACK inline in mm/vmscan.c shrink_folio_list() when add_to_swap() fails and the folio has to be split. The wrappers used there (count_vm_event, count_memcg_folio_events, count_mthp_stat) are static inline functions backed by un-EXPORTed symbols (vm_event_states, __count_memcg_events, mthp_stats per-CPU). The builtin vmscan can use them; tristate coldpgs.ko cannot link against them. Extend the existing reclaim_coldpgs_stats array with two new counters scoped to the proactive reclaim path: RECLAIM_COLDPGS_STAT_LARGE_FOLIO_SWPOUT Bytes worth of large folios that were added to the swap cache whole, i.e. mTHP shape preserved across coldpgs. RECLAIM_COLDPGS_STAT_LARGE_FOLIO_SWPOUT_FALLBACK Bytes worth of large folios that had to be split before swap because add_to_swap() failed at the native order. The size of the original folio is recorded so that fallback / (fallback + swpout) yields a meaningful breakdown rate per memcg. These two counters together let ops attribute mTHP breakdown specifically to the proactive reclaim path and compute its own fallback rate, independent of the global vmstat counters. Following the precedent set by commit 9c90597a4e8a ("anolis: mm: coldpgs: add mlock {dropped, refault} counter"), two CK_KABI_RESERVE slots in struct reclaim_coldpgs_stats are consumed accordingly so that the total size of the structure stays unchanged. The new counters are exposed through the existing coldpgs sysfs stats output as "large folio swap out" and "large folio swap out fallback". Signed-off-by: Jiangtian Feng --- include/linux/memcontrol.h | 4 ++-- mm/coldpgs.c | 11 +++++++++++ 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 9e61015ba31f..dca465a67b70 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -239,6 +239,8 @@ enum reclaim_coldpgs_stat_item { RECLAIM_COLDPGS_STAT_SLAB_DROP, RECLIMA_COLDPGS_STAT_MLOCK_DROP, RECLIMA_COLDPGS_STAT_MLOCK_REFAULT, + RECLAIM_COLDPGS_STAT_LARGE_FOLIO_SWPOUT, + RECLAIM_COLDPGS_STAT_LARGE_FOLIO_SWPOUT_FALLBACK, RECLAIM_COLDPGS_STAT_MAX, }; @@ -247,8 +249,6 @@ struct reclaim_coldpgs_stats { CK_KABI_RESERVE(1) CK_KABI_RESERVE(2) - CK_KABI_RESERVE(3) - CK_KABI_RESERVE(4) }; #endif /* CONFIG_RECLAIM_COLDPGS */ diff --git a/mm/coldpgs.c b/mm/coldpgs.c index 7b53e2b2f41c..c3a5d58da951 100644 --- a/mm/coldpgs.c +++ b/mm/coldpgs.c @@ -901,8 +901,17 @@ static unsigned long reclaim_coldpgs_from_list(struct mem_cgroup *memcg, list)) goto keep_unlocked; + /* Original folio size; now order-0. */ + reclaim_coldpgs_update_stats(memcg, + RECLAIM_COLDPGS_STAT_LARGE_FOLIO_SWPOUT_FALLBACK, + nr_pages << PAGE_SHIFT); + if (!my_add_to_swap(folio)) goto keep_unlocked; + } else if (folio_test_large(folio)) { + reclaim_coldpgs_update_stats(memcg, + RECLAIM_COLDPGS_STAT_LARGE_FOLIO_SWPOUT, + nr_pages << PAGE_SHIFT); } /* Update address space */ @@ -1603,6 +1612,8 @@ static int reclaim_coldpgs_read_stats(struct seq_file *m, void *v) "slab drop", "mlock dropped", "mlock refault", + "large folio swap out", + "large folio swap out fallback", }; self = kzalloc(sizeof(*self) * 3, GFP_KERNEL); -- Gitee