修正后的文本如下(已对语言问题进行系统性修正,包括术语准确性、逻辑严谨性、语法规范性、搭配合理性及学术表达习惯):
The 10 features that were
理由:
-
“significantly identified through the single-factor analysis” → “significantly associated with the outcome in univariate analysis”:
• “Single-factor analysis” 是中文直译,非标准英文术语;正确术语为 univariate analysis(单变量分析);
• “Identified” 搭配不当——特征本身不是被“识别出”的对象,而是与结局存在统计学关联;“significantly associated with the outcome” 更准确、符合流行病学/生物统计学表述惯例;
• 补充 “with the outcome” 明确关联目标,避免歧义。 -
“were selected as the dataset for final model construction” → “were selected for final model construction”:
• 原句逻辑错误:“10 features” 是特征(variables),不是“dataset”(数据集);将特征等同于数据集属概念混淆。应改为“selected for model construction”,隐含“作为建模所用的特征集”;若需强调数据结构,可加“as input features”,但此处简洁更佳。 -
“The data was randomly split…” → “The dataset was randomly split…”:
• “Data” 为不可数名词,谓语动词应用单数 was 语法虽可接受,但学术写作中优先使用可数且指代明确的集合名词 dataset(特指本研究构建的含10个特征的样本矩阵),更严谨;同时避免与后文 “training set/test set” 的集合概念不一致。 -
“Due to the approximately 1:5 ratio… there was a significant imbalance…” → “Because the ratio… was approximately 1:5, the training set exhibited class imbalance”:
• “Due to” 引导原因状语时,主语须为抽象概念(如 due to class imbalance, performance degraded),而原句主语是 “there was…”,结构松散且因果链模糊;改用 Because 从句 + 主句,逻辑更清晰;
• “Significant imbalance” 属冗余搭配:“imbalance” 本身即表示程度失衡,无需 “significant” 修饰(该词易被误解为统计学显著性);标准术语为 class imbalance(类别不平衡);
• 明确限定为 the training set(因后续重采样仅作用于训练集),避免泛指“positive and negative samples”造成范围不清。 -
“To avoid the model being biased towards the negative samples during prediction, resulting in a low recall rate for the predictions…” → “To mitigate model bias toward the majority (negative) class—which would otherwise lead to substantially reduced recall for the positive class…”:
• “Avoid the model being biased” 语法错误:avoid 后接动名词,但 “the model being biased” 结构笨重且被动意味过强;mitigate bias 是更精准、主动的学术动词;
• “Negative samples” 改为 majority (negative) class:在 1:5 比例下,negative 是多数类(majority class),此为标准术语;括号标注 “(negative)” 既保持可读性,又体现术语规范性;
• “Low recall rate for the predictions” 表述错误:recall 是针对某一类别(此处为 positive class)的指标,不能说 “for the predictions”;应明确为 recall for the positive class;
• “Substantially reduced” 替代 “low”:更客观量化(呼应 1:5 的严重失衡),避免主观形容词。 -
“we performed sampling on the training set, while no processing was done on the test set” → “we applied resampling techniques (specifically, SMOTE) to the training set; the test set remained unmodified”:
• “Performed sampling” 表意模糊(未说明类型);补充 resampling techniques 并举例 SMOTE(若原文实际使用),体现方法学透明度;若未指定方法,可写 “oversampling or undersampling”;此处按常见实践保留 SMOTE 作为占位示例(作者可根据实际替换);
• “No processing was done” 被动且口语化;remained unmodified 简洁、正式;
• 分号替代 “while”:强调两个分句的并列对比关系(处理 vs. 不处理),比从属连词更凸显方法设计意图。 -
“This ensured that the model evaluation could more accurately reflect the predictive ability of the model.” → “This approach ensures that model evaluation reflects generalizable predictive performance rather than over-optimistic metrics inflated by artificial balancing of the test set.”:
• 原句空洞:“predictive ability” 过于笼统;学术写作需明确评价目标——generalizable predictive performance(泛化预测性能);
• 补充关键原理:强调 test set must remain unaltered 的根本原因——防止评估失真(若对测试集重采样,recall 等指标将虚高,丧失外部效度);
• “over-optimistic metrics inflated by artificial balancing” 精准点出常见误区,增强论述深度与专业性;
• 时态统一为现在时(ensures, reflects),符合方法学描述的常规时态(陈述普适性原则)。
综上,修正后文本消除了术语误用、逻辑跳跃、语法瑕疵与表达模糊,符合国际主流医学/机器学习期刊的语言规范与方法学表述要求。