重み付き誤分類損失の下でのアンサンブル学習を用いた分類

/ /

日本語AIでPubMedを検索

PubMedの提供する医学論文データベースを日本語で検索できます。AI(Deep Learning)を活用した機械翻訳エンジンにより、精度高く日本語へ翻訳された論文をご参照いただけます。

Stat Med.2019 05;38(11):2002-2012. doi: 10.1002/sim.8082.Epub 2019-01-04.

重み付き誤分類損失の下でのアンサンブル学習を用いた分類

Classification using ensemble learning under weighted misclassification loss.

Yizhen Xu
Tao Liu
Michael J Daniels
Rami Kantor
Ann Mwangi
Joseph W Hogan

PMID: 30609090 PMCID: PMC7045125. DOI: 10.1002/sim.8082.

抄録

共変量に基づく二項分類規則は、典型的にはゼロワン誤分類のような単純な損失関数に依存します。場合によっては、より複雑な損失関数を必要とすることもあります。例えば、抗レトロウイルス療法を受けているHIV感染者の個人レベルでのモニタリングでは、ウイルス負荷（VL）値がある閾値以上であると定義された治療失敗の定期的な評価が必要となります。資源が限られた環境では、VL検査はコストや技術的に制限され、診断は他の臨床マーカーに基づいて行われることがある。シナリオによっては、偽陽性を回避することに高いプレミアムが置かれることがあり、それはより大きなコストをもたらし、治療の選択肢を減らすことになります。ここでは、最適なルールは、重み付けされた誤分類の損失/リスクを最小化することによって決定される。我々は、重み付けされた誤分類損失の下で最適な二値分類ルールを見つけ出し、相互検証する方法を提案する。我々は、予測スコアと関連する閾値からなるルールに焦点を当てており、スコアはアンサンブル学習器を用いて導出される。シミュレーションと事例から、スコアと閾値を共同で導出する我々の手法は、特に有限サンプルに対して、スコアを最初に導出し、スコアに条件付きでカットオフを導出する手法と比較して、より正確に全体的なリスクを推定し、より優れた運用特性を有することが示された。

Binary classification rules based on covariates typically depend on simple loss functions such as zero-one misclassification. Some cases may require more complex loss functions. For example, individual-level monitoring of HIV-infected individuals on antiretroviral therapy requires periodic assessment of treatment failure, defined as having a viral load (VL) value above a certain threshold. In some resource limited settings, VL tests may be limited by cost or technology, and diagnoses are based on other clinical markers. Depending on scenario, higher premium may be placed on avoiding false-positives, which brings greater cost and reduced treatment options. Here, the optimal rule is determined by minimizing a weighted misclassification loss/risk. We propose a method for finding and cross-validating optimal binary classification rules under weighted misclassification loss. We focus on rules comprising a prediction score and an associated threshold, where the score is derived using an ensemble learner. Simulations and examples show that our method, which derives the score and threshold jointly, more accurately estimates overall risk and has better operating characteristics compared with methods that derive the score first and the cutoff conditionally on the score especially for finite samples.