Rを用いた一般化線形混合モデル（GLMM）の分析手法を身につける:言語研究分野の事例をもとに

日本言語テスト学会（JLTA）第26回（2023年度）全国研究大会ワークショップ

田村祐（関西大学）

2023-09-10

おしながき

はじめに
- 経緯
- 注意事項
WSの準備
- RおよびRStudioの準備
理論編：一般（化）線形（混合）モデルの概略
実践編：データ分析の実際
- データの下処理
- データ分析
- 分析結果の解釈
- 結果の図示と報告
おわりに

はじめに

今回のWSの分析の部分は下記の研究論文（データ，RコードすべてOSFにてオンライン公開済）のものです。
二値データの分析：Terai, M., Fukuta, J., & Tamura, Y. (2023). Learnability of L2 collocations and L1 influence on L2 collocational representations of Japanese learners of English. International Review of Applied Linguistics in Language Teaching. https://doi.org/10.1515/iral-2022-0234

※反応時間の分析も扱おうと思っていたのですが，ロジスティック回帰に加えて反応時間のGLMMをゼロから扱うのは3時間では足りないと判断して断念しました。田村のOSFページに反応時間の分析を扱った下記の論文のRコードがありますので，反応時間の分析に興味がお有りの方はそちらをご参照ください。

Tamura, Y. (2023). Investigation of the relationship between animacy and L2 learners’ acquisition of the English plural morpheme. Journal of Psycholinguistic Research, 52, 675–690. https://doi.org/10.1007/s10936-022-09915-2
Tamura, Y. (2023). Is cats one word or two? L2 Learners’ processing of number marking in English from the viewpoints of form–meaning mapping. Second Language Research. https://doi.org/10.1177/02676583231188933

注意事項①

本WSでは，Rの基本的な動作トラブルにつきましては個別に対応できない場合がございます
Rの基本的な操作等に習熟されたい方につきましては，書籍等御覧ください。
『Rによる教育データ分析入門』
教育・心理系研究のためのＲによるデータ分析―論文作成への理論と実践集
最近はChatGPTもRについての質問に割りと答えてくれます（間違えたりもしますが）

注意事項②

本WSは3時間の設定となっていますが，実際のGLMMの習得には，その倍以上の復習が必要となります
今回使用するコードやデータはすべてweb上で公開していますので，復習の際にご利用ください
- GitHub: https://github.com/tam07pb915/JLTA_2023_WS/
個別の分析に関するご相談にはお応えできない可能性が高いですので，参考文献をご参照ください

注意事項③

今回は，時間の都合上により線形混合効果モデル（Linear Mixed-Effect Model）は扱いません
ただし，LMEの発展がGLMMであり，LMEは確率分布の指定が不要なだけですのでR上の分析はほぼ同じで，Rの関数で言えば下記のようになります
- lme4::glmer()-> GLMM
- lme4::lmer() -> LME
評定値のデータの分析について
- 事前のアンケートで評定値データの扱いを書かれている方がいましたがすみません，今回は扱いません
- リッカート尺度の分析方法
  - 間隔尺度とみなして線形モデルで分析する（lme4::lmer()使う）
  - 順序尺度とみなして順序ロジットモデルで分析（ordinal::clmm()を使う）
  - 順序ロジットについては次の論文が参考になります：小林雄一郎. (2022). 学習者コーパス研究におけるマルチレベル順序ロジットモデルの活用. https://doi.org/10.31219/osf.io/tkqym

簡単に自己紹介

自身のこと

名古屋大学大学院国際開発研究科博士課程修了（2018年3月修了）
2018年4月より関西大学外国語学部に着任
専門は第二言語の文法習得や文処理
- Tamura, Y. (2023). Is cats one word or two? L2 Learners’ processing of number marking in English from the viewpoints of form–meaning mapping. Second Language Research. https://doi.org/10.1177/02676583231188933
- Tamura, Y., Fukuta, J., Nishimura, Y., & Kato, D. (2022). Rule-based or efficiency-driven processing of expletive there in English as a foreign language. International Review of Applied Linguistics in Language Teaching. https://doi.org/10.1515/iral-2021-0156
趣味はサッカー観戦でガンバ大阪を応援しています

データ分析について

2014年(D1年時)に俗に言う「緑本」（久保, 2012）を読んでから統計モデリングを勉強するように
Nagoya.R #12という勉強会で，「一般化線形混合モデル入門の入門」というRを使った線形混合効果モデルのやり方について発表
その後から自分の研究でもLMEやGLMMを使うように
- Tamura, Y. (2015). Reinvestigating consciousness-raising grammar task and noticing. JABAET Journal, 19, 19–47. -> mixed-effect logistic regression
- Tamura, Y., Fukuta, J., Nishimura, Y., Harada, Y., Hara, K., & Kato, D. (2019). Japanese EFL learners’ sentence processing of conceptual plurality: An analysis focusing on reciprocal verbs. Applied Psycholinguistics, 41, 59–91. -> Gamma GLMM
気づいたらこのこのネタでのトークを頼まれるように
- 田村祐 (2019). 「統計ワークショップ」JACET英語語彙・英語辞書・リーディング研究会合同研究会. 早稲田大学.資料
- 田村祐 (2021). 「一般化線形混合モデルの実践ー気をつけたい3つのポイント」連続公開講座「データサイエンス時代の英語教育」第2回言語教育研究の実際. 主催：名古屋大学大学院人文学研究科英語教育分野. オンライン開催. 投影資料
- 田村祐（2022). 「Rによる混合効果モデル分析の勘所」外国語教育メディア学会（LET）関西支部メソドロジー研究部会2021年度第3回研究会.
今回の発表資料の中には上記のワークショップ等で用いた資料と重複する部分もあります

一般（化）線形（混合）モデルの概略

確率分布を考える

一般線形モデル
- いわゆる普通の回帰分析（単回帰，重回帰）
- 正規分布が仮定される
- t検定や分散分析も線形モデルの一種
一般化線形モデル
- 正規分布以外の確率分布族を用いるもの
  - 回数データ（単位時間あたりの生起回数）->ポアソン分布，負の二項分布
  - 二値データ（正答・誤答）-> 二項分布
  - 非負の連続量（反応時間等）->ガンマ分布，逆正規分布

なぜ一般”化”線形モデルなのか

世の中には正規分布しないデータもたくさんある
正規分布してなくてもデータ分析はできる
データが生み出されたメカニズムに着目して分析できる
確率分布とリンク関数の組み合わせでより柔軟でデータの実体に即した分析が可能
導入としては，『緑本』がおすすめです

久保拓哉(2012). 『データ解析のための統計モデリング入門：一般化線形モデル・階層ベイズモデル・MCMC』. 岩波書店

混合効果（mixed effect）とはなにか

固定効果＋変量効果のこと
分散分析でいうF1（被験者分析）とF2（項目分析）を同時にやってしまうようなもの
固定効果->説明変数（独立変数）のこと。いわゆる研究者が見たいもの
変量効果->応答変数（従属変数）に影響を与えてしまうばらつき
- 追試を行うときに変わるモノ->変量効果
  - 例: 単語の頻度効果を調べる追試
    - 固定効果->単語の頻度（高・低またはあるコーパスでの頻度）
    - 変量効果->実験参加者，単語（もとの研究と同じ刺激語であっても）
実験系でよく扱うデータは参加者と項目（e.g,テスト項目や刺激文）のばらつきを考慮しなくてはいけない
- 例1：ある項目・参加者だけ難しい（or 反応時間が長い）->ランダム切片
- 例2：ある項目・参加者だけ頻度効果（説明変数の影響）が大きい->ランダム傾き

応用言語学の分野でも新しくない

Rと，混合効果モデルが簡単に実装可能なlme4パッケージの爆発的普及で一般化
混合効果モデルのイントロ論文として私がいつもおすすめするもの
- Cunnings, I. (2012). An overview of mixed-effects statistical models for second language researchers. Second Language Research, 28, 369–382. doi:10.1177/0267658312443651
- Linck, J. A. & Cunnings, I. (2015), The utility and application of mixed‐effects models in second language research. Language Learning, 65, 185-207. doi:10.1111/lang.12117
- Gries, S. T. (2021). (Generalized Linear) Mixed-Effects Modeling: A Learner Corpus Example. Language Learning, 71(3), 757–798. https://doi.org/10.1111/lang.12448
- Meteyard, L., & Davies, R. A. I. (2020). Best practice guidance for linear mixed-effects models in psychological science. Journal of Memory and Language, 112, 104092. https://doi.org/10.1016/j.jml.2020.104092

これだけは言いたい

“[T]here is no single correct way to implement an LMM, and…the choices they [researchers] make during analysis will comprise one path, however justified, amongst multiple alternatives.” (Meteyard & Davies, 2020, pp.1–2)

混合効果モデルの数学的な背景（Barr et al., 2013とそれに基づいた山口, 2018より)

一般的な単回帰モデル

固定効果（独立変数）が一つのシンプルな単回帰モデルを考えてみます。

\[Y_{si} = \beta_0 + \beta_1 X_i + e_{si}, \quad e_{si} \sim N(0, \sigma^2)\]

\(Y_{si}\) : \(_s\)番目の参加者の\(_i\)番目の項目に対する反応（従属変数，または応答変数）
\(\beta_0\): 固定効果の切片
\(\beta_1\): 固定効果の傾き
\(X_i\): 独立変数（または説明変数）
\(e_{si}\): 平均が0，分散\(\sigma^2\)の正規分布(\(N\))に従う誤差
Rのコードだと下記のようになります。

lm(Y ~ X, data=data)

混合効果モデルの回帰式

切片に変量効果を考慮したモデル

\[ Y_{si} = \beta_0 + \color{Red}{S_{0s}} + \beta_1 X_i + e_{si},\\ \color{Red}{\quad S_{0s} \sim N(0, \tau_{00}^2)}, \\ \quad e_{si} \sim N(0, \sigma^2)\]

\(Y_{si}\) : \(_s\)番目の参加者の\(_i\)番目の項目に対する反応（従属変数，または応答変数）
\(\beta_0\): 固定効果の切片
\(\beta_1\): 固定効果の傾き
\(X_i\): 独立変数（または説明変数）
\(e_{si}\): 平均が0，分散\(\sigma^2\)の正規分布(\(N\))に従う誤差
\(\color{Red}{S_{0s}}\): 参加者ごとに変動する切片（ランダム切片）で，平均が0，\(\tau_{00}^2\)の正規分布（\(N\)）に従う
ラテンのシンボルである\(\beta_0\)，\(\beta_1\), \(\sigma^2\), \(\tau_{00}^2\)はすべてパラメータで，これを推定することになる
Rのコードだと下記のようになります。

model <-lmer(Y ~ X + (1|subject), data=data)

参加者だけではなく項目の切片にも変量効果を考慮したモデル

\[ Y_{si} = \beta_0 + S_{0s} + \color{Red}{I_{0i}} + \beta_1 X_i + e_{si}, \\ \quad S_{0s} \sim N(0, \tau_{00}^2), \\ \color{Red}{\quad I_{0i} \sim N(0, \omega_{00}^2)}, \\ \quad e_{si} \sim N(0, \sigma^2) \]

\(Y_{si}\) : \(_s\)番目の参加者の\(_i\)番目の項目に対する反応（従属変数，または応答変数）
\(\beta_0\): 固定効果の切片
\(S_{0s}\): 参加者ごとに変動する切片（ランダム切片）で，平均が0，\(\tau_{00}^2\)の正規分布（\(N\)）に従う
\(\color{Red}{I_{0i}}\): 項目ごとに変動する切片（ランダム切片）で，平均が0，\(\omega_{00}^2\)の正規分布（\(N\)）に従う
\(\beta_1\): 固定効果の傾き
\(X_i\): 独立変数（または説明変数）
\(e_{si}\): 平均が0，分散\(\sigma^2\)の正規分布(\(N\))に従う誤差
Rのコードだと下記のようになります。

model <-lmer(Y ~ X + (1|subject)+(1|item), data=data)

切片だけではなく傾きの変量効果も考慮したモデル(まずは参加者の傾きだけ)

\[ Y_{si} = \beta_0 + S_{0s} + I_{0i} + \color{Red}{(\beta_1 + S_{1s}) X_i} + e_{si},\\ \color{Red}{(S_{0s},S_{1s}) \sim N \left( 0, \begin{pmatrix} \tau_{00}^2 & \rho \tau_{00} \tau_{11} \\ \rho \tau_{00} \tau_{11} & \tau_{11}^2 \end{pmatrix} \right)},\\ I_{0i} \sim N(0, \omega_{00}^2),\\ \quad e_{si} \sim N(0, \sigma^2) \]

\(Y_{si}\) : \(_s\)番目の参加者の\(_i\)番目の項目に対する反応（従属変数，または応答変数）
\(\beta_0\): 固定効果の切片
\(S_{0s}\): 参加者ごとに変動する切片（ランダム切片）で，平均が0，\(\tau_{00}^2\)の正規分布（\(N\)）に従う
\(I_{0i}\): 項目ごとに変動する切片（ランダム切片）で，平均が0，\(\omega_{00}^2\)の正規分布（\(N\)）に従う
\(\beta_1\): 固定効果の傾き
\(\color{Red}{S_{1s}}\): 参加者ごとに変動する傾き（ランダム傾き）で，平均が0，\(\tau_{11}^2\)の正規分布（\(N\)）に従う
\(X_i\): 独立変数（または説明変数）
\(e_{si}\): 平均が0，分散\(\sigma^2\)の正規分布(\(N\))に従う誤差
一般的に，切片と傾きには相関があることが仮定されたモデルを作ります
つまり，\(S_{0s}\)（切片）と\(S_{1s}\)（傾き）が独立した正規分布に従うのではなく，一つの二変量正規分布から推定されるということになります
- \(\color{Red}{\tau_{00}^2}\): 参加者のランダム切片の分散
- \(\color{Red}{\tau_{11}^2}\): 参加者のランダム傾きの分散
- \(\color{Red}{\rho}\): 参加者のランダム切片とランダム傾きの間の相関係数
- \(\color{Red}{\rho \tau_{00} \tau_{11}}\): 参加者のランダム切片とランダム傾きの共分散
- Rのコードだと下記のようになります。

model <-lmer(Y ~ X + (1+X|subject)+(1|item), data=data)

項目の傾きも入れたモデル（最大モデル）

\[ Y_{si} = \beta_0 + S_{0s} + I_{0i} + (\beta_1 + S_{1s} + \color{red}{I_{1i}}) X_i + e_{si}, \] \[ \begin{pmatrix} S_{0s},S_{1s} \end{pmatrix} \sim N \left( 0,\begin{pmatrix} \tau_{00}^2 & \rho \tau_{00} \tau_{11} \\ \rho \tau_{00} \tau_{11} & \tau_{11}^2 \end{pmatrix} \right), \] \[ \color{red}{ \begin{pmatrix} I_{0i} ,I_{1i} \end{pmatrix} } \sim N \left( 0, \begin{pmatrix} \omega_{00}^2 & \rho' \omega_{00} \omega_{11} \\ \rho' \omega_{00} \omega_{11} & \omega_{11}^2 \end{pmatrix} \right), \] \[ e_{si} \sim N(0, \sigma^2) \]

\(Y_{si}\) : \(_s\)番目の参加者の\(_i\)番目の項目に対する反応（従属変数，または応答変数）
\(\beta_0\): 固定効果の切片
\(S_{0s}\): 参加者ごとに変動する切片（ランダム切片）で，平均が0，\(\tau_{00}^2\)の正規分布（\(N\)）に従う
\(I_{0i}\): 項目ごとに変動する切片（ランダム切片）で，平均が0，\(\omega_{00}^2\)の正規分布（\(N\)）に従う
\(\beta_1\): 固定効果の傾き
\(S_{1s}\): 参加者ごとに変動する傾き（ランダム傾き）で，平均が0，\(\tau_{11}^2\)の正規分布（\(N\)）に従う
\(I_{0i}\): 項目ごとに変動する傾き（ランダム傾き）で，平均が0，\(\omega_{11}^2\)の正規分布（\(N\)）に従う
\(X_i\): 独立変数（または説明変数）
\(e_{si}\): 平均が0，分散\(\sigma^2\)の正規分布(\(N\))に従う誤差
参加者のランダム効果（\(S_{0s}\)と\(S_{1s}\)）だけでなく，項目のランダム効果（\(I_{0i}\)と\(I_{1i}\))についても同様の分散共分散行列があることがわかります
- \(\color{Red}{\omega_{00}^2}\): 項目のランダム切片の分散
- \(\color{Red}{\omega_{11}^2}\): 項目のランダム傾きの分散
- \(\color{Red}{\rho}\): 項目のランダム切片とランダム傾きの間の相関係数
- \(\color{Red}{\rho \omega_{00} \omega_{11}}\): 項目のランダム切片とランダム傾きの共分散
- Rのコードだと下記のようになります。

model <-lmer(Y ~ X + (1+X|subject)+(1+X|item), data=data)

いくつかのポイント

統制のとれた項目（問題アイテム，刺激文）であれば，基本的には項目のランダム効果は参加者のそれより小さくなる傾向があります
項目のランダム傾きをモデルに入れるかどうかは，実験デザインに依存します（※なんでもかんでもランダム効果に入れればいいわけではない）
- ランダム傾きを推定する固定効果が一つの項目に対して複数条件ある場合は項目のランダム傾きを考慮する意味がある（例：文法性判断課題における正文・非文の影響）
- 同じ項目で複数条件ないような場合は項目のランダム傾きを考慮する意味がない（例：文法性判断課題における文の長さの影響）
- もちろん，文法性判断課題における非文と正文で文の長さが変わる（語数が変わる）ような場合はランダム傾きを考慮する意味が出てきます
ランダム効果の推定において，ランダム傾きとランダム切片の間の相関を仮定する最も複雑なモデルは，推定がうまくいかないこともあります
- 複数のランダム傾きがモデルに含まれる場合，推定するパラメータの数がかなり多くなるため
- そもそも傾きと切片にほとんど相関が見られない場合
その場合，ランダム切片と傾きの間の相関パラメータを抜いて，切片と傾きがそれぞれ独立した正規分布に従うという推定をします（あとでR上でやります）

今回のWSで扱うロジスティック回帰分析の概要(田村, 2019,とほぼ同じ説明です)

0/1（成功・失敗）のデータを扱うときに使う
一般的な「テスト」のデータも，1問が正解・不正解という2値のデータの蓄積
良くない例
- 15点満点のデータを13点，10点，9点…のように扱う（欠損があったらどうする？）
- 比率データに置き換える（連続変数ですか？もとはカテゴリカルデータですよね？）

二項分布を用いる

\(n\)回（ただし \(n>0\) ）の試行のうちの成功回数 \(k\) が発生する確率 \(q\) を求める
各試行は独立な試行（ベルヌーイ試行）

\[p(k|N,q)=_{N}C_{k}*q^{k}(1-q)^{N-k}\]

コインを10回投げて4回表が出る確率は？（ただしq=0.5とする）
\(p(4|10,0.5)=_{10}C_{4}0.5^{4}(1-0.5)^{10-4}=\) 0.205

ロジスティック関数の導入

*例：30人の学習者がいて，それぞれの学習者がある文法問題に正解する確率を \(q_{i}\)としたときのロジスティック回帰モデル（ \(z_{i}\)には線形予測子が入っていると考える）

\[q_{i}={\rm logistic} (z_{i})=\frac{1}{1+\exp(-z_{i})}\]

ロジスティック関数からロジット関数へ

先ほどのロジスティック関数を変形すると次のようになる

\[{\rm log}(\frac{q_{i}}{1-q_{i}})=z_{i}\]

この式の左辺はロジット関数と呼ばれる
ロジスティック関数とロジット関数は逆関数の関係にある
ロジスティクス関数と線形予測子の関係をつなぐリンク関数がロジットリンク関数

ロジスティック回帰のまとめ

下記のような方程式を解くこと

\[{\rm log}(\frac{q_{i}}{1-q_{i}})=\beta_0 + \beta_1x_{ij}...\]

個人ごとの成功確率である \(q_{i}\)は得られたデータからわかることであり， \(X_{1}\)は独立変数の値である
\(\beta_{0}\)と \(\beta_{1}...\)を最尤推定(maximum Likelihood estimation)で求める
オッズ（ \(\frac{q_{i}}{1-q_{i}}\)）の対数(\(log\))が線形予測子と等しいので次のようにも表現できる

\[\frac{q_{i}}{1-q_{i}}=\exp(\beta_{0}+\beta_{1}X_{1}...)\]

\[=\exp(\beta_{0})*\exp(\beta_{1}X_{1})...\]

実践編

分析するデータはどんなデータか

Terai et al. (2023)は日本語を第一言語とする英語学習者を対象に，コロケーションの容認性判断課題を実施
実験では，コンピュータ画面上に提示される2語のコロケーションが英語として自然であるかの判断が求められた
その後，参加者はコロケーションを英語から日本語に翻訳する課題を行い，その際の翻訳の難しさを4段階で評価することを求められた
翻訳の難しさ評定値は第一言語の影響（L1 influence）の代理指標
コロケーションには次の3つの種類とベースライン条件項目（英語でも日本語でも不自然なコロケーション）があった
- 一致性のあるコロケーション（JE collocations）：thick fog/厚い霧, bitter experience/苦い経験
- 日本語にのみ存在するコロケーション（J-only collocations）：?sweet judgement/甘い判断，熱い戦い/hot battle
- 英語にのみ存在するコロケーション（E-only collocations）：busy road/?忙しい道/混雑した道，lucky guess/?運のいい推測/まぐれあたりの推測
- ベースライン条件項目：ランダムな形容詞と名詞と形容詞の組み合わせ(e.g., severe learner, proud idea)

研究課題

翻訳の難しさ評定値（L1 influence）が容認性判断課題の正答率に与える影響は熟達度によって変わるのか
コロケーションのタイプによって，熟達度の影響は異なるのか

というのが実際の論文なのですが，今回は実際の論文で行われたものと少し異なる分析をします（実際の分析では名義尺度の変数をつかっていませんが，名義尺度の変数の分析も扱いたいため）。

準備

RStudioの起動

ファイルのダウンロード
1. GitHubにいき，“Resources.zip”をクリック
2. 右端のダウンロードアイコンをクリックしてダウンロードして，zipファイルを解凍
3. 解凍したZIPファイルの中のWS.Rprojを起動
4. slide.Rmdの中にあるRコードを走らせていく
コードの実行の仕方
- Rmdファイル上でコードを実行する場合
  - カーソルのある行のコードを実行
    - Windows: Ctrl + Enter
    - Mac: Command + Enter
  - チャンクに囲まれているコード（あるいは複数のコード）をまとめて実行　 - Windows: Ctrl + Shift + Enter
    - Mac: Command + Shift + Enter
- コードをコピペして実行する場合
- >のマークが出ているところに，コンソール（RStudioの左下のペイン）にコピペしてEnter
- 右上にRmdファイルを開いて，そこでコードを実行するほうがいいです
コードを実行した結果がコンソールに表示されるかは設定で選択可
- Tools -> Global Options -> Rmarkdown
- Show output inline for all R markdown documentsのチェックボックス

すすめ方

ここからは，実際にRを動かしながらデータ分析を行っていきます
これより先，灰色の背景で囲まれた部分は全てRのコードを示しているとご理解ください
GitHubからslide.Rmdファイルまたはslide.htmlファイルをDLしてコピペすることをおすすめします

データ分析の下準備

パッケージのインストール

下記のコード（#以降はコメントアウトですので入力しなくてもOK）を実行してパッケージのインストールをお願いします。

install.packages("lme4") #分析
install.packages("sjPlot") #可視化
install.packages("emmeans") #下位検定と可視化
install.packages("tidyverse") #ハンドリング
install.packages("psych") #記述統計
install.packages("performance") #信頼性係数等
install.packages("irr") #評定者間信頼性
install.packages("gplots") #可視化
install.packages("car") #モデルの評価

パッケージの読み込み

インストールしたパッケージを使えるようにします。

library(lme4)
library(sjPlot)
library(emmeans)
library(tidyverse)
library(psych)
library(performance)
library(irr)
library(gplots)
library(car)

データの読み込み

Terai et al. (2023)のデータを読み込みます。必ずAJ.csvのファイルが今作業しているフォルダの中にあることを確認してください。

AJ <-read.csv("AJ.csv", header = T)[,-1] #[,-1]がついているのは，1列目の行番号が不要だからです

データの確認

head(AJ)

##   subject itemID pres.order        Item Itemtype Length ColFreq AdjFreq
## 1       2      1         87   thick fog       JE      9     145   29284
## 2       2      2         61 stupid idea       JE     11     112   17762
## 3       2      3         31  thin lips        JE      9     266   30704
## 4       2      4        121   deep love       JE      9     140   72697
## 5       2      5         43  small room       JE     10    1010  218756
## 6       2      6         98  long sleep       JE     10      82  339087
##   NounFreq MIScore answer res   RT
## 1     7176   14.24   TRUE   1 1273
## 2   126766   12.13   TRUE   1 1012
## 3    25883   14.18   TRUE   1 1265
## 4   192472   11.53   TRUE   1  930
## 5   212579   11.35   TRUE   1 1016
## 6    49735   10.71   TRUE   1  771

subject: 参加者
itemID: 項目番号
pres.order: 提示順序
Item: 提示された2語の刺激
Itemtype: コロケーションのタイプ（JE, Jonly, Eonly, Baseline)
Length: 2語の刺激の長さ（文字数）
ColFreq: コロケーションの頻度
AdjFreq: コロケーションの1語目の形容詞の頻度
NounFreq: コロケーションの2語目の名詞の頻度
MIScore: 相互情報量（MI）スコア（2つの語がどれだけ共起しやすいか）
answer: 模範解答
res: 参加者の反応（0: 誤答，1: 正答）
RT: 反応時間

課題の信頼性係数（Cronbach’s alpha）をチェック

まずは課題の信頼性をチェックします。ただ，データの型がその分析のために合っていませんよね？回帰モデルのためには縦型（long型）のデータ形式である必要がありますが，項目分析や信頼性係数の算出のためには行が項目，列が参加者の行列になっている必要があります。

AJ %>%
  select(subject, itemID, res) %>% #被験者ID，項目ID，参加者の0/1の反応だけを選んでね
  pivot_wider(names_from = itemID, values_from = res) %>% #itemIDを列番号にして，セルに入る数値はresから持ってきてね
  {as.data.frame(.[,-1])} %>% #データ・フレーム形式にして，1列目にあるsubject列は信頼性係数分析のためには不必要なので抜いてね
  psych::alpha(.,n.iter=2000) #ここまで処理したデータをpsychパッケージのalpha()関数に渡してね

## The determinant of the smoothed correlation was zero.
## This means the objective function is not defined.
## Chi square is based upon observed residuals.

## The determinant of the smoothed correlation was zero.
## This means the objective function is not defined for the null model either.
## The Chi square is thus based upon observed correlations.

## Some items ( 1 2 3 5 6 7 8 9 10 14 16 18 19 20 21 22 23 24 25 26 27 29 30 32 33 34 35 36 37 38 39 40 41 42 44 45 46 59 74 78 111 117 130 ) were negatively correlated with the total scale and 
## probably should be reversed.  
## To do this, run the function again with the 'check.keys=TRUE' option

## In smc, smcs < 0 were set to .0

## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0

## 
## Reliability analysis   
## Call: psych::alpha(x = ., n.iter = 2000)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean    sd median_r
##        0.8       0.8     0.8      0.03 4.1 0.048 0.72 0.074    0.014
## 
##     95% confidence boundaries 
##              lower alpha upper
## Feldt         0.68  0.80  0.88
## Duhachek      0.70  0.80  0.89
## bootstrapped  0.64  0.79  0.87
## 
##  Reliability if an item is dropped:
##     raw_alpha std.alpha G6(smc) average_r S/N var.r med.r
## 1        0.80      0.81    0.80     0.030 4.2 0.044 0.014
## 2        0.80      0.81    0.80     0.030 4.1 0.043 0.014
## 3        0.80      0.81    0.80     0.031 4.2 0.044 0.014
## 4        0.79      0.80    0.80     0.030 4.1 0.044 0.014
## 5        0.80      0.81    0.80     0.031 4.2 0.044 0.014
## 6        0.80      0.81    0.80     0.031 4.2 0.044 0.014
## 7        0.80      0.81    0.80     0.031 4.2 0.044 0.014
## 8        0.80      0.81    0.80     0.030 4.1 0.043 0.014
## 9        0.80      0.80    0.80     0.030 4.1 0.043 0.014
## 10       0.80      0.81    0.80     0.031 4.2 0.043 0.014
## 14       0.80      0.81    0.80     0.030 4.1 0.043 0.014
## 15       0.80      0.80    0.80     0.030 4.1 0.043 0.014
## 16       0.81      0.81    0.80     0.031 4.3 0.043 0.014
## 18       0.80      0.80    0.80     0.030 4.1 0.043 0.014
## 19       0.80      0.81    0.80     0.031 4.3 0.043 0.014
## 20       0.80      0.81    0.80     0.030 4.1 0.044 0.014
## 21       0.80      0.81    0.80     0.030 4.1 0.043 0.014
## 22       0.80      0.81    0.80     0.031 4.3 0.043 0.014
## 23       0.80      0.81    0.80     0.031 4.2 0.044 0.014
## 24       0.81      0.81    0.81     0.031 4.3 0.043 0.014
## 25       0.80      0.80    0.80     0.030 4.1 0.044 0.014
## 26       0.80      0.81    0.80     0.031 4.2 0.044 0.014
## 27       0.81      0.81    0.81     0.032 4.4 0.043 0.014
## 28       0.79      0.80    0.80     0.030 4.1 0.044 0.014
## 29       0.80      0.81    0.80     0.030 4.1 0.044 0.014
## 30       0.80      0.81    0.80     0.031 4.2 0.043 0.014
## 31       0.80      0.81    0.80     0.030 4.2 0.043 0.014
## 32       0.80      0.81    0.80     0.031 4.3 0.043 0.014
## 33       0.80      0.81    0.80     0.030 4.2 0.043 0.014
## 34       0.80      0.81    0.80     0.030 4.2 0.043 0.014
## 35       0.80      0.81    0.80     0.031 4.3 0.043 0.014
## 36       0.80      0.81    0.80     0.031 4.2 0.043 0.014
## 37       0.80      0.81    0.81     0.032 4.3 0.043 0.014
## 38       0.80      0.81    0.80     0.031 4.2 0.043 0.014
## 39       0.81      0.81    0.81     0.032 4.3 0.043 0.014
## 40       0.80      0.81    0.80     0.031 4.2 0.043 0.014
## 41       0.80      0.81    0.80     0.031 4.2 0.043 0.014
## 42       0.80      0.81    0.80     0.030 4.2 0.043 0.014
## 43       0.79      0.80    0.79     0.029 4.0 0.043 0.011
## 44       0.80      0.81    0.80     0.030 4.2 0.044 0.014
## 45       0.81      0.81    0.81     0.031 4.3 0.043 0.014
## 46       0.80      0.80    0.80     0.030 4.1 0.043 0.014
## 47       0.80      0.80    0.80     0.030 4.1 0.044 0.014
## 48       0.79      0.80    0.80     0.030 4.1 0.043 0.014
## 49       0.80      0.81    0.80     0.030 4.1 0.043 0.014
## 50       0.80      0.80    0.80     0.030 4.1 0.043 0.014
## 51       0.80      0.80    0.80     0.030 4.1 0.044 0.014
## 52       0.79      0.80    0.80     0.030 4.1 0.043 0.011
## 53       0.79      0.80    0.80     0.030 4.1 0.043 0.014
## 54       0.79      0.80    0.80     0.029 4.0 0.044 0.014
## 55       0.80      0.81    0.80     0.030 4.2 0.043 0.014
## 56       0.80      0.81    0.80     0.030 4.1 0.043 0.014
## 57       0.80      0.80    0.80     0.030 4.1 0.043 0.014
## 58       0.79      0.80    0.79     0.029 3.9 0.043 0.012
## 59       0.80      0.81    0.80     0.030 4.1 0.044 0.014
## 60       0.79      0.80    0.80     0.030 4.1 0.043 0.014
## 61       0.79      0.80    0.79     0.029 4.0 0.043 0.014
## 62       0.80      0.81    0.80     0.030 4.1 0.043 0.014
## 63       0.79      0.80    0.79     0.029 4.0 0.043 0.014
## 64       0.79      0.80    0.79     0.029 3.9 0.043 0.014
## 65       0.79      0.80    0.79     0.028 3.9 0.043 0.011
## 66       0.80      0.80    0.80     0.030 4.1 0.044 0.014
## 67       0.79      0.80    0.80     0.030 4.1 0.043 0.014
## 68       0.79      0.80    0.79     0.029 4.0 0.043 0.012
## 69       0.79      0.80    0.80     0.030 4.1 0.043 0.014
## 70       0.80      0.81    0.80     0.031 4.2 0.044 0.014
## 71       0.79      0.80    0.80     0.029 4.0 0.044 0.014
## 72       0.79      0.80    0.80     0.030 4.0 0.043 0.014
## 73       0.80      0.80    0.80     0.030 4.1 0.043 0.014
## 74       0.80      0.81    0.80     0.030 4.2 0.044 0.014
## 75       0.79      0.80    0.79     0.029 3.9 0.043 0.014
## 76       0.79      0.80    0.79     0.029 3.9 0.043 0.011
## 77       0.79      0.80    0.80     0.030 4.0 0.044 0.014
## 78       0.80      0.81    0.80     0.030 4.2 0.044 0.014
## 79       0.79      0.80    0.80     0.030 4.0 0.043 0.014
## 80       0.79      0.80    0.79     0.029 4.0 0.043 0.014
## 81       0.79      0.80    0.79     0.029 4.0 0.043 0.014
## 82       0.79      0.80    0.79     0.029 4.0 0.043 0.014
## 83       0.80      0.81    0.80     0.030 4.1 0.044 0.014
## 84       0.79      0.80    0.79     0.029 4.0 0.043 0.014
## 85       0.79      0.80    0.79     0.029 4.0 0.043 0.012
## 86       0.79      0.80    0.80     0.029 4.0 0.043 0.014
## 87       0.79      0.80    0.79     0.029 3.9 0.043 0.014
## 88       0.79      0.80    0.80     0.030 4.1 0.043 0.014
## 89       0.79      0.80    0.80     0.030 4.1 0.043 0.014
## 90       0.79      0.80    0.79     0.029 3.9 0.043 0.012
## 91       0.80      0.80    0.80     0.030 4.1 0.044 0.014
## 92       0.79      0.80    0.79     0.029 4.0 0.043 0.014
## 93       0.79      0.80    0.79     0.029 4.0 0.043 0.014
## 94       0.79      0.80    0.79     0.029 4.0 0.043 0.012
## 95       0.79      0.80    0.79     0.029 4.0 0.043 0.014
## 96       0.79      0.80    0.80     0.030 4.1 0.043 0.014
## 97       0.79      0.80    0.79     0.029 4.0 0.043 0.012
## 98       0.79      0.80    0.79     0.029 4.0 0.043 0.014
## 99       0.79      0.80    0.79     0.029 4.0 0.043 0.014
## 100      0.79      0.80    0.79     0.029 4.0 0.043 0.012
## 101      0.80      0.80    0.80     0.030 4.1 0.044 0.014
## 102      0.80      0.80    0.80     0.030 4.1 0.043 0.014
## 103      0.79      0.80    0.80     0.030 4.1 0.044 0.012
## 104      0.79      0.80    0.79     0.028 3.9 0.043 0.012
## 105      0.79      0.80    0.79     0.029 3.9 0.043 0.014
## 106      0.79      0.80    0.79     0.029 4.0 0.043 0.012
## 107      0.79      0.80    0.79     0.029 3.9 0.043 0.012
## 108      0.79      0.80    0.80     0.030 4.1 0.043 0.014
## 109      0.79      0.80    0.79     0.029 3.9 0.043 0.014
## 110      0.80      0.80    0.80     0.030 4.1 0.044 0.014
## 111      0.80      0.81    0.80     0.031 4.3 0.044 0.014
## 112      0.79      0.80    0.80     0.029 4.0 0.043 0.014
## 113      0.79      0.80    0.79     0.029 4.0 0.043 0.014
## 114      0.79      0.80    0.80     0.029 4.0 0.043 0.014
## 115      0.79      0.80    0.79     0.029 4.0 0.043 0.014
## 116      0.79      0.80    0.79     0.029 4.0 0.043 0.014
## 117      0.80      0.81    0.80     0.031 4.2 0.044 0.014
## 118      0.79      0.80    0.79     0.029 4.0 0.043 0.013
## 119      0.79      0.80    0.80     0.030 4.1 0.044 0.014
## 120      0.79      0.80    0.79     0.029 3.9 0.043 0.014
## 121      0.79      0.80    0.79     0.029 4.0 0.043 0.012
## 122      0.79      0.80    0.80     0.029 4.0 0.043 0.014
## 123      0.79      0.80    0.79     0.029 4.0 0.043 0.012
## 124      0.80      0.81    0.80     0.030 4.1 0.044 0.014
## 125      0.79      0.80    0.79     0.029 4.0 0.043 0.014
## 126      0.79      0.80    0.80     0.029 4.0 0.043 0.014
## 127      0.79      0.80    0.79     0.029 3.9 0.043 0.012
## 128      0.79      0.80    0.79     0.029 4.0 0.043 0.014
## 129      0.79      0.80    0.79     0.029 3.9 0.043 0.012
## 130      0.80      0.81    0.80     0.030 4.1 0.043 0.014
## 132      0.79      0.80    0.79     0.029 4.0 0.043 0.014
## 133      0.80      0.80    0.80     0.030 4.1 0.044 0.014
## 134      0.80      0.81    0.80     0.030 4.1 0.044 0.014
## 135      0.79      0.80    0.79     0.029 3.9 0.043 0.014
## 136      0.79      0.80    0.80     0.029 4.0 0.043 0.014
## 137      0.79      0.80    0.79     0.029 3.9 0.043 0.014
## 138      0.79      0.80    0.80     0.029 4.0 0.044 0.014
## 139      0.80      0.80    0.80     0.030 4.1 0.044 0.014
## 
##  Item statistics 
##      n   raw.r   std.r   r.cor   r.drop  mean   sd
## 1   32  0.0556  0.0325 -0.0070  0.00627 0.562 0.50
## 2   32  0.0713  0.0846  0.0513  0.04236 0.906 0.30
## 3   32 -0.1308 -0.1238 -0.1819 -0.17431 0.719 0.46
## 4   32  0.2208  0.2183  0.2009  0.18607 0.844 0.37
## 5   32 -0.1149 -0.0849 -0.1384 -0.13192 0.969 0.18
## 6   32 -0.2377 -0.2038 -0.2715 -0.26478 0.906 0.30
## 7   32 -0.0971 -0.0527 -0.1024 -0.11414 0.969 0.18
## 8   32  0.0100  0.0508  0.0135 -0.00725 0.969 0.18
## 9   32  0.0993  0.1328  0.1052  0.08213 0.969 0.18
## 10  32 -0.0866 -0.0703 -0.1221 -0.13464 0.594 0.50
## 14  32  0.0100  0.0508  0.0135 -0.00725 0.969 0.18
## 15  32  0.1299  0.1729  0.1501  0.10615 0.938 0.25
## 16  32 -0.2919 -0.2246 -0.2947 -0.33637 0.500 0.51
## 18  32  0.0786  0.1320  0.1043  0.05462 0.938 0.25
## 19  32 -0.2759 -0.2385 -0.3104 -0.31637 0.719 0.46
## 20  32  0.0529  0.0758  0.0415  0.02890 0.938 0.25
## 21  32  0.0016  0.0576  0.0211 -0.02245 0.938 0.25
## 22  32 -0.2422 -0.2264 -0.2968 -0.26461 0.938 0.25
## 23  32 -0.2042 -0.1836 -0.2489 -0.22064 0.969 0.18
## 24  32 -0.4066 -0.3788 -0.4673 -0.44490 0.344 0.48
## 25  32  0.0806  0.0959  0.0639  0.03139 0.562 0.50
## 26  32 -0.1800 -0.1897 -0.2557 -0.21714 0.812 0.40
## 27  32 -0.5029 -0.4923 -0.5944 -0.53263 0.219 0.42
## 28  32  0.2491  0.2468  0.2328  0.20652 0.719 0.46
## 29  32  0.0251  0.0756  0.0412 -0.01793 0.750 0.44
## 30  32 -0.1748 -0.1420 -0.2024 -0.22082 0.625 0.49
## 31  32  0.0053 -0.0218 -0.0678 -0.04185 0.344 0.48
## 32  32 -0.2700 -0.2677 -0.3430 -0.30004 0.875 0.34
## 33  32 -0.0393 -0.0447 -0.0934 -0.06815 0.094 0.30
## 34  32 -0.0212  0.0078 -0.0347 -0.06571 0.281 0.46
## 35  32 -0.2618 -0.2259 -0.2962 -0.30117 0.750 0.44
## 36  32 -0.1641 -0.1074 -0.1635 -0.19890 0.844 0.37
## 37  32 -0.3389 -0.4175 -0.5106 -0.37754 0.281 0.46
## 38  32 -0.2474 -0.2051 -0.2729 -0.28717 0.750 0.44
## 39  32 -0.3928 -0.4029 -0.4943 -0.43070 0.312 0.47
## 40  32 -0.0836 -0.0730 -0.1251 -0.13251 0.469 0.51
## 41  32 -0.0850 -0.1095 -0.1659 -0.13242 0.625 0.49
## 42  32 -0.0463 -0.0168 -0.0622 -0.09555 0.469 0.51
## 43  32  0.5395  0.5094  0.5266  0.50349 0.594 0.50
## 44  32 -0.0538 -0.0338 -0.0812 -0.09651 0.250 0.44
## 45  32 -0.3245 -0.3206 -0.4022 -0.36709 0.406 0.50
## 46  32  0.1304  0.1633  0.1393  0.08118 0.500 0.51
## 47  32  0.1239  0.1214  0.0925  0.07958 0.281 0.46
## 48  32  0.2298  0.1747  0.1521  0.18199 0.500 0.51
## 49  32  0.0978  0.0683  0.0331  0.04989 0.375 0.49
## 50  32  0.2421  0.2739  0.2632  0.22571 0.969 0.18
## 51  32  0.2069  0.1676  0.1442  0.18369 0.938 0.25
## 52  32  0.2154  0.2157  0.1980  0.18065 0.156 0.37
## 53  32  0.2275  0.1862  0.1650  0.17977 0.469 0.51
## 54  32  0.3209  0.3079  0.3012  0.27538 0.469 0.51
## 55  32 -0.0053 -0.0159 -0.0612 -0.05244 0.656 0.48
## 56  32  0.1323  0.0811  0.0473  0.09391 0.188 0.40
## 57  32  0.1573  0.1186  0.0893  0.10871 0.438 0.50
## 58  32  0.5971  0.5724  0.5971  0.57316 0.844 0.37
## 59  32  0.0326  0.0430  0.0048 -0.00348 0.844 0.37
## 60  32  0.2496  0.2259  0.2094  0.20260 0.562 0.50
## 61  32  0.4346  0.4043  0.4090  0.39438 0.625 0.49
## 62  32  0.1042  0.0706  0.0356  0.05635 0.375 0.49
## 63  32  0.3770  0.3746  0.3758  0.34296 0.812 0.40
## 64  32  0.5294  0.5276  0.5470  0.50809 0.906 0.30
## 65  32  0.5886  0.6340  0.6661  0.56424 0.844 0.37
## 66  32  0.1894  0.1844  0.1630  0.14099 0.531 0.51
## 67  32  0.2889  0.2519  0.2385  0.24326 0.406 0.50
## 68  32  0.3877  0.3476  0.3456  0.34508 0.594 0.50
## 69  32  0.2486  0.2653  0.2535  0.20201 0.594 0.50
## 70  32 -0.0886 -0.0896 -0.1437 -0.11719 0.906 0.30
## 71  32  0.3191  0.3076  0.3008  0.27684 0.688 0.47
## 72  32  0.3149  0.2786  0.2684  0.28183 0.844 0.37
## 73  32  0.2069  0.2302  0.2142  0.18369 0.938 0.25
## 74  32  0.0073  0.0230 -0.0176 -0.02164 0.906 0.30
## 75  32  0.5614  0.5734  0.5983  0.54106 0.906 0.30
## 76  32  0.5636  0.5387  0.5594  0.52834 0.469 0.51
## 77  32  0.3138  0.2907  0.2820  0.26836 0.438 0.50
## 78  32 -0.0625 -0.0245 -0.0708 -0.08645 0.938 0.25
## 79  32  0.2844  0.2922  0.2836  0.25741 0.906 0.30
## 80  32  0.4011  0.3612  0.3608  0.36258 0.719 0.46
## 81  32  0.4774  0.4615  0.4731  0.44881 0.844 0.37
## 82  32  0.4175  0.4015  0.4059  0.38705 0.844 0.37
## 83  32  0.0963  0.0819  0.0482  0.06355 0.875 0.34
## 84  32  0.3936  0.3640  0.3640  0.35257 0.656 0.48
## 85  32  0.4263  0.3901  0.3932  0.38751 0.688 0.47
## 86  32  0.2550  0.2951  0.2868  0.21597 0.781 0.42
## 87  32  0.5634  0.5762  0.6014  0.55140 0.969 0.18
## 88  32  0.2417  0.2291  0.2129  0.21418 0.906 0.30
## 89  32  0.2077  0.1903  0.1695  0.16434 0.719 0.46
## 90  32  0.6563  0.6302  0.6619  0.63070 0.750 0.44
## 91  32  0.1940  0.2406  0.2259  0.15622 0.812 0.40
## 92  32  0.4437  0.4942  0.5096  0.41674 0.875 0.34
## 93  32  0.3874  0.3812  0.3832  0.35890 0.875 0.34
## 94  32  0.4196  0.3843  0.3867  0.38056 0.688 0.47
## 95  32  0.3576  0.3734  0.3744  0.32557 0.844 0.37
## 96  32  0.2735  0.2622  0.2500  0.23708 0.812 0.40
## 97  32  0.4840  0.4524  0.4629  0.44863 0.719 0.46
## 98  32  0.4393  0.4486  0.4586  0.40006 0.656 0.48
## 99  32  0.4578  0.4170  0.4232  0.42441 0.781 0.42
## 100 32  0.4287  0.4131  0.4188  0.39119 0.719 0.46
## 101 32  0.1738  0.1780  0.1558  0.14551 0.094 0.30
## 102 32  0.1902  0.1900  0.1692  0.15820 0.875 0.34
## 103 32  0.2606  0.2396  0.2248  0.22958 0.125 0.34
## 104 32  0.6682  0.6773  0.7145  0.64445 0.781 0.42
## 105 32  0.4945  0.5233  0.5422  0.46651 0.844 0.37
## 106 32  0.5116  0.5065  0.5235  0.47746 0.719 0.46
## 107 32  0.5774  0.6307  0.6624  0.54760 0.750 0.44
## 108 32  0.2178  0.2480  0.2342  0.18043 0.812 0.40
## 109 32  0.5148  0.5449  0.5664  0.49680 0.938 0.25
## 110 32  0.1542  0.1699  0.1467  0.11598 0.812 0.40
## 111 32 -0.2590 -0.2516 -0.3249 -0.28576 0.906 0.30
## 112 32  0.3526  0.3256  0.3210  0.31131 0.688 0.47
## 113 32  0.4011  0.3860  0.3886  0.36258 0.719 0.46
## 114 32  0.3122  0.3227  0.3177  0.28213 0.875 0.34
## 115 32  0.4052  0.3697  0.3703  0.36834 0.750 0.44
## 116 32  0.4205  0.3689  0.3694  0.37839 0.469 0.51
## 117 32 -0.1792 -0.1899 -0.2559 -0.22186 0.719 0.46
## 118 32  0.4001  0.3702  0.3708  0.35934 0.656 0.48
## 119 32  0.3147  0.2552  0.2422  0.26898 0.469 0.51
## 120 32  0.5800  0.5638  0.5875  0.55533 0.844 0.37
## 121 32  0.4688  0.4698  0.4823  0.43997 0.844 0.37
## 122 32  0.2700  0.3034  0.2962  0.23128 0.781 0.42
## 123 32  0.4127  0.4155  0.4216  0.37778 0.781 0.42
## 124 32  0.0279  0.0587  0.0223  0.01060 0.969 0.18
## 125 32  0.4411  0.4546  0.4653  0.40545 0.750 0.44
## 126 32  0.3674  0.3295  0.3254  0.32553 0.656 0.48
## 127 32  0.5461  0.5220  0.5408  0.51361 0.719 0.46
## 128 32  0.4261  0.4381  0.4469  0.39586 0.844 0.37
## 129 32  0.5376  0.5774  0.6028  0.51363 0.875 0.34
## 130 32  0.0286  0.0624  0.0265 -0.00033 0.906 0.30
## 132 32  0.3686  0.3607  0.3602  0.33968 0.875 0.34
## 133 32  0.1707  0.1581  0.1335  0.15383 0.969 0.18
## 134 32  0.0931  0.0413  0.0028  0.04397 0.562 0.50
## 135 32  0.5752  0.6055  0.6342  0.55256 0.875 0.34
## 136 32  0.3727  0.3303  0.3263  0.33204 0.688 0.47
## 137 32  0.5790  0.6111  0.6405  0.56264 0.938 0.25
## 138 32  0.2978  0.3291  0.3249  0.26437 0.844 0.37
## 139 32  0.1526  0.1728  0.1500  0.12027 0.875 0.34
## 
## Non missing response frequency for each item
##        0    1 miss
## 1   0.44 0.56    0
## 2   0.09 0.91    0
## 3   0.28 0.72    0
## 4   0.16 0.84    0
## 5   0.03 0.97    0
## 6   0.09 0.91    0
## 7   0.03 0.97    0
## 8   0.03 0.97    0
## 9   0.03 0.97    0
## 10  0.41 0.59    0
## 14  0.03 0.97    0
## 15  0.06 0.94    0
## 16  0.50 0.50    0
## 18  0.06 0.94    0
## 19  0.28 0.72    0
## 20  0.06 0.94    0
## 21  0.06 0.94    0
## 22  0.06 0.94    0
## 23  0.03 0.97    0
## 24  0.66 0.34    0
## 25  0.44 0.56    0
## 26  0.19 0.81    0
## 27  0.78 0.22    0
## 28  0.28 0.72    0
## 29  0.25 0.75    0
## 30  0.38 0.62    0
## 31  0.66 0.34    0
## 32  0.12 0.88    0
## 33  0.91 0.09    0
## 34  0.72 0.28    0
## 35  0.25 0.75    0
## 36  0.16 0.84    0
## 37  0.72 0.28    0
## 38  0.25 0.75    0
## 39  0.69 0.31    0
## 40  0.53 0.47    0
## 41  0.38 0.62    0
## 42  0.53 0.47    0
## 43  0.41 0.59    0
## 44  0.75 0.25    0
## 45  0.59 0.41    0
## 46  0.50 0.50    0
## 47  0.72 0.28    0
## 48  0.50 0.50    0
## 49  0.62 0.38    0
## 50  0.03 0.97    0
## 51  0.06 0.94    0
## 52  0.84 0.16    0
## 53  0.53 0.47    0
## 54  0.53 0.47    0
## 55  0.34 0.66    0
## 56  0.81 0.19    0
## 57  0.56 0.44    0
## 58  0.16 0.84    0
## 59  0.16 0.84    0
## 60  0.44 0.56    0
## 61  0.38 0.62    0
## 62  0.62 0.38    0
## 63  0.19 0.81    0
## 64  0.09 0.91    0
## 65  0.16 0.84    0
## 66  0.47 0.53    0
## 67  0.59 0.41    0
## 68  0.41 0.59    0
## 69  0.41 0.59    0
## 70  0.09 0.91    0
## 71  0.31 0.69    0
## 72  0.16 0.84    0
## 73  0.06 0.94    0
## 74  0.09 0.91    0
## 75  0.09 0.91    0
## 76  0.53 0.47    0
## 77  0.56 0.44    0
## 78  0.06 0.94    0
## 79  0.09 0.91    0
## 80  0.28 0.72    0
## 81  0.16 0.84    0
## 82  0.16 0.84    0
## 83  0.12 0.88    0
## 84  0.34 0.66    0
## 85  0.31 0.69    0
## 86  0.22 0.78    0
## 87  0.03 0.97    0
## 88  0.09 0.91    0
## 89  0.28 0.72    0
## 90  0.25 0.75    0
## 91  0.19 0.81    0
## 92  0.12 0.88    0
## 93  0.12 0.88    0
## 94  0.31 0.69    0
## 95  0.16 0.84    0
## 96  0.19 0.81    0
## 97  0.28 0.72    0
## 98  0.34 0.66    0
## 99  0.22 0.78    0
## 100 0.28 0.72    0
## 101 0.91 0.09    0
## 102 0.12 0.88    0
## 103 0.88 0.12    0
## 104 0.22 0.78    0
## 105 0.16 0.84    0
## 106 0.28 0.72    0
## 107 0.25 0.75    0
## 108 0.19 0.81    0
## 109 0.06 0.94    0
## 110 0.19 0.81    0
## 111 0.09 0.91    0
## 112 0.31 0.69    0
## 113 0.28 0.72    0
## 114 0.12 0.88    0
## 115 0.25 0.75    0
## 116 0.53 0.47    0
## 117 0.28 0.72    0
## 118 0.34 0.66    0
## 119 0.53 0.47    0
## 120 0.16 0.84    0
## 121 0.16 0.84    0
## 122 0.22 0.78    0
## 123 0.22 0.78    0
## 124 0.03 0.97    0
## 125 0.25 0.75    0
## 126 0.34 0.66    0
## 127 0.28 0.72    0
## 128 0.16 0.84    0
## 129 0.12 0.88    0
## 130 0.09 0.91    0
## 132 0.12 0.88    0
## 133 0.03 0.97    0
## 134 0.44 0.56    0
## 135 0.12 0.88    0
## 136 0.31 0.69    0
## 137 0.06 0.94    0
## 138 0.16 0.84    0
## 139 0.12 0.88    0

なんだか色々出てきているけれども，raw_alphaのところを確認でとりあえずはOK。信頼区間が3つ（Feldt, Duhacheck, bootstrapped）算出されているが，計算の仕方が違う。ヘルプを見ると，次のようにあるので，bootstrap法で算出したと報告すればOK？＞識者

Because both of these procedures use normal theory, if you really care about confidence intervals, using the boot option (n.iter > 1) is recommended.

別の関数でもCronbach’s alphaを算出。

AJ %>% 
  select(subject, itemID,res) %>% 
  pivot_wider(.,names_from = itemID,values_from = res) %>% 
    {as.data.frame(.[,-1])} %>% 
  performance::cronbachs_alpha()

## [1] 0.7961712

ほとんど変わらないので問題なさそうですね。

頻度情報をlog変換

頻度情報はlog変換して分析される（差分を小さくするため）ので，頻度情報の入った列のデータをログ変換して別の列に保存します。ログ変換の際に，0があるとやっかいなので1を足しておきます。

AJ$ColFreq<-AJ$ColFreq+1
AJ$MIScore<-AJ$MIScore+1

AJ%>%
  mutate(across(c(ColFreq, AdjFreq, NounFreq, MIScore),log,.names = "{.col}_log"))->AJ
head(AJ)

##   subject itemID pres.order        Item Itemtype Length ColFreq AdjFreq
## 1       2      1         87   thick fog       JE      9     146   29284
## 2       2      2         61 stupid idea       JE     11     113   17762
## 3       2      3         31  thin lips        JE      9     267   30704
## 4       2      4        121   deep love       JE      9     141   72697
## 5       2      5         43  small room       JE     10    1011  218756
## 6       2      6         98  long sleep       JE     10      83  339087
##   NounFreq MIScore answer res   RT ColFreq_log AdjFreq_log NounFreq_log
## 1     7176   15.24   TRUE   1 1273    4.983607   10.284797     8.878497
## 2   126766   13.13   TRUE   1 1012    4.727388    9.784817    11.750098
## 3    25883   15.18   TRUE   1 1265    5.587249   10.332148    10.161342
## 4   192472   12.53   TRUE   1  930    4.948760   11.194055    12.167706
## 5   212579   12.35   TRUE   1 1016    6.918695   12.295712    12.267069
## 6    49735   11.71   TRUE   1  771    4.418841   12.734012    10.814464
##   MIScore_log
## 1    2.723924
## 2    2.574900
## 3    2.719979
## 4    2.528126
## 5    2.513656
## 6    2.460443

データフレームの右端に，ColFreq_logなど，頻度情報の列名に_logがついた列が追加されているのがわかりますね。ちなみに，ここで使っているmutate()関数とacross()関数の組み合わせはとても便利です。詳しくは下記の拙ブログ記事をご参照ください。　　

[R] mutateとacrossでデータの下処理を少しだけエレガントに　　

翻訳課題のデータ

翻訳課題のデータの読み込み

翻訳課題のデータは別のcsvファイルに保存されているので，分析に使えるように，容認性判断課題のデータと結合します。まずは，翻訳課題をTransという変数に入れます。

Trans<-read.csv("Translation.csv", header = T); head(Trans)

##   subject itemID order condition        word         JP rating RaterA RaterB
## 1       1      1    54         1   thick fog                 4      0      0
## 2       1      2    58         1 stupid idea 愚かな考え      2      1      1
## 3       1      3    53         1  thin lips      薄い唇      2      1      1
## 4       1      4    36         1   deep love     深い愛      1      1      1
## 5       1      5    52         1  small room 小さい部屋      2      1      1
## 6       1      6     1         1  long sleep   長い眠り      1      1      1
##   Agreement Final
## 1      TRUE     0
## 2      TRUE     1
## 3      TRUE     1
## 4      TRUE     1
## 5      TRUE     1
## 6      TRUE     1

subject: 参加者ID
itemID: 項目ID
order: 提示順序
condition: コロケーションのタイプ（1: JE, 2: Eonly, 3: Jonly）
word: 提示された刺激
JP: 学習者の翻訳（空欄は翻訳できなかった場合）
rating: 翻訳の難しさ
RaterA: 評価者Aの評価
RaterB: 評価者Bの評価
Agreement: AとBの一致度
Final: 正しく翻訳ができていると言えるかどうかの最終的な判断

翻訳課題の評価者間一致度

Trans %>%
  select(Agreement) %>% #Agreement列を選択
  table()->agree_tab #データを表形式で提示したものをagree_tabに入れる
print(agree_tab) #agree_tabの中身を表示

## Agreement
## FALSE  TRUE 
##    77   981

一致していないのが77件，一致しているのが981件というのがわかりましたので，これを単純に割合にすればいいですね。

agree_tab[2]/(agree_tab[1]+agree_tab[2])*100

##     TRUE 
## 92.72212

92.722%の一致度でした。

続いて，評定者間信頼性係数のCohen’s Kappaを出してみます。

cbind(Trans$RaterA,Trans$RaterB) %>% kappa2

##  Cohen's Kappa for 2 Raters (Weights: unweighted)
## 
##  Subjects = 1058 
##    Raters = 2 
##     Kappa = 0.827 
## 
##         z = 26.9 
##   p-value = 0

熟達度テスト（Oxford Quick Placement Test; OQPT）のデータ

次に，熟達度テストのデータを準備します。え？まだ準備なの？と思われるかもしれませんが，実際問題このあたりは論文で言うとメインのデータ分析の説明の前にくる部分です。ただし，こういうのも全部Rでやっておくことで，データ分析の透明性が格段にあがります。「RのことはいいからGLMMのことを早く教えてほしいんだけど…？」と思ったそこのあなた！こういうことをおろそかにしてはいけませんよ！

OQPTデータの読み込み

oqpt <-read.csv("OQPT_results.csv",header=T,sep=",")
head(oqpt)

##   ID Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21
## 1  1  1  1  1  1  1  1  1  1  1   1   0   0   1   0   1   1   0   0   1   1   1
## 2  2  1  1  1  1  1  1  1  1  1   1   0   1   1   0   1   1   1   1   1   1   1
## 3  3  1  1  1  1  1  1  1  1  1   1   1   1   0   0   1   1   1   1   1   0   1
## 4  4  1  1  1  1  1  1  1  1  1   1   1   1   0   0   1   0   1   1   1   0   1
## 5  5  1  1  1  1  1  1  1  1  1   1   0   1   0   1   0   1   0   0   0   0   1
## 6  6  1  1  1  1  1  1  1  1  0   1   1   1   1   1   1   1   1   1   1   1   1
##   Q22 Q23 Q24 Q25 Q26 Q27 Q28 Q29 Q30 Q31 Q32 Q33 Q34 Q35 Q36 Q37 Q38 Q39 Q40
## 1   1   0   0   0   1   0   0   1   0   0   1   1   1   1   1   0   0   1   1
## 2   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   0   1   0   0
## 3   1   1   1   1   0   1   1   1   1   0   1   1   1   1   1   0   1   0   0
## 4   1   1   0   1   1   0   0   0   1   1   1   1   1   1   1   0   0   1   0
## 5   0   1   1   0   0   0   0   0   1   1   1   0   0   1   1   0   0   0   0
## 6   0   0   1   1   0   0   0   0   0   0   1   1   0   1   1   0   0   1   0
##   Q41 Q42 Q43 Q44 Q45 Q46 Q47 Q48 Q49 Q50 Q51 Q52 Q53 Q54 Q55 Q56 Q57 Q58 Q59
## 1   0   1   1   1   1   1   1   0   0   1   0   0   0   1   0   0   1   0   0
## 2   0   1   0   1   0   1   1   1   0   1   0   1   0   1   1   0   1   1   0
## 3   1   1   0   1   1   0   1   0   0   1   1   0   1   1   0   1   1   1   0
## 4   0   0   1   1   0   1   0   0   0   0   0   1   1   1   1   0   1   1   0
## 5   0   0   0   1   0   0   0   1   0   1   0   1   0   0   0   0   1   1   0
## 6   0   0   0   1   0   1   1   0   0   0   1   1   1   1   0   0   1   1   0
##   Q60
## 1   1
## 2   1
## 3   1
## 4   1
## 5   1
## 6   1

OQPTの記述統計

apply(oqpt[,-1],1,sum)->oqpt_sum #パイプ演算子（ %>% ）を使ってもいいのですが，あとで合計得点のデータを熟達度の代理指標として容認性判断課題のデータに読み込むのであえて変数に入れています
describe(oqpt_sum)

##    vars  n  mean   sd median trimmed  mad min max range skew kurtosis   se
## X1    1 34 37.88 6.38   36.5   37.82 6.67  26  53    27 0.27    -0.63 1.09

平均が37.88 ( SD = 6.38 )となっています。

OQPTの信頼性係数

熟達度テストの信頼性係数を報告せよ！はよく言われますよね。というわけでこちらも信頼性係数を出しておきましょう。

psych::alpha(oqpt[,-1], n.iter = 2000)

## Warning in psych::alpha(oqpt[, -1], n.iter = 2000): Item = Q1 had no variance
## and was deleted but still is counted in the score

## Warning in cor.smooth(r): Matrix was not positive definite, smoothing was done

## Warning in psych::alpha(oqpt[, -1], n.iter = 2000): Some items were negatively correlated with the total scale and probably 
## should be reversed.  
## To do this, run the function again with the 'check.keys=TRUE' option

## Some items ( Q6 Q8 Q13 Q14 Q16 Q19 Q39 Q40 Q43 Q51 Q52 ) were negatively correlated with the total scale and 
## probably should be reversed.  
## To do this, run the function again with the 'check.keys=TRUE' option

## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0

## 
## Reliability analysis   
## Call: psych::alpha(x = oqpt[, -1], n.iter = 2000)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N  ase mean   sd median_r
##       0.75      0.73    0.73     0.044 2.7 0.06 0.63 0.11    0.044
## 
##     95% confidence boundaries 
##              lower alpha upper
## Feldt         0.61  0.75  0.86
## Duhachek      0.63  0.75  0.87
## bootstrapped  0.58  0.75  0.83
## 
##  Reliability if an item is dropped:
##     raw_alpha std.alpha G6(smc) average_r S/N var.r med.r
## Q2       0.75      0.73    0.73     0.045 2.7 0.031 0.047
## Q3       0.75      0.73    0.73     0.045 2.7 0.031 0.045
## Q4       0.75      0.73    0.73     0.045 2.7 0.031 0.047
## Q5       0.74      0.72    0.72     0.043 2.6 0.032 0.044
## Q6       0.76      0.75    0.75     0.050 3.0 0.031 0.052
## Q7       0.75      0.73    0.73     0.044 2.7 0.032 0.044
## Q8       0.75      0.74    0.74     0.047 2.9 0.032 0.049
## Q9       0.75      0.74    0.73     0.046 2.8 0.032 0.047
## Q10      0.76      0.74    0.73     0.046 2.8 0.032 0.045
## Q11      0.75      0.73    0.73     0.045 2.8 0.032 0.047
## Q12      0.75      0.73    0.73     0.045 2.8 0.032 0.047
## Q13      0.76      0.74    0.74     0.047 2.9 0.032 0.049
## Q14      0.75      0.73    0.73     0.045 2.8 0.032 0.047
## Q15      0.74      0.72    0.72     0.043 2.6 0.031 0.044
## Q16      0.75      0.73    0.73     0.045 2.8 0.032 0.045
## Q17      0.75      0.73    0.73     0.044 2.7 0.032 0.045
## Q18      0.75      0.73    0.72     0.043 2.6 0.032 0.044
## Q19      0.75      0.74    0.73     0.046 2.8 0.032 0.047
## Q20      0.74      0.72    0.72     0.043 2.6 0.031 0.044
## Q21      0.75      0.73    0.73     0.044 2.7 0.032 0.044
## Q22      0.73      0.71    0.71     0.041 2.5 0.032 0.041
## Q23      0.74      0.72    0.72     0.043 2.6 0.031 0.044
## Q24      0.74      0.72    0.72     0.043 2.6 0.032 0.043
## Q25      0.74      0.72    0.71     0.042 2.5 0.031 0.043
## Q26      0.75      0.73    0.73     0.045 2.7 0.032 0.045
## Q27      0.73      0.71    0.71     0.041 2.5 0.031 0.041
## Q28      0.74      0.72    0.72     0.042 2.6 0.032 0.041
## Q29      0.75      0.73    0.73     0.045 2.8 0.032 0.045
## Q30      0.74      0.72    0.72     0.043 2.6 0.031 0.043
## Q31      0.75      0.73    0.73     0.044 2.7 0.031 0.044
## Q32      0.75      0.73    0.72     0.044 2.7 0.032 0.044
## Q33      0.74      0.72    0.72     0.042 2.6 0.031 0.044
## Q34      0.75      0.72    0.72     0.043 2.6 0.031 0.044
## Q35      0.74      0.72    0.72     0.043 2.6 0.032 0.044
## Q36      0.75      0.73    0.73     0.045 2.7 0.032 0.045
## Q37      0.75      0.73    0.73     0.044 2.6 0.032 0.041
## Q38      0.75      0.73    0.73     0.045 2.7 0.032 0.044
## Q39      0.76      0.74    0.73     0.046 2.8 0.032 0.047
## Q40      0.76      0.74    0.73     0.047 2.8 0.031 0.047
## Q41      0.75      0.73    0.72     0.044 2.7 0.032 0.044
## Q42      0.75      0.73    0.73     0.045 2.7 0.032 0.045
## Q43      0.76      0.74    0.73     0.046 2.8 0.032 0.047
## Q44      0.75      0.73    0.72     0.044 2.7 0.032 0.044
## Q45      0.74      0.72    0.72     0.042 2.6 0.031 0.043
## Q46      0.75      0.73    0.73     0.045 2.7 0.032 0.044
## Q47      0.74      0.72    0.72     0.043 2.6 0.031 0.044
## Q48      0.75      0.73    0.73     0.044 2.7 0.032 0.045
## Q49      0.75      0.73    0.72     0.044 2.7 0.032 0.043
## Q50      0.74      0.73    0.72     0.044 2.6 0.031 0.044
## Q51      0.75      0.74    0.73     0.047 2.8 0.031 0.045
## Q52      0.75      0.73    0.73     0.045 2.8 0.031 0.044
## Q53      0.75      0.73    0.72     0.043 2.6 0.032 0.044
## Q54      0.74      0.72    0.72     0.043 2.6 0.031 0.044
## Q55      0.75      0.73    0.72     0.044 2.7 0.031 0.044
## Q56      0.74      0.72    0.72     0.042 2.5 0.032 0.041
## Q57      0.74      0.72    0.72     0.043 2.6 0.031 0.044
## Q58      0.74      0.72    0.72     0.043 2.6 0.031 0.043
## Q59      0.74      0.71    0.71     0.041 2.5 0.032 0.040
## Q60      0.74      0.72    0.72     0.043 2.6 0.032 0.043
## 
##  Item statistics 
##      n   raw.r  std.r   r.cor  r.drop  mean   sd
## Q2  34  0.0799  0.174  0.1238  0.0531 0.971 0.17
## Q3  34  0.0948  0.170  0.1188  0.0575 0.941 0.24
## Q4  34  0.0799  0.174  0.1238  0.0531 0.971 0.17
## Q5  34  0.3565  0.369  0.3519  0.3105 0.882 0.33
## Q6  34 -0.4190 -0.374 -0.5201 -0.4408 0.971 0.17
## Q7  34  0.1943  0.214  0.1786  0.1579 0.941 0.24
## Q8  34 -0.1695 -0.128 -0.2309 -0.1955 0.971 0.17
## Q9  34  0.0245  0.037 -0.0381 -0.0024 0.971 0.17
## Q10 34  0.0524  0.070  0.0015 -0.0178 0.735 0.45
## Q11 34  0.0848  0.109  0.0530  0.0285 0.853 0.36
## Q12 34  0.0522  0.096  0.0394  0.0253 0.971 0.17
## Q13 34 -0.0949 -0.126 -0.2201 -0.1725 0.471 0.51
## Q14 34  0.1233  0.106  0.0438  0.0476 0.647 0.49
## Q15 34  0.3598  0.372  0.3555  0.3053 0.824 0.39
## Q16 34  0.1172  0.114  0.0530  0.0429 0.676 0.47
## Q17 34  0.3013  0.260  0.2448  0.2266 0.529 0.51
## Q18 34  0.3230  0.324  0.3180  0.2671 0.824 0.39
## Q19 34  0.0896  0.079  0.0113  0.0290 0.824 0.39
## Q20 34  0.4195  0.376  0.3598  0.3570 0.706 0.46
## Q21 34  0.2583  0.244  0.2057  0.2155 0.912 0.29
## Q22 34  0.5923  0.569  0.5871  0.5385 0.618 0.49
## Q23 34  0.3844  0.409  0.3987  0.3309 0.824 0.39
## Q24 34  0.4345  0.424  0.4161  0.3750 0.735 0.45
## Q25 34  0.4859  0.520  0.5288  0.4245 0.647 0.49
## Q26 34  0.1394  0.152  0.1075  0.0615 0.412 0.50
## Q27 34  0.5580  0.576  0.5951  0.5004 0.412 0.50
## Q28 34  0.4433  0.469  0.4697  0.3805 0.324 0.47
## Q29 34  0.1372  0.108  0.0457  0.0632 0.676 0.47
## Q30 34  0.3996  0.404  0.3932  0.3316 0.618 0.49
## Q31 34  0.1981  0.212  0.1676  0.1202 0.529 0.51
## Q32 34  0.3147  0.298  0.2685  0.2550 0.794 0.41
## Q33 34  0.4335  0.472  0.4729  0.3823 0.824 0.39
## Q34 34  0.3178  0.349  0.3282  0.2526 0.735 0.45
## Q35 34  0.3772  0.353  0.3334  0.3071 0.412 0.50
## Q36 34  0.2001  0.177  0.1270  0.1409 0.824 0.39
## Q37 34  0.2527  0.312  0.2857  0.2272 0.029 0.17
## Q38 34  0.1967  0.177  0.1273  0.1185 0.500 0.51
## Q39 34  0.0604  0.034 -0.0414 -0.0179 0.588 0.50
## Q40 34  0.0261 -0.035 -0.1215 -0.0529 0.441 0.50
## Q41 34  0.2532  0.258  0.2222  0.1885 0.235 0.43
## Q42 34  0.1972  0.156  0.1018  0.1242 0.676 0.47
## Q43 34  0.0936  0.045 -0.0277  0.0141 0.500 0.51
## Q44 34  0.3173  0.273  0.2398  0.2481 0.676 0.47
## Q45 34  0.4125  0.439  0.4339  0.3443 0.588 0.50
## Q46 34  0.2074  0.141  0.0841  0.1318 0.382 0.49
## Q47 34  0.3839  0.338  0.3269  0.3142 0.588 0.50
## Q48 34  0.2262  0.226  0.1847  0.1490 0.529 0.51
## Q49 34  0.2832  0.290  0.2599  0.2126 0.324 0.47
## Q50 34  0.3323  0.316  0.2901  0.2593 0.559 0.50
## Q51 34 -0.0077 -0.036 -0.1232 -0.0589 0.118 0.33
## Q52 34  0.1437  0.115  0.0544  0.0652 0.559 0.50
## Q53 34  0.3157  0.323  0.2985  0.2596 0.176 0.39
## Q54 34  0.3980  0.415  0.4074  0.3389 0.765 0.43
## Q55 34  0.2345  0.256  0.2195  0.1586 0.412 0.50
## Q56 34  0.4994  0.500  0.5060  0.4439 0.265 0.45
## Q57 34  0.3888  0.391  0.3783  0.3394 0.853 0.36
## Q58 34  0.4153  0.435  0.4293  0.3473 0.412 0.50
## Q59 34  0.5341  0.556  0.5710  0.5006 0.088 0.29
## Q60 34  0.4173  0.418  0.4325  0.3515 0.647 0.49
## 
## Non missing response frequency for each item
##        0    1 miss
## Q2  0.03 0.97    0
## Q3  0.06 0.94    0
## Q4  0.03 0.97    0
## Q5  0.12 0.88    0
## Q6  0.03 0.97    0
## Q7  0.06 0.94    0
## Q8  0.03 0.97    0
## Q9  0.03 0.97    0
## Q10 0.26 0.74    0
## Q11 0.15 0.85    0
## Q12 0.03 0.97    0
## Q13 0.53 0.47    0
## Q14 0.35 0.65    0
## Q15 0.18 0.82    0
## Q16 0.32 0.68    0
## Q17 0.47 0.53    0
## Q18 0.18 0.82    0
## Q19 0.18 0.82    0
## Q20 0.29 0.71    0
## Q21 0.09 0.91    0
## Q22 0.38 0.62    0
## Q23 0.18 0.82    0
## Q24 0.26 0.74    0
## Q25 0.35 0.65    0
## Q26 0.59 0.41    0
## Q27 0.59 0.41    0
## Q28 0.68 0.32    0
## Q29 0.32 0.68    0
## Q30 0.38 0.62    0
## Q31 0.47 0.53    0
## Q32 0.21 0.79    0
## Q33 0.18 0.82    0
## Q34 0.26 0.74    0
## Q35 0.59 0.41    0
## Q36 0.18 0.82    0
## Q37 0.97 0.03    0
## Q38 0.50 0.50    0
## Q39 0.41 0.59    0
## Q40 0.56 0.44    0
## Q41 0.76 0.24    0
## Q42 0.32 0.68    0
## Q43 0.50 0.50    0
## Q44 0.32 0.68    0
## Q45 0.41 0.59    0
## Q46 0.62 0.38    0
## Q47 0.41 0.59    0
## Q48 0.47 0.53    0
## Q49 0.68 0.32    0
## Q50 0.44 0.56    0
## Q51 0.88 0.12    0
## Q52 0.44 0.56    0
## Q53 0.82 0.18    0
## Q54 0.24 0.76    0
## Q55 0.59 0.41    0
## Q56 0.74 0.26    0
## Q57 0.15 0.85    0
## Q58 0.59 0.41    0
## Q59 0.91 0.09    0
## Q60 0.35 0.65    0

あまり高くはないですが，まあそこまで悪いというわけでもなさそうですね。

データの結合

OQPTデータをz変換

テストスコアをそのまま使うのではなく正規化してから使うのでその処理をしておきます

oqpt_sum%>%
  as.data.frame()%>% #いったんデータフレームに変換
  mutate(across(1,~scale(.x)[,1],.names = "z.oqpt"))->oqpt_sum #scale()関数の処理をして，それを"z.oqpt"という名前にする

翻訳課題データとOQPTデータの結合

Transというデータフレームには1人につき69行のデータがあるので，1人のOQPTスコアを69回ずつ入れてあげることになります。

nrow(Trans)

## [1] 2346

Trans$oqpt <-rep(oqpt_sum[,1],each=69)  #Transという変数にoqptという列を作り，opqt_sumの1列目からデータを取ってきてそれぞれの値を69回繰り返す
Trans$z.oqpt <-rep(oqpt_sum[,2],each=69) #z変換した数値も同様に。
str(Trans) #oqptとz.oqptという列ができているのがわかります

## 'data.frame':    2346 obs. of  13 variables:
##  $ subject  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ itemID   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ order    : int  54 58 53 36 52 1 44 11 49 64 ...
##  $ condition: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ word     : chr  "thick fog" "stupid idea" "thin lips " "deep love" ...
##  $ JP       : chr  "" "愚かな考え" "薄い唇" "深い愛" ...
##  $ rating   : int  4 2 2 1 2 1 1 1 2 4 ...
##  $ RaterA   : int  0 1 1 1 1 1 1 1 1 0 ...
##  $ RaterB   : int  0 1 1 1 1 1 1 1 1 0 ...
##  $ Agreement: logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ Final    : int  0 1 1 1 1 1 1 1 1 0 ...
##  $ oqpt     : int  36 36 36 36 36 36 36 36 36 36 ...
##  $ z.oqpt   : num  -0.295 -0.295 -0.295 -0.295 -0.295 ...

被験者の除外

実は，ID001とID016の参加者は，容認性判断課題時にプログラムの不具合が起こってデータが全て消えてしまったので，この2名のデータを除外します。

Trans2 <- dplyr::filter(Trans,subject != 1 & subject != 16) #filter()関数で除外して，Trans2という新しい変数に。こういう処理をしたときは，新しい変数に入れることをおすすめします

ベースライン条件のデータを除外して翻訳課題データと容認性判断課題データを結合

ベースライン条件のデータは，翻訳のデータがありません。したがって，ベースライン条件のデータを除外してから結合します。

dplyr::left_join(AJ,Trans2,by = c('subject','itemID'))->dat #subjectとitemID列を使って2つのデータを照合して結合
str(dat)

## 'data.frame':    4448 obs. of  28 variables:
##  $ subject     : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ itemID      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ pres.order  : int  87 61 31 121 43 98 26 101 99 25 ...
##  $ Item        : chr  "thick fog" "stupid idea" "thin lips " "deep love" ...
##  $ Itemtype    : chr  "JE" "JE" "JE" "JE" ...
##  $ Length      : int  9 11 9 9 10 10 8 13 10 10 ...
##  $ ColFreq     : num  146 113 267 141 1011 ...
##  $ AdjFreq     : int  29284 17762 30704 72697 218756 339087 70610 35921 72740 133697 ...
##  $ NounFreq    : int  7176 126766 25883 192472 212579 49735 26082 64072 440005 6860 ...
##  $ MIScore     : num  15.2 13.1 15.2 12.5 12.3 ...
##  $ answer      : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ res         : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ RT          : int  1273 1012 1265 930 1016 771 1031 1080 1023 2434 ...
##  $ ColFreq_log : num  4.98 4.73 5.59 4.95 6.92 ...
##  $ AdjFreq_log : num  10.28 9.78 10.33 11.19 12.3 ...
##  $ NounFreq_log: num  8.88 11.75 10.16 12.17 12.27 ...
##  $ MIScore_log : num  2.72 2.57 2.72 2.53 2.51 ...
##  $ order       : int  35 68 51 34 24 15 37 31 55 18 ...
##  $ condition   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ word        : chr  "thick fog" "stupid idea" "thin lips " "deep love" ...
##  $ JP          : chr  "濃い霧" "馬鹿げた考え" "薄い唇" "深い愛" ...
##  $ rating      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ RaterA      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ RaterB      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Agreement   : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ Final       : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ oqpt        : int  47 47 47 47 47 47 47 47 47 47 ...
##  $ z.oqpt      : num  1.43 1.43 1.43 1.43 1.43 ...

上のデータをよく見ると，翻訳課題のwordという列と，容認性判断課題のItemという列が重複していることがわかります。また，Itemtypeとconditionも同じ情報ですので余剰ですね（こういうのは実験プログラムを作る際に起こさないようにしないといけないですね…）。よって，これらの重複を除外します。

select(dat,-c(word, condition))->dat1 #datからwordとcondition列を抜いてdat1に保存
str(dat1)

## 'data.frame':    4448 obs. of  26 variables:
##  $ subject     : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ itemID      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ pres.order  : int  87 61 31 121 43 98 26 101 99 25 ...
##  $ Item        : chr  "thick fog" "stupid idea" "thin lips " "deep love" ...
##  $ Itemtype    : chr  "JE" "JE" "JE" "JE" ...
##  $ Length      : int  9 11 9 9 10 10 8 13 10 10 ...
##  $ ColFreq     : num  146 113 267 141 1011 ...
##  $ AdjFreq     : int  29284 17762 30704 72697 218756 339087 70610 35921 72740 133697 ...
##  $ NounFreq    : int  7176 126766 25883 192472 212579 49735 26082 64072 440005 6860 ...
##  $ MIScore     : num  15.2 13.1 15.2 12.5 12.3 ...
##  $ answer      : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ res         : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ RT          : int  1273 1012 1265 930 1016 771 1031 1080 1023 2434 ...
##  $ ColFreq_log : num  4.98 4.73 5.59 4.95 6.92 ...
##  $ AdjFreq_log : num  10.28 9.78 10.33 11.19 12.3 ...
##  $ NounFreq_log: num  8.88 11.75 10.16 12.17 12.27 ...
##  $ MIScore_log : num  2.72 2.57 2.72 2.53 2.51 ...
##  $ order       : int  35 68 51 34 24 15 37 31 55 18 ...
##  $ JP          : chr  "濃い霧" "馬鹿げた考え" "薄い唇" "深い愛" ...
##  $ rating      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ RaterA      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ RaterB      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Agreement   : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ Final       : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ oqpt        : int  47 47 47 47 47 47 47 47 47 47 ...
##  $ z.oqpt      : num  1.43 1.43 1.43 1.43 1.43 ...

さて，これでデータの準備ができました！

データ分析（記述統計と可視化）

翻訳課題の記述統計

dat1 %>% 
  filter(Itemtype == "JE") %>% 
  select(res,Final) %>% 
  xtabs(~+res+Final,.)%>%
  balloonplot(.,xlab = "Acceptability",ylab = "Translation",main = "")

上図からわかるように，容認性判断課題と翻訳課題の両方に正解したのが全体の84.6％で，容認性判断で不正解となった88回答のうち、70.5％が翻訳タスクで正解していることがわかります。それでは，Eonlyの項目についても見てみます。

dat1 %>% 
  filter(Itemtype == "Eonly") %>% 
  select(res,Final) %>%
  xtabs(~+res+Final,.)%>%
  balloonplot(.,xlab = "Acceptability",ylab = "Translation",main = "")

Eonlyでは傾向が全く異なりますね。容認性判断課題で不正解となった項目のおよそ3割ほどが翻訳課題では正答になっています。また，容認性判断課題における正答の4割弱が翻訳課題では誤答と判断されています。翻訳が誤っていたものと，翻訳に無回答のデータは正しい知識を保有していないと判断し，このあとの分析から除外します。ただし，JonlyとBaselineは翻訳の正答・誤答がないので，分けて処理します。

4つがItemtypeという列に入っていますが，JonlyとBaselineは”NO”が正解となる項目である一方で，JEとEonlyは”YES”が正解になる項目で，一概に比較ができません。したがって，分けて分析を行ってみます。

データの分割と誤訳の排除

dat1 %>% 
  dplyr::filter(Itemtype %in% c("JE", "Eonly")) %>% 
  dplyr::filter(Final==1)->dat_y

dat1 %>% 
  dplyr::filter(Itemtype %in% c("Jonly", "Baseline"))->dat_n

容認性判断課題の記述統計

JE

dat_y%>%
  dplyr::filter(Itemtype == "JE") %>% 
  dplyr::group_by(subject)%>%
  summarize(Accuracy = mean(res,na.rm=T))%>%
  print(n=Inf)%>% #print all data
  summarize(.,describe(Accuracy)) #descriptive stats

## # A tibble: 32 × 2
##    subject Accuracy
##      <int>    <dbl>
##  1       2    0.957
##  2       3    0.957
##  3       4    1    
##  4       5    0.85 
##  5       6    0.913
##  6       7    1    
##  7       8    0.857
##  8       9    1    
##  9      10    0.696
## 10      11    0.727
## 11      12    0.957
## 12      13    0.9  
## 13      14    0.952
## 14      15    1    
## 15      17    0.722
## 16      18    0.833
## 17      19    0.955
## 18      20    0.909
## 19      21    0.818
## 20      22    0.810
## 21      23    1    
## 22      24    0.957
## 23      25    0.913
## 24      26    0.95 
## 25      27    0.955
## 26      28    0.909
## 27      29    0.955
## 28      30    1    
## 29      31    0.864
## 30      32    1    
## 31      33    0.909
## 32      34    0.85

## # A tibble: 1 × 13
##    vars     n  mean     sd median trimmed   mad   min   max range   skew
##   <dbl> <dbl> <dbl>  <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1     1    32 0.909 0.0854  0.932   0.920 0.101 0.696     1 0.304 -0.933
## # … with 2 more variables: kurtosis <dbl>, se <dbl>

Eonly

dat_y%>%
  dplyr::filter(Itemtype == "Eonly") %>% 
  dplyr::group_by(subject)%>%
  summarize(Accuracy = mean(res,na.rm=T))%>%
  print(n=Inf)%>% #print all data
  summarize(.,describe(Accuracy)) #descriptive stats

## # A tibble: 32 × 2
##    subject Accuracy
##      <int>    <dbl>
##  1       2    0.5  
##  2       3    0.714
##  3       4    0.615
##  4       5    0.556
##  5       6    0.6  
##  6       7    0.75 
##  7       8    0.5  
##  8       9    0.667
##  9      10    0.583
## 10      11    0.364
## 11      12    0.833
## 12      13    0.556
## 13      14    1    
## 14      15    1    
## 15      17    0.545
## 16      18    0.4  
## 17      19    0.786
## 18      20    0.625
## 19      21    0.889
## 20      22    0.6  
## 21      23    0.727
## 22      24    0.889
## 23      25    0.6  
## 24      26    0.625
## 25      27    0.9  
## 26      28    0.636
## 27      29    0.643
## 28      30    0.769
## 29      31    0.7  
## 30      32    0.625
## 31      33    0.727
## 32      34    0.692

## # A tibble: 1 × 13
##    vars     n  mean    sd median trimmed   mad   min   max range  skew kurtosis
##   <dbl> <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1     1    32 0.676 0.154  0.640   0.671 0.127 0.364     1 0.636 0.305   -0.316
## # … with 1 more variable: se <dbl>

Jonly

dat_n%>%
  dplyr::filter(Itemtype == "Jonly") %>% 
  dplyr::group_by(subject)%>%
  summarize(Accuracy = mean(res,na.rm=T))%>%
  print(n=Inf)%>% #print all data
  summarize(.,describe(Accuracy)) #descriptive stats

## # A tibble: 32 × 2
##    subject Accuracy
##      <int>    <dbl>
##  1       2    0.565
##  2       3    0.478
##  3       4    0.304
##  4       5    0.696
##  5       6    0.522
##  6       7    0.522
##  7       8    0.826
##  8       9    0.217
##  9      10    0.783
## 10      11    0.826
## 11      12    0.435
## 12      13    0.522
## 13      14    0.565
## 14      15    0.478
## 15      17    0.565
## 16      18    0.783
## 17      19    0.696
## 18      20    0.652
## 19      21    0.609
## 20      22    0.826
## 21      23    0.565
## 22      24    0.609
## 23      25    0.783
## 24      26    0.609
## 25      27    0.391
## 26      28    0.609
## 27      29    0.652
## 28      30    0.478
## 29      31    0.435
## 30      32    0.304
## 31      33    0.522
## 32      34    0.783

## # A tibble: 1 × 13
##    vars     n  mean    sd median trimmed   mad   min   max range   skew kurtosis
##   <dbl> <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>
## 1     1    32 0.582 0.160  0.565   0.589 0.129 0.217 0.826 0.609 -0.214   -0.628
## # … with 1 more variable: se <dbl>

Baseline

dat_n%>%
  dplyr::filter(Itemtype == "Baseline") %>% 
  dplyr::group_by(subject)%>%
  summarize(Accuracy = mean(res,na.rm=T))%>%
  print(n=Inf)%>% #print all data
  summarize(.,describe(Accuracy)) #descriptive stats

## # A tibble: 32 × 2
##    subject Accuracy
##      <int>    <dbl>
##  1       2    0.8  
##  2       3    0.871
##  3       4    0.586
##  4       5    0.943
##  5       6    0.729
##  6       7    0.886
##  7       8    0.943
##  8       9    0.357
##  9      10    0.886
## 10      11    0.943
## 11      12    0.843
## 12      13    0.686
## 13      14    0.843
## 14      15    0.571
## 15      17    0.843
## 16      18    0.829
## 17      19    0.843
## 18      20    0.729
## 19      21    0.714
## 20      22    0.957
## 21      23    0.829
## 22      24    0.871
## 23      25    0.786
## 24      26    0.7  
## 25      27    0.471
## 26      28    0.686
## 27      29    0.843
## 28      30    0.9  
## 29      31    0.7  
## 30      32    0.657
## 31      33    0.629
## 32      34    0.843

## # A tibble: 1 × 13
##    vars     n  mean    sd median trimmed   mad   min   max range   skew kurtosis
##   <dbl> <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>
## 1     1    32 0.772 0.142  0.829   0.787 0.148 0.357 0.957   0.6 -0.980    0.553
## # … with 1 more variable: se <dbl>

容認性判断課題の正答率の可視化

グラフにするときは4つ横並びにしたいので，バラしたデータフレームをくっつけるというやや冗長なことをしています。

combined_data <- bind_rows(
  dat_y %>% dplyr::select(subject, Itemtype, res),
  dat_n %>% dplyr::select(subject, Itemtype, res)
)
combined_data %>%
  dplyr::group_by(subject) %>%
  dplyr::group_by(Itemtype, .add = TRUE) %>%
  summarize(Accuracy = mean(res, na.rm = TRUE)) %>%
  ggplot(aes(x = Itemtype, y = Accuracy)) +
  geom_jitter(aes(color = Itemtype), alpha = 0.5,
              width = 0.15, height = 0,
              show.legend = FALSE) +
  geom_boxplot(aes(fill = Itemtype),
               alpha = 0.5, show.legend = FALSE) +
  ylim(0, 1) +
  labs(x = "Item Type", y = "Accuracy") +
  scale_x_discrete(
    limit = c("JE", "Eonly", "Jonly", "Baseline"),
    labels = c("Congruent", "L2-only", "L1-only", "Baseline")
  ) +
  theme(
    axis.text.x = element_text(size = 15),
    axis.text.y = element_text(size = 15),
    axis.title.x = element_text(size = 20),
    axis.title.y = element_text(size = 20)
  )

混合効果ロジスティック回帰モデル

下準備

頻度データの正規化

ログ変換した頻度列を，正規化します。

str(dat_y)

## 'data.frame':    1030 obs. of  26 variables:
##  $ subject     : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ itemID      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ pres.order  : int  87 61 31 121 43 98 26 101 99 25 ...
##  $ Item        : chr  "thick fog" "stupid idea" "thin lips " "deep love" ...
##  $ Itemtype    : chr  "JE" "JE" "JE" "JE" ...
##  $ Length      : int  9 11 9 9 10 10 8 13 10 10 ...
##  $ ColFreq     : num  146 113 267 141 1011 ...
##  $ AdjFreq     : int  29284 17762 30704 72697 218756 339087 70610 35921 72740 133697 ...
##  $ NounFreq    : int  7176 126766 25883 192472 212579 49735 26082 64072 440005 6860 ...
##  $ MIScore     : num  15.2 13.1 15.2 12.5 12.3 ...
##  $ answer      : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ res         : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ RT          : int  1273 1012 1265 930 1016 771 1031 1080 1023 2434 ...
##  $ ColFreq_log : num  4.98 4.73 5.59 4.95 6.92 ...
##  $ AdjFreq_log : num  10.28 9.78 10.33 11.19 12.3 ...
##  $ NounFreq_log: num  8.88 11.75 10.16 12.17 12.27 ...
##  $ MIScore_log : num  2.72 2.57 2.72 2.53 2.51 ...
##  $ order       : int  35 68 51 34 24 15 37 31 55 18 ...
##  $ JP          : chr  "濃い霧" "馬鹿げた考え" "薄い唇" "深い愛" ...
##  $ rating      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ RaterA      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ RaterB      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Agreement   : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ Final       : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ oqpt        : int  47 47 47 47 47 47 47 47 47 47 ...
##  $ z.oqpt      : num  1.43 1.43 1.43 1.43 1.43 ...

str(dat_n)

## 'data.frame':    2976 obs. of  26 variables:
##  $ subject     : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ itemID      : int  47 48 49 50 51 52 53 54 55 56 ...
##  $ pres.order  : int  85 126 15 70 32 7 46 128 42 122 ...
##  $ Item        : chr  "active personality" "cheap pay" "dark society" "delicious season" ...
##  $ Itemtype    : chr  "Jonly" "Jonly" "Jonly" "Jonly" ...
##  $ Length      : int  18 9 12 16 7 14 10 13 14 13 ...
##  $ ColFreq     : num  1 1 1 1 1 7 2 2 1 3 ...
##  $ AdjFreq     : int  37799 16483 84962 7916 191270 36780 70927 138049 23315 95141 ...
##  $ NounFreq    : int  18372 111661 94963 114909 59867 178198 40461 42635 257709 35105 ...
##  $ MIScore     : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ answer      : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ res         : int  0 0 0 1 1 0 0 1 0 0 ...
##  $ RT          : int  1675 1143 1346 1255 1988 1319 1586 1075 1385 1196 ...
##  $ ColFreq_log : num  0 0 0 0 0 ...
##  $ AdjFreq_log : num  10.54 9.71 11.35 8.98 12.16 ...
##  $ NounFreq_log: num  9.82 11.62 11.46 11.65 11 ...
##  $ MIScore_log : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ order       : int  27 49 23 40 16 64 50 8 56 29 ...
##  $ JP          : chr  "活発な個性" "安い支払い" "暗い社会" "美味しい季節" ...
##  $ rating      : int  1 1 1 3 3 1 1 3 3 1 ...
##  $ RaterA      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ RaterB      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ Agreement   : logi  NA NA NA NA NA NA ...
##  $ Final       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ oqpt        : int  47 47 47 47 47 47 47 47 47 47 ...
##  $ z.oqpt      : num  1.43 1.43 1.43 1.43 1.43 ...

全体でscaleすると，平均と分散が小さくなってしまうので，参加者ごとに取り出して処理して，元の形に戻します。

dat_y %>%
  group_by(subject) %>%
  mutate(across(contains("log"), ~scale(.x)[,1],.names = "z.{col}")) %>%
  ungroup() -> dat_y

dat_n %>%
  group_by(subject) %>%
  mutate(across(contains("log"), ~scale(.x)[,1],.names = "z.{col}")) %>%
  ungroup() -> dat_n

評定値データの正規化

4段階の評定値データを正規化します。

dat_y %>% 
  mutate(across(rating,~scale(.x)[,1],.names = "z.{.col}"))->dat_y

dat_n %>% 
  mutate(across(rating,~scale(.x)[,1],.names = "z.{.col}"))->dat_n

カテゴリカル変数のコーディング

カテゴリカル変数（名義尺度のデータ）をそのまま分析に使うと，ダミー・コーディング（トリートメント・コーディング）になります。これは2つ以上ある水準の1つをベースラインとし，そのベースラインと他の水準を比較するというものです。もしも，1変数のみであれば特に問題ありませんが，2変数以上の場合にダミーコーディングを使っていると，ある変数の主効果が別の変数にネストしてしまうことになります。　　例えば，2×2のデザインを考えます。変数Aと変数Bがあるとき，変数Aの主効果は，変数Bがダミーコーディングであれば変数BがゼロのときのAの「主効果」になります。これは，実際には「単純主効果」と呼ばれるものであり，分散分析の「主効果」ではありません。よって，結果の解釈には注意が必要になります。このような場合には，全体平均を切片とする解釈が可能になるサム・コーディング(シンプル・コーディング）が必要です。これは，2水準であればある水準を-0.5,もう一方の水準を0.5とするようなコーディング方法です。2つ（以上）のカテゴリカル変数の交互作用が含まれない場合には，この心配は必要ありません。コーディング方法については，Schad et al. (2020)が非常に詳しいのでおすすめです。

コーディング

contrasts()関数を使う方法もありますが，個人の経験で，変量効果のパラメータ推定時に相関外す手続きがうまくできないので，数字のコーディングをして新しく列を作る方法をおすすめします。

ifelse(dat_y$Itemtype == "Eonly", -0.5, 0.5) -> dat_y$Itemtype_c #cはcontrastの略のつもりです。Eonlyが-0.5でJEが0.5
ifelse(dat_n$Itemtype == "Baseline", -0.5, 0.5) -> dat_n$Itemtype_c #cはcontrastの略のつもりですBaselineが-0.5でJEが0.5

YESデータセットの分析

GLMMは分析を関数を何回も回してモデルを何個も作ります。よって，最初にリスト形式の「箱」を用意して，そこにモデルを入れいくようにするといいです。

m_y<-list()

まずは，興味のある変数（翻訳の評価，熟達度，コロケーション・タイプ）以外の共変量である頻度データのうち，どれがモデルを説明するのかを吟味して，今回の容認性判断課題の反応に影響のあるものだけ選び取ります。このような方法を前向き（forward）の方法と呼びます。変数が多くて，最も複雑なモデルから単純化するのが困難な場合に採用されます（このくらいの変数じゃ多いと言わないかもですが）。

切片のみモデル

m_y[[1]] <-glmer(res ~ (1|subject)+(1|itemID),
               data = dat_y, #データの指定
               family=binomial, #binomialにするとロジスティック回帰になります
               glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5))) #これはおまじないだと思ってください

コロケーションの頻度を入れたモデル

m_y[[2]] <-glmer(res ~ z.ColFreq_log+(1|subject)+(1|itemID),
               data = dat_y,
               family=binomial,
               glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5)))

形容詞の頻度を入れたモデル

m_y[[3]] <-glmer(res ~ z.AdjFreq_log+(1|subject)+(1|itemID),
               data = dat_y,
               family=binomial,
               glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5)))

名詞の頻度を入れたモデル

m_y[[4]] <-glmer(res ~ z.NounFreq_log+(1|subject)+(1|itemID),
               data = dat_y,
               family=binomial,
               glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5)))

MI Scoreを入れたモデル

m_y[[5]] <-glmer(res ~ z.MIScore_log+(1|subject)+(1|itemID),
               data = dat_y,
               family=binomial,
               glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5)))

5つのモデルをAICで比較します。

sapply(m_y,AIC)%>%data.frame

##          .
## 1 809.6749
## 2 800.6994
## 3 804.4039
## 4 806.0400
## 5 797.5253

sapply(m_y,AIC)%>%which.min

## [1] 5

5番目のモデル（MI Scoreを入れたモデル）が最もAICが低いですね。というわけで，MI Scoreを入れたモデルにさらに別の変数を入れて比較していきます。

さらに追加する変数を選択

m_y2<-list()

MI Scoreを入れたモデルをベースに

m_y2[[1]]<-m_y[[5]]

新たに変数を追加

先程と同じようにすべての式を書いてもいいのですが，面倒なので，update()関数を使います。

m_y2[[2]]<-update(m_y2[[1]],.~.+z.ColFreq_log) #.~.は「元の式」の意味。そこにz.ColFreq_logを足してね，ということ。変数を抜くときはマイナス
m_y2[[3]]<-update(m_y2[[1]],.~.+z.AdjFreq_log)
m_y2[[4]]<-update(m_y2[[1]],.~.+z.NounFreq_log)

sapply(m_y2,AIC)%>%data.frame

##          .
## 1 797.5253
## 2 796.0641
## 3 798.4382
## 4 799.5253

sapply(m_y2,AIC)%>%which.min

## [1] 2

コロケーション頻度を入れた2番目のモデルが最もAICが低いですね。ただ，数字の差がそこまで大きくありません。尤度比検定してみましょう。

anova(m_y2[[1]],m_y2[[2]])

## Data: dat_y
## Models:
## m_y2[[1]]: res ~ z.MIScore_log + (1 | subject) + (1 | itemID)
## m_y2[[2]]: res ~ z.MIScore_log + (1 | subject) + (1 | itemID) + z.ColFreq_log
##           npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)  
## m_y2[[1]]    4 797.53 817.27 -394.76   789.53                       
## m_y2[[2]]    5 796.06 820.75 -393.03   786.06 3.4611  1    0.06283 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Mtuschek et al. (2017)では，α _LRT = .10 or .20の基準で有意ならより複雑なモデルを選択することが推奨されているので，コロケーション頻度も入れましょう。

さらに追加する変数を選択

m_y3<-list()

コロケーション頻度も追加されたモデルをベースに

m_y3[[1]]<-m_y2[[2]]

新たに変数を追加

m_y3[[2]]<-update(m_y3[[1]],.~.+z.AdjFreq_log)
m_y3[[3]]<-update(m_y2[[1]],.~.+z.NounFreq_log)

AICを比較。

sapply(m_y3,AIC)%>%data.frame

##          .
## 1 796.0641
## 2 797.2814
## 3 799.5253

sapply(m_y3,AIC)%>%which.min

## [1] 1

形容詞の頻度や名詞の頻度を足してもモデルが向上しないようです。よって，これらの変数はいれません。共変量を入れたモデルは次のようになりました。

summary(m_y3[[1]])

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: res ~ z.MIScore_log + (1 | subject) + (1 | itemID) + z.ColFreq_log
##    Data: dat_y
## Control: glmerControl(optimizer = c("bobyqa"), optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##    796.1    820.8   -393.0    786.1     1025 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.6425  0.1318  0.2364  0.4245  1.6306 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  itemID  (Intercept) 1.6832   1.2974  
##  subject (Intercept) 0.5191   0.7205  
## Number of obs: 1030, groups:  itemID, 42; subject, 32
## 
## Fixed effects:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     2.1219     0.2831   7.495 6.62e-14 ***
## z.MIScore_log  -0.6774     0.2608  -2.597   0.0094 ** 
## z.ColFreq_log   0.5503     0.2996   1.837   0.0662 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) z.MIS_
## z.MIScor_lg -0.117       
## z.ColFrq_lg  0.142  0.432

YESデータ分析の本番はここから

それでは，ここで興味のある3つの変数の分析に入りましょう。

m_y4<-list()

下記のコードでは，glmer()関数を，system.time()という関数に渡しています。これは，計算にどれくらいの時間がかかるかを可視化するためです。今回はそこまで時間がかかりませんが，データセットの大きさやモデルの複雑さによっては，かなりの時間がかかる場合もあります。コードを共有する際に，データを再分析する側の人が，どれくらいの時間がかかるのか，コードを実行する前にわかるように，処理時間が表示されるようにしています。結果の単位は「秒」です。

system.time(m_y4[[1]] <-glmer(res ~ z.MIScore_log+z.ColFreq_log+z.oqpt*Itemtype_c+(1+z.oqpt*Itemtype_c|subject)+(1+z.oqpt+z.oqpt:Itemtype_c|itemID),
             data = dat_y,
             family=binomial,
             glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5))))

## boundary (singular) fit: see ?isSingular

##    user  system elapsed 
##  16.833   0.038  16.885

試しに今熟達度とコロケーションタイプの交互作用を入れたモデルを作ってみましたが，警告が出ました。この警告は，推定しようとしている分散のパラメータが0に近いものがあると出てきます。モデルが複雑すぎるので，ランダム効果を削る事を考えます。この際に，主成分分析を行い，分散が0に近いものがあれば問題があります。

rePCA(m_y4[[1]]) %>% summary

## $itemID
## Importance of components:
##                         [,1]      [,2] [,3]
## Standard deviation     0.872 1.787e-07    0
## Proportion of Variance 1.000 0.000e+00    0
## Cumulative Proportion  1.000 1.000e+00    1
## 
## $subject
## Importance of components:
##                          [,1]   [,2]      [,3] [,4]
## Standard deviation     0.7744 0.3214 6.016e-07    0
## Proportion of Variance 0.8531 0.1469 0.000e+00    0
## Cumulative Proportion  0.8531 1.0000 1.000e+00    1

system.time(m_y4[[2]] <-glmer(res ~ z.MIScore_log+z.ColFreq_log+z.oqpt*Itemtype_c+(1+z.oqpt*Itemtype_c||subject)+(1+z.oqpt+z.oqpt:Itemtype_c||itemID),
             data = dat_y,
             family=binomial,
             glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5))))

## boundary (singular) fit: see ?isSingular

##    user  system elapsed 
##   6.251   0.014   6.271

相関パラメータを外してもだめなようです。では，また主成分分析の結果を見てみましょう。

rePCA(m_y4[[2]]) %>% summary

## $itemID
## Importance of components:
##                          [,1] [,2] [,3]
## Standard deviation     0.8574    0    0
## Proportion of Variance 1.0000    0    0
## Cumulative Proportion  1.0000    1    1
## 
## $subject
## Importance of components:
##                          [,1]    [,2] [,3] [,4]
## Standard deviation     0.5824 0.14728    0    0
## Proportion of Variance 0.9399 0.06011    0    0
## Cumulative Proportion  0.9399 1.00000    1    1

項目，参加者，両方の変量効果に分散がゼロのものが含まれています。では，モデルの中身を見てみましょう。

summary(m_y4[[2]])

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: res ~ z.MIScore_log + z.ColFreq_log + z.oqpt * Itemtype_c + (1 +  
##     z.oqpt * Itemtype_c || subject) + (1 + z.oqpt + z.oqpt:Itemtype_c ||  
##     itemID)
##    Data: dat_y
## Control: glmerControl(optimizer = c("bobyqa"), optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##    779.7    843.9   -376.8    753.7     1017 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.5189  0.1079  0.2234  0.4193  1.6847 
## 
## Random effects:
##  Groups    Name              Variance Std.Dev.
##  itemID    z.oqpt:Itemtype_c 0.00000  0.0000  
##  itemID.1  z.oqpt            0.00000  0.0000  
##  itemID.2  (Intercept)       0.73511  0.8574  
##  subject   z.oqpt:Itemtype_c 0.00000  0.0000  
##  subject.1 Itemtype_c        0.00000  0.0000  
##  subject.2 z.oqpt            0.02169  0.1473  
##  subject.3 (Intercept)       0.33920  0.5824  
## Number of obs: 1030, groups:  itemID, 42; subject, 32
## 
## Fixed effects:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         1.9220     0.2379   8.081 6.44e-16 ***
## z.MIScore_log      -0.4764     0.1980  -2.406  0.01614 *  
## z.ColFreq_log       0.5409     0.2322   2.330  0.01982 *  
## z.oqpt              0.4631     0.1662   2.787  0.00532 ** 
## Itemtype_c          1.9574     0.3682   5.316 1.06e-07 ***
## z.oqpt:Itemtype_c   0.2192     0.2179   1.006  0.31442    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) z.MIS_ z.ClF_ z.oqpt Itmty_
## z.MIScor_lg -0.137                            
## z.ColFrq_lg  0.167  0.433                     
## z.oqpt      -0.057  0.005  0.018              
## Itemtype_c   0.033  0.082  0.119  0.089       
## z.qpt:Itmt_  0.064 -0.042  0.026  0.181  0.094
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular

system.time(m_y4[[3]] <-glmer(res ~ z.MIScore_log+z.ColFreq_log+z.oqpt*Itemtype_c+(1+z.oqpt+Itemtype_c||subject)+(1+z.oqpt+z.oqpt:Itemtype_c||itemID),
             data = dat_y,
             family=binomial,
             glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5))))

## boundary (singular) fit: see ?isSingular

##    user  system elapsed 
##   3.778   0.011   3.796

rePCA(m_y4[[3]]) %>% summary

## $itemID
## Importance of components:
##                          [,1] [,2] [,3]
## Standard deviation     0.8574    0    0
## Proportion of Variance 1.0000    0    0
## Cumulative Proportion  1.0000    1    1
## 
## $subject
## Importance of components:
##                          [,1]   [,2] [,3]
## Standard deviation     0.5824 0.1473    0
## Proportion of Variance 0.9399 0.0601    0
## Cumulative Proportion  0.9399 1.0000    1

summary(m_y4[[3]])

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: res ~ z.MIScore_log + z.ColFreq_log + z.oqpt * Itemtype_c + (1 +  
##     z.oqpt + Itemtype_c || subject) + (1 + z.oqpt + z.oqpt:Itemtype_c ||  
##     itemID)
##    Data: dat_y
## Control: glmerControl(optimizer = c("bobyqa"), optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##    777.7    836.9   -376.8    753.7     1018 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.5189  0.1079  0.2234  0.4193  1.6847 
## 
## Random effects:
##  Groups    Name              Variance Std.Dev.
##  itemID    z.oqpt:Itemtype_c 0.00000  0.0000  
##  itemID.1  z.oqpt            0.00000  0.0000  
##  itemID.2  (Intercept)       0.73511  0.8574  
##  subject   Itemtype_c        0.00000  0.0000  
##  subject.1 z.oqpt            0.02169  0.1473  
##  subject.2 (Intercept)       0.33920  0.5824  
## Number of obs: 1030, groups:  itemID, 42; subject, 32
## 
## Fixed effects:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         1.9220     0.2379   8.081 6.44e-16 ***
## z.MIScore_log      -0.4764     0.1980  -2.406  0.01614 *  
## z.ColFreq_log       0.5409     0.2322   2.330  0.01982 *  
## z.oqpt              0.4631     0.1662   2.787  0.00532 ** 
## Itemtype_c          1.9574     0.3682   5.316 1.06e-07 ***
## z.oqpt:Itemtype_c   0.2192     0.2179   1.006  0.31442    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) z.MIS_ z.ClF_ z.oqpt Itmty_
## z.MIScor_lg -0.137                            
## z.ColFrq_lg  0.167  0.433                     
## z.oqpt      -0.057  0.005  0.018              
## Itemtype_c   0.033  0.082  0.119  0.089       
## z.qpt:Itmt_  0.064 -0.042  0.026  0.181  0.094
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular

参加者のItemtype_cのランダム傾きが非常に小さいのがわかります。これを抜いて新しくモデルを作ります。

system.time(m_y4[[4]] <-glmer(res ~ z.MIScore_log+z.ColFreq_log+z.oqpt*Itemtype_c+(1+z.oqpt||subject)+(1+z.oqpt:Itemtype_c||itemID),
             data = dat_y,
             family=binomial,
             glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5))))

## boundary (singular) fit: see ?isSingular

##    user  system elapsed 
##   1.622   0.006   1.629

まだ収束しません。

rePCA(m_y4[[4]]) %>% summary

## $itemID
## Importance of components:
##                          [,1]      [,2]
## Standard deviation     0.8574 1.455e-07
## Proportion of Variance 1.0000 0.000e+00
## Cumulative Proportion  1.0000 1.000e+00
## 
## $subject
## Importance of components:
##                          [,1]   [,2]
## Standard deviation     0.5824 0.1473
## Proportion of Variance 0.9399 0.0601
## Cumulative Proportion  0.9399 1.0000

まだ項目のランダム傾きで分散が小さいものがあるようです。

summary(m_y4[[4]])

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: res ~ z.MIScore_log + z.ColFreq_log + z.oqpt * Itemtype_c + (1 +  
##     z.oqpt || subject) + (1 + z.oqpt:Itemtype_c || itemID)
##    Data: dat_y
## Control: glmerControl(optimizer = c("bobyqa"), optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##    773.7    823.0   -376.8    753.7     1020 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.5189  0.1079  0.2234  0.4193  1.6847 
## 
## Random effects:
##  Groups    Name              Variance  Std.Dev. 
##  itemID    z.oqpt:Itemtype_c 2.118e-14 1.455e-07
##  itemID.1  (Intercept)       7.351e-01 8.574e-01
##  subject   z.oqpt            2.169e-02 1.473e-01
##  subject.1 (Intercept)       3.392e-01 5.824e-01
## Number of obs: 1030, groups:  itemID, 42; subject, 32
## 
## Fixed effects:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         1.9220     0.2379   8.081 6.44e-16 ***
## z.MIScore_log      -0.4764     0.1980  -2.406  0.01614 *  
## z.ColFreq_log       0.5409     0.2322   2.330  0.01982 *  
## z.oqpt              0.4631     0.1662   2.787  0.00532 ** 
## Itemtype_c          1.9574     0.3682   5.316 1.06e-07 ***
## z.oqpt:Itemtype_c   0.2192     0.2179   1.006  0.31442    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) z.MIS_ z.ClF_ z.oqpt Itmty_
## z.MIScor_lg -0.137                            
## z.ColFrq_lg  0.167  0.433                     
## z.oqpt      -0.057  0.005  0.018              
## Itemtype_c   0.033  0.082  0.119  0.089       
## z.qpt:Itmt_  0.064 -0.042  0.026  0.181  0.094
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular

交互作用項をなくしてみましょう。

system.time(m_y4[[5]] <-glmer(res ~ z.MIScore_log+z.ColFreq_log+z.oqpt*Itemtype_c+(1+z.oqpt||subject)+(1+z.oqpt||itemID),
             data = dat_y,
             family=binomial,
             glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5))))

## boundary (singular) fit: see help('isSingular')

##    user  system elapsed 
##   1.040   0.001   1.043

残念ながら収束しません。

summary(m_y4[[5]])

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: res ~ z.MIScore_log + z.ColFreq_log + z.oqpt * Itemtype_c + (1 +  
##     z.oqpt || subject) + (1 + z.oqpt || itemID)
##    Data: dat_y
## Control: glmerControl(optimizer = c("bobyqa"), optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##    773.7    823.0   -376.8    753.7     1020 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.5189  0.1079  0.2234  0.4193  1.6847 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  itemID    z.oqpt      0.00000  0.0000  
##  itemID.1  (Intercept) 0.73511  0.8574  
##  subject   z.oqpt      0.02169  0.1473  
##  subject.1 (Intercept) 0.33920  0.5824  
## Number of obs: 1030, groups:  itemID, 42; subject, 32
## 
## Fixed effects:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         1.9220     0.2378   8.081 6.44e-16 ***
## z.MIScore_log      -0.4764     0.1980  -2.406  0.01614 *  
## z.ColFreq_log       0.5409     0.2322   2.330  0.01982 *  
## z.oqpt              0.4631     0.1662   2.787  0.00532 ** 
## Itemtype_c          1.9574     0.3682   5.316 1.06e-07 ***
## z.oqpt:Itemtype_c   0.2192     0.2179   1.006  0.31442    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) z.MIS_ z.ClF_ z.oqpt Itmty_
## z.MIScor_lg -0.137                            
## z.ColFrq_lg  0.167  0.433                     
## z.oqpt      -0.057  0.005  0.018              
## Itemtype_c   0.033  0.082  0.119  0.089       
## z.qpt:Itmt_  0.064 -0.042  0.026  0.181  0.094
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

項目ごとの熟達度テストスコアの傾きの分散がゼロなので，それを抜きます。このとき，(1||itemID)としないように注意します。傾きがないので，相関パラメータもありませんからね。

system.time(m_y4[[6]] <-glmer(res ~ z.MIScore_log+z.ColFreq_log+z.oqpt*Itemtype_c+(1+z.oqpt||subject)+(1|itemID),
             data = dat_y,
             family=binomial,
             glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5))))

##    user  system elapsed 
##    0.65    0.00    0.65

収束したようです。では，参加者のランダム傾きの相関パラメータを戻してみましょう。

system.time(m_y4[[7]] <-glmer(res ~ z.MIScore_log+z.ColFreq_log+z.oqpt*Itemtype_c+(1+z.oqpt|subject)+(1|itemID),
             data = dat_y,
             family=binomial,
             glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5))))

##    user  system elapsed 
##   0.674   0.001   0.674

収束しました。尤度比検定してみます。

anova(m_y4[[7]],m_y4[[6]])

## Data: dat_y
## Models:
## m_y4[[6]]: res ~ z.MIScore_log + z.ColFreq_log + z.oqpt * Itemtype_c + (1 + z.oqpt || subject) + (1 | itemID)
## m_y4[[7]]: res ~ z.MIScore_log + z.ColFreq_log + z.oqpt * Itemtype_c + (1 + z.oqpt | subject) + (1 | itemID)
##           npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## m_y4[[6]]    9 771.67 816.11 -376.84   753.67                     
## m_y4[[7]]   10 773.50 822.87 -376.75   753.50 0.1721  1     0.6783

パラメータが増えても残念ながらモデルは良くなっていません。よって，相関パラメータのないモデルを採用します。

最終的なモデル

summary(m_y4[[6]])

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: res ~ z.MIScore_log + z.ColFreq_log + z.oqpt * Itemtype_c + (1 +  
##     z.oqpt || subject) + (1 | itemID)
##    Data: dat_y
## Control: glmerControl(optimizer = c("bobyqa"), optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##    771.7    816.1   -376.8    753.7     1021 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.5189  0.1079  0.2234  0.4193  1.6847 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  itemID    (Intercept) 0.73511  0.8574  
##  subject   z.oqpt      0.02169  0.1473  
##  subject.1 (Intercept) 0.33920  0.5824  
## Number of obs: 1030, groups:  itemID, 42; subject, 32
## 
## Fixed effects:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         1.9220     0.2379   8.081 6.44e-16 ***
## z.MIScore_log      -0.4764     0.1980  -2.406  0.01614 *  
## z.ColFreq_log       0.5409     0.2322   2.330  0.01982 *  
## z.oqpt              0.4631     0.1662   2.787  0.00532 ** 
## Itemtype_c          1.9574     0.3682   5.316 1.06e-07 ***
## z.oqpt:Itemtype_c   0.2192     0.2179   1.006  0.31442    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) z.MIS_ z.ClF_ z.oqpt Itmty_
## z.MIScor_lg -0.137                            
## z.ColFrq_lg  0.167  0.433                     
## z.oqpt      -0.057  0.005  0.018              
## Itemtype_c   0.033  0.082  0.119  0.089       
## z.qpt:Itmt_  0.064 -0.042  0.026  0.181  0.094

モデルの評価

回帰モデルは，説明変数間の相関が高いと多重共線性の問題があります。その評価のために，VIF(Variance Inflation Factor)をチェックします。このVIFの値がいくつなら問題になるのかは諸説ありますが，田中他 (2008, p.116）は，VIFが5より大きければ多重共線性の可能性が大きく注意が必要と述べています。

vif(m_y4[[6]])

##     z.MIScore_log     z.ColFreq_log            z.oqpt        Itemtype_c 
##          1.236905          1.243394          1.039676          1.029916 
## z.oqpt:Itemtype_c 
##          1.044850

交互作用の図示

残念ながら交互作用は非有意ですが，そのことを図示して確かめましょう。

plot_model(m_y4[[6]],type = "int")

結果の解釈

まず，コロケーションの頻度とMIスコアの頻度は係数が有意です。よって
- コロケーションの頻度が高くなると容認性判断課題の正答率が高くなる
- MIスコアが高くなると容認性判断課題の正答率が高くなる
次に，熟達度テストスコアの係数が有意ですから，熟達度があがれば，容認性判断課題の正答率もあがるといえます
さらに，コロケーションタイプの係数も有意ですから，JE項目のほうが，Eonly項目よりも容認性判断課題の正答率が高いといえます

オッズ比の算出

効果量（オッズ比）を求める①

ある事象の起こる確率が \(p\)で表されるとき，オッズは次のようになる

\[Odds = \frac{p}{1-p}\]

オッズ比はオッズの比で表されるので，確率 \(p\)と確率 \(q\)のオッズ比は次のようになる

\[\frac{\frac{p}{1-p}}{\frac{q}{1-q}}\]

効果量（オッズ比）を求める②

ロジット関数を思い出すと，これはオッズの対数であり，商の対数は対数の差と等しいので次のように表せる

\[{\rm log}(\frac{q_{i}}{1-q_{i}})={\rm log}(q_{i})-log(1-q_{i})\]

つまり，2つの確率のロジットの差がオッズ比の対数である

\[{\rm log}(\frac{p}{1-p})-{\rm log}(\frac{q}{1-q})={\rm log}(\frac{\frac{p}{1-p}}{\frac{q}{1-q}})\]

効果量（オッズ比）を求める③

このオッズ比の仕組みが今理解できなくても大丈夫です
大事なことは次のこと

偏回帰係数を指数変換するとオッズ比

Rではexp()という関数があるので，オッズ比は次のように求められる

fixef(m_y4[[6]]) %>% #fixefは固定効果の係数を取り出す関数
  exp()

##       (Intercept)     z.MIScore_log     z.ColFreq_log            z.oqpt 
##          6.834599          0.620988          1.717604          1.589008 
##        Itemtype_c z.oqpt:Itemtype_c 
##          7.081010          1.245135

例えば，z.oqptのオッズ比は1.589ですから，z.oqptが1単位増加すると（今回は正規化しているので，熟達度テストの1単位あたりの変動は1SDの変動になります），容認性判断課題に正答する確率が1.589倍あがると解釈できます。

NOデータ分析

ちょっとした注意事項

最初に，翻訳データと熟達度テストデータを結合し，その後に容認性判断課題のデータとくっつけました。したがって，今，Baselineアイテムの行には熟達度テストデータが入っていません。また，Baselineアイテムは翻訳課題に含まれていないので，翻訳の難しさの評定値データもありません。つまり，こちらの分析にはz.ratingの変数は使えません。また，どうにかして熟達度テストデータをBaselineアイテムの行に持ってこないといけません。

oqpt_sumデータに被験者番号を付与して，その番号を使って元のデータと紐づけます。

head(oqpt_sum)

##    .     z.oqpt
## 1 36 -0.2952283
## 2 47  1.4300119
## 3 45  1.1163319
## 4 39  0.1752918
## 5 28 -1.5499484
## 6 37 -0.1383882

oqpt_sum$subject<-1:nrow(oqpt)
names(oqpt_sum)[1]<-"oqpt"

結合には，left_join()関数を使います。

left_join(oqpt_sum,dat_n, by="subject") %>% 
  dplyr::select(-contains(".y")) %>% 
  dplyr::filter(.,subject != 1 & subject != 16)->dat_n2

.xという識別子がつくので，それを削除します。

names(dat_n2) <- gsub("\\.x$", "", names(dat_n2))

Baseline項目にもoqptやz.oqptのスコアの列が入っていることを確認しましょう。

dat_n2 %>% 
  filter(Itemtype == "Baseline") %>% 
  head()

##   oqpt   z.oqpt subject itemID pres.order                    Item Itemtype
## 1   47 1.430012       2     70        118             active gift Baseline
## 2   47 1.430012       2     71        120                bad lips Baseline
## 3   47 1.430012       2     72        119             bare growth Baseline
## 4   47 1.430012       2     73         13 beautiful understanding Baseline
## 5   47 1.430012       2     74         96          big atmosphere Baseline
## 6   47 1.430012       2     75         12             bitter hole Baseline
##   Length ColFreq AdjFreq NounFreq MIScore answer res   RT ColFreq_log
## 1     11       1   37799    26244       1  FALSE   1 1212   0.0000000
## 2      8       1  123592    25883       1  FALSE   0 1428   0.0000000
## 3     11       1   13042    71125       1  FALSE   1 1099   0.0000000
## 4     23       1   57918    58879       1  FALSE   1 1634   0.0000000
## 5     14       2  258911    19918       1  FALSE   1 1817   0.6931472
## 6     11       1   11204    29176       1  FALSE   1 1374   0.0000000
##   AdjFreq_log NounFreq_log MIScore_log order   JP rating RaterA RaterB
## 1   10.540038    10.175193           0    NA <NA>     NA     NA     NA
## 2   11.724741    10.161342           0    NA <NA>     NA     NA     NA
## 3    9.475930    11.172194           0    NA <NA>     NA     NA     NA
## 4   10.966783    10.983240           0    NA <NA>     NA     NA     NA
## 5   12.464240     9.899379           0    NA <NA>     NA     NA     NA
## 6    9.324026    10.281102           0    NA <NA>     NA     NA     NA
##   Agreement Final z.ColFreq_log z.AdjFreq_log z.NounFreq_log z.MIScore_log
## 1        NA    NA    -0.5850149    -0.1677051    -0.67052375           NaN
## 2        NA    NA    -0.5850149     1.0659607    -0.68226438           NaN
## 3        NA    NA    -0.5850149    -1.2757913     0.17457178           NaN
## 4        NA    NA    -0.5850149     0.2766775     0.01440701           NaN
## 5        NA    NA     0.2399097     1.8360220    -0.90431356           NaN
## 6        NA    NA    -0.5850149    -1.4339734    -0.58075129           NaN
##   z.rating Itemtype_c
## 1       NA       -0.5
## 2       NA       -0.5
## 3       NA       -0.5
## 4       NA       -0.5
## 5       NA       -0.5
## 6       NA       -0.5

では，YESデータセットの分析と同様に，まずは共変量としてどの変数を入れるかを検討します。　　

m_n<-list()

切片のみモデル

m_n[[1]] <-glmer(res ~ (1|subject)+(1|itemID),
               data = dat_n2, #データの指定
               family=binomial, #binomialにするとロジスティック回帰になります
               glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5))) #これはおまじないだと思ってください

コロケーションの頻度を入れたモデル

m_n[[2]] <-glmer(res ~ z.ColFreq_log+(1|subject)+(1|itemID),
               data = dat_n2,
               family=binomial,
               glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5)))

形容詞の頻度を入れたモデル

m_n[[3]] <-glmer(res ~ z.AdjFreq_log+(1|subject)+(1|itemID),
               data = dat_n2,
               family=binomial,
               glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5)))

名詞の頻度を入れたモデル

m_n[[4]] <-glmer(res ~ z.NounFreq_log+(1|subject)+(1|itemID),
               data = dat_n2,
               family=binomial,
               glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5)))

MI Scoreを入れたモデル

MI ScoreはBaselineとJonlyは計算できないので，ここでは考慮しません。

4つのモデルをAICで比較します。

sapply(m_n,AIC)%>%data.frame

##          .
## 1 2917.465
## 2 2910.161
## 3 2915.097
## 4 2919.392

sapply(m_n,AIC)%>%which.min

## [1] 2

2番目のコロケーションの頻度を入れたものがAICが一番低いので，それをベースにしてYESデータセットの分析と同じ手順を繰り返します。

さらに追加する変数を選択

m_n2<-list()

コロケーション頻度を入れたモデルをベースに

m_n2[[1]]<-m_n[[2]]

新たに変数を追加

先程と同じようにすべての式を書いてもいいのですが，面倒なので，update()関数を使います。

m_n2[[2]]<-update(m_n2[[1]],.~.+z.AdjFreq_log)
m_n2[[3]]<-update(m_n2[[1]],.~.+z.NounFreq_log)

sapply(m_n2,AIC)%>%data.frame

##          .
## 1 2910.161
## 2 2911.007
## 3 2911.761

sapply(m_n2,AIC)%>%which.min

## [1] 1

名詞の頻度や形容詞の頻度を入れてもモデルは向上していないようです。ということで，NOデータセットの分析にはコロケーション頻度だけを共変量にします。

summary(m_n2[[1]])

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: res ~ z.ColFreq_log + (1 | subject) + (1 | itemID)
##    Data: dat_n2
## Control: glmerControl(optimizer = c("bobyqa"), optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##   2910.2   2934.2  -1451.1   2902.2     2972 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.3530 -0.4304  0.3243  0.5195  3.3366 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  itemID  (Intercept) 1.2906   1.1361  
##  subject (Intercept) 0.7435   0.8623  
## Number of obs: 2976, groups:  itemID, 93; subject, 32
## 
## Fixed effects:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     1.3171     0.2002   6.577 4.79e-11 ***
## z.ColFreq_log  -0.4021     0.1289  -3.119  0.00181 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## z.ColFrq_lg -0.010

NOデータ分析の本番はここから

それでは，ここで興味のある3つの変数の分析に入りましょう。

m_n3<-list()

system.time(m_n3[[1]] <-glmer(res ~ z.ColFreq_log+z.oqpt*Itemtype_c+(1+z.oqpt*Itemtype_c|subject)+(1+z.oqpt+z.oqpt:Itemtype_c|itemID),
             data = dat_n2,
             family=binomial,
             glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5))))

## boundary (singular) fit: see help('isSingular')

##    user  system elapsed 
##  25.153   0.071  25.227

?isSingularのエラーが出ています。主成分分析の結果を見てみます。

rePCA(m_n3[[1]]) %>% summary

## $itemID
## Importance of components:
##                         [,1]      [,2] [,3]
## Standard deviation     1.031 6.225e-07    0
## Proportion of Variance 1.000 0.000e+00    0
## Cumulative Proportion  1.000 1.000e+00    1
## 
## $subject
## Importance of components:
##                          [,1]    [,2]      [,3] [,4]
## Standard deviation     0.9097 0.27640 5.105e-07    0
## Proportion of Variance 0.9155 0.08451 0.000e+00    0
## Cumulative Proportion  0.9155 1.00000 1.000e+00    1

分散がゼロの要素があります。まずは相関パラメータを外してみましょう。

system.time(m_n3[[2]] <-glmer(res ~ z.ColFreq_log+z.oqpt*Itemtype_c+(1+z.oqpt*Itemtype_c||subject)+(1+z.oqpt+z.oqpt:Itemtype_c||itemID),
             data = dat_n2,
             family=binomial,
             glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5))))

## boundary (singular) fit: see help('isSingular')

##    user  system elapsed 
##   7.598   0.034   7.644

まだ同じエラーが出ます。

モデルの中身を見て，どれを除外するか，見当をつけましょう。

summary(m_n3[[2]])

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## res ~ z.ColFreq_log + z.oqpt * Itemtype_c + (1 + z.oqpt * Itemtype_c ||  
##     subject) + (1 + z.oqpt + z.oqpt:Itemtype_c || itemID)
##    Data: dat_n2
## Control: glmerControl(optimizer = c("bobyqa"), optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##   2901.6   2973.5  -1438.8   2877.6     2964 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.3272 -0.4404  0.3176  0.5072  3.3289 
## 
## Random effects:
##  Groups    Name              Variance  Std.Dev. 
##  itemID    z.oqpt:Itemtype_c 4.877e-13 6.983e-07
##  itemID.1  z.oqpt            0.000e+00 0.000e+00
##  itemID.2  (Intercept)       1.036e+00 1.018e+00
##  subject   z.oqpt:Itemtype_c 2.501e-13 5.001e-07
##  subject.1 Itemtype_c        9.385e-02 3.064e-01
##  subject.2 z.oqpt            0.000e+00 0.000e+00
##  subject.3 (Intercept)       7.187e-01 8.478e-01
## Number of obs: 2976, groups:  itemID, 93; subject, 32
## 
## Fixed effects:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        1.03488    0.20191   5.125 2.97e-07 ***
## z.ColFreq_log     -0.41020    0.11750  -3.491 0.000481 ***
## z.oqpt             0.04339    0.15979   0.272 0.785957    
## Itemtype_c        -1.12508    0.27411  -4.104 4.05e-05 ***
## z.oqpt:Itemtype_c -0.28555    0.11904  -2.399 0.016447 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) z.ClF_ z.oqpt Itmty_
## z.ColFrq_lg -0.003                     
## z.oqpt      -0.034 -0.001              
## Itemtype_c   0.311  0.027 -0.007       
## z.qpt:Itmt_ -0.014  0.006  0.103 -0.015
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

項目の交互作用の傾きがゼロなので，それを抜きます。項目タイプのランダム傾きはモデルに入らないデザインなので抜きます。

system.time(m_n3[[3]] <-glmer(res ~ z.ColFreq_log+z.oqpt*Itemtype_c+(1+z.oqpt*Itemtype_c||subject)+(1+z.oqpt||itemID),
             data = dat_n2,
             family=binomial,
             glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5))))

## boundary (singular) fit: see help('isSingular')

##    user  system elapsed 
##   5.424   0.022   5.447

まだダメです。

rePCA(m_n3[[3]]) %>% summary

## $itemID
## Importance of components:
##                         [,1] [,2]
## Standard deviation     1.018    0
## Proportion of Variance 1.000    0
## Cumulative Proportion  1.000    1
## 
## $subject
## Importance of components:
##                          [,1]   [,2] [,3] [,4]
## Standard deviation     0.8478 0.3064    0    0
## Proportion of Variance 0.8845 0.1155    0    0
## Cumulative Proportion  0.8845 1.0000    1    1

summary(m_n3[[3]])

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## res ~ z.ColFreq_log + z.oqpt * Itemtype_c + (1 + z.oqpt * Itemtype_c ||  
##     subject) + (1 + z.oqpt || itemID)
##    Data: dat_n2
## Control: glmerControl(optimizer = c("bobyqa"), optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##   2899.6   2965.6  -1438.8   2877.6     2965 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.3272 -0.4404  0.3176  0.5072  3.3288 
## 
## Random effects:
##  Groups    Name              Variance Std.Dev.
##  itemID    z.oqpt            0.00000  0.0000  
##  itemID.1  (Intercept)       1.03597  1.0178  
##  subject   z.oqpt:Itemtype_c 0.00000  0.0000  
##  subject.1 Itemtype_c        0.09385  0.3064  
##  subject.2 z.oqpt            0.00000  0.0000  
##  subject.3 (Intercept)       0.71875  0.8478  
## Number of obs: 2976, groups:  itemID, 93; subject, 32
## 
## Fixed effects:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        1.03488    0.20191   5.126 2.97e-07 ***
## z.ColFreq_log     -0.41020    0.11750  -3.491 0.000481 ***
## z.oqpt             0.04339    0.15979   0.272 0.785951    
## Itemtype_c        -1.12508    0.27411  -4.104 4.05e-05 ***
## z.oqpt:Itemtype_c -0.28555    0.11904  -2.399 0.016447 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) z.ClF_ z.oqpt Itmty_
## z.ColFrq_lg -0.003                     
## z.oqpt      -0.034 -0.001              
## Itemtype_c   0.311  0.027 -0.007       
## z.qpt:Itmt_ -0.014  0.006  0.103 -0.015
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

項目のz.oqptの傾きを除きます。

system.time(m_n3[[4]] <-glmer(res ~ z.ColFreq_log+z.oqpt*Itemtype_c+(1+z.oqpt+Itemtype_c||subject)+(1|itemID),
             data = dat_n2,
             family=binomial,
             glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5))))

## boundary (singular) fit: see help('isSingular')

##    user  system elapsed 
##   1.809   0.005   1.813

まだ問題があります。

rePCA(m_n3[[4]]) %>% summary

## $itemID
## Importance of components:
##                         [,1]
## Standard deviation     1.018
## Proportion of Variance 1.000
## Cumulative Proportion  1.000
## 
## $subject
## Importance of components:
##                          [,1]   [,2] [,3]
## Standard deviation     0.8478 0.3064    0
## Proportion of Variance 0.8845 0.1155    0
## Cumulative Proportion  0.8845 1.0000    1

summary(m_n3[[4]])

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## res ~ z.ColFreq_log + z.oqpt * Itemtype_c + (1 + z.oqpt + Itemtype_c ||  
##     subject) + (1 | itemID)
##    Data: dat_n2
## Control: glmerControl(optimizer = c("bobyqa"), optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##   2895.6   2949.6  -1438.8   2877.6     2967 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.3272 -0.4404  0.3176  0.5072  3.3288 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  itemID    (Intercept) 1.03597  1.0178  
##  subject   Itemtype_c  0.09386  0.3064  
##  subject.1 z.oqpt      0.00000  0.0000  
##  subject.2 (Intercept) 0.71875  0.8478  
## Number of obs: 2976, groups:  itemID, 93; subject, 32
## 
## Fixed effects:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        1.03488    0.20191   5.126 2.97e-07 ***
## z.ColFreq_log     -0.41020    0.11750  -3.491 0.000481 ***
## z.oqpt             0.04339    0.15979   0.272 0.785953    
## Itemtype_c        -1.12508    0.27410  -4.105 4.05e-05 ***
## z.oqpt:Itemtype_c -0.28555    0.11904  -2.399 0.016448 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) z.ClF_ z.oqpt Itmty_
## z.ColFrq_lg -0.003                     
## z.oqpt      -0.034 -0.001              
## Itemtype_c   0.311  0.027 -0.007       
## z.qpt:Itmt_ -0.014  0.006  0.103 -0.015
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

参加者のz.oqptの傾きを除きます。

system.time(m_n3[[5]] <-glmer(res ~ z.ColFreq_log+z.oqpt*Itemtype_c+(1+Itemtype_c||subject)+(1|itemID),
             data = dat_n2,
             family=binomial,
             glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5))))

##    user  system elapsed 
##   0.996   0.002   0.998

モデルが収束しました。

summary(m_n3[[5]])

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: res ~ z.ColFreq_log + z.oqpt * Itemtype_c + (1 + Itemtype_c ||  
##     subject) + (1 | itemID)
##    Data: dat_n2
## Control: glmerControl(optimizer = c("bobyqa"), optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##   2893.6   2941.6  -1438.8   2877.6     2968 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.3272 -0.4404  0.3176  0.5072  3.3288 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  itemID    (Intercept) 1.03597  1.0178  
##  subject   Itemtype_c  0.09386  0.3064  
##  subject.1 (Intercept) 0.71875  0.8478  
## Number of obs: 2976, groups:  itemID, 93; subject, 32
## 
## Fixed effects:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        1.03488    0.20191   5.125 2.97e-07 ***
## z.ColFreq_log     -0.41020    0.11750  -3.491 0.000481 ***
## z.oqpt             0.04339    0.15979   0.272 0.785951    
## Itemtype_c        -1.12508    0.27410  -4.105 4.05e-05 ***
## z.oqpt:Itemtype_c -0.28555    0.11904  -2.399 0.016448 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) z.ClF_ z.oqpt Itmty_
## z.ColFrq_lg -0.003                     
## z.oqpt      -0.034 -0.001              
## Itemtype_c   0.311  0.027 -0.007       
## z.qpt:Itmt_ -0.014  0.006  0.103 -0.015

相関パラメータを戻してみましょう。

system.time(m_n3[[6]] <-glmer(res ~ z.ColFreq_log+z.oqpt*Itemtype_c+(1+Itemtype_c|subject)+(1|itemID),
             data = dat_n2,
             family=binomial,
             glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5))))

##    user  system elapsed 
##   1.236   0.003   1.239

anova(m_n3[[5]],m_n3[[6]])

## Data: dat_n2
## Models:
## m_n3[[5]]: res ~ z.ColFreq_log + z.oqpt * Itemtype_c + (1 + Itemtype_c || subject) + (1 | itemID)
## m_n3[[6]]: res ~ z.ColFreq_log + z.oqpt * Itemtype_c + (1 + Itemtype_c | subject) + (1 | itemID)
##           npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## m_n3[[5]]    8 2893.6 2941.6 -1438.8   2877.6                     
## m_n3[[6]]    9 2893.6 2947.6 -1437.8   2875.6 1.9773  1     0.1597

Mtuschek et al. (2017)では，α _LRT = .10 or .20の基準で有意ならより複雑なモデルを選択することが推奨されているので，相関パラメータがあるモデルを最終モデルとして採用しましょう。

最終モデル

summary(m_n3[[6]])

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: res ~ z.ColFreq_log + z.oqpt * Itemtype_c + (1 + Itemtype_c |  
##     subject) + (1 | itemID)
##    Data: dat_n2
## Control: glmerControl(optimizer = c("bobyqa"), optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##   2893.6   2947.6  -1437.8   2875.6     2967 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.3958 -0.4533  0.3137  0.5080  2.8731 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev. Corr 
##  itemID  (Intercept) 1.02945  1.0146        
##  subject (Intercept) 0.66872  0.8178        
##          Itemtype_c  0.07928  0.2816   -0.70
## Number of obs: 2976, groups:  itemID, 93; subject, 32
## 
## Fixed effects:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        1.03799    0.19769   5.251 1.52e-07 ***
## z.ColFreq_log     -0.41004    0.11722  -3.498 0.000469 ***
## z.oqpt             0.04383    0.15471   0.283 0.776968    
## Itemtype_c        -1.15373    0.27299  -4.226 2.38e-05 ***
## z.oqpt:Itemtype_c -0.27957    0.11681  -2.393 0.016691 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) z.ClF_ z.oqpt Itmty_
## z.ColFrq_lg -0.004                     
## z.oqpt      -0.033  0.000              
## Itemtype_c   0.220  0.027 -0.002       
## z.qpt:Itmt_ -0.004  0.006 -0.179 -0.017

モデルの評価

VIFを確認しておきます。

vif(m_n3[[6]])

##     z.ColFreq_log            z.oqpt        Itemtype_c z.oqpt:Itemtype_c 
##          1.000768          1.033107          1.001034          1.033430

すべて1程度ですので問題なさそうです。

解釈

結果の表を見てみると，固定効果の推定結果の一番下の行にある交互作用項が有意になっています。

交互作用項の図示

plot_model(m_n3[[6]],type = "int")

図を見てみると，赤い線（Baseline）と青い線（Jonly）で傾きが違いそうです。では，実際にどうなのか見てみましょう。

単純主効果

Baseline項目をreference levelにしたダミー変数を作ります。

ifelse(dat_n2$Itemtype == "Baseline", 0, 1) -> dat_n2$Itemtype_d_Base #Baselineをゼロにして基準レベルにします。

新しくリストを作ります。

m_n4<-list()

さきほどのダミー変数を使ってもう一度最終モデルを分析します。

system.time(m_n4[[1]] <-glmer(res ~ z.ColFreq_log+z.oqpt*Itemtype_d_Base+(1+Itemtype_d_Base|subject)+(1|itemID),
             data = dat_n2,
             family=binomial,
             glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5))))

##    user  system elapsed 
##   0.670   0.002   0.672

summary(m_n4[[1]])

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## res ~ z.ColFreq_log + z.oqpt * Itemtype_d_Base + (1 + Itemtype_d_Base |  
##     subject) + (1 | itemID)
##    Data: dat_n2
## Control: glmerControl(optimizer = c("bobyqa"), optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##   2893.6   2947.6  -1437.8   2875.6     2967 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.3958 -0.4533  0.3137  0.5080  2.8731 
## 
## Random effects:
##  Groups  Name            Variance Std.Dev. Corr 
##  itemID  (Intercept)     1.02945  1.0146        
##  subject (Intercept)     0.84975  0.9218        
##          Itemtype_d_Base 0.07927  0.2816   -0.77
## Number of obs: 2976, groups:  itemID, 93; subject, 32
## 
## Fixed effects:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              1.6149     0.2141   7.542 4.63e-14 ***
## z.ColFreq_log           -0.4100     0.1172  -3.498 0.000468 ***
## z.oqpt                   0.1836     0.1749   1.050 0.293718    
## Itemtype_d_Base         -1.1537     0.2730  -4.226 2.37e-05 ***
## z.oqpt:Itemtype_d_Base  -0.2796     0.1168  -2.393 0.016689 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) z.ClF_ z.oqpt Itm__B
## z.ColFrq_lg -0.020                     
## z.oqpt      -0.028 -0.002              
## Itmtyp_d_Bs -0.435  0.027  0.004       
## z.qpt:It__B  0.007  0.006 -0.492 -0.017

さきほどのモデルと比較して，z.oqptの推定値が変わっていることがわかります。今，Itemtype_d_BaseはBaseline項目がゼロのときですから，Baseline項目についてはz.oqptの推定値が0.184だとわかります。

それでは，Jonly条件のときはどうでしょうか。今度はJonlyを0（reference level）にしたダミー変数を使って回帰モデルを作ってみます。

ifelse(dat_n2$Itemtype == "Jonly", 0, 1) -> dat_n2$Itemtype_d_J #Jonlyをゼロにして基準レベルにします。

system.time(m_n4[[2]] <-glmer(res ~ z.ColFreq_log+z.oqpt*Itemtype_d_J+(1+Itemtype_d_J|subject)+(1|itemID),
             data = dat_n2,
             family=binomial,
             glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5))))

##    user  system elapsed 
##   1.167   0.003   1.169

結果はいかに。

summary(m_n4[[2]])

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: res ~ z.ColFreq_log + z.oqpt * Itemtype_d_J + (1 + Itemtype_d_J |  
##     subject) + (1 | itemID)
##    Data: dat_n2
## Control: glmerControl(optimizer = c("bobyqa"), optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##   2893.6   2947.6  -1437.8   2875.6     2967 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.3958 -0.4533  0.3137  0.5080  2.8731 
## 
## Random effects:
##  Groups  Name         Variance Std.Dev. Corr
##  itemID  (Intercept)  1.02945  1.0146       
##  subject (Intercept)  0.52733  0.7262       
##          Itemtype_d_J 0.07928  0.2816   0.59
## Number of obs: 2976, groups:  itemID, 93; subject, 32
## 
## Fixed effects:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          0.46112    0.26376   1.748 0.080427 .  
## z.ColFreq_log       -0.41004    0.11722  -3.498 0.000468 ***
## z.oqpt              -0.09596    0.15528  -0.618 0.536599    
## Itemtype_d_J         1.15374    0.27299   4.226 2.38e-05 ***
## z.oqpt:Itemtype_d_J  0.27957    0.11681   2.393 0.016691 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) z.ClF_ z.oqpt Itm__J
## z.ColFrq_lg  0.011                     
## z.oqpt      -0.030  0.002              
## Itemtyp_d_J -0.682 -0.027  0.008       
## z.qpt:It__J  0.012 -0.006 -0.198 -0.017

先ほどと比較すると，z.oqptの推定値の符号が逆になっていることがわかります。しかしながら，傾きは有意ではありません。よって，Jonly条件のときも熟達度によって容認性判断課題の正答率が変わるとは言えないようです。

では交互作用が意味するところはなんなのでしょうか。可能性として，JonlyとBaselineの正答率の差が，熟達度によって変わるということが考えられます。これを調べるために，熟達度テストスコアのある特定の点におけるJonlyとBaselineの差をemmeans()関数を使って調べてみましょう。

emmeans(m_n4[[2]],~z.oqpt*Itemtype_d_J,at=list(Itemtype_d_J=c("Jonly","Baseline"),z.oqpt=c(min(oqpt_sum$z.oqpt):max(oqpt_sum$z.oqpt))))%>% #範囲は熟達度テストのzスコアの最小値から最大値まで
contrast(.,"pairwise",by="z.oqpt")

## z.oqpt = -1.864:
##  contrast         estimate    SE  df z.ratio p.value
##  Jonly - Baseline   -0.633 0.352 Inf  -1.798  0.0722
## 
## z.oqpt = -0.864:
##  contrast         estimate    SE  df z.ratio p.value
##  Jonly - Baseline   -0.912 0.293 Inf  -3.118  0.0018
## 
## z.oqpt =  0.136:
##  contrast         estimate    SE  df z.ratio p.value
##  Jonly - Baseline   -1.192 0.273 Inf  -4.363  <.0001
## 
## z.oqpt =  1.136:
##  contrast         estimate    SE  df z.ratio p.value
##  Jonly - Baseline   -1.471 0.302 Inf  -4.879  <.0001
## 
## z.oqpt =  2.136:
##  contrast         estimate    SE  df z.ratio p.value
##  Jonly - Baseline   -1.751 0.367 Inf  -4.774  <.0001
## 
## Results are given on the log odds ratio (not the response) scale.

解釈

z.oqpt = -1.864のとき（熟達度が最も低いとき），2水準の差は非有意になっています。
しかしながら，熟達度が高くなれば，JonlyとBaselineの差は統計的に有意になっています。
この推定値は，Baselineを0としたダミー変数を使ったモデルでは係数の符号が逆になりますが，値は同じです（下記参照）。

emmeans(m_n4[[1]],~z.oqpt*Itemtype_d_Base,at=list(Itemtype_d_Base=c("Baseline","Jonly"),z.oqpt=c(min(oqpt_sum$z.oqpt):max(oqpt_sum$z.oqpt))))%>% #範囲は熟達度テストのzスコアの最小値から最大値まで
contrast(.,"pairwise",by="z.oqpt")

## z.oqpt = -1.864:
##  contrast         estimate    SE  df z.ratio p.value
##  Baseline - Jonly    0.633 0.352 Inf   1.798  0.0722
## 
## z.oqpt = -0.864:
##  contrast         estimate    SE  df z.ratio p.value
##  Baseline - Jonly    0.912 0.293 Inf   3.118  0.0018
## 
## z.oqpt =  0.136:
##  contrast         estimate    SE  df z.ratio p.value
##  Baseline - Jonly    1.192 0.273 Inf   4.363  <.0001
## 
## z.oqpt =  1.136:
##  contrast         estimate    SE  df z.ratio p.value
##  Baseline - Jonly    1.471 0.302 Inf   4.880  <.0001
## 
## z.oqpt =  2.136:
##  contrast         estimate    SE  df z.ratio p.value
##  Baseline - Jonly    1.751 0.367 Inf   4.774  <.0001
## 
## Results are given on the log odds ratio (not the response) scale.

zスコア化したOQPTスコアは素点ではどのくらいでしょうか？

oqpt_sum %>% 
  arrange(oqpt) %>% 
  head

##   oqpt    z.oqpt subject
## 1   26 -1.863628      33
## 2   28 -1.549948       5
## 3   28 -1.549948      26
## 4   31 -1.079428      13
## 5   31 -1.079428      16
## 6   31 -1.079428      27

見てみると，26点です。これはCEFRでいうとA2レベルくらいです。A2レベルくらいだと，Baselineに対して「これは英語として自然ではない」と考えるのとJonlyに対して「これは英語として自然ではない」と答えるのが同じくらいの確率だということになります。

ただし，26点の学習者はたった1名であり，_p_値も有意水準の5%に近いので，この結果を強く主張することには慎重になるべきでしょう。

別の方法（余談）

今回は時間の関係でemmeans()関数を使いましたが，GLMMモデルの中で熟達度の影響を見ることもできます
z.oqpt変数の任意のポイントを基準点の0(reference-level)にしてモデルに投入し，その時のもう一方の変数（今回の場合はItemtype_d_J）の係数を見ます
0, 25, 50, 75, 100パーセンタイルを基準点にすることを考えてみます

dat_n2 %>% 
  dplyr::distinct(z.oqpt) %>% 
  pull(z.oqpt) %>% #1列だけ取ってくる関数です
  quantile(., probs = c(0,0.25, 0.5, 0.75,1.00))->percentiles #パーセンタイルを取得してpercentilesという変数に入れる
print(percentiles)

##          0%         25%         50%         75%        100% 
## -1.86362839 -0.72653830  0.01845177  0.92028184  2.37105197

0パーセンタイル（最小値）の時を例にします。

dat_n2$p0 <- dat_n2$z.oqpt-percentiles[1] 
dat_n2 %>% 
  filter(p0 == 0) %>% 
  head()

##   oqpt    z.oqpt subject itemID pres.order               Item Itemtype Length
## 1   26 -1.863628      33     47         73 active personality    Jonly     18
## 2   26 -1.863628      33     48        105          cheap pay    Jonly      9
## 3   26 -1.863628      33     49         16       dark society    Jonly     12
## 4   26 -1.863628      33     50        121   delicious season    Jonly     16
## 5   26 -1.863628      33     51         52            far eye    Jonly      7
## 6   26 -1.863628      33     52        122     frank question    Jonly     14
##   ColFreq AdjFreq NounFreq MIScore answer res   RT ColFreq_log AdjFreq_log
## 1       1   37799    18372       1  FALSE   0 1401     0.00000   10.540038
## 2       1   16483   111661       1  FALSE   0 1794     0.00000    9.710085
## 3       1   84962    94963       1  FALSE   0 1777     0.00000   11.349959
## 4       1    7916   114909       1  FALSE   1 1162     0.00000    8.976641
## 5       1  191270    59867       1  FALSE   1  801     0.00000   12.161441
## 6       7   36780   178198       1  FALSE   0 1169     1.94591   10.512709
##   NounFreq_log MIScore_log order           JP rating RaterA RaterB Agreement
## 1     9.818583           0     3 積極的な性格      1     NA     NA        NA
## 2    11.623223           0     6     安い料金      1     NA     NA        NA
## 3    11.461243           0    54       闇社会      1     NA     NA        NA
## 4    11.651896           0     7 美味しい季節      1     NA     NA        NA
## 5    10.999881           0    45       遠い目      3     NA     NA        NA
## 6    12.090651           0    19     軽い質問      3     NA     NA        NA
##   Final z.ColFreq_log z.AdjFreq_log z.NounFreq_log z.MIScore_log   z.rating
## 1    NA    -0.5850149    -0.1677051    -0.97279932           NaN -1.2047577
## 2    NA    -0.5850149    -1.0319593     0.55688039           NaN -1.2047577
## 3    NA    -0.5850149     0.6756896     0.41957999           NaN -1.2047577
## 4    NA    -0.5850149    -1.7957153     0.58118470           NaN -1.2047577
## 5    NA    -0.5850149     1.5207093     0.02851249           NaN  0.6561062
## 6    NA     1.7308412    -0.1961630     0.95308956           NaN  0.6561062
##   Itemtype_c Itemtype_d_Base Itemtype_d_J p0
## 1        0.5               1            0  0
## 2        0.5               1            0  0
## 3        0.5               1            0  0
## 4        0.5               1            0  0
## 5        0.5               1            0  0
## 6        0.5               1            0  0

こうすると，p0という変数はz.oqptが-1.8636284のときに0となっていることがわかります。つまり，p0という変数は，z.oqptが最小値のときがreference levelになります。よって，z.oqptのかわりにp0という変数を入れて次のようにモデルを作ったときのもう一方の変数（Itemtype_d_J）の係数を見る（頑健にいくなら95%CIを計算して信頼区間がゼロを含むかを確認）ことで，熟達度が最小値のときのもう一方の変数の影響を検討できます。

m_n5<-list()
system.time(m_n5[[1]] <-glmer(res ~ z.ColFreq_log+p0*Itemtype_d_J+(1+Itemtype_d_J|subject)+(1|itemID),
             data = dat_n2,
             family=binomial,
             glmerControl(optimizer=c("bobyqa"),optCtrl = list(maxfun=2e5))))

##    user  system elapsed 
##   1.113   0.003   1.116

summary(m_n5[[1]])

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: res ~ z.ColFreq_log + p0 * Itemtype_d_J + (1 + Itemtype_d_J |  
##     subject) + (1 | itemID)
##    Data: dat_n2
## Control: glmerControl(optimizer = c("bobyqa"), optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##   2893.6   2947.6  -1437.8   2875.6     2967 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.3958 -0.4533  0.3137  0.5080  2.8731 
## 
## Random effects:
##  Groups  Name         Variance Std.Dev. Corr
##  itemID  (Intercept)  1.02946  1.0146       
##  subject (Intercept)  0.52733  0.7262       
##          Itemtype_d_J 0.07928  0.2816   0.59
## Number of obs: 2976, groups:  itemID, 93; subject, 32
## 
## Fixed effects:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      0.63995    0.39742   1.610 0.107344    
## z.ColFreq_log   -0.41004    0.11722  -3.498 0.000468 ***
## p0              -0.09596    0.15528  -0.618 0.536593    
## Itemtype_d_J     0.63272    0.35196   1.798 0.072220 .  
## p0:Itemtype_d_J  0.27957    0.11680   2.393 0.016689 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) z.ClF_ p0     Itm__J
## z.ColFrq_lg  0.006                     
## p0          -0.748  0.002              
## Itemtyp_d_J -0.450 -0.017  0.129       
## p0:Itmty__J  0.152 -0.006 -0.198 -0.631

上でやったemmeans関数の出力結果と見比べてみてください。推定値（Estimate)と有意確率が同じになっていると思います。これを同じことを，25, 50, 75, 100パーセンタイルでもやれば，emmeansと同じことができます（どのポイントで推定するかはemmeansでもこの方法でも任意に選べます）。

結果の報告

分析の結果を報告する際には，下記を報告しましょう。
- 推定値（Estimate）
- 標準誤差（SE）
- 統計量（_z_値または_t_値）
- _p_値
- 変量効果の標準偏差
- データポイント数
- モデル式
- 推定値の95%信頼区間 -（R二乗）

95%信頼区間の計算

95%信頼区間の計算の最も簡単な方法は，confint()関数を使うことです
lme4パッケージがロードされていたら，confint()とするとconfint.merMod()が使われます
変量効果の信頼区間も計算すると，かなり時間がかかります
1つの固定効果の推定値の信頼区間を求めるだけでも，データセットの大きさによっては数時間以上かかることもあります（推定方法によって早くできるものもありますが，基本的には早い＝不正確）

おわりに

ロジスティック回帰については，下記のテクニカルレポートと，それに基づくWS資料もあります。合わせてお読みいただくと理解が深まるかと思います
- 田村祐（2016）「外国語教育研究における二値データの分析－ロジスティック回帰を例に－」『外国語教育メディア学会中部支部外国語教育基礎研究部会2015年度報告論集』29–82.リンク
- 田村祐 (2019). 「統計ワークショップ」JACET英語語彙・英語辞書・リーディング研究会合同研究会. 早稲田大学.資料
今回はわりとデータ分析のかなり泥臭い部分満載でしたが，これを補う資料が下記の資料です
一般化線形混合モデルの実践：気をつけたい3つのポイント
論文での報告など，今回扱いきれなかった部分は上記資料と次ページの参考文献を御覧ください

参考文献リスト

Barr, D. J., Levy, R., Scheepers, C., and Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of memory and language, 68(3), 255–278.
Bates, D., Kliegl, R., Vasishth, S., & Baayen, H. (2015). Parsimonious Mixed Models. https://arxiv.org/abs/1506.04967v2
Brauer, M., & Curtin, J. J. (2018). Linear mixed-effects models and the analysis of nonindependent data: A unified framework to analyze categorical and continuous independent variables that vary within-subjects and/or within-items. Psychological Methods, 23(3), 389–411. https://doi.org/10.1037/met0000159
Brysbaert, M., & Stevens, M. (2018). Power Analysis and Effect Size in Mixed Effects Models: A Tutorial. Journal of Cognition, 1(1), 9. https://doi.org/10.5334/joc.10
Burnham, K. P., & Anderson, D. R. (2004). Multimodel Inference: Understanding AIC and BIC in Model Selection. Sociological Methods & Research, 33(2), 261–304. https://doi.org/10.1177/0049124104268644
Frossard, J., & Renaud, O. (2019). Choosing the correlation structure of mixed effect models for experiments with stimuli. https://arxiv.org/abs/1903.10766v3
Gries, S. T. (2021). (Generalized Linear) Mixed-Effects Modeling: A Learner Corpus Example. Language Learning, 71(3), 757–798. https://doi.org/10.1111/lang.12448
Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing Type I error and power in linear mixed models. Journal of Memory and Language, 94, 305–315. https://doi.org/10.1016/j.jml.2017.01.001
Meteyard, L., & Davies, R. A. I. (2020). Best practice guidance for linear mixed-effects models in psychological science. Journal of Memory and Language, 112, 104092. https://doi.org/10.1016/j.jml.2020.104092
Murakami, A. (2016). Modeling Systematicity and Individuality in Nonlinear Second Language Development: The Case of English Grammatical Morphemes: Modeling Individual Nonlinear Development. Language Learning, 66(4), 834–871. https://doi.org/10.1111/lang.12166
RPubs—Reduction of Complexity of Linear Mixed Models with Double-Bar Syntax. (n.d.). Retrieved November 3, 2021, from https://rpubs.com/Reinhold/22193
RPubs—The Correlation Parameter in the Random Effects of Mixed Effects Models. (n.d.). Retrieved November 3, 2021, from https://rpubs.com/yjunechoe/correlationsLMEM
Schad, D. J., Vasishth, S., Hohenstein, S., & Kliegl, R. (2020). How to capitalize on a priori contrasts in linear (mixed) models: A tutorial. Journal of Memory and Language, 110, 104038. https://doi.org/10.1016/j.jml.2019.104038
Scherbaum, C. A., & Ferreter, J. M. (2009). Estimating Statistical Power and Required Sample Sizes for Organizational Research Using Multilevel Modeling. Organizational Research Methods, 12(2), 347–367. https://doi.org/10.1177/1094428107308906
Should we fit maximal linear mixed models? | R-bloggers. (2014, November 25). https://www.r-bloggers.com/2014/11/should-we-fit-maximal-linear-mixed-models/
Tamura, Y., Fukuta, J., Nishimura, Y., Harada, Y., Hara, K., & Kato, D. (2019). Japanese EFL learners’ sentence processing of conceptual plurality: An analysis focusing on reciprocal verbs. Applied Psycholinguistics,41, 59–91. https://doi.org/10.1017/S0142716418000450
新井学, & Roland D. (2016). 「言語理解研究における眼球運動データ及び読み時間データの統計分析」. 統計数理, 64(2), 201–231.
井関龍太. (2020). 「心理学者は反応時間をどう分析するか」. 基礎心理学研究, 38(2), 243–249.
久保拓弥 (2012). 『データ解析のための統計モデリング入門』東京: 岩波書店.
清水裕士 (2014).『個人と集団のマルチレベル分析』京都：ナカニシヤ出版.
田中豊・森川敏彦・山中竹春・冨田誠 (2008) 『一般化線形モデル入門原著第2版』共立出版.
田村祐（2016）「外国語教育研究における二値データの分析－ロジスティック回帰を例に－」『外国語教育メディア学会中部支部外国語教育基礎研究部会2015年度報告論集』29–82.リンク
山口剛. (2018). 【心理学系】混合効果モデルの利用: 刺激の変量効果への対処 (2). 日本バイオフィードバック学会. https://doi.org/10.20595/jjbf.45.2_93