Nguyen D, Rosseel L, Grieve J. On learning and representing social meaning in NLP: a sociolinguistic perspective. In: Toutanova K, Rumshisky A, Zettlemoyer L, Hakkani-Tur D, Beltagy I, Bethard S, Cotterell R, Chakraborty T, Zhou Y, editors. Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies. Association for Computational Linguistics; 2021. pp. 603–12. https://doi.org/10.18653/v1/2021.naacl-main.50. https://aclanthology.org/2021.naacl-main.50.
Plank B, Jensen KN, van der Goot R. DaN+: danish nested named entities and lexical normalization. In: Scott D, Bel N, Zong C, editors. Proceedings of the 28th international conference on computational linguistics. Barcelona, Spain (Online): International Committee on Computational Linguistics; 2020. pp. 6649–62. https://doi.org/10.18653/v1/2020.coling-main.583. https://aclanthology.org/2020.coling-main.583.
Zupan K, Ljubešić N, Erjavec T. How to tag non-standard language: normalisation versus domain adaptation for slovene historical and user-generated texts. Nat Lang Eng. 2019;25(5):651–74. https://doi.org/10.1017/S1351324919000366.
van der Goot R, van Noord G. Parser adaptation for social media by integrating normalization. In: Barzilay R, Kan M-Y, editors. Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 2: Short Papers). Vancouver, Canada: Association for Computational Linguistics; 2017; pp. 491–7. https://doi.org/10.18653/v1/P17-2078. https://aclanthology.org/P17-2078.
Sidarenka U. Sentiment analysis of german twitter. PhD thesis; 2019. https://doi.org/10.25932/PUBLISHUP-43742. https://publishup.uni-potsdam.de/43742.
Bhat I, Bhat RA, Shrivastava M, Sharma D. Universal dependency parsing for Hindi-English code-switching. In: Walker M, Ji H, Stent A, editors. Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long Papers). New Orleans, Louisiana: Association for Computational Linguistics; 2018. pp. 987–98. https://doi.org/10.18653/v1/N18-1090. https://aclanthology.org/N18-1090.
Karamanolakis G, Mukherjee S, Zheng G, Awadallah AH. Self-training with weak supervision. In: Toutanova K, Rumshisky A, Zettlemoyer L, Hakkani-Tur D, Beltagy I, Bethard S, Cotterell R, Chakraborty T, Zhou Y, editors. Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies. Association for Computational Linguistics, Online; 2021. pp. 845–63. https://doi.org/10.18653/v1/2021.naacl-main.66. https://aclanthology.org/2021.naacl-main.66.
van der Goot R, Ramponi A, Zubiaga A, Plank B, Muller B, San Vicente Roncal I, Ljubešić N, Çetinoğlu Ö, Mahendra R, Çolakoğlu T, Baldwin T, Caselli T, Sidorenko W. MultiLexNorm: a shared task on multilingual lexical normalization. In: Xu W, Ritter A, Baldwin T, Rahimi A, editors. Proceedings of the seventh workshop on noisy user-generated text (W-NUT 2021). Association for Computational Linguistics, Online; 2021. pp. 493–509. https://doi.org/10.18653/v1/2021.wnut-1.55. https://aclanthology.org/2021.wnut-1.55.
Baldwin T, de Marneffe MC, Han B, Kim Y-B, Ritter A, Xu W. Shared tasks of the 2015 workshop on noisy user-generated text: twitter lexical normalization and named entity recognition. In: Xu W, Han B, Ritter A, editors. Proceedings of the workshop on noisy user-generated text. Association for Computational Linguistics, Beijing, China; 2015. pp. 126–35. https://doi.org/10.18653/v1/W15-4319. https://aclanthology.org/W15-4319.
Jin N. NCSU-SAS-ning: candidate generation and feature engineering for supervised lexical normalization. In: Xu W, Han B, Ritter A, editors. Proceedings of the workshop on noisy user-generated text. Association for Computational Linguistics, Beijing, China; 2015. pp. 87–92. https://doi.org/10.18653/v1/W15-4313. https://aclanthology.org/W15-4313.
Akhtar MS, Sikdar UK, Ekbal A. IITP: hybrid approach for text normalization in Twitter. In: Xu W, Han B, Ritter A, editors. Proceedings of the workshop on noisy user-generated text. Association for Computational Linguistics, Beijing, China; 2015, pp. 106–10. https://doi.org/10.18653/v1/W15-4316. https://aclanthology.org/W15-4316.
Supranovich D, Patsepnia V. IHS_RD: lexical normalization for English tweets. In: Xu W, Han B, Ritter A, editors. Proceedings of the workshop on noisy user-generated text. Association for Computational Linguistics, Beijing, China; 2015. pp. 78–81. https://doi.org/10.18653/v1/W15-4311. https://aclanthology.org/W15-4311.
Min W, Mott B. NCSU_SAS_WOOKHEE: a deep contextual long-short term memory model for text normalization. In: Xu W, Han B, Ritter A, editors. Proceedings of the workshop on noisy user-generated text. Association for Computational Linguistics, Beijing, China; 2015. pp. 111–119. https://doi.org/10.18653/v1/W15-4317. https://aclanthology.org/W15-4317.
Wagner J, Foster J. DCU-ADAPT: learning edit operations for microblog normalisation with the generalised perceptron. In: Xu W, Han B, Ritter A, editors. Proceedings of the workshop on noisy user-generated text. Association for Computational Linguistics, Beijing, China; 2015. pp. 93–98. https://doi.org/10.18653/v1/W15-4314. https://aclanthology.org/W15-4314.
van der Goot R. MoNoise: a multi-lingual and easy-to-use lexical normalization tool. In: Costa-jussà MR, Alfonseca E, editors. Proceedings of the 57th annual meeting of the association for computational linguistics: system demonstrations. Association for Computational Linguistics, Florence, Italy; 2019. pp. 201–06. https://doi.org/10.18653/v1/P19-3032. https://aclanthology.org/P19-3032.
Muller B, Sagot B, Seddah D. Enhancing BERT for lexical normalization. In: Xu W, Ritter A, Baldwin T, Rahimi A, editors. Proceedings of the 5th workshop on noisy user-generated text (W-NUT 2019). Association for Computational Linguistics, Hong Kong, China; 2019. pp. 297–306. https://doi.org/10.18653/v1/D19-5539. https://aclanthology.org/D19-5539.
Devlin J, Chang M-W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein J, Doran C, Solorio T, editors. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota; 2019. pp. 4171–86. https://doi.org/10.18653/v1/N19-1423. https://aclanthology.org/N19-1423.
Bucur A-M, Cosma A, Dinu LP. Sequence-to-sequence lexical normalization with multilingual transformers. In: Xu W, Ritter A, Baldwin T, Rahimi A, editors. Proceedings of the seventh workshop on noisy user-generated text (W-NUT 2021). Association for Computational Linguistics, Online; 2021. pp. 473–82. https://doi.org/10.18653/v1/2021.wnut-1.53. https://aclanthology.org/2021.wnut-1.53.
Tang Y, Tran C, Li X, Chen P-J, Goyal N, Chaudhary V, Gu J, Fan A. Multilingual translation from denoising pre-training. In: Zong C, Xia F, Li W, Navigli R, editors. Findings of the association for computational linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online; 2021. pp. 3450–66. https://doi.org/10.18653/v1/2021.findings-acl.304. https://aclanthology.org/2021.findings-acl.304.
Nguyen VH, Nguyen HT, Snasel V. Text normalization for named entity recognition in vietnamese tweets. Comput Soc Netw. 2016;3(1):10. https://doi.org/10.1186/s40649-016-0032-0.
Trang NTT, Bach DX, Tung NX. A hybrid method for vietnamese text normalization. In: Proceedings of the 2019 3rd International conference on natural language processing and information retrieval. NLPIR ’19. Association for Computing Machinery, New York, NY, USA; 2019. pp. 104–09. https://doi.org/10.1145/3342827.3342851.
Dang H-T, Vuong T-H-Y, Phan X-H. Non-standard vietnamese word detection and normalization for text-to-speech. In: 2022 14th international conference on Knowledge and Systems Engineering (KSE); 2022. pp. 1–6. https://doi.org/10.1109/KSE56063.2022.9953791.
Nguyen AT-H, Nguyen DH, Nguyen T-N, Ho KT-D, Nguyen KV. Automatic textual normalization for hate speech detection. In: Abraham A, Bajaj A, Hanne T, Siarry P, editors. Intelligent systems design and applications. Springer, Cham; 2024. vol. 4.
Nguyen T-N, Le T-P, Nguyen K. ViLexNorm: a lexical normalization corpus for Vietnamese social media text. In: Graham Y, Purver M, editors. Proceedings of the 18th conference of the European chapter of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, St. Julian’s, Malta; 2024. pp. 1421–37. https://aclanthology.org/2024.eacl-long.85.
Ren P, Xiao Y, Chang X, Huang P-Y, Li Z, Gupta BB, Chen X, Wang X. A survey of deep active learning. ACM Comput Surv. 2021;54(9). https://doi.org/10.1145/3472291.
Xie Q, Dai Z, Hovy E, Luong M-T, Le QV. Unsupervised data augmentation for consistency training. In: Proceedings of the 34th international conference on neural information processing systems. NIPS ’20. Curran Associates Inc., Red Hook, NY, USA; 2020.
Wilson G, Cook DJ. A survey of unsupervised deep domain adaptation. ACM Trans Intell Syst Technol. 2020;11(5). https://doi.org/10.1145/3400066.
Hoffmann R, Zhang C, Ling X, Zettlemoyer L, Weld DS. Knowledge-based weak supervision for information extraction of overlapping relations. In: Lin D, Matsumoto Y, Mihalcea R, editors. Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies. Association for Computational Linguistics, Portland, Oregon, USA; 2011. pp. 541–50. https://aclanthology.org/P11-1055.
Yuen M-C, King I, Leung K-S. A survey of crowdsourcing systems. In: 2011 IEEE Third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing; 2011. pp. 766–73. https://doi.org/10.1109/PASSAT/SocialCom.2011.203.
Awasthi A, Ghosh S, Goyal R, Sarawagi S. Learning from rules generalizing labeled exemplars. In: International conference on learning representations; 2020. https://openreview.net/forum?id=SkeuexBtDr.
Mann GS, McCallum A. Generalized expectation criteria for semi-supervised learning with weakly labeled data. J Mach Learn Res. 2010;11(32):955–84.
MathSciNet MATH Google Scholar
Ratner A, Bach SH, Ehrenberg H, Fries J, Wu S, Ré C. Snorkel: rapid training data creation with weak supervision. Proc VLDB Endow. 2017;11(3):269–82. https://doi.org/10.14778/3157794.3157797.
Ratner A, Sa CD, Wu S, Selsam D, Ré C. Data programming: creating large training sets, quickly. In: Proceedings of the 30th international conference on neural information processing systems. NIPS’16. Curran Associates Inc., Red Hook, NY, USA; 2016. pp. 3574–82.
Ren W, Li Y, Su H, Kartchner D, Mitchell C, Zhang C. Denoising multi-source weak supervision for neural text classification. In: Cohn T, He Y, Liu Y, editors. Findings of the association for computational linguistics: EMNLP 2020. Association for Computational Linguistics, Online; 2020. pp. 3739–54. https://doi.org/10.18653/v1/2020.findings-emnlp.334. https://aclanthology.org/2020.findings-emnlp.334.
Mallinar N, Shah A, Ho TK, Ugrani R, Gupta A. Iterative data programming for expanding text classification corpora. Proc AAAI Conf Artif Intell. 2020;34(08):13332–7. https://doi.org/10.1609/aaai.v34i08.7045.
Zhang J, Yu Y, Li Y, Wang Y, Yang Y, Yang M, Ratner A. WRENCH: a comprehensive benchmark for weak supervision. In: Thirty-fifth conference on neural information processing systems datasets and benchmarks track; 2021. https://openreview.net/forum?id=Q9SKS5k8io.
Nguyen N, Phan T, Nguyen D-V, Nguyen K. ViSoBERT: a pre-trained language model for Vietnamese social media text processing. In: Bouamor H, Pino J, Bali K, editors. Proceedings of the 2023 conference on empirical methods in natural language processing. Association for Computational Linguistics, Singapore; 2023. pp. 5191–207. https://doi.org/10.18653/v1/2023.emnlp-main.315. https://aclanthology.org/2023.emnlp-main.315.
Nguyen DQ, Tuan Nguyen A. PhoBERT: pre-trained language models for Vietnamese. In: Cohn T, He Y, Liu Y, editors. Findings of the association for computational linguistics: EMNLP 2020. Association for Computational Linguistics, Online; 2020. pp. 1037–42. https://doi.org/10.18653/v1/2020.findings-emnlp.92. https://aclanthology.org/2020.findings-emnlp.92.
Tran NL, Le DM, Nguyen DQ. Bartpho: pre-trained sequence-to-sequence models for vietnamese. CoRR abs/2109.09701;2021. 2109.09701.
Luu ST, Nguyen KV, Nguyen NL-T. A large-scale dataset for hate speech detection on vietnamese social media texts. In: Fujita H, Selamat A, Lin JC-W, Ali M, editors. Advances and trends in artificial intelligence. Artificial Intelligence Practices. Springer, Cham; 2021. pp. 415–26.
Ho VA, Nguyen DH-C, Nguyen DH, Pham LT-V, Nguyen D-V, Nguyen KV, Nguyen NL-T. Emotion recognition for vietnamese social media text. In: Nguyen L-M, Phan X-H, Hasida K, Tojo S, editors. Computational linguistics. Springer, Singapore; 2020. pp. 319–33.
Hoang PG, Luu CD, Tran KQ, Nguyen KV, Nguyen NL-T. ViHOS: hate speech spans detection for Vietnamese. In: Vlachos A, Augenstein I, editors. Proceedings of the 17th conference of the european chapter of the association for computational linguistics. Association for Computational Linguistics, Dubrovnik, Croatia; 2023. pp. 652–69. https://doi.org/10.18653/v1/2023.eacl-main.47. https://aclanthology.org/2023.eacl-main.47.
Van Dinh C, Luu ST, Nguyen AG-T. Detecting spam reviews on vietnamese e-commerce websites. In: Nguyen NT, Tran TK, Tukayev U, Hong T-P, Trawiński B, Szczerbicki E, editors. Intelligent information and database systems. Springer, Cham; 2022. pp. 595–607.
Luc Phan L, Huynh Pham P, Thi-Thanh Nguyen K, Khai Huynh S, Thi Nguyen T, Thanh Nguyen L, Van Huynh T, Van Nguyen K. Sa2sl: from aspect-based sentiment analysis to social listening system for business intelligence. In: Qiu H, Zhang C, Fei Z, Qiu M, Kung S-Y, editors. Knowledge science, engineering and management. Springer: Cham; 2021. p. 647–58.
Ayetiran EF, Özgöbek Ö. An inter-modal attention-based deep learning framework using unified modality for multimodal fake news, hate speech and offensive language detection. Inf Syst. 2024;123: 102378. https://doi.org/10.1016/j.is.2024.102378.
Ayetiran EF. Attention-based aspect sentiment classification using enhanced learning through cnn-bilstm networks. Knowl-Based Syst. 2022;252: 109409. https://doi.org/10.1016/j.knosys.2022.109409.
Comments (0)