Innodata has released an open-source LLM Evaluation Toolkit, together with a repository of 14 semi-synthetic and human-crafted evaluation datasets, that enterprises can utilize for evaluating the safety of their Large Language Models in the context of enterprise tasks. Using the toolkit and the datasets, data scientists can automatically test the safety of underlying LLMs across multiple harm categories simultaneously. By identifying the precise input conditions that generate problematic outputs, developers can understand how their AI systems respond to a variety of prompts and can identify remedial fine-tuning required to align the systems to the desired outcomes. Innodata encourages enterprise LLM developers to begin utilizing the toolkit and the published data sets as-is. Innodata expects a commercial version of the toolkit and more extensive, continually-updated benchmarking datasets to become available later this year.
Published first on TheFly – the ultimate source for real-time, market-moving breaking financial news. Try Now>>
Read More on INOD:
- Innodata awarded three LLM development programs by Big Tech customer
- Class Action Lawsuit against Innodata, Inc. (NASDAQ:INOD)
- Innodata launches Intelligent Insights feature
- Australian Stocks: Appen (APX) Shares Plummet on Failed Takeover
- Innodata Withdraws Appen Acquisition Bid, Affirms Projections