AI Developments: Report #2
- Johnny Shollaj
- Technology, Data
- 21 Aug, 2023
Another week where we noticed an exponential development in LLMs and more efforts to moderate answers. Progress has been noted also in multimodal and recommender system development, while more AI startups are receiving massive hype.
What is a liquid neural network, really?
Main Highlight: Liquid neural networks, a concept developed by Ramin Hasani and researchers at MIT, offer a flexible and adaptable approach to machine learning. These networks can scale down to have fewer but richer nodes, making them more efficient and less computationally expensive.
Applications of these systems in robotics show potential for real-world use, including using Raspberry Pi for complex reasoning, reducing the number of neurons for faster solutions, and creating more transparent decision-making processes. These networks require time series data and could help in safety-critical systems by avoiding unnecessary mistakes.
Brian Heater | Tech Crunch
Publication Date: August 18, 2023
Artificial intelligence for augmentation and productivity
Main Highlight: The MIT Schwarzman College of Computing has awarded seed grants to seven interdisciplinary projects focusing on AI-augmented management. Funded by Andrew W. Houston and Dropbox Inc., these projects aim to leverage artificial intelligence and human-computer interaction to enhance modern workspaces, achieving better management and higher productivity.
The selected projects span a wide range of applications, including memory prosthetics, social scenario simulation, healthcare improvement, and democratizing programming, demonstrating the potential breadth of impact on various sectors of society and the economy.
Schwarzman College of Computing | MIT
Publication Date: August 18, 2023
Reallusion Elevates Character Animation Workflows With Two-Way Live Sync and OpenUSD Support
Main Highlight: Reallusion has updated its iClone Omniverse Connector, enhancing character animation workflows by offering real-time previews, a bidirectional workflow with Omniverse, and improved support for OpenUSD.
The update facilitates seamless collaboration and expands creative possibilities by including real-time synchronization of projects and enhanced import functionality. These features are designed to make work between iClone and Omniverse quicker and more efficient, and come along with additional bug fixes and improvements.
DANE JOHNSTON | NVIDIA
Publication Date: August 16, 2023
OpenAI acquires Global Illumination
Main Highlight: OpenAI has acquired the team at Global Illumination, a company known for leveraging AI in creative tools and digital experiences. The team, who previously contributed to major companies like Instagram, Facebook, YouTube, Google, Pixar, and Riot Games, has joined OpenAI to work on core products including ChatGPT.
OpenAI | Various Authors
Publication Date: August 16, 2023
Using GPT-4 for content moderation
Main Highlight: GPT-4 is being used for content policy development and moderation, significantly speeding up the iteration on policy changes from months to hours and allowing more consistent labeling.
By understanding and interpreting rules and nuances in content policies, GPT-4 is able to adapt instantly to updates, reducing reliance on human moderators.
OpenAI | Various Authors
Publication Date: August 16, 2023
Research Blogs
Autonomous visual information seeking with large language models
Main Highlight: Google Research has introduced "AVIS: Autonomous Visual Information Seeking with Large Language Models", a novel method that elevates the performance of large language models (LLMs) on visual information seeking tasks.
Utilizing three types of tools - computer vision tools, web search tools, and image search tools - AVIS uses an LLM-powered planner and reasoner to extract and analyze information from various sources.
The method achieved state-of-the-art results on visual information seeking datasets, showing a significant improvement in accuracy and efficiency by incorporating human decision-making data and using a dynamic, structured approach to decision-making.
Ziniu Hu, Alireza Fathi, | Google
Publication Date: August 18, 2023
Consistent Collaborative Filtering via Tensor Decomposition
Main Highlight: A new model called Sliced Anti-symmetric Decomposition (SAD) has been developed by Apple, for collaborative filtering in recommendation systems, based on implicit feedback.
Unlike traditional methods that estimate latent vectors for users and items, SAD introduces an additional latent vector for each item using a novel three-way tensor view. This innovation extends user-item preferences, suggesting that users may have nonlinear mental models when evaluating items.
The model has demonstrated efficiency in both simulated and real-world datasets, outperforming seven state-of-the-art collaborative filtering models in consistency and accuracy of personalized preferences.
Shiwen Zhao, Charles Crissman, Guillermo R Sapiro | Apple
Publication Date: August 16, 2023
Which GPU(s) to Get for Deep Learning
Main Highlight: Tim Dettmers provides comprehensive advice and insights on selecting the best GPU for deep learning, emphasizing the importance of GPU RAM, cores, tensor cores, and caches.
The blog post is structured to guide readers through the details of GPU functionality, including comparisons between CPU and GPU, the unique features of NVIDIA's RTX 40 Ampere series, and practical recommendations for various scenarios.
The guide also includes a Q&A section to tackle common misconceptions and address specific questions like cloud vs desktop considerations, cooling strategies, and the comparison between AMD and NVIDIA.
Tim Dettmers
Updated Publication Date: August 8, 2023
Scientific discovery in the age of artificial intelligence
Main Highlight: The integration of AI into scientific discovery has led to breakthroughs such as self-supervised learning and geometric deep learning, augmenting traditional research methods.
These advancements enable the creation of new designs and help scientists throughout the research process, although challenges remain in data quality and the understanding of AI's capabilities and limitations.
Wang, Fu, Du, Huang, Deac, Gao, Liu, Bengio, et.al
Publication Date: August 2, 2023
Research Papers
Scaling Laws for Generative Mixed-Modal Language Models
Main Highlight: The study was conducted to explore the scaling properties of mixed-modal generative language models, examining the interaction between different modalities like text, speech, images, and code.
The research, including over 250 experiments with seven modalities and varying model sizes, has identified new mixed-modal scaling laws that capture both individual modalities and their interactions, predicting competition and synergy.
The findings provide insights into the design and training of mixed-modal models, including guidelines for hyperparameter selection, and will likely advance the development of unified models handling multiple modalities simultaneously.
Aghajanyan, Yu, Conneau, Hsu
Publication Date: 10 Jan , 2023
Sigmoid Loss for Language Image Pre-Training
Main Highlight:This new sigmoid loss method performs better, particularly at smaller batch sizes, and allows for further scaling up of the batch size, without requiring additional resources. With this method, the researchers were able to achieve up to 84.5% ImageNet zero-shot accuracy within two days, making it a promising advancement for image-text pre-training.
Zhai, Mustafa, Kolesnikov, Beyer
Publication Date: 4 May , 2023
“Do Anything Now”: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
Main Highlight:The paper focuses on the misuse of Large Language Models (LLMs) and the emergence of "jailbreak prompts," which are crafted to bypass safeguards and elicit harmful content from LLMs.
Through an extensive study of 6,387 prompts, it identifies characteristics and major attack strategies of jailbreak prompts, such as prompt injection and privilege escalation. The paper also highlights that current LLMs and safeguards are not adequately defending against these threats, with some jailbreak prompts achieving a 0.99 attack success rate on models like ChatGPT and GPT-4, and stresses the need for stronger defense mechanisms.
Shen, Chen, Backes, Zhang
Publication Date: 7 Aug , 2023
Platypus: Quick, Cheap, and Powerful Refinement of LLMs
Main Highlight:Platypus, a family of fine-tuned and merged Large Language Models (LLMs), has achieved the strongest performance and tops HuggingFace's Open LLM Leaderboard.
The project introduces Open-Platypus, a curated dataset released to the public, and describes the process of fine-tuning and merging LoRA modules to bring specific domain knowledge to the forefront.
The 13B Platypus model can be trained on a single A100 GPU in 5 hours with 25k questions, offering strong performance with significantly less data and compute resources than other state-of-the-art LLMs, opening opportunities for more improvements in the field.
Lee, Hunter, Ruiz
Publication Date: 14 Aug , 2023
TeCH: Text-guided Reconstruction of Lifelike Clothed Humans
Main Highlight:TeCH has introduced a new method for reconstructing 3D clothed human figures from a single image, addressing the unsolved challenge of accurately restoring "unseen regions" of the body. The method leverages descriptive text prompts and a personalized Text-to-Image diffusion model to optimize the 3D human's geometry and texture.
The technology shows promise for various applications in augmented and virtual reality, gaming, and more, but also raises concerns regarding potential misuse for deep-fake avatars and intellectual property issues.
Huang, Yi, Xiu et.al
Publication Date: 16 Aug , 2023
XSTEST: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
Main Highlight:This paper introduces a test suite called XSTEST to identify exaggerated safety behaviors in large language models (LLMs), such as refusing safe requests due to misinterpretation as unsafe.
Using XSTEST, the authors found that the Llama2 model exhibits substantial exaggerated safety behavior, refusin1g 38% of test prompts and partially refusing another 22%.
The findings suggest that this behavior is a result of lexical overfitting, making models overly sensitive to certain words, and that OpenAI's GPT-4 is better calibrated in comparison.
Röttger, Kirk, Vidgen, Attanasio, Bianchi et.al
Publication Date: 2 Aug , 2023
Development
Llama-GPT
Main Highlight:A self-hosted, offline, ChatGPT-like chatbot, powered by Llama 2. 100% private, with no data leaving your device.
Author: GetUmbrel
Danswer
Main Highlight: Danswer allows you to ask natural language questions against internal documents and get back reliable answers backed by quotes and references from the source material so that you can always trust what you get back. You can connect to a number of common tools such as Slack, GitHub, Confluence, amongst others.
Author: Danswer-AI
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
Main Highlight: This repo introduces the content deformation field CoDeF, a new video representation comprising a canonical content field for static content and a temporal deformation field for transformations along time.
The CoDeF system has been designed to lift image algorithms for video processing, enabling image-to-image translation to be adapted for video-to-video translation and keypoint detection for keypoint tracking without training. This approach provides superior cross-frame consistency in processed videos and can track non-rigid objects like water and smog.
Author: Ouyang, Wang, Xiao, Bai, Zhang, Zhou et.al
Practical Tutorials and Resources
Stanford Course on LLMs - Notes are open-sourced in the given links
Liang, Hashimoto, Re, Bommasani, Xie | Stanford
Large Language Models: Foundation Models from the Ground Up
Databricks course available on EDX, starting from ground up to the main elements of Generative AI. Complementary for the LLM specialty certification.
Zaharia, Raymond, Eng | Databricks
Various tutorials which integrate LangChain with specific tools or frameworks, such as Predibase, Zep or Qdrant
LangChain et. al
DeepLearning AI: Large Language Models with Semantic Search
Free tutorial teaching concepts like dense retrieval, which elevates the relevance of retrieved information, leading to improved search results beyond traditional keyword search, and reranking, which injects the intelligence of LLMs into asearch system, making it faster and more effective.
Cohere x DeepLearning
DeepLearning AI: Evaluating and Debugging Generative AI
Free tutorial teaching concepts like monitoring and tracing of LLMs over time in complex interactions and properly debugging by use case.
W&B x DeepLearning
DeepLearning AI: Building Generative AI Applications with Gradio
Free tutorial teaching how to create apps which integrate easily Generative AI (Hugging Face Integrations).
Hugging Face x DeepLearning