Dhanshree Shripad Shenwai, Author at MarkTechPost

Weight Scope Alignment Method that Utilizes Weight Scope Regularization to Constrain the Alignment of Weight Scopes during Training

Dhanshree Shripad Shenwai — Fri, 06 Sep 2024 10:00:00 +0000

Model fusion involves merging multiple deep models into one. One intriguing potential benefit of model interpolation is its potential to enhance researchers’ understanding of the features of neural networks’ mode connectivity. In the context of federated learning, intermediate models are typically sent across edge nodes before being merged on the server. This process has sparked significant interest among researchers due to its importance in various applications. The primary goal of model fusion is to enhance generalizability, efficiency, and robustness while preserving the original models’ capabilities.

The method of choice for model fusing in deep neural networks is coordinate-based parameter averaging. At the same time, federated learning aggregates local models from edge nodes, and mode connectivity research uses linear or piecewise interpolation between models. Parameter averaging has some good qualities. However, it might not work well in more complicated training situations, such as when dealing with Non-Independent and Identically Distributed (Non-I.I.D.) data or different training conditions. For instance, due to the inherent heterogeneity of local node data caused by NonI.I.D. data in federated learning, model aggregation experiences diverging update orientations. Studies also show that neuron misalignment further increases the difficulty of model fusion by the permutation invariance trait that neural networks possess. So, approaches to solving the problem have been put up that aim to regularize elements one by one or reduce the impact of permutation invariance. However, only some of these approaches have considered how different model weight ranges affect model fusion.

A new study by researchers at Nanjing University explores merging models under different weight scopes and the impact of training conditions on weight distributions (referred to as ‘Weight Scope’ in this study). This is the first work that officially investigates the influence of weight scope on model fusion. After conducting multiple experiments under different data quality and training hyper-parameter circumstances, the researchers identified the phenomenon as a ‘weight scope mismatch’. They found that the converged models’ weight scopes differ significantly. Despite all distributions being approximated by Gaussian distributions, the work shows that there are considerable changes in the model weight distributions under different training settings. In particular, the parameters from models using the same optimizer are shown in the top five sub-figures, while models using various optimizers are shown in the bottom ones. Weight range inconsistency impacts model fusion, as is seen from the poor linear interpolation caused by the mismatched weight scope. The researchers explain that it is easier to aggregate parameters with similar distributions than with distinct ones, and merging models with dissimilar parameters can be a real pain.

Every layer’s parameters adhere to a straightforward distribution—the Gaussian distribution. The simple distribution inspires a new and easy method of parameter alignment. The researchers use a target weight scope to direct the training of the models to ensure that the weights and scopes of the merged models are in sync. They aggregate the goal weight scope statistic with the mean and variance of the parameter weights in the to-be-merged models for more complicated multi-stage fusion. Weight Scope Alignment (WSA) is the name of the suggested approach; weight scope regularization and weight scope fusion are the names of the two processes above.

The team studies the benefits of WSA in comparison to related technologies by implementing it in mode connectivity and federated learning situations. By training the weights to be as near to a given distribution as possible, the suggested WSA optimizes for successful model fusion while balancing specificity and generality. It effectively addresses the drawbacks of existing methods and competes with other similar regularization methods such as the proximal term and weight decay, providing valuable insights for researchers and practitioners in the field.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and LinkedIn. Join our Telegram Channel. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit

The post Weight Scope Alignment Method that Utilizes Weight Scope Regularization to Constrain the Alignment of Weight Scopes during Training appeared first on MarkTechPost.

Srcbook: A New Open-Source Application for Prototyping in TypeScript

Dhanshree Shripad Shenwai — Fri, 06 Sep 2024 09:55:00 +0000

The purpose of observables is to serve data visualizations as static webpages and visualize data using plots, charts, graphs, and other techniques. The main focus is on use cases related to business analytics, research, reporting, and data journalism. Explore, and about pages let you see how they see themselves.

Meet Srcbook, a platform that serves as a learning tool and development assistance for regular software development. Srcbook is used to test code or HTTP endpoints (like a scripting version of Postman), explore libraries on npm, develop ideas in code, and more. The AI capabilities of Srcbook can write whole Srcbooks or modify existing ones. Code written by AI can be executed instantly since Srcbook is both a coding and an execution environment. By allowing Srcbook to run your code in addition to creating it, you may utilize it as a better AI playground and accelerate development cycles if you want to employ AI capabilities. Keep in mind that the code cannot be run without your permission.

The ideal method to prototype code, explore libraries, study concepts, and quickly iterate on your ideas using TypeScript is to use Srcbook, a notebook-style programming environment. It has strong AI capabilities, operates locally, and outputs to markdown for optimal mobility. A lovely, open-source, local-first tool for writing and iterating on TypeScript code is called Srcbook.

Key Features

Make, operate, and distribute ideas and programs that can be repeated.
Export to a valid.src.md markdown format.
AI capabilities for concept exploration and iteration
Local implementation via an online interface
Utilizing Node.js
Licensed under the Apache2 terms, open-source
Complete access to the npm ecosystem
AI-supported coding
Exports to valid markdown for version control and simple sharing.

Srcbook is a CLI program with a web interface that runs locally on your computer. Node.js v20+ and nvm for managing local node versions are prerequisites. The command “npm install -g srcbook” makes it simple to install the srcbook application from npm.

Your code is presently executed in a Node.js process by Srcbook. It is therefore more appropriate for backend code or code that is shared by the frontend and backend. This implies that you can operate directly with the file system, create web servers, open database connections, and so on using Srcbook.

In Conclusion

Srcbook (pronounced “source-book”) is a locally executable TypeScript notebook that is open-source and powered by Node.js. It works great for idea collaboration, coding exploration, and quick prototyping. It draws inspiration from Livebook in Elixir and Jupyter in Python.

The post Srcbook: A New Open-Source Application for Prototyping in TypeScript appeared first on MarkTechPost.

Guided Reasoning: A New Approach to Improving Multi-Agent System Intelligence

Dhanshree Shripad Shenwai — Thu, 05 Sep 2024 19:15:00 +0000

Gregor Betz from Logikon AI, KIT introduces Guided Reasoning. A system with more than one agent is a Guided Reasoning system if one agent, called the guide, mostly works with the other agents to improve their Reasoning. A multi-agent system with a guide agent and at least one client agent is called a Guided Reasoning system if the guide works with the clients in a planned and main way to get them to reason in a way that follows a certain method M. One way to describe the reasoning method M is with standards and criteria, clear examples, or detailed rules and directions. Guided Reasoning methods include a coach helping a business unit do a SWOT analysis, a child helping their grandmother solve a crossword problem, and a Socratic dialogue.

At first glance, the case for AI-AI Guided Reasoning is based on these assumptions:

AI should give the right answers and explain them.
AI systems can only honestly explain their answers if they are based on clear thinking.
Bad Reasoning makes it harder for AI systems to give the right replies.
Strong experts in a field don’t always know how to use advanced thinking techniques.

The cognitive specialization principle says that to make AI systems that can be explained and are accurate; more AI experts should be added for reasoning methods (meta-reasoning specialists) who can work with experts in other domains. Guided Reasoning is a good design technique for advanced GenAI apps because it makes it easy to divide the cognitive work.

Logikon’s standard way of using Guided Reasoning mentions that when client agents are faced with a decision problem, they are told to look into and carefully weigh both the pros and cons reasons.

Step 1: The Guided Reasoning method is started when the user query is sent. This might be done immediately by the client model calling a tool-use method or if the user specifically asks for it to be done.
Step 2: The client presents the problem statement to the guide. The guide’s crucial role is to meticulously organize the steps of thinking that will be used to find the answer, providing a clear structure to the process.Step 3: The guide may ask the client questions.
Step 4: The guide gets the client’s answers.
Step 5: The answers are further processed and reviewed.

The guide sets the rules for the thinking process and manages the flow of work, either statically or dynamically. The guide rewrites the problem differently after getting the problem statement (in step 2). Steps 3 and 4 let the client answer the different problem statements without relying on each other. This is called the “chain of thought.” The guide compares the possible answers to determine if the client understands the problem and what they should say in response. The client is given a properly written explanation and a summary of the thinking process (protocol). If the AI hasn’t developed consistent lines of Reasoning and answers to similar problem formulations, the client may respond to the first user question.

After receiving the problem statement, the guide tells the client to think of different ways to solve the problem and list the pros and cons of each possible solution. The guide uses the thinking trace made in this way as a starting point for further analysis. In particular, through a series of steps outlined below, it creates an informal argument map that makes the different arguments put forward during brainstorming clear and shows how they are connected to the competing answer choices directly or indirectly.

A single claim shows each case for the informal argument map.
Next, the guide uses the argument map to get the client to evaluate the arguments in a planned way.
The client is tasked with evaluating the persuasiveness of claim C by examining all the pros and cons that have been deemed reasonable.
This backward, argument-by-argument review starts with the argument map’s leaf nodes and ends with a check of how plausible the main claim(s) are.

The above figure shows users’ steps to put together a controversial argument as a loose (fuzzy) argument map. This is how Logikon normally does direct Reasoning by weighing the pros and cons. Each step in the Logikon Python program is matched with a different analyst class. The analyst classes mostly use internal LLM processes to make the needed logical artifacts.

The IssueBuilder takes the rough thinking reasoning trace and, with the help of expert LLMs, describes the main issue the text is about, which is usually a new way of stating the original problem.
The ProsConsBuilder uses the thinking traces to build a list of pros and cons with multiple roots that address the main issue that was already identified. There are several steps to this method itself: First, from the reasoning trace, all reason statements relevant to the problem are taken out, no matter their valence. In the second step, these reasons are combined in one or more lists of pros and cons. This is the only step where the core root claims are found and added. The final lists of pros and cons are checked for duplicates and thoroughness (based on the reasons given at the start) and changed if needed.
The RelevanceNetworkBuilder uses a set of prompt templates to determine how likely it is that any two reason statements are relevant to each other and any pair of a reason statement and a core claim. This makes a full graph of all the reason statements and main claims, with weighted support and attack relationships. (Any two root claims are thought to contradict each other maximally.)
The FuzzyArgmapBuilder takes the entire graph and uses an optimal branching method to create a tree that connects all the argument nodes with the strongest edges. It then adds more edges with weights higher than a certain level. This process results in a fuzzy argument map, which is then exported in various useful formats. The purpose of the FuzzyArgmapBuilder is to provide a comprehensive and visually intuitive representation of the argumentation process, making it easier to understand and analyze.

Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and LinkedIn. Join our Telegram Channel. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit

The post Guided Reasoning: A New Approach to Improving Multi-Agent System Intelligence appeared first on MarkTechPost.

California’s AI Safety Bill Sparks Controversy in Silicon Valley

Dhanshree Shripad Shenwai — Sun, 01 Sep 2024 11:04:34 +0000

If you regularly follow AI updates, the AI Safety Bill in California should have caught your attention and is causing a lot of debate in Silicon Valley. SB 1047, the Safe and Secure Innovation for Frontier Artificial Intelligence Models Act, was passed by the State Assembly and Senate. This is a big step forward in California’s efforts to control artificial intelligence (AI). The bill has caused a lot of debate in the tech community, especially in Silicon Valley. It is now waiting for Governor Gavin Newsom to make a choice.

What is SB 1047?

SB 1047 is one of the first important AI laws in the US. Its goal is to lower the risks of using advanced AI models. Big AI makers are the ones the bill is aimed at, especially those who are working on models that take a long time to train and cost at least $100 million. The law says that these businesses have to set up strict safety procedures, such as an “emergency stop” button, testing methods to look for possible risks, and yearly checks of their safety procedures by a third party.

The bill also created the Board of Frontier Models, a new governing group whose job is to make sure rules are followed and give advice on safety issues. The governor and legislature will choose the people serving on this board. They will come from the AI industry, universities, and the open-source community.

Supporters vs. Opponents

Supporters of SB 1047 argue that the bill is necessary to prevent potential misuse of AI, such as AI-powered hacking or the development of autonomous weapons. State Senator Scott Wiener, the bill’s author, emphasizes the need for swift action, drawing on past struggles to regulate social media and data privacy.

“Let’s not wait for something bad to happen.” “Let’s just get ahead of it,” Wiener said, emphasizing how important it is to put safety measures in place immediately before AI technologies become a world threat.

Geoffrey Hinton and Yoshua Bengio, two well-known AI experts, have backed the bill because they are worried about the existential risks that unchecked AI development poses. Groups like the Center for AI Safety have also supported the bill, which says that stopping a big AI safety incident is good for the tech industry in the long run.

People in Silicon Valley, on the other hand, are strongly against the bill. SB 1047 could stop people from developing new ideas, especially startups and open-source AI writers. Some venture capital firms, such as Andreessen Horowitz (a16z), are worried that the bill’s standards are not logical and could hurt the AI ecosystem. They say that as AI models get more expensive, more startups will have to follow the strict rules of the bill, which could slow down growth.

Even tech giants like Meta, OpenAI, and Google have voiced their concerns. OpenAI believes that AI-related national security measures should be controlled at the federal level, not by individual states. Yann LeCun, Meta’s top AI scientist, criticizes the bill as an overreaction to what he perceives as an ‘illusion of existential risk.’

Changes and the Way Forward

Because of the negative reactions, several changes were made to SB 1047. For example, possible criminal penalties were changed to civil ones, and California’s attorney general was given less power to implement the law. The changes have made the resistance less strong. Dario Amodei, CEO of Anthropic, said the bill’s benefits now “likely outweigh its costs.”

Even with these changes, the bill is still controversial. Some of the most important people in Silicon Valley, like Congressman Ro Khanna and Speaker Nancy Pelosi, are worried that SB 1047 could hurt California’s innovation environment. The US Chamber of Commerce has also criticized the bill and warned that it might force tech companies to move out of the state.

Governor Newsom’s Decision

The tech business is excited to see what Governor Newsom does with the bill now that it is on his desk. Newsom has until the end of September to turn down the bill or sign it into law. If signed into law, SB 1047 would set a big example for how AI should be regulated in the US. This could have effects on the tech business around the world.

The argument over SB 1047 shows how hard it is to regulate new technologies like AI, even if the bill still needs to become law. California is at the center of the AI revolution and is still trying to figure out how to balance new ideas with safety concerns.

Sources:

https://www.morganlewis.com/pubs/2024/08/californias-sb-1047-would-impose-new-safety-requirements-for-developers-of-large-scale-ai-models#:~:text=The%20California%20State%20Assembly%20passed,%2C%20safety%2C%20and%20enforcement%20standards
https://www.theverge.com/2024/8/28/24229068/california-sb-1047-ai-safety-bill-passed-state-assembly-governor-newsom-signature
https://www.techtimes.com/articles/307315/20240830/california-sb-1047-k-controversial-ai-safety-bill-recently-passed.htm

The post California’s AI Safety Bill Sparks Controversy in Silicon Valley appeared first on MarkTechPost.

Poplar: A Distributed Training System that Extends Zero Redundancy Optimizer (ZeRO) with Heterogeneous-Aware Capabilities

Dhanshree Shripad Shenwai — Sun, 01 Sep 2024 05:49:07 +0000

Training a model now requires more memory and computing power than a single accelerator can provide due to the exponential growth of model parameters. The effective usage of combined processing power and memory across a large number of GPUs is essential for training models on a big scale. Getting many identical high-end GPUs in a cluster usually takes a considerable amount of time. Still, there is typically no problem acquiring a sufficient amount of heterogeneous GPUs. The limited number of consumer-grade GPUs available to some academics makes it impossible for them to train massive models independently. Buying new equipment is also expensive because GPU goods are released so frequently. Tackling these issues and speeding up model exploration and tests can be achieved by properly employing heterogeneous GPU resources. Most distributed model training methods and techniques now assume that all employees are the same. There will be a lot of downtime during synchronization when these methods are used directly to heterogeneous clusters.

Incorporating heterogeneity into the search space of auto-parallel algorithms has been the subject of numerous studies. Previous studies have focused on certain aspects of heterogeneity, but not all of them. Only GPUs with different architectures and amounts of RAM (such as a V100 and an A100) can run them smoothly. This hinders the efficient exploitation of heterogeneous real GPU clusters. Given the characteristics of 3D parallelism, current approaches fail in two cases: (1) when the sole difference is in memory capacities and computation capabilities, as in A100-80GB and A100-40GB, and (2) when the quantity of heterogeneous GPUs is not uniform.

Poplar, a groundbreaking distributed training system, has been developed by a team of researchers from Peking University, the PLA Academy of Military Science, and the Advanced Institute of Big Data. This innovative system takes a comprehensive approach to GPU heterogeneity, considering computing capabilities, memory capacity, quantity, and their combinations. By expanding ZeRO to include heterogeneous GPUs and independently assigning jobs to each GPU, Poplar ensures maximum global throughput. The team also introduces a novel method for evaluating GPU heterogeneity, conducting granular analyses for each ZeRO stage to bridge the performance gap between the cost model and real-world results.

The team created a search algorithm that works independently of a batch allocation approach to guarantee that the load is balanced. They remove the need for human modification and expert knowledge by enabling automatic optimal configuration determination across heterogeneous GPUs.

The researchers used three diverse GPU clusters in their tests, with two different kinds of GPUs in each cluster. To measure the efficient use of the cluster from beginning to end, they employ TFLOPs (FLOPs/1e12). The average value is obtained after 50 repetitions for each experiment. They validated performance in the key experiments using Llama, then assessed generalizability using Llama and BERT for different sizes. For their trials, they keep the worldwide batch size of tokens at 2 million.

By setting up four baselines, they can clearly show that Poplar can accelerate. In baseline 2, more powerful homogenous GPUs are used, unlike baseline 1, which uses less powerful GPUs. The third baseline uses an advanced distributed training method called DeepSpeed. For baseline 3, they manually assign maximum batch sizes that satisfy the requirements. When it comes to fourth-generation heterogeneous training systems that provide hetero-aware load balancing, the gold standard is undoubtedly Whale. Baseline 4’s batch sizes are tuned to ensure maximum batch size aligned with its strategy. Findings on three real-world heterogeneous GPU clusters show that Poplar outperformed other approaches regarding training speed.

The team intends to investigate using ZeRO in heterogeneous clusters with network constraints. They also plan to explore the possibility of an uneven distribution of model parameters among diverse devices, taking into account their memory sizes.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit

Here is a highly recommended webinar from our sponsor: ‘Building Performant AI Applications with NVIDIA NIMs and Haystack’

The post Poplar: A Distributed Training System that Extends Zero Redundancy Optimizer (ZeRO) with Heterogeneous-Aware Capabilities appeared first on MarkTechPost.

Fortress: An Orchestration Platform for SaaS Applications, Allowing them to Manage a Multi-Instance Database Architecture in their Own Cloud Easily

Dhanshree Shripad Shenwai — Fri, 30 Aug 2024 13:18:17 +0000

For cost, latency, and data control, SaaS companies eventually shift away from third-party managed database platforms and onto their cloud, such as Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure. In addition, they transition from a single shared database architecture to a multi-instance database architecture to meet performance, compliance, and enterprise data isolation requirements. These cause problems for SaaS companies.

On top of dealing with complicated documentation and software development kits (SDKs), existing cloud-native database services necessitate even more DevOps to orchestrate individual database instances: schema migrations, connection pooling, versioning across cases, etc.

Meet Fortress, a database orchestrator that streamlines the DevOps process of maintaining a private cloud architecture with many database instances. Fortress is a platform for orchestration that SaaS apps may use to manage their cloud-based database architecture easily. This architecture can include both dedicated and shared instances.

SaaS apps can rely on Fortress as their optimal database platform as they grow. The advent of bring-your-own-computer (BYOC) allowed us to create a state-of-the-art database as a service (DBaaS) that would take advantage of the latest database developments and integrate them into our customers’ private clouds.

Benefits and main characteristics of Fortress

Strong Data Privacy and Security: By design, Fortress establishes separate database instances for every client, guaranteeing unmatched data privacy and security. This method assists SaaS companies in meeting demanding compliance standards while reducing the likelihood of data breaches.

Database management is made easy using Fortress, a tool that simplifies DevOps. Their platform allows DevOps teams to concentrate on more valuable operations by automating provisioning, scaling, and managing database instances.

Deployment Options: Fortress allows you to choose between fully managed and self-managed services so that you may tailor your deployment to your specifications. Set it up on your cloud and use the platform’s knowledge and infrastructure.

Accessible to Developers: The platform’s developer-friendly tools and APIs make working with Fortress a breeze. Development cycles and time-to-market are both shortened by this simplified experience.

Improved Safety: The platform uses advanced security protocols to keep user information safe. Protecting private data is easier with encrypted data transmission, network isolation, and role-based access control.

In Conclusion

A database orchestration platform like Fortress’s makes it easier to handle separate customer databases in a software-as-a-service setting, making it an attractive option. Businesses can use the platform’s hybrid approach by setting up dedicated or shared database instances, depending on their customers’ needs.

When it comes to improving their data management capabilities, SaaS organizations should seriously consider Fortress. Its strong security features, streamlined DevOps, and emphasis on segregated client databases make it an attractive option for companies of any size.

The post Fortress: An Orchestration Platform for SaaS Applications, Allowing them to Manage a Multi-Instance Database Architecture in their Own Cloud Easily appeared first on MarkTechPost.

Saldor: The Web Scraper for AI

Dhanshree Shripad Shenwai — Tue, 27 Aug 2024 10:51:45 +0000

The quantity and quality of data directly impact the efficacy and accuracy of AI models. Getting accurate and pertinent data is one of the biggest challenges in the development of AI. LLMs require current, high-quality internet data to address certain issues. It is challenging to compile data from the internet. Coordinating crawlers, locating interesting pages inside a website, preserving context from page layouts, and other issues can be difficult. Updating the store may be expensive and time-consuming as this data changes over time.

Meet Saldor, who gathers and preserves the greatest web data for RAG. Saldor gathers material from websites by clever crawling. Engineers can turn jumbled online data into a tidy, usable output—whether it’s structured JSON for conventional programs or human-readable language for LLMs—with only a few lines of code.

Saldor is a web scraping tool made especially for artificial intelligence uses. It makes it easier for developers to get the data required to train their AI models by streamlining the process of pulling data from websites. Saldor saves developers time and effort by automating the data-collecting process, freeing them up to concentrate on creating and improving their AI models.

Salvador offers user-friendliness, dependability, and high-quality data. Saldor frees up developers’ time to work on other elements of their AI projects by automating the laborious web scraping process. Saldor offers a configurable and adaptable web scraping method.

How Does Saldor Work?

Saldor works by following several key steps:

Target Selection: Users specify the domains or web pages they wish to scrape. URLs, domains, or even certain page components might be used for this.

Using data extraction, Saldor locates and retrieves the required data from the target websites. This can contain different information, text, pictures, and links.

Data Cleaning: To guarantee the quality and consistency of the extracted data, it is cleaned and formatted. This might entail standardizing the data, fixing mistakes, or eliminating duplicates.

Data Export: In an appropriate format, such as CSV, JSON, or XML, the cleaned data is exported. This makes it simple to include in workflows for AI development.

In Conclusion

With Saldor, an AI web scraper, you can quickly convert a website into a RAG agent. Saldor is an effective tool that makes web scraping for AI development easier. Saldor helps AI developers create more precise and useful models by automating data collecting and guaranteeing data quality.

The post Saldor: The Web Scraper for AI appeared first on MarkTechPost.

Sepal AI: A Data Development Platform that Enables You to Curate Useful Datasets

Dhanshree Shripad Shenwai — Sat, 24 Aug 2024 07:31:25 +0000

For optimal performance, AI models require top-notch data. Obtaining and organizing this data may be quite a challenge, unfortunately. There is a risk that publicly available datasets must be more adequate, too broad, or tainted to be useful for some purposes. It can be challenging to find domain experts, which is a problem for many datasets. There is a need for Golden Datasets and Frontier Benchmarking in a world where AI propels economic growth and promotes scientific research. The goal of iteratively testing the model’s efficacy on different use scenarios is to Data for Training: If someone want to boost the model’s performance with RLHF and fine-tuning Before releasing LLMs into the wild, it is important to assess and predict their safety by red-teaming.

Publicly available benchmarks that are either too vague or inaccurate to be of any use to real product creators need to be made, and the majority of data requires domain knowledge, which can be difficult to collect and curate. Advanced data is essential to deploy and scale AI safely. Nevertheless, gathering this information is no picnic. Collecting and curating domain knowledge (e.g., medicine, biology, physics, finance, etc.) for most frontier data can be challenging. The publicly available benchmarks, such as MMLU, GPQA, MATH, etc., are polluted and overly simplistic to be of any use to the people who construct products and models.

Meet Sepal AI, a data development tool that lets you create valuable datasets through curation. Sepal offers advanced data and tools to promote ethical AI development. By responsibly developing AI, Sepal AI aims to expand human knowledge and capacities.

Responsible behaviors are highly valued by Sepal AI, which acknowledges the ethical considerations surrounding AI development. The platform helps build AI models that are good for society, impartial, and fair by giving resources for making high-quality data. By incorporating human expertise, synthetic data augmentation, data generating tools, and stringent quality control, Sepal AI makes it easy to oversee the creation of reliable datasets.

Sepal AI is involved in the following engagements:

Molecular and Cellular Biology Benchmark: A novel approach to comparing models’ complicated thinking abilities. It was developed by a group of highly regarded American PhD scientists.
Finance Q&A + SQL Eval: A Golden Dataset to evaluate an AI agent’s database querying skills and generate responses to complex finance inquiries comparable to human experts.
Uplift Trials & Human Baselining: Comprehensive End-to-End Support for Safe, In-Person Model Evaluations.

In Conclusion

Sepal AI solves this data shortage by enabling individuals and companies to develop meaningful datasets. Sepal AI provides an all-encompassing method for data development by integrating tools for data generation, synthetic data augmentation, stringent quality control, and an expert network.

The post Sepal AI: A Data Development Platform that Enables You to Curate Useful Datasets appeared first on MarkTechPost.

DeepSim: AI-Accelerated 3D Physics Simulator for Engineers

Dhanshree Shripad Shenwai — Fri, 23 Aug 2024 15:32:51 +0000

One uses computational power in physics simulation to solve mathematical models that describe physical events. When dealing with complex geometries, fluid dynamics, or large-scale systems, the processing demands of these simulations can be enormous, but the insights they bring are vital. 3D physics simulations are time-consuming, costly, and a pain to run. Before even running a single simulation, experienced engineers still invest significant time in meshing their ideas. Many simulations take hours—if not weeks—on pricey computing systems. Some are so difficult that even skilled engineers must simplify them, greatly compromising accuracy. Simulators are delicate and can crash at any moment thus continual monitoring is required at every stage.

Meet DeepSim, a revolutionary AI simulation platform that automates the physics setup, allowing for 1000X faster design simulations without sacrificing accuracy. The platform’s efficient and rapid solution delivery is made possible by combining a powerful GPU-accelerated solver and lightweight, easily trainable AI models. With this technology, the bulkiness of classic finite element method (FEM) tools is removed, and the rigidity of other AI physics simulations is surmounted.

The DeepSim team has developed the most advanced thermal simulator for circuit design in collaboration with semiconductor companies. This simulator can run thermal simulations with billions of nodes in minutes on a single GPU, providing a thousand times higher detail than a commercial tool using fifty or more CPU cores simultaneously.

Modeling exceedingly complicated geometries with length scales spanning six orders of magnitude, which are impossible to solve with FEM tools, is now possible using DeepSim’s simulator. A 10-minute thermal simulation of an integrated circuit with all the necessary components (heat sink, airflow, etc.) is shown below. Within a complete chip (1 cm), DeepSim can resolve hot spots in individual transistors (~10 nm in size).

To summarize

DeepSim is developing an AI-powered 3D physics simulator that can complete complicated simulations in minutes rather than days, thanks to its 1000X speed advantage over current FEM tools. PhDs from Stanford University who have extensive experience with GPU-accelerated solvers and thermal simulation of semiconductor devices and circuits started DeepSim. Quick and straightforward 3D physics simulations are now within engineers’ reach with DeepSim. Faster design iterations lead to better products, and better judgments are made through real-world monitoring.

The post DeepSim: AI-Accelerated 3D Physics Simulator for Engineers appeared first on MarkTechPost.

Revolutionizing Deep Model Fusion: Introducing Sparse Mixture of Low-rank Experts (SMILE) for Scalable Model Upscaling

Dhanshree Shripad Shenwai — Fri, 23 Aug 2024 15:26:20 +0000

The training of large-scale deep models on broad datasets is becoming more and more costly in terms of resources and environmental effects due to the exponential development in model sizes and dataset scales in deep learning. A new, potentially game-changing approach is deep model fusion techniques, which combine the insights of several models into one without requiring substantial retraining. Combining the strengths of numerous models in this way decreases computational costs and allows for the production of more robust and versatile models.

Model ensemble, merging, and mixing procedures are the primary groups into which model fusion approaches fall. Model ensemble techniques combine the predictions of multiple models to improve performance. It enhances training for knowledge distillation, but its memory and computation are expensive. However, model merging approaches combine different models’ parameters, usually by aligning or weighting them. More adaptable and flexible fusion tactics are made possible by model mixing methods, which incorporate numerous models through depth concatenation or gating mechanisms. When training for multiple tasks simultaneously, these techniques shine because the combined model can handle it all. Model fusion has come a long way, but some major obstacles still prevent it from reaching its full potential. Interference between model parameters, which might cause less-than-ideal performance, is a major cause for con. Furthermore, one of the biggest problems with fusion is that it needs to be more easily interpretable. To understand the combined models, knowing how parameters are combined is important.

Researchers from Wuhan University, Sun Yat-sen University, JD Explore Academy, Beijing Institute of Technology, and Nanyang Technological University offer a new subspace viewpoint for comprehending and solving the parameter interference issue instead of depending on heuristic approaches or simplified assumptions. Using matrix decomposition, they started by looking into linear layer fine-tuning from a subspace analysis perspective. As a result, it becomes possible to break down the fine-tuned model’s prediction into its parts, which include both the pre-trained knowledge and the task-specific adaptation. This method can better understand models’ ability to adapt to downstream tasks while maintaining pre-trained information.

The researchers constructed a more thorough comprehension of fine-tuning by analyzing experimental data. They recast parameter interference as an optimization problem, offering a more scientific and quantifiable viewpoint. They present zero-shot Sparse MIxture of Low-rank Experts (SMILE), improving upon their current source models. Their approach’s zero-shot feature allows fused models to be immediately deployed in new contexts or jobs, significantly reducing the time and resources usually needed for model development.

They suggested the method’s efficacy stems from two important findings in subspace analysis:

When adapting to new tasks, it was discovered that the fine-tuning mostly uses less significant or previously unused dimensions of the parameter space while preserving the most relevant pre-trained weights. The parameter subspace needed to include new information may differ from one job to another. Still, this preservation guarantees that the crucial pre-training information encoded in the initial models is preserved while fine-tuning.
Parameter inference is intractable in the initial parameter space. However, as the dimensionality of the model increases, it becomes more tractable. This enhancement gives more “room” for task-specific parameter modifications to live in harmony.

The researchers performed comprehensive tests spanning various tasks and models in the visual and linguistic domains using both Low-Rank Adaptation (LoRA) and classic complete fine-tuning. According to the findings, models that are fine-tuned in their entirety can attain around 98-99% of the performance of eight separate fine-tuned models by adding about 50% more parameters. However, LoRA fine-tuned models, by keeping 99% of the individual performance with only a 2% increase in parameters, demonstrate the efficiency and practicality of the research. His system also provides performance-size trade-offs by changing the rank k of the local experts.

Even while the MoE approach is sparsely activated to make it efficient, it still adds computational cost, particularly when there are more jobs or experts to consider. The team suggests that by identifying the subspaces that have the most impact on task-specific performance, it is possible to develop fine-tuning strategies that are more efficient and focused on updating only the areas of the model that need it. Other domains, such as multimodal big language models, can benefit from this strategy since it treats various data types (modalities) as independent experts.

Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 49k+ ML SubReddit

Find Upcoming AI Webinars here

The post Revolutionizing Deep Model Fusion: Introducing Sparse Mixture of Low-rank Experts (SMILE) for Scalable Model Upscaling appeared first on MarkTechPost.